Open to any feedback on this here or over email (jaan@onefact.org)!
I quit academia to start a non-profit focused on using open source to analyze the public hospital price transparency data.
We are now making similar dashboards for every hospital in the country, and need all the help we can get if you would be interested in using the latest geospatial mapping tools, databases (duckdb) and large language models to make sense of this massive amount of data.
Bitly was expensive, so this is a fun little use case for redirect rules for subdomains. Was coming increasingly handy as we scale our nonprofit, so wanted to share and open sourced this!
Yes! We are working on this and integrating with the OMOP common data model, to be able to link the health outcomes in our data partners' clinical repositories to the cost of care. For example, we work with the NIH All of Us study for outcome data (joinallofus.org -- I signed up both to contribute to this science and to get my whole genome sequenced free!)
If you look at the files, many of them are not compliant, and so we need to figure out what the associated line item corresponds to: a CPT code? HCPCS code? ICD code? etc :)
Here's an example NLP tool I helped build we're using to do this: https://arxiv.org/abs/1904.05342 -- it's in several pipelines now for data annotation and crowdsourcing.
Really meant as a critique of late stage capitalism where companies like Headspace have ads in New York encouraging us to “optimize” our mindfulness practice so we can train harder…
This is what we need for the large language models I am training for health care use cases.
For example, constraining LLM output is currently done by masking, and having this rust based library would enable novel ways to train LLMs.
Relevant papers:
https://github.com/epfl-dlab/transformers-CFG
https://neurips.cc/virtual/2023/poster/70782