More

jaan · on March 11, 2024

Does it support constrained generation during training?

This is what we need for the large language models I am training for health care use cases.

For example, constraining LLM output is currently done by masking, and having this rust based library would enable novel ways to train LLMs.

Relevant papers:

https://github.com/epfl-dlab/transformers-CFG

https://neurips.cc/virtual/2023/poster/70782

mmoskal · on March 11, 2024

It's definitely very exciting direction, which we have not explored at all!

jaan · on Feb 18, 2024

Also trying to figure this out for using it with vite! Added a question here: https://github.com/observablehq/framework/discussions/855

jaan · on Aug 14, 2023

Open to any feedback on this here or over email (jaan@onefact.org)!

I quit academia to start a non-profit focused on using open source to analyze the public hospital price transparency data.

We are now making similar dashboards for every hospital in the country, and need all the help we can get if you would be interested in using the latest geospatial mapping tools, databases (duckdb) and large language models to make sense of this massive amount of data.

Through a data bounty (https://www.dolthub.com/repositories/onefact/paylesshealth) we collected 4000+ hospital price sheets and made them public here: https://data.payless.health/#hospital_price_transparency/. This was on HN previously.

Grateful for all your support so far!!

jaan · on May 10, 2023

Bitly was expensive, so this is a fun little use case for redirect rules for subdomains. Was coming increasingly handy as we scale our nonprofit, so wanted to share and open sourced this!

jaan · on Dec 7, 2022

Wow. Would love to help you build this and pay you for Apache 2.0 release of your work with our grant money - emailing you now :)

jaan · on Dec 6, 2022

Yes! We are working on this and integrating with the OMOP common data model, to be able to link the health outcomes in our data partners' clinical repositories to the cost of care. For example, we work with the NIH All of Us study for outcome data (joinallofus.org -- I signed up both to contribute to this science and to get my whole genome sequenced free!)

jaan · on Dec 6, 2022

You're right! We're linking this data to the negiotated rates :) and building the search engine for both of these at payless.health.

supernova87a · on Dec 6, 2022

Thank you!

jaan · on Dec 6, 2022

Nailed it! :)

jaan · on Dec 6, 2022

If you look at the files, many of them are not compliant, and so we need to figure out what the associated line item corresponds to: a CPT code? HCPCS code? ICD code? etc :)

Here's an example NLP tool I helped build we're using to do this: https://arxiv.org/abs/1904.05342 -- it's in several pipelines now for data annotation and crowdsourcing.

jaan · on Feb 11, 2022

I got a friend to record guided meditations geared at techbros: https://dystopia.guide/

Really meant as a critique of late stage capitalism where companies like Headspace have ads in New York encouraging us to “optimize” our mindfulness practice so we can train harder…