Hacker Newsnew | past | comments | ask | show | jobs | submit | jaan's commentslogin

Does it support constrained generation during training?

This is what we need for the large language models I am training for health care use cases.

For example, constraining LLM output is currently done by masking, and having this rust based library would enable novel ways to train LLMs.

Relevant papers:

https://github.com/epfl-dlab/transformers-CFG

https://neurips.cc/virtual/2023/poster/70782


It's definitely very exciting direction, which we have not explored at all!


Also trying to figure this out for using it with vite! Added a question here: https://github.com/observablehq/framework/discussions/855


Open to any feedback on this here or over email (jaan@onefact.org)!

I quit academia to start a non-profit focused on using open source to analyze the public hospital price transparency data.

We are now making similar dashboards for every hospital in the country, and need all the help we can get if you would be interested in using the latest geospatial mapping tools, databases (duckdb) and large language models to make sense of this massive amount of data.

Through a data bounty (https://www.dolthub.com/repositories/onefact/paylesshealth) we collected 4000+ hospital price sheets and made them public here: https://data.payless.health/#hospital_price_transparency/. This was on HN previously.

Grateful for all your support so far!!


Bitly was expensive, so this is a fun little use case for redirect rules for subdomains. Was coming increasingly handy as we scale our nonprofit, so wanted to share and open sourced this!


Wow. Would love to help you build this and pay you for Apache 2.0 release of your work with our grant money - emailing you now :)


Yes! We are working on this and integrating with the OMOP common data model, to be able to link the health outcomes in our data partners' clinical repositories to the cost of care. For example, we work with the NIH All of Us study for outcome data (joinallofus.org -- I signed up both to contribute to this science and to get my whole genome sequenced free!)


You're right! We're linking this data to the negiotated rates :) and building the search engine for both of these at payless.health.


Thank you!


Nailed it! :)


If you look at the files, many of them are not compliant, and so we need to figure out what the associated line item corresponds to: a CPT code? HCPCS code? ICD code? etc :)

Here's an example NLP tool I helped build we're using to do this: https://arxiv.org/abs/1904.05342 -- it's in several pipelines now for data annotation and crowdsourcing.


I got a friend to record guided meditations geared at techbros: https://dystopia.guide/

Really meant as a critique of late stage capitalism where companies like Headspace have ads in New York encouraging us to “optimize” our mindfulness practice so we can train harder…


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: