Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We ran some qualitative tests and there was a quality difference. In fact, benchmarks show that trend to generally hold: https://archersama.github.io/coir/

That being said, our goal was to make the library modular so you can easily add support for whatever embeddings you want. Definitely encourage experimenting for your use-case because even in our tests, we found that trends which hold true in research benchmarks don't always translate to custom use-cases.



> we found that trends which hold true in research benchmarks don't always translate to custom use-cases.

Exactly why I asked! If you don't mind a followup question, how were you evaluating embeddings models — was it mostly just vibes on your own repos, or something more rigorous? Asking because I'm working on something similar and based on what you've shipped, I think I could learn a lot from you!


Happy to help!

At the beginning, we started with qualitative "vibe" checks where we could iterate quickly and the delta in quality was still so significant that we could obviously see what was performing better.

Once we stopped trusting our ability to discern differences, we actually bit the bullet and made a small eval benchmark set (~20 queries across 3 repos of different sizes) and then used that to guide algorithmic development.


Thank you, I appreciate the details.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: