Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow, actually a good point I haven't seen anyone make.

Taking raw embeddings and then storing them into vector databases, would be like if you took raw n-grams of your text and put them into a database for search.

Storing documents makes much more sense.



Been using pgvector for a while, and to me it was kind of obvious that the source document and the embeddings are fundamentally linked so we always stored them "together". Basically anyone doing embeddings at scale is doing something similar to what Pgai Vectorizer is doing and is certainly a nice abstraction.


I used FAISS as it also allowed me to trivially store them together.

Idk how well it scales though, it's just doing it's job on my hobby project scale

For my few 100'000s embeddings I must say the performance was satisfactory.


This is how most modern vector dbs work, you usually can store much more than just the raw embeddings (full text, metadata fields, secondary/named vectors, geospatial data, relational fields, etc).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: