I was using embeddings to group articles by topic, and hit a specific issue. Say...

eamag · on Nov 1, 2024

I agree, I tried several methods during my pet project [1], and all of them have their pros and cons. Looks like creating topics first and predicting them using LLM works the best

[1] https://eamag.me/2024/Automated-Paper-Classification

coredog64 · on Nov 1, 2024

Allegedly, the new hotness in RAG is exactly that. Use a smaller LLM to summarize the article and include that summary alongside the article when generating the embedding.

Potentially solves your issue, but it is also handy when you have to chunk a larger document and would lose context from calculating the embedding just on the chunk.