Allegedly, the new hotness in RAG is exactly that. Use a smaller LLM to summariz...

Allegedly, the new hotness in RAG is exactly that. Use a smaller LLM to summarize the article and include that summary alongside the article when generating the embedding.

Potentially solves your issue, but it is also handy when you have to chunk a larger document and would lose context from calculating the embedding just on the chunk.