That's just bais collaborative filtering. Drdaeman is talking about using the ac...

That's just bais collaborative filtering. Drdaeman is talking about using the actual content of the songs in your vector embeddings.

This is not really important if you have a lot of user behavior data and/or playlists for each song. But if you have a niche song that few people of listened to, collaborative filtering based recommendations aren't going to be good.

Real semantic embeddings (which can then be part of the input to the recommendation model) can be trained using self-supervision, e.g. an auto encoder or a seperate "next audio token" predicting transformer.