I have been wanting to do similar things. But, shied away because I wasn't in the mood to find out if this sort of thing was legal or not. Can someone here who knows this space better talk about the legal aspect of doing something like this?
You're just interacting with the API, you're not gonna have any legal issues. I have some experience with Spotify's API (not MusicKit), so I'm gonna try guessing how it works based on that.
There's an API endpoint called audio_features[0] that tells you things about the song (tempo, danceability, acousticness, major/minor key...), so while you can't get full versions of every song, you can approximate how they sound like based on Spotify's audio analysis of them.
So, build a database of audio_features while respecting API limits, find the most similar ones based on about a dozen variables, and you're good to go.
I doubt T&Cs have included "no training embeddings with our data" yet; so they're probably clear there (and it might be Fair Use in USA??).
On the main question, I think we'll be waiting for Getty v whoever (Dall-E?) lawsuit to see what courts think.
A useful indicator might be have any major corporation's released AIs trained on public data - because they will be the prime targets for people looking to sue and walk away with lots of money. You can get plugins for Photoshop to do AI imagery, but I don't think Adobe sell any?