Working on https://greatreads.dev/
A place to aggregate and find articles from developers' blogs. Right now, I'm building a submission form for people to submit new sources.
There is also a way to search for articles using vectors, it's called "Semantic Search". So basically you can ask, for example, "Postgresql and how to best optimize it." and it would search for articles touching that subject, or at least related to it.
Wondering about the best way I can add a weekly newsletter built on top of the content currently being ingested, and still looking for more sources to add to the database (let me know if you have any good recommendations).
Doing something similar for a non-public project. How do you deal with remixing feeds, and the potential mix of formats (rss, atom, etc)? I need to create new feeds as well, and if its done by normalizing, sanitizing content, etc. I feel I misrepresent the original content, and probably breach the implicit license granted by syndicating via feed.
Probably very few creators care one way or another, as the links are going to the original content. Just interested if people had an opinion on the matter.
It’s honestly a bit of a pain. I’m using a library to help parse different formats, but there are many custom cases to handle. Dates are a good example. I’m parsing more than a dozen formats, and there’s no real pattern in how sites display their published dates. Some blogs even use unusual formats that aren’t common anywhere else.
I try to avoid altering the original content as much as possible. I do need to sanitize and adjust parts of it to produce clean text on my site, but I’m careful not to change anything in a way that misrepresents the source. Only a few short phrases appear on GreatReads, and users cannot read the full article without visiting the original source.
I'm using a Postgres database. So when articles are ingested, I use the Gemini Embedding model (they have a great free tier) and save that in a vector column that is used later to do the search.
There is also a way to search for articles using vectors, it's called "Semantic Search". So basically you can ask, for example, "Postgresql and how to best optimize it." and it would search for articles touching that subject, or at least related to it.
Wondering about the best way I can add a weekly newsletter built on top of the content currently being ingested, and still looking for more sources to add to the database (let me know if you have any good recommendations).