Hacker Newsnew | past | comments | ask | show | jobs | submit | peterstjohn's commentslogin

Try Hey Duggee - it's not as explicitly British-coded, but there's a ton of stuff in there if you were watching Spaced in your late teens and now find yourself a parent…


Seconded, Hey Duggee is a fantastic show. In a way it's the anti-Bluey - same delightful vibes, just as playfully animated, but intentionally ridiculous (and, to me, hilarious) stories.


Ha, same here! It really helped my imposter syndrome, as I overheard a couple of guys talking about the ARM assembly they were doing on their Archimedes on the first day…and I hadn't written anything fancier than QuickBASIC at the time…


Was actually lucky enough to be taught by the co-inventor of the ARM cpu. Furber is awesome.


For my sins, I didn't actually realise how great that was until quite a bit afterwards! ;)


I even hosted a mirror of the original Mozilla source code dump from St. Anselm Hall, and nobody ever complained ;P


If you think that of Owen's output, for heaven's sake I fear for you if you ever read a Jonathan Meades article…


You would just film almost _directly_ across the river and shoot on the South Bank, one of the major brutalist outposts in London. It's lovely.


I no longer work there, but Lucidworks has had embedding training as a first-class feature in Fusion since January 2020 (I know because I wrapped up adding it just as COVID became a thing). We definitely saw that even with just slightly out-of-band use of language - e.g. in e-commerce, things like "RD TSHRT XS", embedding search with open (and closed) models would fall below bog-standard* BM25 lexical search. Once you trained a model, performance would kick up above lexical search…and if you combined lexical _and_ vector search, things were great.

Also, a member on our team developed an amazing RNN-based model that still today beats the pants off most embedding models when it comes to speed, and is no slouch on CPU either…

(* I'm being harsh on BM25 - it is a baseline that people often forget in vector search, but it can be a tough one to beat at times)


Heh. A lot of what search people have known for a while, is suddenly being re-learned by the population at large, in the context of RAG, etc :)


The thing with tech is, if you're too early, it's not like you eventually get discovered and adopted.

When the time is finally right, people just "invent" what you made all over again.


Totally. And this has even happened in search. Open source search engines like Elasticsearch, etc did this... Google etc did this in the early Web days, and so on :)


Sorry, what is it that people in search _have_ known?

I know nothing about search, but a bit about ML, so I'm curious


That ranking is a lot more complicated than cosine similarity on embeddings


What’s the model?


Well, why wouldn't they sell (license) the rights to make Transformers films (which as far as I know is just extending their existing contract with Paramount)?

They still own the underlying IP[^1], so as long as the contract is a decent one, Paramount has to deal with the actual making/distributing the film, and Hasbro just gets the money, and a toy line off the back of the film. Feels like an easier set up than taking the risk on movie-making yourself (which they did attempt with eOne for other properties, but seemingly have decided that it's probably not a good deal with them)

[1] yes, yes, it's a bit more complicated with Takara in the mix too, but you can essentially view it as a Hasbro-owned property


+1 to everybody that mentioned that Vespa has great vector support _and_ lexical filtering. And you likely will end up needing both.

Don't sleep on some of its newer features like multi-vector document fields, either…


That paper does a terrible job of making Lucene look useful, though. 10qps from a server with 1TB of RAM is not great (and I know Lucene HNSW can perform better than that in the real world, so I am somewhat mystified that this paper is being pushed by the community).


It definitely depends on your use case. If you are just searching through the entire array at all times, then this is certainly an acceptable option (you could even flip it all onto a GPU too).

But when you start to require filtering or combining the vector search with a lexical search, then something like Pinecone, Vespa, Qdrant, Lucene-based options (e.g. Solr and ES) etc. become a lot more practical than you building all that functionality yourself.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: