More

peterstjohn · 2025-08-04T18:43:17 1754332997

Try Hey Duggee - it's not as explicitly British-coded, but there's a ton of stuff in there if you were watching Spaced in your late teens and now find yourself a parent…

skrebbel · 2025-08-04T19:23:13 1754335393

Seconded, Hey Duggee is a fantastic show. In a way it's the anti-Bluey - same delightful vibes, just as playfully animated, but intentionally ridiculous (and, to me, hilarious) stories.

peterstjohn · 2025-03-30T18:22:30 1743358950

Ha, same here! It really helped my imposter syndrome, as I overheard a couple of guys talking about the ARM assembly they were doing on their Archimedes on the first day…and I hadn't written anything fancier than QuickBASIC at the time…

trollied · 2025-03-30T18:55:21 1743360921

Was actually lucky enough to be taught by the co-inventor of the ARM cpu. Furber is awesome.

peterstjohn · 2025-03-30T19:58:30 1743364710

For my sins, I didn't actually realise how great that was until quite a bit afterwards! ;)

peterstjohn · on Jan 10, 2025

I even hosted a mirror of the original Mozilla source code dump from St. Anselm Hall, and nobody ever complained ;P

peterstjohn · on Sept 8, 2024

If you think that of Owen's output, for heaven's sake I fear for you if you ever read a Jonathan Meades article…

peterstjohn · on April 21, 2024

You would just film almost _directly_ across the river and shoot on the South Bank, one of the major brutalist outposts in London. It's lovely.

peterstjohn · on Jan 26, 2024

I no longer work there, but Lucidworks has had embedding training as a first-class feature in Fusion since January 2020 (I know because I wrapped up adding it just as COVID became a thing). We definitely saw that even with just slightly out-of-band use of language - e.g. in e-commerce, things like "RD TSHRT XS", embedding search with open (and closed) models would fall below bog-standard* BM25 lexical search. Once you trained a model, performance would kick up above lexical search…and if you combined lexical _and_ vector search, things were great.

Also, a member on our team developed an amazing RNN-based model that still today beats the pants off most embedding models when it comes to speed, and is no slouch on CPU either…

(* I'm being harsh on BM25 - it is a baseline that people often forget in vector search, but it can be a tough one to beat at times)

softwaredoug · on Jan 27, 2024

Heh. A lot of what search people have known for a while, is suddenly being re-learned by the population at large, in the context of RAG, etc :)

mvkel · on Jan 27, 2024

The thing with tech is, if you're too early, it's not like you eventually get discovered and adopted.

When the time is finally right, people just "invent" what you made all over again.

softwaredoug · on Jan 27, 2024

Totally. And this has even happened in search. Open source search engines like Elasticsearch, etc did this... Google etc did this in the early Web days, and so on :)

data_maan · on Jan 28, 2024

Sorry, what is it that people in search _have_ known?

I know nothing about search, but a bit about ML, so I'm curious

softwaredoug · on Jan 28, 2024

That ranking is a lot more complicated than cosine similarity on embeddings

az226 · on Jan 27, 2024

What’s the model?

peterstjohn · on Dec 17, 2023

Well, why wouldn't they sell (license) the rights to make Transformers films (which as far as I know is just extending their existing contract with Paramount)?

They still own the underlying IP[^1], so as long as the contract is a decent one, Paramount has to deal with the actual making/distributing the film, and Hasbro just gets the money, and a toy line off the back of the film. Feels like an easier set up than taking the risk on movie-making yourself (which they did attempt with eOne for other properties, but seemingly have decided that it's probably not a good deal with them)

[1] yes, yes, it's a bit more complicated with Takara in the mix too, but you can essentially view it as a Hasbro-owned property

peterstjohn · on Oct 4, 2023

+1 to everybody that mentioned that Vespa has great vector support _and_ lexical filtering. And you likely will end up needing both.

Don't sleep on some of its newer features like multi-vector document fields, either…

peterstjohn · on Sept 5, 2023

That paper does a terrible job of making Lucene look useful, though. 10qps from a server with 1TB of RAM is not great (and I know Lucene HNSW can perform better than that in the real world, so I am somewhat mystified that this paper is being pushed by the community).

peterstjohn · on May 5, 2023

It definitely depends on your use case. If you are just searching through the entire array at all times, then this is certainly an acceptable option (you could even flip it all onto a GPU too).

But when you start to require filtering or combining the vector search with a lexical search, then something like Pinecone, Vespa, Qdrant, Lucene-based options (e.g. Solr and ES) etc. become a lot more practical than you building all that functionality yourself.