More

peterstjohn · on May 5, 2023

Yes! We've been running Milvus in production for about three years now, powering some customers that do have queries at that scale. It has its foibles like all of these systems (the lack of non-int id fields in the 1.x line is maddening and has required a bunch of additional engineering by us to work with our other systems), but it has held up pretty well in our experience.

(I can't speak to Milvus 2.x as we are probably not going to upgrade to that for a number of non-performance reasons)

peterstjohn · on May 5, 2023

Are they forking Lucene or somehow getting the Lucene devs to increase that limit? Because this PR has been open for over a year now: https://github.com/apache/lucene/issues/11507

softwaredoug · on May 5, 2023

No - they just did something in Elasticsearch to make their own FieldType https://github.com/elastic/elasticsearch/pull/95257

peterstjohn · on April 27, 2023

Fun project, with a bit of a kicker as I see the words "Colerain Avenue" and realize it was literally across the road from me.

peterstjohn · on April 19, 2023

So just use their base model and fine-tune with a non-restrictive dataset (e.g. Databricks' Dolly 2.0 instructions)? You can get a decent LoRA fine-tune done in a day or so on consumer GPU hardware, I would imagine.

The point here is that you can use their bases in place of LLaMA and not have to jump through the hoops, so the fine-tuned models are really just there for a bit of flash…

peterstjohn · on April 4, 2023

I once travelled with a 5kg vat of fondant icing on a transatlantic flight. "Yes, it looks very much like Semtex, but it's fine!" Still not exactly sure how I got away with it…

TheCoelacanth · on April 4, 2023

People often manage to bring actual weapons onto flights, so it's unsurprising that they frequently fail to catch liquids.

mrguyorama · on April 4, 2023

95% of TSA screening difficulty is entirely dependent on how white you are.

peterstjohn · on Sept 15, 2022

Heh, my eyes did pop at that one, considering we've also been doing that over here since 2020 at least ;)

peterstjohn · on Sept 15, 2022

It really does give you the best of both worlds - resistant to typos, handling synonyms without all the usual hand-written rules, but still able to handle direct searches like ISBNs.

(disclaimer: I work on Semantic Search at Lucidworks)

peterstjohn · on Sept 5, 2022

Two big reasons for Vespa over Milvus 1.x:

* Filtering

* String-based IDs

(a caveat that I haven't used Milvus 2.x recently, which does fix these issues, but brings in a bunch of other dependencies like Kafka or Pulsar)

peterstjohn · on July 9, 2022

UTAH SAINTS! UTAH SAINTS!

;P

(It would have been more fun if we'd spent the past month with clickbait like "What is Orgone Energy, Anyway?"

tptacek · on July 9, 2022

Right? It would be as good as anything else the show has induced in the culture. Maybe it explains the Upside Down or whatever. :)

peterstjohn · on Feb 23, 2022

If you control the HNSW implementation, it can definitely do pre-filtering. Vespa does it, and you can modify open source HNSW libs easily. I added pre-filtering support to an internal fork of HNSWLIB last week, for example…