Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You probably don't need full text search, but only exact match search and very efficient time-based retrieval of contiguous log fragments. As an engineer spending quite a lot of time debugging and reading logs, our Opensearch has been almost useless for me (and a nightmare for our ops folks), since it can miss searches on terms like filenames and OSD UX is slow and generally unpleasant. I'd rather have a 100MB of text logs downloaded locally.

Please enlighten me, what are use cases for real full-text search (with fuzzy matching, linguistic normalization etc.) in logs and similar machine-generated transactional data? I understand its use for dealing with human-written texts, but these are rarely in TB range, unless you are indexing the Web or logs of some large-scale communication platform.



I agree that fuzzy matching etc. are usually not needed, but in my experience I need at least substring match. A log message may say "XYZ failed for FOO id 1234556789" and I want to be able to search logs for 123456789 to see all related information (+ trace id if available)

In systems that deal with asynchronous actions, log entries relating to "123456789" may be spread over minutes, hours or even days. When researching issues, I have found searches like Opensearch, Splunk etc. invaluable and think the additional cost is worth it. But we also don't have PB of logs to handle, so there may be a point where the cost is greater than the benefit.


This is why you should always do structured logging. Finding logs using string match can be fragile.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: