Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

whats your take on handling log compaction to prevent unbounded growth, especially in systems with high write throughput?


Nice question! Restate is not a log that retains the raw events for a long time - conceptually just until they where processed by the handlers, DB, locking, etc.

When you build stateful handlers, the state per key is in the internal DB, and that get's you a similar effect to log compaction, i.e., retain one value per key.


I have a “summarise” log entry: the current log’s contents that will be relevant to the future are summarised. For example, if it’s FY2023’s financial transactions, we compute the final balances at the end of the year. We then close the log, and write an entry to it of “no more log entries after this are valid”.

We then copy the summary transactions to a new log, and compress and archive the old log.

You can identify high throughput and low throughput types of log entries and segregate them into different log streams. For example, the “new customer/change customer info” stream probably gets way less traffic than the “customer has logged in” stream. The former is also harder to summarise. Put the hard to summarise but low volume stuff in its own log.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: