An interesting exercise would be to compare this with the (confusingly similarly named) `fsst` string compression strategy: https://github.com/cwida/fsst
After reading I don’t get how locks held in memory affect WAL shipping.
WAL reader reads it in a single thread, updates in-memory data structures
periodically dumping them on disk. Perhaps you want to read one big
instruction from WAL and apply it to many buffers using multiple threads?
We currently use an un-modified/generic WAL entry, and don't implement our own replay. That means we don't control the order of locks acquired/released during replay: and the default is to acquire exactly one lock to update a buffer.
But as far as I know, even with a custom WAL entry implementation, the maximum in one entry would still be ~8k, which might not be sufficient for a multi-block atomic operation. So the data structure needs to support block-at-a-time atomic updates.
I guess your implementation generates a lot of dead tuples during
compaction. You clearly fighting PG here. Could a custom storage
engine be a better option?
`pg_search`'s LSM tree is effectively a custom storage engine, but it is an index (Index Access Method and Custom Scan) rather than a table. See more on it here: https://www.paradedb.com/blog/block_storage_part_one
LSM compaction does not generate any dead tuples on its own, as what is dead is controlled by what is "dead" in the heap/table due to deletes/updates. Instead, the LSM is cycling blocks into and out of a custom free space map (that we implemented to reduce WAL traffic).
Thanks for reporting this! I'm having trouble finding the link you are referring to though. Would you mind sharing a link to the file/page containing the dead link?
I think that `inverting` also subsumes async functions/values, which is pretty neat!
In the case where asynchrony was actually necessary, it seems like a great alternative to function coloring.
But whether you should actually use it for something like their `sub_min` example is highly dependent on how good the performance of their implementation is. Creating a graph of references rather than making two passes over an array of integers is not clearly faster ... or clearer, for that matter.
When it comes to understanding the risks involved with having this many dependencies, one thing that folks might not understand is that Rust's support for dependency resolution and lock files is fantastic.
Tools like `cargo audit` can tell you statically based on the lockfile which dependencies have security vulnerabilities reported against them (but you have to run it!). And Github's https://github.com/dependabot/ will do that same thing automatically, just based on the existence of the lockfile in your repo (and will also open PRs to bump deps for you).
And as mentioned elsewhere: Cargo's dependency resolver supports providing multiple versions of a dep in different dependency subgraphs, which all but eliminates the "dependency hell" that folks expect from ecosystems like Python or the JVM. Two copies of a dep at different versions? Totally fine.
Yes. AFAIK, it evolved over time across 3+ package managers (`npm`, `yarn`, `pnpm`, etc), but the current state of that ecosystem is similar (including the behavior of dependabot).
Python's Poetry has poetry audit as well, and there are third-party tools such as Safety (Python), Nancy (Golang), etc. Lots of languages have something like this.
They support lockfiles and tools like `audit`, yes. But they do not support having multiple versions of a dependency.
Tools based on loading libraries from a *PATH (Go, Python, JVM) usually do so by grabbing the first one that they encounter that contains the appropriate symbols. That is incompatible with having multiple versions of a package.
On the other hand, Rust and node.js support this -- each in their own way. In Rust, artifact names are transparently suffixed with a hash to prevent collisions. And in node.js, almost all symbol lookups are accomplished with relative filesystem paths.
We have built something that hits on points 1, 3, 5, and 7 at https://reboot.dev/ ... but in a multi-language framework (supporting Python and TypeScript to start).
The end result is something that looks a lot like distributed, persistent, transactional memory. Rather than explicit interactions with a database, local variable writes to your state are transactionally persisted if a method call succeeds, even across process/machine boundaries. And that benefits point 7, because transactional method calls compose across team/application boundaries.
[1] Loosen Up The Functions
[3] Production-Level Releases
[5] Value Database
[7] A Language To Encourage Modular Monoliths
They are related, for sure. But one of the biggest differences is that operations affecting multiple Reboot states are transactional, unlike Azure's "entity functions".
Because multiple Azure entity functions are not updated transactionally, you are essentially always implementing the saga pattern: you have to worry about cleaning up after yourself in case of failure.
In Reboot, transactional function calls automatically roll back all state changes if they fail, without any extra boilerplate code. Our hypothesis is that that enables a large portion of an application to skip worrying about failure entirely.
Code that has side-effects impacting the outside world can be isolated using our workflow mechanism (effectively durable execution), which can themselves be encapsulated inside of libraries and composed. But we don't think that that is the default mode that developers should be operating in.
> Code that has side-effects impacting the outside world can be isolated using our workflow mechanism (effectively durable execution
Sounds very interesting!
I have been thinking about something like this for a new PL, and many kinds of side effect can actually be reversed, as if it never happened.
I have also read that exceptions can complicate control flow, disallowing some optimizations - but if they are transactional, then we can just add their reverse to the supposedly already slow error path, and enjoy our performance boost!
Every method in Reboot has a type: reader, writer, transaction, or workflow. Our retry semantics are such that any method can always be retried from the top, but for different reasons:
In readers, no state changes are possible. And in writers and transactions, retry/abort is always safe because no state changes occur until the method completes successfully.
In workflows, retry is always safe, and is in fact required due to the primitives we use to implement durable execution (we will publish more docs on this soon!). The workflow retries durably until it eventually completes, one way or another.
That means that a workflow is always the right spot to execute an external side effect: if a reader/writer/transaction want to execute a side effect, they do so by spawning a task, which is only actually spawned if the method completes successfully. And we do "effect validation" (effectively: running your method twice!) to make it very hard to write a side effect in the wrong place.
> I have also read that exceptions can complicate control flow, disallowing some optimizations - but if they are transactional, then we can just add their reverse to the supposedly already slow error path, and enjoy our performance boost!
Somewhat...! When you write a transaction method in Reboot, code that fails with an exception cannot have had a side effect on the outside world, and all state changes will vanish if the transaction aborts. So there is never any need to clean something up, unless you are using exceptions to implement control flow.
The fundamental difference between an ECS and a struct/object layout is that an ECS is column-oriented (aka columnar), while a struct/object layout is row-oriented.
Everything else about how you might query these layouts is more superficial... you can provide the same API with either layout, the same way you can in relational database systems (both layouts can be queried with SQL, but with different performance characteristics.)
Very interesting. I’m not sure if it’s actually the case, but I end up doing a ton of maintenance on BUILD files every day. I’ve become a sort of mini-master of designing target DAGs - in a way that’s upfront costly, but pays for itself over time with fast(er) rebuilds: I get very conservative with structure.
When deps are automatically gathered, do you tend to see that developers’ discipline becomes softened and code becomes highly interdependent again?
> In Guice, this would involve maintaining 23 BUILD files containing 622 individual java_library targets, each with many dependencies and exports listed. In larger monorepos, there’d be even more.