More

stuhood · 2025-12-15T15:34:22 1765812862

Yes, thank you for your hard work! We rebased recently, and we'll likely talk about those improvements as part of our `0.21.x` release.

stuhood · 2025-08-14T02:23:26 1755138206

An interesting exercise would be to compare this with the (confusingly similarly named) `fsst` string compression strategy: https://github.com/cwida/fsst

stuhood · 2025-07-22T03:51:56 1753156316

Thanks for the questions!

    After reading I don’t get how locks held in memory affect WAL shipping.
    WAL reader reads it in a single thread, updates in-memory data structures
    periodically dumping them on disk. Perhaps you want to read one big
    instruction from WAL and apply it to many buffers using multiple threads?

We currently use an un-modified/generic WAL entry, and don't implement our own replay. That means we don't control the order of locks acquired/released during replay: and the default is to acquire exactly one lock to update a buffer.

But as far as I know, even with a custom WAL entry implementation, the maximum in one entry would still be ~8k, which might not be sufficient for a multi-block atomic operation. So the data structure needs to support block-at-a-time atomic updates.

    I guess your implementation generates a lot of dead tuples during
    compaction. You clearly fighting PG here. Could a custom storage
    engine be a better option?

`pg_search`'s LSM tree is effectively a custom storage engine, but it is an index (Index Access Method and Custom Scan) rather than a table. See more on it here: https://www.paradedb.com/blog/block_storage_part_one

LSM compaction does not generate any dead tuples on its own, as what is dead is controlled by what is "dead" in the heap/table due to deletes/updates. Instead, the LSM is cycling blocks into and out of a custom free space map (that we implemented to reduce WAL traffic).

stuhood · 2025-04-09T19:49:02 1744228142

Thanks for reporting this! I'm having trouble finding the link you are referring to though. Would you mind sharing a link to the file/page containing the dead link?

stuhood · on Feb 23, 2025

I think that `inverting` also subsumes async functions/values, which is pretty neat!

In the case where asynchrony was actually necessary, it seems like a great alternative to function coloring.

But whether you should actually use it for something like their `sub_min` example is highly dependent on how good the performance of their implementation is. Creating a graph of references rather than making two passes over an array of integers is not clearly faster ... or clearer, for that matter.

stuhood · on Jan 15, 2025

When it comes to understanding the risks involved with having this many dependencies, one thing that folks might not understand is that Rust's support for dependency resolution and lock files is fantastic.

Tools like `cargo audit` can tell you statically based on the lockfile which dependencies have security vulnerabilities reported against them (but you have to run it!). And Github's https://github.com/dependabot/ will do that same thing automatically, just based on the existence of the lockfile in your repo (and will also open PRs to bump deps for you).

And as mentioned elsewhere: Cargo's dependency resolver supports providing multiple versions of a dep in different dependency subgraphs, which all but eliminates the "dependency hell" that folks expect from ecosystems like Python or the JVM. Two copies of a dep at different versions? Totally fine.

Threadbare · on Jan 15, 2025

Doesn't node npm also do similar?

stuhood · on Jan 15, 2025

Yes. AFAIK, it evolved over time across 3+ package managers (`npm`, `yarn`, `pnpm`, etc), but the current state of that ecosystem is similar (including the behavior of dependabot).

robertlagrant · on Jan 15, 2025

Python's Poetry has poetry audit as well, and there are third-party tools such as Safety (Python), Nancy (Golang), etc. Lots of languages have something like this.

stuhood · on Jan 15, 2025

They support lockfiles and tools like `audit`, yes. But they do not support having multiple versions of a dependency.

Tools based on loading libraries from a *PATH (Go, Python, JVM) usually do so by grabbing the first one that they encounter that contains the appropriate symbols. That is incompatible with having multiple versions of a package.

On the other hand, Rust and node.js support this -- each in their own way. In Rust, artifact names are transparently suffixed with a hash to prevent collisions. And in node.js, almost all symbol lookups are accomplished with relative filesystem paths.

robertlagrant · on Jan 15, 2025

True!

hulitu · on Jan 16, 2025

> Tools like `cargo audit` can tell you statically based on the lockfile which dependencies have security vulnerabilities reported against them

known security vulnerabilities. If someone compromises your cargo repository (see npm for examples) all your safety is gone.

stuhood · on Jan 8, 2025

We have built something that hits on points 1, 3, 5, and 7 at https://reboot.dev/ ... but in a multi-language framework (supporting Python and TypeScript to start).

The end result is something that looks a lot like distributed, persistent, transactional memory. Rather than explicit interactions with a database, local variable writes to your state are transactionally persisted if a method call succeeds, even across process/machine boundaries. And that benefits point 7, because transactional method calls compose across team/application boundaries.

[1] Loosen Up The Functions [3] Production-Level Releases [5] Value Database [7] A Language To Encourage Modular Monoliths

masfuerte · on Jan 8, 2025

This seems to be similar to Azure Durable Functions:

https://learn.microsoft.com/en-us/azure/azure-functions/dura...

stuhood · on Jan 8, 2025

They are related, for sure. But one of the biggest differences is that operations affecting multiple Reboot states are transactional, unlike Azure's "entity functions".

Because multiple Azure entity functions are not updated transactionally, you are essentially always implementing the saga pattern: you have to worry about cleaning up after yourself in case of failure.

In Reboot, transactional function calls automatically roll back all state changes if they fail, without any extra boilerplate code. Our hypothesis is that that enables a large portion of an application to skip worrying about failure entirely.

Code that has side-effects impacting the outside world can be isolated using our workflow mechanism (effectively durable execution), which can themselves be encapsulated inside of libraries and composed. But we don't think that that is the default mode that developers should be operating in.

gf000 · on Jan 9, 2025

Could you expand on this part?

> Code that has side-effects impacting the outside world can be isolated using our workflow mechanism (effectively durable execution

Sounds very interesting!

I have been thinking about something like this for a new PL, and many kinds of side effect can actually be reversed, as if it never happened.

I have also read that exceptions can complicate control flow, disallowing some optimizations - but if they are transactional, then we can just add their reverse to the supposedly already slow error path, and enjoy our performance boost!

stuhood · on Jan 10, 2025

Every method in Reboot has a type: reader, writer, transaction, or workflow. Our retry semantics are such that any method can always be retried from the top, but for different reasons:

In readers, no state changes are possible. And in writers and transactions, retry/abort is always safe because no state changes occur until the method completes successfully.

In workflows, retry is always safe, and is in fact required due to the primitives we use to implement durable execution (we will publish more docs on this soon!). The workflow retries durably until it eventually completes, one way or another.

That means that a workflow is always the right spot to execute an external side effect: if a reader/writer/transaction want to execute a side effect, they do so by spawning a task, which is only actually spawned if the method completes successfully. And we do "effect validation" (effectively: running your method twice!) to make it very hard to write a side effect in the wrong place.

https://docs.reboot.dev/develop/side_effects has some more details on our approach to side effects, but feel free to follow up in our discord too: https://discord.com/invite/cRbdcS94Nr

----

> I have also read that exceptions can complicate control flow, disallowing some optimizations - but if they are transactional, then we can just add their reverse to the supposedly already slow error path, and enjoy our performance boost!

Somewhat...! When you write a transaction method in Reboot, code that fails with an exception cannot have had a side effect on the outside world, and all state changes will vanish if the transaction aborts. So there is never any need to clean something up, unless you are using exceptions to implement control flow.

stuhood · on Nov 16, 2024

Not anymore: PEX now supports including an interpreter in the package: https://docs.pex-tool.org/scie.html

And unfortunately, PyOxidizer went into maintenance mode: https://gregoryszorc.com/blog/2024/03/17/my-shifting-open-so...

stuhood · on Dec 7, 2023

The fundamental difference between an ECS and a struct/object layout is that an ECS is column-oriented (aka columnar), while a struct/object layout is row-oriented.

Everything else about how you might query these layouts is more superficial... you can provide the same API with either layout, the same way you can in relational database systems (both layouts can be queried with SQL, but with different performance characteristics.)

gefjon · on Dec 7, 2023

Are you familiar with any existing ECS implementations which expose a SQL interface?

stuhood · on Oct 25, 2022

> I’m more familiar with Bazel. Though I don’t “love it”, I spend plenty of time with it every day. Gazelle like rules for java would be nice.

We'd love your feedback on Pants' support for Java. Pants' "dependency inference" eliminates most BUILD file maintenance: https://blog.pantsbuild.org/automatically-unlocking-concurre...

dfee · on Oct 25, 2022

Very interesting. I’m not sure if it’s actually the case, but I end up doing a ton of maintenance on BUILD files every day. I’ve become a sort of mini-master of designing target DAGs - in a way that’s upfront costly, but pays for itself over time with fast(er) rebuilds: I get very conservative with structure.

When deps are automatically gathered, do you tend to see that developers’ discipline becomes softened and code becomes highly interdependent again?

> In Guice, this would involve maintaining 23 BUILD files containing 622 individual java_library targets, each with many dependencies and exports listed. In larger monorepos, there’d be even more.

I’m all too familiar.