Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> each consumer can perform their own targeted lookups as needed

that puts you into tricky race condition territory, the data targeted by an event might have changed (or be deleted) between the time it was emitted and the time you're processing it. It's not always a problem, but you have to analyse if it could be every time.

It also means that you're losing information on what this event actually represents: looking at an old event you wouldn't know what it actually did, as the data has changed since then.

It also introduces a synchronous dependency between services: your consumer has to query the service that dispatched the event for additional information (which is complexity and extra load).

Ideally you'd design your event so that downstream consumers don't need extra information, or at least the information they need is independent from the data described by the event: eg a consumer needs the user name to format an email in reaction to the "user_changed_password" event? No problem to query the service for the name, these are independent concepts, updates to these things (password & name) can happen concurrently but it doesn't really matter if a race condition happens



There should be some law that says strictly serialized process should never be broken into discreet services. Distributed locks and transactions are hell.


The best way to avoid distributed locks and transactions is to manually do the work. For example, instead of doing a distributed lock on two accounts when transferring funds, you might do this (which is the same as a distributed transaction, without the lock):

1. take money from account A

2. if failed, put money back into account A

3. put money into account B

4. if failed, put money back into account A

In other words, perform compensating actions instead of doing transactions.

This also requires that you have some kind of mechanism to handle an application crash between 2 and 3, but that is something else entirely. I've been working on this for a couple of years now and getting close to something really interesting ... but not quite there yet.


> This also requires that you have some kind of mechanism to handle an application crash between 2 and 3, but that is something else entirely

Like a distributed transaction or lock. This is the entire problem space, your example above is very naive.


You _can_ use them, but even that won't save you. If you take a distributed lock, and crash, you still have the same issue. What I wrote is essentially a distributed transaction and what happens in a distributed transaction with "read uncommitted" isolation levels. A database that supports this handles all the potential failure cases for you. However, that doesn't magically make the errors disappear or may not be even fully/properly handled (e.g., a node running out of disk space in the middle of the transaction) by your code/database. It isn't naive, it is literally a pseudo-code of what you are proposing.


It is not a DTC, despite the DB world using ATM's wrongly as an example for decades, follow their model for actually moving money and you would be sent to jail.

"Accountants don't use erasers"

The ledger is the source of truth in accounting, if you use event streams as the source of truth you can gain the same advantages and disadvantages.

An event being past tense ONLY is a very critical part.

There are lots of way to address this all with their own tradeoffs and it is a case of the least worst option for a particuar context.

but over-reliance on ACID DBMSs and the false claim that ATMs use DTC really does hamper the ability to teach these concepts.


The better version of this is sagas, which is a kind of a simplified distributed transaction. If you do this without actually using sagas, you can really mess this up.

E.g. you perform step 2, but fail to record it. When resuming from crash, you perform step 2 again. Now A has too much money in their account.


Sagas are great for this and should be used when able, IMHO. It's still possible to mess it up, as there are basically two guarantees you can make in a distributed system: at-least-once, and at-most-once. Thus, you will either need to accept the possibility of lost messages or be able to make your event consumers idempotent to provide an illusion of exactly-once.

Sagas require careful consideration to make sure you can provide one of these guarantees during a "commit" (the order in which you ACK a message, send resulting messages, and record your own state -- if necessary) as these operations are non-atomic. If you mess it up, you can end up providing the wrong level of guarantee by accident. For example:

1. fire resulting messages

2. store state (which includes ids of processed messages for idempotency)

3. ACK original event

In this case, you guarantee that you will always send results at-least-once if a crash happens between 1&2. Once we get past 2, we provide exactly-once semantics, but we can only guarantee at-least-once. If we change the order to:

1. store state

2. fire messages

3. ACK original message

We now only provide at-most-once semantics. In this case, if we crash between 1&2, when we resume, we will see that we've already handled the current message and not process it again, despite never having sent any result yet. We end up with at-most-once if we swap 1&3 as well.

So, yes, Sagas are great, but still pretty easy to mess up.


Here's how you can do this. You have 3 accounts: A, B and in-flight.

1. Debit A, Credit in-flight.

2. Credit B, Debit in-flight.

If 1. fails, nothing happened and everything is consistent.

If 2. fails, you know (because you have money left on in-flight), and you can retry later, or refund A.

This way at no point your total balance decreases, so everything is always consistent.


It can fail at your commas in 1&2, then you are just as broke as everyone else.

This isn't an easy-to-solve problem when it comes to distributed computing.


It should be an atomic transaction, double-entry style, so it can’t fail between commas.

The important thing is not having money go missing.


The only way that is possible is if the tables exist on the same database server. Otherwise, you are right back at distributed transaction problems.


That's the whole point of double-entry. You don't split the entries.


Then this is off-topic af. We are talking about distributed systems here.


> Distributed locks and transactions are hell.

Which distributed transaction scenario have you ever dealt with that wasn't correctly handled by a two-phase commit or at worst a three-phase commit?


The scenario where one of the processes crashes cannot be handled by any number of commit phases.


Your event data must not be mutable.

That's kind of the first rule of any event-based system. It doesn't really matter the architecture, if you decide to name the things "event", everybody's head will break if you make them mutable.

If you decide to add mutation there in some way, you will need to rewrite the event stream, replacing entire events.


It's not about mutability of events, but about mutating the underlying data itself. If the event only says "customer 123 has been updated", and a consumer of that event goes back to the source of the event to query the full state of that customer 123, it may have been updated again (or even deleted) since the event was emitted. Depending on the use case, this may or may not be a problem. If the consumer is only interested in the current state of the data, this typically is acceptable, but if it is needed in the complete history of changes, it is not.


Making a wacky 2-steps announcement protocol doesn't change the nature of your events.

If the consumer goes to your database and asks "what's the data for customer 123 at event F52A?" it better always get back the same data or "that event doesn't exist, everything you know is wrong".


> ... at event F52A

Sure, if the database supports this sort of temporal query, then you're good with such id-only events. But that's not exactly the default for most databases / data models.


I'm understanding what you have isn't really "events", but some kind of "notifications".

Events are part of a stream that define your data. The stream doesn't have to be complete, but if it doesn't make sense to do things like buffer or edit it, it's probably something else and using that name will mislead people.


> (...) and a consumer of that event goes back to the source of the event to query the full state of that customer 123, it may have been updated again (or even deleted) since the event was emitted.

So the entity was updated. What's the problem?


I understand gp to say that the database data is changed not the data in the event.

Surely, some data needs to change if a password is updated?


> an event might have changed (or be deleted) between the time it was emitted

Then I would argue it isn't a meaningful event. If some attributes of the event could become "out of date" such that the logical event risks invalidation in the future, you have probably put too much data into the event.

For example, including a user's preferences (e.g., display name) in a logon event - while convenient - means that if those preferences ever change, the event is invalid to reference for those facts. If you only include the user's id, your event should be valid forever (for most rational systems).

> your consumer has to query the service that dispatched the event

An unfortunate but necessary consequence of integrating multiple systems together. You can't take out a global database lock for every event emitted.


https://medium.com/geekculture/the-eight-fallacies-of-distri...

Also, CAP is a thing too.

Sure, try to keep transactions single-node. If you can't let me give you the advice of people FAR smarter than I:

- DO NOT DESIGN YOUR OWN DISTRIBUTED TRANSACTION SERVICE

Use a vetted one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: