Yordis Prieto Logo

The Outbox Pattern

First, you save an order to your database. Next, you publish an OrderPlaced event to a Broker. These are two operations on two different systems.

One of these operations will fail at some point.

If the save succeeds but the publish fails, the order exists but nobody knows about it. The shipping service never picks it up. The customer never gets a confirmation email. The order is a ghost.

If you flip the order of operations and publish first, the outcome is even worse. The event goes out. Consumers start acting on it. But if the save then fails, now the world thinks an order exists that does not actually exist. The shipping service packs a box for an order that was never recorded.

The Obvious Fixes Don't Work

A common first instinct is to wrap both operations in a try/catch and use a retry loop on failure. A retry gives at-least-once processing, not atomicity. It also doesn’t guarantee that both operations will finish.

Furthermore, a retry only works if the process is still running. If the Database commits, but the process crashes, the retry won’t happen. Now what?

We can try to send a message before the database transaction commits. If the Broker receives the event before the transaction finishes, a fast consumer might pull it and query the database too soon. That can result in outdated data or missing results. Creating a race condition that makes it hard to manage inconsistent states in your systems. That is, assuming that the database transaction will succeed.

Lastly, the final boss, two-phase commit[3]. The Database and Broker will either commit or roll back together. In practice, that means both systems block while waiting for the other. Latency goes up. Availability goes down. Most Brokers don’t support two-phase commit at all. Also, those that use two-phase commit slow down your throughput.

We have Problems++ everywhere.

The problem isn't that you need both operations to happen at the same time. It's that you're treating two systems as one. The Database is reliable. The Broker is reliable. But the gap between them is where messages go to die.

Close the Gap

The Outbox pattern[1][2] works as follows:

  1. Your service opens a database transaction.
  2. The service saves both the business data and the outbox event. It inserts both rows in a single commit to ensure consistency.
  3. A Relay process reads new rows from the outbox table and publishes them to the Broker.
  4. Once the Broker confirms receipt, the Relay updates the outbox row to "sent."

The Broker might be down. The Relay can crash and restart. The event stays in the outbox until it's sent.

Here's what that transaction may look like:

BEGIN;

INSERT INTO orders (id, customer_id, total)
VALUES ('ord-8821', 'cust-441', 59.38);

INSERT INTO outbox (id, type, source, partitionkey, sequence, data, time, metadata)
VALUES (
  gen_random_uuid(),                     -- id
  'com.myshop.orders.v1.OrderPlaced',    -- type
  'orders/ord-8821',                     -- source
  'ord-8821',                            -- partitionkey
  DEFAULT,                               -- sequence (auto-increment)
  '{"orderId": "ord-8821"}',             -- data
  now(),                                 -- time
  '{"traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"}' -- metadata
);

COMMIT;

One transaction. One database. One commit. All or nothing.

Try It

Toggle the Outbox pattern. Then trigger a Broker failure to see the difference. You can also access this widget as a standalone tool.

ServiceDatabaseBrokerConsumerWrite to databasePublish to broker1INSERT order2OK3publish(OrderPlaced)4ack5poll6deliver(OrderPlaced)7ack

Write to database

1Service saves the order to the database.
2Database confirms the write.

Publish to broker

3Service publishes the event to the broker.
4Broker acknowledges the event.
5Consumer polls the broker for new events.
6Broker delivers the event to the consumer.
7Consumer acknowledges processing.

Found an issue or have an improvement? Drop me an email

When to Reach for It

  • Database + Broker. You write to a database and need to publish events to a Broker, and fire-and-forget isn't acceptable.

What It Costs

  • Operational overhead. Monitor the Relay process. Set up alerts and a restart plan.
  • Storage. The system takes up space with Outbox rows until someone cleans them up.
  • Latency. The relay polls on an interval. Events don't publish the instant they're written.

The Principle

You cannot commit to two different systems at the same time. The moment you write to a database and publish to a Broker as separate operations, you have a gap. A failure can always occur between these two steps.

The gap between your database and your broker is real. The outbox just makes sure nothing falls into it.

References

  1. [1]
    microservices.io
  2. [2]
  3. [3]
  4. [4]
  5. [5]
  6. [6]
    wolverine.netlify.app
  7. [7]
    masstransit.io

Talk to you later 🐊 alligator.

Stay in touch

Stay updated with my latest posts and project updates. Follow me on X to connect and discuss software development.