Transactional Outbox: Atomicity in Microservices with Debezium & Kafka

15 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Specter of Dual-Writes in Distributed Systems

In any non-trivial microservices architecture, the need to both persist a state change and notify other services of that change is a ubiquitous requirement. A classic example is an OrderService: when an order is created, it must insert a record into its orders table and publish an OrderCreated event to a message bus like Kafka. Other services, such as NotificationService or InventoryService, subscribe to this event to perform their respective duties.

The naive implementation looks something like this:

java
// WARNING: THIS IS AN ANTI-PATTERN
@Transactional
public void createOrder(OrderData data) {
    // 1. Save to the database
    Order order = new Order(data);
    orderRepository.save(order);

    // 2. Publish event to Kafka
    OrderCreatedEvent event = new OrderCreatedEvent(order);
    kafkaTemplate.send("order_events", event);
}

This code is a ticking time bomb. It suffers from the dual-write problem. The database save and the Kafka send are two separate, non-atomic operations. Consider the failure modes:

  • Database Commit Fails: The transaction rolls back. No order is saved, and no event is sent. This is the only 'safe' failure.
  • Database Commit Succeeds, but Kafka Send Fails: The order is saved in the database, but the event is never published. The rest of the system is oblivious to the new order. Inventory is not updated, notifications are not sent. Data consistency is lost.
  • Database Commit Succeeds, Kafka Send Succeeds, but the Service Crashes Before Returning: The caller might retry the operation, leading to a duplicate order and duplicate events, unless perfect idempotency is implemented at the API layer.
  • Traditional distributed transaction protocols like two-phase commit (2PC) are often too slow, complex, and introduce tight coupling between services and the message broker, making them an anti-pattern for scalable microservices.

    We need a mechanism that guarantees atomicity: either both the database write and the event publication conceptually succeed, or they both fail. This is precisely what the Transactional Outbox pattern provides.

    Architecture Deep Dive: The Transactional Outbox Pattern

    The pattern's brilliance lies in its simplicity. Instead of directly publishing an event to a message broker, we persist the event in an outbox table within the same local database transaction as the business entity itself.

    Here's the refined workflow:

  • Begin Transaction: A database transaction is initiated.
  • Business Logic: The service executes its business logic, creating or updating business entities (e.g., an Order).
  • Persist Event: The service creates an event object representing the change and inserts it into an outbox table.
  • Commit Transaction: The transaction is committed. This single, atomic action saves both the business data and the outbox event. The atomicity is now guaranteed by the ACID properties of the relational database.
  • At this point, the service's primary responsibility is complete. A separate, asynchronous process is now responsible for relaying the event from the outbox table to the message broker.

    mermaid
    graph TD
        A[Client] --> B(Order Service API);
        subgraph "DB Transaction (Atomic)"
            B --> C{Create Order Logic};
            C --> D[INSERT INTO orders];
            C --> E[INSERT INTO outbox_events];
        end
        subgraph "Asynchronous Relay"
            F(Database WAL) -- Debezium CDC --> G(Kafka Connect);
            G -- Publishes Event --> H(Kafka Topic);
        end
        E -- Triggers DB Log --> F;
        H --> I[Inventory Service];
        H --> J[Notification Service];

    The Message Relay: Polling vs. Change Data Capture (CDC)

    How do we get the event from the outbox table to Kafka? Two primary approaches exist:

  • Polling Publisher: A separate thread or process in your service periodically polls the outbox table for unprocessed events, publishes them to Kafka, and then marks them as processed. This works, but has drawbacks: polling latency, resource consumption from constant querying, and complexity in ensuring exactly-once processing and avoiding race conditions in a scaled-out service.
  • Change Data Capture (CDC): This is the superior, production-grade approach. We can tail the database's transaction log (e.g., PostgreSQL's Write-Ahead Log or WAL) to capture committed changes in real-time. Tools like Debezium are built for this. Debezium is a distributed platform for CDC that runs on Kafka Connect. It monitors the database transaction log, and upon seeing a new row inserted into our outbox table, it generates a corresponding event and publishes it to a Kafka topic. This is highly efficient, low-latency, and completely decouples the event publishing mechanism from our service's business logic.
  • This article will focus exclusively on the CDC approach with Debezium, as it represents the most robust and scalable implementation.

    Production Implementation: PostgreSQL, Debezium, and Spring Boot

    Let's build a complete, runnable example. Our stack will be:

    * Database: PostgreSQL

    * Message Broker: Apache Kafka

    * CDC Platform: Debezium running on Kafka Connect

    * Application: Spring Boot with JPA/Hibernate

    Step 1: Environment Setup with Docker Compose

    First, we need to configure our infrastructure. A key prerequisite for Debezium with PostgreSQL is setting wal_level = logical. This instructs PostgreSQL to write enough information to the WAL to enable logical decoding.

    docker-compose.yml

    yaml
    version: '3.8'
    
    services:
      zookeeper:
        image: confluentinc/cp-zookeeper:7.3.0
        container_name: zookeeper
        environment:
          ZOOKEEPER_CLIENT_PORT: 2181
          ZOOKEEPER_TICK_TIME: 2000
    
      kafka:
        image: confluentinc/cp-kafka:7.3.0
        container_name: kafka
        depends_on:
          - zookeeper
        ports:
          - "9092:9092"
        environment:
          KAFKA_BROKER_ID: 1
          KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
          KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
          KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
          KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
          KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
          KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
          KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
    
      postgres:
        image: debezium/postgres:14
        container_name: postgres
        ports:
          - "5432:5432"
        environment:
          - POSTGRES_USER=user
          - POSTGRES_PASSWORD=password
          - POSTGRES_DB=order_db
        command: >
          postgres -c wal_level=logical
    
      connect:
        image: debezium/connect:2.1
        container_name: connect
        ports:
          - "8083:8083"
        depends_on:
          - kafka
          - postgres
        environment:
          - BOOTSTRAP_SERVERS=kafka:29092
          - GROUP_ID=1
          - CONFIG_STORAGE_TOPIC=my_connect_configs
          - OFFSET_STORAGE_TOPIC=my_connect_offsets
          - STATUS_STORAGE_TOPIC=my_connect_statuses
          - CONNECT_KEY_CONVERTER=org.apache.kafka.connect.json.JsonConverter
          - CONNECT_VALUE_CONVERTER=org.apache.kafka.connect.json.JsonConverter

    Run docker-compose up -d to launch the environment.

    Step 2: Database Schema

    In our order_db database, we'll create two tables: orders for our business data and outbox_events for the events we want to publish.

    sql
    CREATE TABLE orders (
        id UUID PRIMARY KEY,
        customer_id VARCHAR(255) NOT NULL,
        order_total DECIMAL(10, 2) NOT NULL,
        created_at TIMESTAMPTZ NOT NULL
    );
    
    CREATE TABLE outbox_events (
        id UUID PRIMARY KEY,
        aggregate_type VARCHAR(255) NOT NULL, -- e.g., 'Order'
        aggregate_id VARCHAR(255) NOT NULL,   -- e.g., the order ID
        event_type VARCHAR(255) NOT NULL,     -- e.g., 'OrderCreated'
        payload JSONB NOT NULL,               -- The actual event data
        created_at TIMESTAMPTZ NOT NULL
    );

    Step 3: Configuring the Debezium Connector

    With Kafka Connect running, we can configure the Debezium PostgreSQL connector via its REST API on port 8083. This is where the magic happens. We'll use several advanced features.

    table.include.list: We instruct Debezium to only* monitor our public.outbox_events table.

    * transforms: We use a Single Message Transform (SMT) called outbox.event.router to reshape the raw CDC event into the actual domain event we want to publish.

    debezium-pg-connector.json

    json
    {
      "name": "outbox-connector",
      "config": {
        "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
        "plugin.name": "pgoutput",
        "database.hostname": "postgres",
        "database.port": "5432",
        "database.user": "user",
        "database.password": "password",
        "database.dbname": "order_db",
        "database.server.name": "pg-orders-server",
        "table.include.list": "public.outbox_events",
        "tombstones.on.delete": "false",
        "transforms": "outbox",
        "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
        "transforms.outbox.route.by.field": "event_type",
        "transforms.outbox.route.topic.replacement": "${routedByValue}_events",
        "transforms.outbox.table.field.event.key": "aggregate_id",
        "transforms.outbox.table.field.event.payload": "payload"
      }
    }

    Let's break down the critical transforms.outbox.* configuration:

    * transforms.outbox.type: Specifies Debezium's built-in EventRouter SMT. This SMT is designed specifically for the outbox pattern.

    * transforms.outbox.route.by.field: Tells the router to look at the event_type column in the outbox_events table to determine the destination topic.

    * transforms.outbox.route.topic.replacement: This is a powerful expression. It takes the value from the event_type field (e.g., "OrderCreated") and constructs the destination topic name, resulting in OrderCreated_events.

    * transforms.outbox.table.field.event.key: Sets the Kafka message key to the value of the aggregate_id column. This is CRITICAL for ordering guarantees, as it ensures all events for the same aggregate (e.g., the same order) land in the same Kafka partition.

    * transforms.outbox.table.field.event.payload: Instructs the router to use the payload column's content as the Kafka message's value, effectively unwrapping the event from the outbox structure.

    To register this connector, execute:

    bash
    curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" \
    localhost:8083/connectors/ -d @debezium-pg-connector.json

    Step 4: The Spring Boot Application

    Now for the application code that performs the atomic write.

    pom.xml (dependencies)

    xml
    <!-- Spring Boot, Web, Data JPA, PostgreSQL Driver -->
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
    </dependency>
    <dependency>
        <groupId>com.vladmihalcea</groupId>
        <artifactId>hibernate-types-60</artifactId>
        <version>2.21.1</version>
    </dependency>

    Note: hibernate-types is used for convenient JSONB mapping.

    JPA Entities

    java
    // Order.java
    @Entity
    @Table(name = "orders")
    public class Order {
        @Id
        private UUID id;
        private String customerId;
        private BigDecimal orderTotal;
        private Instant createdAt;
        // Constructors, getters, setters
    }
    
    // OutboxEvent.java
    @Entity
    @Table(name = "outbox_events")
    public class OutboxEvent {
        @Id
        private UUID id;
        private String aggregateType;
        private String aggregateId;
        private String eventType;
    
        @Type(JsonType.class) // from hibernate-types
        @Column(columnDefinition = "jsonb")
        private String payload; // Store payload as a JSON string
    
        private Instant createdAt;
        // Constructors, getters, setters
    }

    The Transactional Service

    This is the core of the application-side implementation. The @Transactional annotation ensures that both orderRepository.save() and outboxRepository.save() are part of the same atomic database transaction.

    java
    // OrderService.java
    @Service
    @RequiredArgsConstructor
    public class OrderService {
    
        private final OrderRepository orderRepository;
        private final OutboxRepository outboxRepository;
        private final ObjectMapper objectMapper;
    
        @Transactional
        public Order createOrder(CreateOrderRequest request) {
            // 1. Create and save the business entity
            Order order = new Order(
                UUID.randomUUID(),
                request.getCustomerId(),
                request.getOrderTotal(),
                Instant.now()
            );
            orderRepository.save(order);
    
            // 2. Create and save the outbox event within the same transaction
            OrderCreatedEvent eventPayload = new OrderCreatedEvent(
                order.getId(),
                order.getCustomerId(),
                order.getOrderTotal()
            );
    
            OutboxEvent outboxEvent = new OutboxEvent(
                UUID.randomUUID(),
                "Order",
                order.getId().toString(),
                "OrderCreated",
                convertToJson(eventPayload),
                Instant.now()
            );
            outboxRepository.save(outboxEvent);
    
            return order;
        }
    
        private String convertToJson(Object payload) {
            try {
                return objectMapper.writeValueAsString(payload);
            } catch (JsonProcessingException e) {
                throw new RuntimeException("Error serializing payload to JSON", e);
            }
        }
    }
    
    // DTO for the event payload
    record OrderCreatedEvent(UUID orderId, String customerId, BigDecimal total) {}

    When createOrder is called, a new row appears in both orders and outbox_events tables simultaneously. Debezium, tailing the WAL, will see the insertion into outbox_events, process it through the EventRouter SMT, and publish a clean OrderCreatedEvent message to the OrderCreated_events Kafka topic with the orderId as the message key. Atomicity is achieved.

    Advanced Considerations and Edge Cases

    Implementing the pattern correctly requires thinking beyond the happy path.

    1. Idempotent Consumers

    The entire Outbox/CDC pipeline provides at-least-once delivery semantics. Kafka Connect or Kafka itself could, in rare failure scenarios, redeliver a message. Therefore, all downstream consumers must be idempotent.

    * Strategy: The id of the OutboxEvent is a perfect unique identifier for the event. Consumers should track the IDs of events they have already processed. A simple approach is to store the eventId in a table (e.g., processed_events) within the same transaction as the consumer's business logic.

    java
    // In a consumer service (e.g., NotificationService)
    @Transactional
    public void handleOrderCreated(OrderCreatedEvent event, UUID eventId) {
        // Check if event has already been processed
        if (processedEventRepository.existsById(eventId)) {
            log.warn("Duplicate event received, ignoring: {}", eventId);
            return;
        }
    
        // Business logic for handling the event
        sendEmailNotification(event.customerId());
    
        // Mark event as processed
        processedEventRepository.save(new ProcessedEvent(eventId, Instant.now()));
    }

    2. Event Schema Evolution and Schema Registry

    Storing payloads as JSONB is flexible but offers no schema enforcement. In a mature system, this is risky. A change in the producer can break all consumers.

    * Solution: Integrate a Schema Registry (like Confluent Schema Registry) and use a structured format like Avro or Protobuf. Debezium has first-class support for this.

    Your Kafka Connect configuration would be updated:

    json
    {
      "name": "outbox-connector-avro",
      "config": {
        // ... other properties
        "key.converter": "io.confluent.connect.avro.AvroConverter",
        "key.converter.schema.registry.url": "http://schema-registry:8081",
        "value.converter": "io.confluent.connect.avro.AvroConverter",
        "value.converter.schema.registry.url": "http://schema-registry:8081",
        // ... transforms config
      }
    }

    This requires your producer to serialize the payload to Avro binary format (with the schema ID) before storing it in the outbox_events.payload column (which would now be a bytea type). The EventRouter SMT can handle this, but it requires more complex configuration, often involving custom converters if the payload field itself isn't raw Avro.

    3. Handling Deletes and Tombstone Events

    What happens when an event is processed and you want to clean up the outbox_events table? If you simply DELETE the row, Debezium will, by default, publish a null message (a tombstone) to the Kafka topic. This is often desired for log-compacted topics, as it signals consumers to delete their local state for that key.

    However, for an outbox, you usually don't want a tombstone. You've already processed the event. That's why we set "tombstones.on.delete": "false" in our connector config. This tells Debezium to simply ignore DELETE operations on the outbox table. A separate, periodic cleanup job can then safely purge old, processed events from the outbox_events table without generating unwanted Kafka messages.

    4. Monitoring and Operational Health

    This architecture introduces a critical component: Kafka Connect/Debezium. It must be monitored.

    * PostgreSQL Replication Slots: Debezium creates a logical replication slot on PostgreSQL. This slot prevents the WAL from being purged until Debezium has consumed it. If the Debezium connector is down for an extended period, the WAL can grow indefinitely, eventually filling the database server's disk. You must monitor the age and size of the replication slot.

    sql
        SELECT slot_name, active, wal_status, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS replication_lag_bytes
        FROM pg_replication_slots;

    * Kafka Connect Lag: Monitor the Kafka Connect REST API (/connectors/{name}/status) and JMX metrics to track connector and task status. High lag can indicate performance issues in the connector, Kafka, or the network.

    Conclusion: Robustness Through Decoupling

    The Transactional Outbox pattern, when implemented with a powerful CDC tool like Debezium, is the definitive solution to the dual-write problem in event-driven microservices. It elegantly leverages the ACID guarantees of the local database to provide an atomic guarantee for state changes and event publication.

    While it introduces operational complexity—requiring management of Kafka Connect and monitoring of database replication logs—the trade-off is a massive gain in system resilience and data consistency. By decoupling the act of event persistence from the act of event transport, you create a system that can withstand message broker outages, service crashes, and network partitions without losing a single critical event. For senior engineers building mission-critical distributed systems, mastering this pattern isn't just a best practice; it's a necessity.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles