Event sourcing with CQRS for mobile backends

Meta description: How to implement event sourcing with CQRS using PostgreSQL and Kafka for mobile app backends, with concrete schema designs and failure analysis.

TL;DR

Event sourcing paired with CQRS gives your mobile backend a complete audit trail, temporal queries, and the ability to rebuild read models from scratch — but it demands disciplined schema design and a clear snapshotting strategy. This post walks through an order system using PostgreSQL as the event store, Kafka for async command processing, and concrete failure mode analysis so you ship this pattern without the usual landmines.

Why event sourcing pays off non-linearly

Paul Graham writes about superlinear returns — how certain investments compound non-linearly over time. Event sourcing is one of those bets. Early on, it feels like overhead. Six months in, when compliance asks for a full audit trail, when a PM wants to know “what did the cart look like before the user removed item X,” or when you need to rebuild an entire read model after a schema migration — that upfront investment pays for itself many times over.

I’ve built a few production systems for mobile-first products, and the order domain is where event sourcing fits most naturally. Orders are inherently state machines with high regulatory and business visibility.

The event store: PostgreSQL schema design

Forget the temptation to reach for a specialized event store database. PostgreSQL handles this workload well up to tens of billions of events with proper partitioning.

CREATE TABLE event_store (
    event_id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate_id    UUID NOT NULL,
    aggregate_type  VARCHAR(128) NOT NULL,
    event_type      VARCHAR(256) NOT NULL,
    event_data      JSONB NOT NULL,
    metadata        JSONB DEFAULT '{}',
    version         INTEGER NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (aggregate_id, version)
);

CREATE INDEX idx_event_store_aggregate ON event_store (aggregate_id, version);
CREATE INDEX idx_event_store_type_time ON event_store (event_type, created_at);

The UNIQUE (aggregate_id, version) constraint is your optimistic concurrency control. Two concurrent commands on the same aggregate will conflict at the database level — no distributed locks required.

Event payload example

{
  "event_type": "OrderItemAdded",
  "aggregate_id": "ord-881a-...",
  "version": 3,
  "event_data": {
    "sku": "WIDGET-42",
    "quantity": 2,
    "unit_price_cents": 1499,
    "added_by": "user-7f3c..."
  },
  "metadata": {
    "correlation_id": "req-a1b2...",
    "device": "iOS/17.4",
    "ip": "redacted"
  }
}

CQRS: separating writes from reads

Most teams get CQRS wrong in the same way: they commit to “eventual consistency” for their read model without defining what “eventually” actually means for their mobile clients. Pin it down. For us, it was under 200ms.

Concern	Command side	Query side
Storage	`event_store` (append-only)	`order_projections` (mutable, denormalized)
Consistency	Strongly consistent per aggregate	Eventually consistent (typically <200ms)
Scaling	Vertical (single writer per aggregate)	Horizontal (read replicas, caching)
Schema changes	Never — events are immutable	Rebuild from event log anytime
Mobile API pattern	POST commands, return command ID	GET queries with `If-None-Match` / ETag

The projection table is purpose-built for your mobile API’s read patterns:

CREATE TABLE order_projections (
    order_id        UUID PRIMARY KEY,
    user_id         UUID NOT NULL,
    status          VARCHAR(32) NOT NULL,
    total_cents     BIGINT NOT NULL,
    item_count      INTEGER NOT NULL,
    last_event_version INTEGER NOT NULL,
    projected_at    TIMESTAMPTZ NOT NULL,
    snapshot        JSONB NOT NULL
);

Kafka integration: async command processing

Commands arrive via your mobile API, get validated, and publish domain events to Kafka. Projectors consume these events to update read models.

Mobile App → API Gateway → Command Handler → event_store (PG)
                                           → Kafka Topic: order.events
                                                ↓
                                           Projector → order_projections (PG)
                                           Notifier  → Push Notification Service

Write to PostgreSQL first, then publish to Kafka. Use the transactional outbox pattern: write events and an outbox row in a single PG transaction, then a poller or CDC (Debezium) pushes to Kafka. This avoids dual-write inconsistencies, which is the number one failure mode I’ve seen in production event-sourced systems. Do not skip this step.

Snapshotting for read-model performance

Without snapshots, rebuilding an aggregate with 500 events means replaying all 500 on every command. The numbers speak for themselves:

Events per aggregate	Replay without snapshot	Replay with snapshot (every 50)
50	~4ms	~4ms
200	~16ms	~4ms
1,000	~82ms	~5ms
10,000	~810ms	~6ms

Benchmarked on PostgreSQL 16, m6i.xlarge, JSONB deserialization in Kotlin/JVM.

CREATE TABLE aggregate_snapshots (
    aggregate_id    UUID NOT NULL,
    aggregate_type  VARCHAR(128) NOT NULL,
    version         INTEGER NOT NULL,
    state           JSONB NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    PRIMARY KEY (aggregate_id, version)
);

Snapshot every N events (50 is a solid default). On aggregate load: fetch the latest snapshot, then replay only the events after it.

Failure mode analysis

Failure	Impact	Mitigation
Kafka broker down	Projections stall, reads serve stale data	Outbox pattern + retry; mobile client shows cached state with staleness indicator
Duplicate event publish	Projector processes event twice	Idempotent projectors keyed on `(aggregate_id, version)`
Projection bug ships	Corrupted read model	Rebuild projection from event log — this is the whole reason you did event sourcing
Schema evolution	Old events lack new fields	Upcasters transform events at read time; never mutate stored events
Snapshot corruption	Aggregate loads incorrect state	Fallback to full replay; validate snapshot checksums

What actually matters

Start with the outbox pattern, not direct Kafka writes. Dual-write bugs are subtle and devastating. A single PostgreSQL transaction for events + outbox row eliminates an entire class of consistency failures. I learned this the hard way.

Snapshot aggressively but lazily. Set a threshold (50 events), generate snapshots asynchronously after command processing, and always keep the full event log as your source of truth. The ability to rebuild is worth more than the storage cost.

Design projections for your mobile API, not your domain model. Each screen or endpoint should have a purpose-built projection. The whole point of CQRS is that reads and writes have different shapes — lean into it.

Event sourcing is not free. It adds real complexity to your system, and I wouldn’t use it for a simple CRUD app. But in domains with audit requirements, complex state transitions, and mobile clients that need fast denormalized reads, it has paid off for me every time. The cost is upfront; the value compounds.

TAGS: architecture, backend, kotlin, api, microservices