MVP Factory
ai startup development

JSONB indexing: GIN vs expression indexes for mobile APIs

KW
Krystian Wiewiór · · 5 min read

Meta description: Learn why your PostgreSQL JSONB queries hit sequential scans at scale. Compare GIN, trigram, and expression indexes with EXPLAIN ANALYZE walkthroughs for mobile backends.

Tags: backend, api, architecture, mobile, cleanarchitecture


TL;DR

JSONB columns are not a free schema. GIN indexes only accelerate containment (@>), not extraction (->>). Expression indexes beat GIN for equality lookups by 10-50x at scale. Wide JSONB rows trigger TOAST decompression that silently destroys p99 latency. The fix is a hybrid model: normalize your hot query paths, keep JSONB for truly flexible data, and index surgically.


The schemaless trap

Every mobile backend team I’ve worked with has done this: ship v1 with a metadata JSONB column to “stay flexible,” then watch query performance degrade superlinearly as the dataset grows. It doesn’t scale linearly with row count — it compounds. A query that runs in 2ms at 100K rows doesn’t take 20ms at 1M rows. It takes 200ms, because sequential scans over TOAST-compressed JSONB hit a wall that no amount of connection pooling will fix.

Most teams assume CREATE INDEX ON events USING GIN (metadata) solves everything. It does not.

GIN index internals: what actually gets indexed

A GIN (Generalized Inverted Index) builds a posting tree — a B-tree of keys mapping to sorted lists of row pointers. For JSONB, each key-value pair becomes an entry. The part that trips people up: GIN indexes support containment operators (@>, ?, ?|, ?&), not extraction operators (->>, ->, #>>).

This means:

Query PatternUses GIN Index?Operator
WHERE metadata @> '{"status":"active"}'Yes@> containment
WHERE metadata->>'status' = 'active'No->> extraction
WHERE metadata ? 'status'Yes? key existence
WHERE metadata->>'name' LIKE '%john%'No->> + LIKE

Look at the EXPLAIN ANALYZE on a 2M-row table:

-- GIN index EXISTS, but this query IGNORES it:
EXPLAIN ANALYZE SELECT * FROM events
WHERE metadata->>'status' = 'active';

-- Seq Scan on events  (cost=0.00..285431.00 rows=10000 width=312)
--   Filter: ((metadata ->> 'status') = 'active')
--   Rows Removed by Filter: 1990000
--   Execution Time: 1842.331 ms

-- Rewrite using containment:
EXPLAIN ANALYZE SELECT * FROM events
WHERE metadata @> '{"status": "active"}';

-- Bitmap Heap Scan on events  (cost=92.40..12431.20 rows=10000 width=312)
--   Recheck Cond: (metadata @> '{"status": "active"}'::jsonb)
--   ->  Bitmap Index Scan on idx_events_metadata_gin
--   Execution Time: 18.442 ms

100x difference. Same data, same index — just the wrong operator.

Expression indexes: the surgical fix

For high-frequency equality lookups on known paths, expression indexes crush GIN:

CREATE INDEX idx_events_status ON events ((metadata->>'status'));

EXPLAIN ANALYZE SELECT * FROM events
WHERE metadata->>'status' = 'active';

-- Index Scan using idx_events_status  (cost=0.43..8421.10 rows=10000 width=312)
--   Execution Time: 4.218 ms

Four milliseconds. No query rewrite needed. The tradeoff is you need one index per path, and schema flexibility decreases. But honestly, 80% of JSONB queries in most mobile backends I’ve built hit fewer than five paths. You probably know which fields they are right now.

Trigram indexes for LIKE queries

When your mobile app needs fuzzy search inside JSONB text fields:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_events_name_trgm ON events
USING GIN ((metadata->>'name') gin_trgm_ops);

-- Now this works:
SELECT * FROM events WHERE metadata->>'name' LIKE '%john%';
-- Bitmap Index Scan on idx_events_name_trgm
-- Execution Time: 12.108 ms (vs 1900ms sequential)

The TOAST problem nobody talks about

PostgreSQL TOAST-compresses any value exceeding ~2KB. Wide JSONB rows — common when teams dump entire API responses into a column — mean every row access triggers decompression. This is where p99 latency goes to die:

JSONB Row SizeAvg Read Latencyp99 Latency
500 bytes0.08 ms0.4 ms
4 KB (TOASTed)0.31 ms2.1 ms
32 KB (TOASTed)1.42 ms18.7 ms

Even with a perfect index, fetching the row still decompresses the entire JSONB value. The jsonpath expressions in PostgreSQL 17 (@@ operator) improve predicate pushdown but do not eliminate TOAST overhead on wide rows.

Migration pattern: hybrid model without downtime

-- Step 1: Add normalized columns
ALTER TABLE events ADD COLUMN status TEXT;
ALTER TABLE events ADD COLUMN event_type TEXT;

-- Step 2: Backfill (batched, non-blocking)
UPDATE events SET status = metadata->>'status',
  event_type = metadata->>'type'
WHERE id BETWEEN $1 AND $2;

-- Step 3: Add indexes on normalized columns
CREATE INDEX CONCURRENTLY idx_events_status_norm ON events (status);

-- Step 4: Update application to write both paths
-- Step 5: Migrate reads, then drop JSONB extraction queries
-- Step 6: Remove extracted keys from JSONB (optional, saves TOAST)

Use CREATE INDEX CONCURRENTLY — it won’t lock the table. Run backfills in batches of 10K-50K rows to avoid long transactions.

What to do about it

Grep your codebase for ->> used with GIN-indexed JSONB columns. Every one of those is a sequential scan waiting to happen. Rewrite to @> or add expression indexes.

Check your TOAST exposure. Run SELECT avg(pg_column_size(metadata)) FROM your_table. If the average exceeds 2KB, your p99 latency is paying a decompression tax on every read. Normalize or trim.

Adopt the hybrid model before you think you need it. Keep JSONB for genuinely unstructured data — user preferences, feature flags, third-party webhook payloads. Normalize any field that appears in a WHERE, JOIN, or ORDER BY clause. The migration is mechanical and you can do it without downtime.


Share: Twitter LinkedIn