MVP Factory
ai startup development

Stop using OFFSET for pagination — it won't scale

KW
Krystian Wiewiór · · 5 min read

Meta description: LIMIT/OFFSET pagination degrades linearly with dataset size. Keyset cursor pagination gives you consistent O(1) performance at any page depth.

Tags: backend, api, rest, architecture, graphql

TL;DR

LIMIT/OFFSET pagination forces your database to scan and discard N rows before returning results, so page 10,000 is orders of magnitude slower than page 1. Keyset (cursor) pagination uses indexed column seeks to jump directly to the next result set, delivering consistent performance regardless of depth. If your dataset exceeds a few thousand rows and users (or integrations) page deeply, you need cursor pagination. Full stop.


The hidden cost of OFFSET

Most teams assume the database “skips” rows efficiently when using OFFSET. It doesn’t. When you run:

SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;

PostgreSQL scans 100,020 rows, discards the first 100,000, and returns the remaining 20. Every discarded row costs CPU and I/O. The deeper you page, the worse it gets. Performance degrades linearly with the offset value.

In my experience, this is the single most common API performance issue in backend services that have outgrown their MVP phase. It starts small. Pagination works fine with 5,000 rows. Then you hit 500,000 rows, and your page-50 request takes 2 seconds. By the time you reach millions of rows, deep pages are timing out entirely.

What EXPLAIN ANALYZE reveals

-- OFFSET approach (page 5000, 20 rows per page)
EXPLAIN ANALYZE
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;

-- Result: Seq Scan on orders
-- Planning Time: 0.08 ms
-- Execution Time: 112.45 ms  (scanned 100,020 rows)
-- Keyset approach (same logical page)
EXPLAIN ANALYZE
SELECT * FROM orders WHERE id > 100000 ORDER BY id LIMIT 20;

-- Result: Index Scan using orders_pkey
-- Planning Time: 0.07 ms
-- Execution Time: 0.12 ms  (scanned 20 rows)

That’s ~900x faster at this depth. The keyset query performs an index seek directly to the starting point, then reads exactly 20 rows. No scanning, no discarding. Consistent O(log n + k) performance where k is your page size.

OFFSET vs cursor: the full comparison

CriteriaLIMIT/OFFSETKeyset cursor
Performance at depthDegrades linearly with offsetConstant regardless of position
DB operationSequential scan + discardIndex seek + limited scan
CPU/IO costGrows with page numberFixed per request
Duplicate/skipped rows on INSERTYes — rows shift between pagesNo — cursor is stable
Arbitrary page jumpingSupported (page 1, 50, 200)Not natively supported
Implementation complexityTrivialModerate
Suited for infinite scroll / feedsPoorExcellent
Suited for numbered page UIsAdequate for small datasetsRequires adaptation

Implementing cursor pagination

Basic REST API pattern

Your API response includes a next_cursor that the client passes back:

{
  "data": [...],
  "pagination": {
    "next_cursor": "eyJpZCI6MTAwMDIwfQ==",
    "has_more": true
  }
}

The cursor is typically a Base64-encoded representation of the last row’s sort key. On the server side:

// Ktor route example
get("/orders") {
    val cursor = call.parameters["cursor"]
        ?.let { Base64.decode(it) }
        ?.let { Json.decodeFromString<Cursor>(it) }
    
    val orders = db.orders
        .run { if (cursor != null) where { id greater cursor.lastId } else this }
        .orderBy(Orders.id)
        .limit(20)
        .toList()
    
    val nextCursor = orders.lastOrNull()?.let { 
        Base64.encodeToString(Cursor(lastId = it.id))
    }
    
    call.respond(PagedResponse(orders, nextCursor))
}

Spring Boot follows the same pattern: decode the cursor, apply a WHERE clause, encode the next cursor from the last result.

The composite cursor edge case

When sorting by a non-unique column (e.g., created_at), you need a composite cursor that includes a tiebreaker:

-- Sorting by created_at (non-unique) + id (unique tiebreaker)
SELECT * FROM orders
WHERE (created_at, id) > (:last_timestamp, :last_id)
ORDER BY created_at ASC, id ASC
LIMIT 20;

This tuple comparison leverages PostgreSQL’s row-value comparison and can use a composite index on (created_at, id) efficiently. Without the tiebreaker, rows with identical timestamps get skipped or duplicated.

GraphQL connections

The Relay Connection specification was built for cursor pagination. If you’re building a GraphQL API, this is the standard approach:

query {
  orders(first: 20, after: "eyJpZCI6MTAwMDIwfQ==") {
    edges {
      node { id total createdAt }
      cursor
    }
    pageInfo { hasNextPage endCursor }
  }
}

When OFFSET is still fine

I’m not dogmatic about this. OFFSET works when your dataset is small (under ~10,000 rows), users never page past the first few pages, and you need numbered page navigation. Admin dashboards with modest data volumes are a classic example. The real problem is when teams pick OFFSET as a default and never revisit the decision as data grows.

What to do now

Run EXPLAIN ANALYZE on your paginated queries at realistic offsets (page 100, 500, 1000). If execution time grows linearly with the page number, you’ve got a problem that will only get worse.

For any API-facing or feed-style pagination, default to keyset cursors. Encode composite cursors (sort column + unique tiebreaker) as opaque Base64 tokens so you handle the non-unique sort column case from day one.

Reserve OFFSET for admin UIs with small, bounded datasets. If you need page-number navigation on larger datasets, consider a hybrid: keyset pagination under the hood with a separate count query (cached) for the page count display.


Share: Twitter LinkedIn