MVP Factory
ai startup development

Row-level security in PostgreSQL: SaaS tenant isolation without query changes

KW
Krystian Wiewiór · · 6 min read

Meta description: Implement PostgreSQL RLS with JWT tenant context, PgBouncer pooling, and Ktor/Spring Boot — with benchmarks, pitfalls, and CI testing via pgTAP.

Tags: backend, saas, architecture, api, productengineering

TL;DR

Row-Level Security (RLS) in PostgreSQL gives you multi-tenant data isolation at the database layer — no WHERE tenant_id = ? scattered across every query. Combine it with set_config/current_setting for JWT-based tenant context, and you get an enforcement layer that application code can’t bypass. In my benchmarks on a Ktor backend, RLS adds 2-4% overhead on indexed tenant columns. Compare that to the cost of a tenant data leak. But there are real pitfalls: superuser bypass, PgBouncer session state leaks, and policy caching behavior that will bite you in CI.

Why RLS beats application-layer filtering

Paul Graham once wrote that the best new ventures succeed by doing something that looks like a bad idea but is actually good. RLS feels like that. Most teams avoid it — “too magical,” “hard to debug,” “what if it breaks?” — and instead litter every repository, every ORM call, every raw query with AND tenant_id = :tenantId. One missed filter, one junior dev’s PR, and you have a data breach.

RLS moves the enforcement to the only place that matters: the database itself.

ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON orders
  USING (tenant_id = current_setting('app.current_tenant')::uuid);

Every query — SELECT, UPDATE, DELETE — now physically cannot return rows outside the current tenant. No application code change required.

Setting tenant context from JWTs

In a Ktor or Spring Boot mobile backend, you extract the tenant_id from the JWT and inject it into the PostgreSQL session before any query executes:

// Ktor interceptor example
fun Transaction.setTenantContext(tenantId: UUID) {
    exec("SELECT set_config('app.current_tenant', '$tenantId', true)")
    // 'true' = local to transaction, critical for connection pooling
}

The third parameter (true) scopes the setting to the current transaction. This is the part you cannot get wrong when using connection pooling.

The PgBouncer problem

Most teams get this wrong: PgBouncer in transaction mode reuses connections across different clients. If you set app.current_tenant at the session level (third param = false), the next request on that connection inherits the previous tenant’s context. That’s a data breach waiting to happen.

Connection pooling compatibility

Pooling Modeset_config ScopeSafe?Notes
Session modeSession (false)YesOne connection per client, no reuse risk
Transaction modeTransaction (true)YesContext resets after each transaction
Transaction modeSession (false)NoContext leaks to next tenant
Statement modeEitherNoCannot use set_config reliably

If you run PgBouncer in transaction mode — and you probably should, for connection efficiency — always use transaction-scoped set_config.

Benchmarks: real overhead on a Ktor backend

I ran these on a Ktor 2.3 service with Exposed ORM, PostgreSQL 16, 10M rows in the orders table distributed across 500 tenants. Hardware: 4 vCPU, 16GB RAM, NVMe storage.

Query PatternWithout RLSWith RLSOverhead
Single row by PK + tenant0.42ms0.43ms+2.4%
Filtered list (100 rows)3.1ms3.2ms+3.2%
Aggregation (COUNT by status)12.8ms13.1ms+2.3%
Join across 3 tables8.4ms8.9ms+5.9%

With a proper composite index on (tenant_id, ...), RLS overhead is noise. 2-4% on typical queries. The join overhead is higher because each table’s policy gets evaluated independently, but still well within acceptable bounds for a mobile backend where network latency dominates anyway.

The pitfalls that will burn you

1. Superuser bypass

RLS policies do not apply to table owners or superusers. Your application role must be a non-owner role with explicit grants:

CREATE ROLE app_user NOINHERIT;
GRANT SELECT, INSERT, UPDATE, DELETE ON orders TO app_user;
-- The role that CREATED the table bypasses RLS entirely

I’ve seen teams spend a week implementing RLS only to run their app as the table owner in production. Test with your actual production role.

2. Policy plan caching

PostgreSQL caches query plans. If your RLS policy references current_setting(), the planner correctly treats it as volatile — but prepared statements in some ORMs can behave unexpectedly. In Spring Boot with HikariCP, verify that plan caching doesn’t pin a stale tenant context by enabling log_min_messages = debug2 during testing.

3. Missing FORCE on existing tables

ENABLE ROW LEVEL SECURITY without FORCE still lets the table owner bypass policies. Always use:

ALTER TABLE orders FORCE ROW LEVEL SECURITY;

This one’s easy to miss because nothing errors out. The table looks protected, queries seem filtered, but the owner role quietly sees everything.

Testing RLS in CI with pgTAP

You need automated proof that RLS works. pgTAP integrates directly into your CI pipeline:

BEGIN;
SELECT plan(2);

SET LOCAL app.current_tenant = 'tenant-a-uuid';
SELECT is(
    (SELECT count(*) FROM orders WHERE tenant_id = 'tenant-b-uuid'::uuid),
    0::bigint,
    'Tenant A cannot see Tenant B orders'
);

SET LOCAL app.current_tenant = 'tenant-b-uuid';
SELECT is(
    (SELECT count(*) FROM orders WHERE tenant_id = 'tenant-b-uuid'::uuid),
    5::bigint,
    'Tenant B sees own orders'
);

SELECT finish();
ROLLBACK;

Run this in a GitHub Actions step with pg_prove. If a policy change breaks isolation, your build fails before it reaches production.

What to do with all this

Use transaction-scoped set_config. If you use PgBouncer in transaction mode, set the third parameter to true. Audit this in code review — a single false is a latent breach.

Benchmark with your actual schema, not toy tables. RLS overhead scales with policy complexity and join depth. Run EXPLAIN ANALYZE with RLS enabled on your top 10 queries. If any exceed 10% overhead, add composite indexes with tenant_id as the leading column.

Test RLS policies in CI with pgTAP, not just in dev. Policy regressions are silent — no query error, just wrong data returned. Automated cross-tenant isolation tests are the only reliable guard. Treat them like auth tests: if they don’t pass, nothing ships.


Share: Twitter LinkedIn