Server-driven paywall A/B testing that moves revenue
Server-driven paywall A/B testing that moves revenue
TL;DR
Most teams A/B test paywall conversion rate, ship the “winner,” and watch revenue stay flat. Conversion rate without discount depth is a vanity metric. This post covers the full architecture: RevenueCat custom placements for server-driven paywalls, feature flag integration for cohort targeting, platform-specific exit offer triggers on Android and iOS, and the statistical framework that tests revenue-per-user instead of conversion rate. Includes sample size math, sequential testing to avoid peeking problems, and cohort isolation for small-audience apps.
The architecture: server-driven paywalls
You want to change what users see on the paywall (offer tiers, copy, discount depth, exit intent triggers) without shipping an app update. The pipeline:
[RevenueCat Offerings + Custom Placements]
↓
[Feature Flag Service (LaunchDarkly / Statsig)]
↓ cohort assignment + payload
[Client SDK fetches placement config]
↓
[Render paywall variant → track events → measure LTV]
RevenueCat’s custom placements let you define named paywall surfaces (main_paywall, exit_offer, upgrade_nudge) and map each to a specific offering remotely. Combine this with a feature flag service that assigns users to experiment cohorts, and you control the entire presentation layer from your dashboard.
The client code stays thin. On Android with Kotlin:
val placement = Purchases.sharedInstance.getCustomPlacement("exit_offer")
val offering = placement?.availablePackages ?: return
// Render server-defined paywall variant
No hardcoded product IDs. No app update to test a new discount tier.
Exit offers: platform-specific triggers
Exit offers fire when a user signals intent to leave the paywall. Detection differs quite a bit across platforms.
| Signal | Android | iOS |
|---|---|---|
| Back navigation | OnBackPressedCallback via BackHandler | UIAdaptivePresentationControllerDelegate.presentationControllerDidAttemptToDismiss |
| Swipe dismiss | N/A (back gesture covers this) | UISheetPresentationController delegate callbacks |
| Lifecycle-aware timeout | Lifecycle.Event.ON_PAUSE after threshold | viewWillDisappear with timer validation |
| Trigger control | Server flag: exit_offer_enabled | Same flag, shared config |
One thing that will bite you: on iOS with StoreKit 2, subscription offer eligibility (isEligibleForIntroOffer) is async and user-specific. On Android with Google Play Billing Library 7, offer eligibility lives in ProductDetails.SubscriptionOfferDetails. You must pre-fetch eligibility before showing the exit offer. A 300ms delay on an exit intent screen kills the interaction.
The statistical framework that actually works
Most teams test conversion rate as the primary metric. This is the wrong metric.
Consider two variants:
| Variant | Conversion Rate | Avg Discount | Revenue Per User |
|---|---|---|---|
| A (no discount) | 3.2% | 0% | $1.92 |
| B (50% off annual) | 5.8% | 50% | $1.45 |
Variant B “wins” on conversion. Variant A generates 32% more revenue per user exposed. I’ve seen teams ship Variant B and then spend quarters trying to figure out why MRR didn’t move.
Your primary metric should be revenue-per-user (RPU): total revenue generated divided by total users exposed to the paywall, including non-converters.
Sample size and sequential testing
RPU has high variance (coefficient of variation ~3-5x for typical subscription apps), so you need much larger samples than conversion rate tests. A rough formula:
n = (2 * (Z_α/2 + Z_β)² * σ²) / δ²
For a 10% RPU lift detection at 80% power and 95% confidence with high-variance revenue data, expect needing 5,000-10,000 users per variant minimum.
The peeking problem: checking results daily and stopping when you see significance inflates your false positive rate from 5% to over 25%. Use sequential testing, either a Bayesian approach with credible intervals or group sequential methods with O’Brien-Fleming spending functions. Statsig handles this natively. With LaunchDarkly, you’ll need to implement the stopping rules yourself or export to a proper experimentation platform.
Cohort isolation in small-audience apps
For apps with smaller user bases (I run into this with niche productivity tools like HealthyDesk, which is a break reminder and desk exercise app for developers), experiment contamination is a real risk. A user who sees the exit offer in one session and the control in another pollutes both cohorts.
The fix: assign cohorts at the user level, persist the assignment in RevenueCat subscriber attributes, and use that as the source of truth across sessions.
Purchases.sharedInstance.setAttributes(
mapOf("experiment_cohort" to flagService.getCohort(userId))
)
Event taxonomy: connecting impressions to LTV
Your event pipeline needs these minimum events to close the loop:
| Event | Properties | Purpose |
|---|---|---|
paywall_impression | placement_id, variant, cohort | Denominator for RPU |
exit_offer_triggered | trigger_type, variant | Exit funnel tracking |
purchase_initiated | product_id, offer_type, discount_pct | Conversion + discount depth |
purchase_completed | revenue, currency, is_trial | Revenue attribution |
subscription_renewed | period, revenue | LTV calculation |
Without discount_pct on the purchase event, you can’t decompose whether a revenue change came from volume or price. Non-negotiable.
What to do with all this
Test RPU, not conversion rate. When discount depth varies across variants, conversion rate decouples from revenue. Wire revenue-per-exposed-user as your primary metric from day one.
Pre-fetch offer eligibility before exit triggers fire. StoreKit 2 and Play Billing Library 7 handle subscription offer eligibility differently. Cache it when the paywall loads, not when the exit offer appears.
Isolate cohorts at the user level, not the session level. Persist experiment assignments in RevenueCat subscriber attributes and enforce them across sessions. For small-audience apps, contamination will destroy your statistical power faster than insufficient sample size will.