The Production Readiness Review Checklist

The Production Readiness Review Checklist
Published

10 Jun 2026

Author
Akash Shakya

Akash Shakya

Table of Contents

50 points across 5 dimensions. The pre-launch audit that prevents the $23K crisis on day one — yours to use against any agency's deliverable or your own product.

lightbulb-filament

Why This Checklist Is Free

The Production Readiness Review™ is part of every EB Pearls engagement. We could keep the framework internal — most agencies do. We are publishing it because the checklist works regardless of who runs it, and founders deserve to know what "launch-ready" actually means.

Use this checklist against any agency's deliverable. Use it against your own product. Use it against a freelancer's work. If after running the checklist you decide we are the right partner to fix what it reveals, great. If you decide you can fix it yourself or with another partner, that is also a good outcome. The checklist is the standard. The agency is optional.


A note on Hana: Hana is a composite character drawn from patterns we have observed across hundreds of post-launch crises. The product, launch failures, security incident, and recovery described here reflect real scenarios — compressed into a single narrative.

Hana's $23K Lesson: Launching Without a PRR

"I built a women's hormonal health tracking app — cycle tracking, symptom logging, and personalised insights drawn from medical literature. Validated the problem with 22 user interviews. Built a $36K MVP over 9 weeks. The product worked beautifully in QA. The founder-team test went perfectly. I felt confident.

My agency offered a 'pre-launch checklist review' for an extra $2,500. I declined. The app worked. We had tested everything. Why pay $2,500 for someone to confirm what was already true?

I launched on a Tuesday. By Wednesday afternoon I had 11 critical issues:

1. Push notifications fired at 3am for users in different timezones (we had only tested AEST)
2. Symptom logging crashed for entries longer than 280 characters (we had only tested short entries)
3. Charts displayed wrong cycle phases for women with cycles shorter than 24 days (we had only tested 28-day cycles)
4. The export-to-PDF feature failed silently with no error message — users thought their data was gone
5. App Store reviews dropped from 5-star pre-launch to 2.4-star within 48 hours
6. Database queries timed out at 400 concurrent users (we had only load-tested to 100)
7. Apple's medical content guidelines flagged 3 insights as 'unsubstantiated health claims'
8. The analytics events I needed to measure retention were not firing
9. Password reset emails were going to spam folders (no SPF/DKIM configured)
10. The privacy policy referenced 'Australia only' but the app was downloadable globally
11. No process for handling data deletion requests — required by GDPR for our European downloads

Three weeks of emergency rework. $23,000 in unplanned costs. App Store rating that took 4 months to recover. And the most painful part: a Production Readiness Review at $2,500 would have caught all 11 issues before launch. I paid $23,000 to skip a $2,500 audit.

I have run a PRR on every product I have shipped since."

— Hana (composite founder)

11

Critical issues in first 48 hours

all preventable with PRR

$23K

Emergency rework cost

3 weeks of crisis work

$2,500

What the PRR would have cost

9x ROI on prevention

4 mo

Time to recover App Store rating

from 2.4 stars to 4.6

How to Use This Checklist

The checklist has 50 points across 5 dimensions: Reliability, Measurability, Usability, Scalability, and Security. Each point is scored 0, 1, or 2:

Score Meaning Action
0 Not implemented or significantly deficient Not implemented or significantly deficient
1 Implemented but has notable gaps Document and schedule remediation
2 Fully implemented and tested Pass — no action needed


Maximum score: 100. Launch threshold: 80/100 with zero critical-severity items at 0. Below 80, or with any critical at 0, launch is delayed.

Run this 2-3 weeks before launch. Not the day before. The audit takes 2-5 days of structured work, and the issues it reveals need time to fix. Running it the day before launch defeats the purpose. According to Google's SRE Book on release engineering, the most reliable launches are those where the readiness review happens early enough that issues can be remediated without delaying launch — typically 3 weeks pre-launch for a 6-10 week MVP build.

Dimension 1: Reliability (10 Points)

Does the product work consistently under real conditions?

1. Critical user paths have automated tests (Critical)

The 3-5 most important user actions (sign-up, core action, payment, key data write) have automated tests that run on every build. Without this, every code change risks silent breakage.

2. Error handling for every external API call (Critical)

Payment processors, email services, push notifications, third-party APIs all fail occasionally. The product handles each failure gracefully without crashing or losing user data.

3. Edge cases tested with adversarial inputs (High)

Empty inputs, maximum-length inputs, special characters, Unicode, emojis, SQL injection attempts. Hana's app crashed on entries over 280 characters because nobody tested long inputs.


4. Offline / poor connection handling (High)

Mobile apps face spotty connectivity. Either the app degrades gracefully (cached data, queued writes) or it tells the user clearly what is happening. Silent failures destroy trust.


5. Data persistence verified end-to-end (Critical)

User writes data → close app → reopen on different device → data is there. Tested for every type of user data the product handles.


6. Timezone handling tested for all expected user locations (High)
All timestamps stored as UTC, displayed in user's local time. Hana's 3am push notifications happened because timezone handling was AEST-only.
 
 
7. Rollback plan documented and tested (High)
If launch reveals a critical bug, can you roll back to the previous version within 30 minutes? Tested, not assumed.
 
 
8. Crash reporting active (Sentry, Crashlytics, etc.) (Critical)

Every crash in production is logged with stack trace, user context, and frequency. Without this, you find out about bugs from negative reviews.
 
 
9. Backup and recovery process documented (Critical)

Database backups run automatically, tested for restoration, retention policy documented. "We have backups" is not enough — "we restored from backup last week and it worked" is.
 
 

10. Status page or uptime monitoring configured (Medium)

UptimeRobot, Pingdom, or equivalent pings the API every 60 seconds. Alerts on downtime within 2 minutes. You know about outages before users do.

Dimension 2: Measurability (10 Points)

Can you measure user behaviour and product health?

11. Analytics platform integrated (Mixpanel, Amplitude, etc.)  (Critical)

An analytics platform is installed and receiving events. Hana's analytics were not firing — she could not measure retention for 3 weeks.
 
 
12. North Star Metric defined and instrumented (Critical)

The single metric that defines success (DAU, weekly active retention, paid conversion) is identified before launch and tracked from day 1.
 
 
13. 5-10 key events instrumented (not 50+) ( High)
 
Track the events that matter (sign-up, activation, core action, retention milestones). Over-instrumentation creates noise and slows analysis. Under-instrumentation creates blind spots.
 
 
14. Funnel from acquisition to activation visible (High)
 
You can see: downloads → sign-ups → activation → core action → retention. Each step has a measurable conversion rate.
 
 
15. User identification consistent across sessions and devices (High)
 
A user's events are linked to their account regardless of device. Required for accurate retention and LTV analysis.
 
 
16. A/B testing framework available (if applicable) (Medium)
 
If your product will iterate on features, an A/B testing framework lets you measure changes rather than guessing.


17. Performance monitoring active (response times, error rates) (High)
 
You know how long key actions take, in which percentile. Slow features become visible before users complain.


18. Revenue tracking integrated (Stripe, payment platform) (Critical)
 
Every paying user is identifiable. MRR, churn, LTV calculable from day 1.
 
 
19. Cohort analysis possible (users grouped by signup week) (Medium)
 
You can compare retention of users who signed up in different weeks. Essential for measuring whether product changes improve or degrade retention.
 
 
20. Dashboard accessible to non-technical team members (Medium)
 
The founder can read the metrics without engineering help. Decisions slow when only the technical team can see the data.
 
 

Dimension 3: Usability (10 Points)

Can real users complete the core flow without help?

21. 5+ external testers complete the core flow unassisted (Critical)
 
Not your team. Not your friends. Real target users you find via Craigslist or User Interviews who have never seen the product. 80%+ should complete the flow without guidance.
 
 
22. Onboarding tested with users who have never seen the product (Critical)
 
First-time user experience matters more than any other moment. Watch 5+ people open the app for the first time. The drop-off points are obvious.
 
 
23. Error messages are human-readable, not technical (High)
 
"Something went wrong, please try again" beats "Error 500: Connection refused at 0x7F". Every error message tested for clarity.
 
 
24. Accessibility basics implemented (WCAG 2.1 AA minimum) (High)
 
Colour contrast, alt text on images, keyboard navigation, screen reader compatibility. Not optional in 2026.
 
 
25. Loading states for actions taking >1 second (Medium)
 
Users tap a button. If nothing happens within 1 second, they tap again. Loading spinners prevent double-actions and accidental duplicate data.
 
 
26. Empty states designed (not blank screens)  (Medium)
 
First-time users see an explanation of what should appear, not a blank list. Power users see helpful context, not emptiness.
 
 
27. Support / help mechanism in-app (High)
 
FAQ, chat widget, or email support is one tap away. Users who hit a problem find help without leaving the app.
 
 
28. Forms validated client-side AND server-side  (High)
 
Client-side validation gives immediate feedback. Server-side validation prevents malformed data. Both are necessary.
 
 
29. Mobile experience tested on real devices (not just emulators) (Critical)
 
iPhone SE (small screen), iPhone Pro Max (large screen), mid-range Android, older Android. Emulators miss touch-target issues, performance gaps, and rendering bugs.
 
 
30. App Store / Play Store assets prepared (screenshots, description, keywords)  (High)
 
App Store rejection on day-of-launch delays go-live by 1-3 days. Apple's medical content guidelines flagged Hana's app — she should have submitted for review 2 weeks earlier.
 
 

Dimension 4: Scalability (10 Points)

Will the product survive sudden growth?

31. Load tested at 10x expected launch traffic (Critical)
 
If you expect 100 concurrent users at launch, test at 1,000. Ravi's app broke at 2,000 users because load testing stopped at 100.
 
 
32. Database queries indexed appropriately (Critical)
 
Every query that runs frequently has an index. Missing indexes are invisible at 100 users and catastrophic at 2,000.
 
 
33. Static assets served via CDN (High)
 
Images, fonts, JavaScript bundles served from CloudFront or equivalent. Reduces server load and improves user experience globally.
 
 
34. Caching layer configured (Redis, Memcached, or equivalent) (High)
 
Frequently accessed data is cached. Database queries drop by 40-60% with proper caching.
 
 
35. Background job processing for slow operations (Medium)
 
Email sending, image processing, report generation happen in background queues — not in user-facing API requests.
 
 
36. Rate limiting on public endpoints (High)
 
Login, sign-up, password reset, public APIs have rate limits to prevent abuse and accidental traffic floods.
 
 
37. Push notification batching configured (High)
 
Sending 5,000 notifications simultaneously hits provider rate limits. Batched sending in waves of 500-1,000 prevents this.
 
 
38. Infrastructure cost monitoring and alerts (Medium)

You know your AWS bill in real-time. Alerts fire if costs spike unexpectedly (e.g., runaway query, DDoS attempt).
 
 
39. Auto-scaling configured for primary services (Medium)
 
Compute scales up under load and down at quiet hours. Saves cost. Survives traffic spikes.
 
 
40. Architecture documented for the next engineer (High)
 
A new developer joining the team can understand the system from documentation alone, without "tribal knowledge" handovers. Zara's $67K debt crisis happened because this was skipped.
 
 

Dimension 5: Security (10 Points)

Is user data protected and is the system resilient to attack?

41. HTTPS / TLS enforced on every endpoint (Critical)
 
No HTTP. No mixed content. Certificate auto-renewal configured. Hana's product had this, but the privacy policy webpage did not — a small embarrassment with regulatory implications.
 
 
42. Passwords hashed with bcrypt, argon2, or equivalent (Critical)
 
Never MD5, SHA1, or plain text. Salts used. OWASP password storage guidelines followed.
 
 
43. Authentication uses standard library, not custom code (Critical)
 
Auth0, Firebase Auth, Cognito, or framework-native auth. Custom-rolled auth is one of the highest-risk pieces of code in any product.
 
 
44. Sensitive data encrypted at rest (Critical)
 
PII, payment info, health data encrypted in the database. AWS RDS encryption, field-level encryption for highest-sensitivity fields.
 
 
45. API keys and secrets in environment variables, not code (Critical)
 
Never committed to git. Rotated regularly. AWS Secrets Manager or equivalent for production.
 
 
46. SQL injection and XSS protection verified (Critical)
 
All user inputs sanitised. Parameterised queries used. OWASP Top 10 vulnerabilities tested.
 
47. Privacy policy reflects actual data handling (High)
 
The policy is accurate, not boilerplate. Mentions every type of data collected, where it is stored, how long it is retained, how it can be deleted. Hana's policy referenced "Australia only" but the app was downloadable globally.
 
 
48. GDPR / CCPA compliance assessed for target markets (High)
 
If users are in Europe (GDPR) or California (CCPA), data deletion endpoints exist and are tested. Consent management implemented. Jaya's international expansion guide covers this in depth.
 
 
49. Industry-specific compliance (HIPAA, PCI-DSS, etc.) where applicable (Critical)
 
Healthcare = HIPAA-equivalent. Payments = PCI-DSS. Financial services = local regulatory. Not optional. Not bolt-on later. Architected from day 1.
 
 
50. Penetration test or security audit completed ( High)
 
A third party has tested the product for vulnerabilities. Not a friend who "knows security" — an external firm or recognised tool (Burp Suite, OWASP ZAP). For high-stakes products: full penetration test by a certified firm.
 

Hana's Second Lesson: A PRR Is Only as Strong as Its Weakest Dimension

"Eighteen months after the first launch, I built a second product — a partner companion app that pairs with the original. This time I ran a PRR. I knew the playbook. I scored every dimension carefully — Reliability 9/10, Measurability 8/10, Usability 9/10, Scalability 8/10. Total: 87/100. Well above the 80 threshold. I felt confident.

I had not scored Security carefully. I had assumed the second product inherited the security architecture of the first. I gave Security 8/10 without auditing it specifically.

90 days post-launch, a user noticed they could see another user's partner data if they manipulated a URL parameter. The two products had different authorisation patterns — and the second product had a gap I had not tested for. Not a hack. Not a sophisticated attack. A teenager curious about how the API worked.

Privacy breach. 47 user records exposed. Mandatory notification to the Australian Privacy Commissioner. Public disclosure on the website. Trust damage that took 6 months to recover. Cost: $18,000 in incident response, legal review, security audit, and patch deployment.

The PRR works. I just did not run it properly. The dimensions are not optional. The security checklist is not a suggestion. Skipping any of the 50 points means accepting the risk of what that point catches."

— Hana (composite founder)


The Five-Dimension Rule. A PRR is only as strong as its weakest dimension. A product can score 95/100 across Reliability, Measurability, Usability, and Scalability — and still fail catastrophically if Security scores 30/100. Run every dimension. Score every point. Do not skip the boring sections to get to the launch.

Built to Last™ — P05: The Right Code. The Production Readiness Review is how P05 enters the launch phase. P05 is not "code that compiles." It is "code that withstands real users in production." The 50 points in this checklist are the operational definition of P05 — and the reason every EB Pearls launch undergoes a PRR before going live. Zara's $67K technical debt would have been a fraction of that cost if a PRR had caught the missing tests and documentation before launch.

What to Do With Your Score

Score     Status Action
90-100     Excellent Launch with confidence. Schedule a 30-day review.
80-89     Pass Launch with documented remediation plan for items scored 1.
70-79     Conditional     Delay launch 1-2 weeks. Fix all 0s. Re-score.
60-69     Significant gaps     Delay launch 2-4 weeks. Major remediation required.

Below 60    
Not ready     Launch will damage the product. Reassess timeline.

The non-negotiable rule: Regardless of total score, any critical-severity item scored 0 blocks launch. A 92/100 score with one critical at 0 is not a pass — it is a launch waiting to fail at the one thing that matters most.

Hana's first launch would have scored approximately 68/100 with 5 critical items at 0. The 11 issues she hit in production were exactly the items the PRR would have flagged. Her second launch scored 87/100 — but the Security dimension was inflated because she did not audit it carefully. Score honestly. The PRR exists to tell you the truth, not to validate your readiness.

Founder FAQ

What is a Production Readiness Review?

A structured pre-launch audit across reliability, measurability, usability, scalability, and security. Produces a Production Readiness Score™. Below 80 = launch delayed.

When should I run it?

2-3 weeks before launch. Running it the day before defeats the purpose. The audit reveals issues that need time to fix.

Can I run a PRR without an agency?

Yes. The 50-point checklist in this article is freely usable. Run it against your product, your agency's deliverable, or a freelancer's work.

What if my agency does not offer PRR?

Ask why. Request it as a paid addition, or run the checklist yourself. Launching without a structured pre-launch audit is the pattern that cost Mika $30K and Hana $23K.

What is a passing score?

80/100 minimum, with zero critical-severity items at 0. Below 80 means launch is delayed. Some founders push back; we delay anyway. A launch at 65/100 will hit the same issues an 80/100 launch avoids.

Why 50 points specifically?

10 points across each of 5 dimensions is the minimum useful coverage. Fewer points miss critical categories. More points become theatre. Derived from 600+ launches at EB Pearls.

The Founder's Edge

Hana launched twice. The first time, she skipped a $2,500 PRR and paid $23,000 in emergency rework. The second time, she ran a PRR but inflated her Security score and paid $18,000 in incident response. Total cost of two skipped audits: $41,000.

The checklist in this article is the framework that would have prevented both losses. Fifty points. Five dimensions. Two-to-five days of structured work. It is the cheapest insurance against the most expensive product failures.

Use it. Whether you engage us or not.

Want To Become The Most Known And Trusted Brand In Your Market

If you’re looking to become a trusted brand and not sure where to start, IMPACT can help. We’ll guide you on how to lead with transparency, show your process with video, sell in buyer-friendly ways, and keep it human. All to build the trust that drives real revenue.