Production Readiness Review: Never Go Live Unprepared

Published

11 Jun 2026

Author

Tiffany Palmer

Table of Contents

Launch day arrives because the calendar says it should — not because the system has earned the right to be live. The marketing campaign is booked. The investor demo is on the diary. The team has been heads-down for months and "ready" feels close enough. So the deploy button gets pressed, and the next two hours are spent discovering everything that wasn't quite finished. A connection pool exhausts. An alert that was never configured doesn't fire. A rollback path that was never tested doesn't work. The product is live, and the team is firefighting in production while customers churn through the experience.

A production readiness review is the structured pre-launch session that exists to prevent that pattern. The Production Readiness Review™ is a defined gate in EB Pearls' Built to Last™ 2.0 framework: a walk through every category of production failure — performance, security, observability, backup, escalation — measured against a Production Readiness Score™. Either the score passes, and the system goes live; or it doesn't, and the launch waits until it does. There is no negotiation against the calendar.

What follows is a walkthrough of how the review actually runs: the categories it covers, who's in the room, what each step verifies, what the score means, and what a passing system looks like the day before it goes live.

The Cost of Skipping a Production Readiness Review

The damage from launching against a calendar instead of a checklist is rarely a single dramatic incident. It is a sequence of small surprises, each individually survivable, that arrive in the worst possible order.

A team launches with default logging that captures errors but not request traces. Twenty minutes after launch, the database connection pool starts to saturate under a load pattern the load test never produced. The error rate climbs. Customer-facing latency doubles, then triples. By the time the team identifies the cause, two hours of revenue and several support tickets have already happened. None of the individual problems were complicated. The team simply discovered them sequentially, in production, in front of users.

The cost has three components. The first is revenue lost during the incident — measurable but usually small relative to the build budget. The second is the trust cost: customers who churned in their first session, investors who saw the launch wobble, internal stakeholders whose confidence in the team has been quietly downgraded. The third is the compounding cost of every post-launch firefight: each one consumes engineering time that was meant to be spent on the next feature. In Built to Last 2.0, this failure sits in P02 — The Right Infrastructure. Infrastructure that performs in testing and fails under real users isn't faster to ship; it's more expensive to fix.

EB Pearls has shipped 600+ products since 2004 with a 98% on-time delivery rate over the past 12 years. The discipline that supports that rate isn't speed — it's the willingness to hold a launch when the review hasn't passed. The Production Readiness Review is the structural reason "on time" doesn't translate into "fragile at launch." It is also the artefact that makes the way we deliver custom software defensible to a board or an auditor after the fact.

Who Needs to Be in the Room

A production readiness review only works if the people who can actually fix the gaps are present. Status from a delegate is not the same as a hand on the keyboard.

For most product engagements, the room includes the engineering lead, the DevOps or infrastructure engineer responsible for the deployment, the QA lead who owns the test suite, and the product owner who can decide whether to launch with a known gap or to hold. For mobile app builds, the platform engineer who handles store submissions joins the room. For AI-enabled products, an additional seat goes to whoever owns model monitoring and evaluation. For regulated builds — payments, healthcare, anything with data residency requirements — the security lead is mandatory, not optional.

On the EB Pearls side, the review is run by a senior engagement lead alongside the technical architect who signed off the Architecture Decision Records during sprint one. The architect is in the room because production readiness is partially a check that the architecture survived contact with delivery. If the system has drifted from what was agreed, the review catches it.

The review is deliberately scheduled at least two weeks before the target launch date. That window is the difference between "fix the seven items found in review" and "negotiate which items we can survive going live with." A review three days before launch is a sign that something earlier in the process failed.

How the Production Readiness Review Works, Step by Step

The review follows a defined sequence. Each step is a category of risk with artefacts to verify and a pass/fail decision that contributes to the score. Here is what happens, in order.

Step 1: Verify Performance Benchmarks and Load Behaviour

The first category is performance under realistic load. The review asks: at what load does the system degrade, where does it degrade first, and what's the response when it does?

The artefacts are load test results, response-time baselines for every critical user path, and the documented behaviour at 1x, 5x, and 10x expected peak. The benchmark isn't whether the system handles average load — that's the floor, not the bar. The bar is whether the team knows the failure mode at 10x. Does the database become the bottleneck? Does the queue back up? Does the autoscaling policy kick in before users notice?

A common failure pattern at this step is "the load test passed but it didn't replicate production conditions." The fix is structural. Load tests must replicate real query patterns, real session lengths, real concurrency profiles. Synthetic load that hits one endpoint at constant rate is not a load test; it's a smoke test.

Step 2: Validate Security Posture

Step two walks the security checklist. Authentication and authorisation flows verified end-to-end. Secrets rotated and stored in a dedicated manager, not in environment files. Dependencies scanned and high-severity vulnerabilities patched. Static and dynamic security testing run against the production build, not the development build. TLS configuration validated. Rate limiting in place on every public endpoint.

For builds with regulatory scope — Australian Privacy Principles, PCI-DSS, HIPAA, GDPR — the review confirms that controls are evidenced, not just claimed. The OWASP Top 10 categories each get a yes/no against the system's current posture. Items that fail to verify get filed as blockers, not deferrals.

The most common failure at this step is the hardcoded secret in environment variables that "we'll move to a vault after launch." It does not get moved after launch. The review's job is to prevent the deferred-cleanup pattern that becomes a six-month-old security debt.

Step 3: Confirm Observability and Monitoring Are Live

The third step verifies that the team will know before customers do. Structured logging in place. Metrics for every critical user path. Distributed tracing across services. Error tracking with assigned ownership. Alerting tuned — meaning real alerts fire on real conditions, and noise has been reduced so on-call engineers respond rather than tune out.

The pass condition isn't that observability tooling exists. It's that the team has tested the alerting paths. A deliberate failure injection in staging that the team observed, identified, and responded to using only the tools available in production. If the team can't see a failure they themselves caused in staging, they can't see one a customer will cause in production.

For AI products, this step expands to include accuracy monitoring, drift detection, and cost monitoring. An AI system that goes live without accuracy benchmarks is a system that will quietly degrade.

Step 4: Test Backup, Restore, and Disaster Recovery

Backups that have never been restored are not backups. They are hope.

Step four runs the actual restore. A backup from the previous day is restored to a test environment. The data is verified. The restore time is measured against the recovery time objective agreed during architecture. If the restore takes longer than the RTO allows, the gap is documented and the launch is held until it doesn't.

The same applies to disaster recovery. If the architecture document says the system can fail over to a secondary region in fifteen minutes, the review verifies that claim by performing the failover in a non-production environment and measuring it. Claims become evidence or they become blockers.

Step 5: Confirm Escalation Paths and Runbooks Exist

A production system without an escalation path is a system whose 2am incidents become four-hour incidents. Step five verifies that the on-call rotation is defined, the most junior on-call engineer knows where the runbooks live, and the runbooks pass the test of being followable by someone who didn't write them.

The review walks at least one runbook end-to-end with a team member who wasn't involved in writing it. If they can follow the runbook to resolve a simulated incident, the runbook passes. If they get stuck on an undocumented step, the runbook fails and gets rewritten before launch.

Escalation paths are checked the same way. Every step from detection to resolution has a named owner, a documented next-step-if-unreachable, and a maximum response time. A pager that wakes a single engineer with no fallback is not an escalation path. This step is also where DevOps operating discipline gets pressure-tested before the first paying customer can find a gap.

Step 6: Score the Review and Gate the Launch

Each category produces a sub-score. The categories roll up into the Production Readiness Score. Items either pass, pass with a documented exception that the product owner has signed off, or fail. Failed items are blockers — launch does not happen until they pass.

The score is not a percentage to be averaged. A 90% score with a critical failure in security is not a pass. The review is gated on the categories, not on the aggregate. This is what makes "we won't go live with infrastructure that hasn't passed" a structural commitment rather than a slogan.

The output of the review is a written record: the categories tested, the items that passed, the items that failed, and the items that passed with exceptions. That record is what the team refers back to the first time someone asks, six months after launch, whether a particular control was verified before going live.

What a Passing Review Looks Like — And Where It Still Fails

A passing review produces a launch that is uneventful. Not because nothing happens, but because the things that happen were anticipated. The autoscaling kicks in when it was meant to. The alert that fires during launch is one the team configured deliberately. The rollback path nobody used is the one they tested last week.

The failure modes, even when a review is run, are worth naming. The most common: a category gets a pass because a senior person vouched for it rather than because the artefact was verified. The fix is to require evidence — a screenshot, a test log, a recorded restore — for every pass. The second most common: the review happens but the items found get deferred under launch pressure. The fix is the gate. Failed items are blockers, the launch slips, full stop.

The third failure mode is more subtle. The review covers what the team already thought to test and misses what they didn't. The mitigation is to include a generalist in the room — someone who didn't build the system and asks the obvious questions. Their job is to find the categories the build team is too close to see.

The compound value of a passing review is invisible by design. The launch is calm. The first month in production is fixing features users want rather than incidents users caused. The team that ran the review has the artefact they need for the next compliance audit, the next investor diligence, the next executive question about whether the platform is ready for scale.

Two Launches, Two Outcomes

A mid-sized SaaS client (engineering team of around 15) we worked with had launched a previous product without a structured review. The team had monitoring — uptime checks against the homepage — but no application-level observability and no tested escalation path. Twenty-three minutes after the launch announcement went out, the database connection pool began saturating under a load pattern that hadn't surfaced in their load test, which had hit a single endpoint at constant rate. The team noticed when a customer emailed support about checkout failures. The fix took roughly two hours and was applied in production with the founder watching. The post-incident review identified specific items that a pre-launch readiness review would have flagged in advance.

For the next product, the same team booked the Production Readiness Review three weeks before launch. The first pass surfaced a connection pool tuning recommendation, secrets still in environment files, an alerting configuration that would have failed silently, two runbooks that no one outside the writer could follow, an untested restore procedure, a missing rate limit, and a deployment rollback that had only been tested in development. The team fixed the seven blockers across the next two weeks and re-ran the review. The system passed the second time. Launch day was a non-event. The first month in production was spent on features, not incidents.

The contrast is not that the second team was more skilled. It is that the second team had a structured checklist that surfaced the items the first team had unintentionally deferred. The review converts good intentions into verified artefacts.

When a Production Readiness Review Is Critical, When You Might Defer

The Production Readiness Review is critical for any launch where downtime translates into measurable cost — revenue lost, customers churned, reputation damaged, compliance exposure created. That covers every customer-facing system, every system handling regulated data, every system with an executive demo attached, and every system where the team won't be staffed to firefight 24/7 in the launch window.

It is also critical when the architecture has not been used in production before — a new AI service, a new platform, a new integration. Novelty multiplies the categories that can fail in unexpected ways. The path from project kick-off to a working launch is the same regardless of stack, but the failure modes are not.

Where you might defer it: an internal tool with a known small user base, a private beta with users who have signed up to experience early bugs, an experimental prototype that exists to test a riskiest assumption and is not yet handling real value. Even in these cases, a lightweight version of the review — performance, observability, escalation — is usually worth running. The full review is what the framework's commitment to "infrastructure that performs at scale" actually means in practice.

What to Do This Week

If you have a launch coming up in the next month and the production readiness review hasn't been scheduled, schedule it. Two weeks before launch is the latest the review should sit on the calendar. Anything closer and the review becomes a negotiation rather than a gate.

For a deeper view of the delivery process that surrounds the review, read our project delivery framework. For a sense of the full lifecycle, from concept to launch covers the broader sequence end-to-end. If you'd like to discuss a specific upcoming launch, the custom software service line is where these reviews sit.

Frequently Asked Questions

What is a production readiness review?

A production readiness review is a structured pre-launch session that verifies a system meets a defined bar across performance, security, observability, backup, and escalation before it goes live. In Built to Last 2.0 it produces a Production Readiness Score. If the score doesn't pass, the launch is held until it does.

When should we schedule the production readiness review?

At least two weeks before the target launch date, and ideally three. Earlier scheduling gives the team time to fix the items the review finds. Closer to launch and the review becomes pressure to defer items rather than fix them — which defeats the purpose. A review scheduled three days before launch is itself a signal that the project's pre-launch discipline needs strengthening.

What's in the review?

Five categories: performance and load behaviour, security posture, observability and monitoring, backup and disaster recovery, and escalation paths and runbooks. Each category has artefacts to verify and a pass/fail decision. A sixth step scores the review against those categories and gates the launch.

What happens if we fail the review?

The launch is held. Failed items are blockers, not negotiations. The team fixes the items and re-runs the review. This is what makes the review a gate rather than a status meeting. Launching with known critical gaps is exactly what the review exists to prevent.

How is this different from a normal pre-launch checklist?

A checklist tracks tasks. The review verifies evidence. Every item that passes has a corresponding artefact — a screenshot, a test log, a recorded restore, a runbook walked by someone who didn't write it. Items pass because they are proven, not because someone confirms they are done. The Production Readiness Score is the structural difference.

Do we still need the review if we've launched products before?

Yes — especially then. Teams that have launched before often skip the review on the assumption that their habits are enough. The categories the review catches are the ones habits miss. Every launch has a category the team didn't think to look at, and that category is the one a structured review is designed to surface.

Tiffany Palmer Senior UX/UI Designer

Tiffany brings creativity, adapts quickly to new tools, and leads atomic design principles to enhance UI/UX efficiency.