Data Model Design: The Most Expensive Part to Change After Launch

Published

15 Jun 2026

Author

Tiffany Palmer

Table of Contents

The failure pattern arrives quietly. The product launched on time, the first cohort behaved roughly as expected, and the second item on the roadmap looked simple in the brief. Then engineering pulled the estimate apart. The shape of the data — settled in week three of the build, around the launch use case — could not accommodate the relationship the new feature required. The realistic path forward was a schema migration touching twelve tables, backfilling six months of production data, and running parallel writes for the duration. That was not a feature; it was a project.

Data model design is the architectural decision whose cost compounds most aggressively after launch. Refactoring application code is unpleasant; migrating a production schema against live writes, with referential integrity to preserve and audit history to maintain, sits in a different class of expensive. This article is for engineers and architects who can still make the design call before the cost compounds. It sits inside P03 — The Right Architecture — of the Built to Last™ 2.0 framework, alongside the Architecture Session, API and integration design, and the three-horizon architecture test. Our broader custom software delivery approach treats the data model as a pre-build artefact, not a sprint-one improvisation.

What schema migrations really cost

A poorly designed data model rarely shows up in launch metrics. It shows up in the quarters after, as estimates that quietly inflate.

The first cost is the migration. Adding a column with a default is cheap. Splitting an entity in two across six months of live data, with foreign keys to update, dual writes to maintain, and a rollback path to design, is not. Migrations of that class typically take a sprint to scope, a sprint to build, and most of another to run safely against production. While the migration is in flight, the feature that motivated it is not.

The second cost is data integrity. Every migration of any size leaves a fingerprint — orphaned rows, inconsistent timestamps, audit gaps — and the cost of cleaning these up is paid the quarter after, often by the people least equipped to spot them.

The third is trust. Stakeholders who saw a feature shipped in two weeks at launch and now hear "twelve weeks" for something that looks similar will, reasonably, ask why. "The data model didn't fit" is the honest answer and a hard one to use in the same roadmap meeting twice.

What data model design actually involves

Data model design is the deliberate, documented specification of how the system's data is shaped before code is written against it. The deliverable is not the schema dump. It is an entity-relationship diagram, a set of table or collection definitions, an indexing strategy tied to the actual queries the application will run, a documented normalisation level with rationale, an audit and history approach, a PII and compliance plan, and a versioned migration strategy. Each is a decision. Each has trade-offs that need to be visible to reviewers before code locks them in.

Entity identification. What deserves a table. Customer and account, order and invoice, post and revision commonly start as one entity and become two within the first year. Identifying them separately at design time costs nothing; resolving them once production data has accumulated under the collapsed shape costs a migration.

Data relationships. One-to-one, one-to-many, many-to-many — resolved against how the business actually works rather than how today's screens display it. A common failure is the relationship described as one-to-many in the workshop being faithfully implemented as a single foreign key, when the underlying reality is many-to-many and will surface as such inside two release cycles.

Database schema and normalisation. Third normal form is the sensible default for transactional systems. Selective denormalisation is appropriate where read profiles demand it, but should be a documented decision, not an accident. Materialised views, read replicas, and caches are usually better answers to read-heavy workloads than flattening the canonical schema.

Indexing strategy. Indexes follow queries; queries follow features. Build the indexing plan from the queries the application will run in its first six months — operational and reporting both. Choose composite index column order against selectivity. Add partial indexes where a column is nullable but only non-null rows are queried. The cost of an index is paid on every write; the cost of missing one is paid on every slow page ( PostgreSQL indexing documentation ).

Audit, history, and soft deletes. Decide once per entity whether you need a temporal record, an event log, a soft-delete pattern, or none of these. The default of "we'll add it later" is the most reliable predictor of a compliance retrofit two years in.

PII and segregation. Where sensitive data lives, how it is encrypted, who can read it, and how it is deleted on request. The architectural decision is not "do we comply" but "how is the schema shaped so compliance is the default path".

Growth assumptions. What does this look like at 10× data volume? Will partition keys hold? Will the index scan that's fine at 100,000 rows still hold at ten million? The three-horizon test applies to the data model as directly as it applies to compute.

Schema migration tooling. Schema changes belong in version control. Flyway, Alembic, Sqitch, Prisma Migrate, or Liquibase — the tool matters less than the rule: applied migrations are immutable; new changes are new migrations. Editing an applied migration is the single fastest route to corrupted environment parity.

Who is in the room

A real data model review has the backend lead who will own the schema, the engineer doing the work, the product owner with domain context, and the platform or database owner where one exists. For systems handling regulated data, the privacy or security owner attends. The Discovery Workshop™ surfaces the use cases; the Architecture Session locks the broader shape; the data model session focuses on entities, relationships, indexes, and migrations. It ends with an ERD checked into the repository and an Architecture Decision Record documenting why the model looks the way it does.

Failure modes even when the design is done

Three failure modes are common on teams who already take this seriously. Designing for the launch use case only — correct for today's screens, silently wrong for the second feature. Normalisation as dogma — pushing every system to third normal form regardless of read profile, then watching the application code accumulate joins. Reactive indexing — indexes added when a slow query is reported, without checking which other queries the same index affects. None of these is rare. All are preventable at design time.

A concrete example

Consider an eCommerce platform's Orders table built around the launch use case: one customer, one shipping address, one payment method, and a JSON column of line items. Correct for day one. Three months in, finance asks for per-SKU revenue reporting; the line items are not queryable as a relational set. Six months in, gift orders — one customer paying, a different recipient receiving — break the single shipping address. Twelve months in, partial refunds need tracking against individual line items, which means those items must exist as rows. The shape was wrong from the start, not because anyone got the launch wrong, but because the model was specified against today's screens rather than the next twelve months of features.

How to make data model design a pre-code discipline

Pre-code data model design does not require new tooling. It requires the design to happen, and to happen in the order that makes the decisions reviewable.

Start with the queries, not the tables. List the operational queries the application will run — the page loads, the API endpoints, the writes — and the reporting queries the business will need within six months. The list does not need to be exhaustive; it needs to be representative. The schema falls out of the queries far more reliably than it falls out of the conceptual model.

Build the ERD against twelve months of roadmap, not against today. Walk the next two or three features and check that the entities and relationships hold. Where they don't, decide now: design the relationship in, or document the migration you are knowingly deferring. Either is acceptable. An invisible migration is a future incident. Our project delivery framework treats this kind of forward-looking design as a standard pre-build deliverable, not a speculative exercise.

Decide audit, history, and soft-delete strategy at design time. Add the columns compliance and analytics will eventually want — created_at, updated_at, deleted_at, last_modified_by, tenant_id where multi-tenancy applies — even if today's product doesn't read them. Adding them at design time is free; adding them across six months of historical data is not.

Tie the indexing plan to the query list. For each operational query, identify the index that supports it. Capture the indexing plan in the same document as the ERD so it can be reviewed alongside the schema, not as an afterthought when a query plan turns red.

Choose schema migration tooling and commit to the rule that applied migrations are immutable. New schema changes become new migrations. Rollback paths are designed when the migration is written, not when production is in trouble. The DevOps and infrastructure approach we apply to deployment pipelines extends to schema deployment: migrations run through the same gates as code.

Wire the data model into the wider engineering process. The Architecture Session locks the broader stack; the data model lives inside that decision. The Production Readiness Review™ should include index coverage of the operational query list and a tested rollback path for every migration that touched production. The Production Readiness Score™ reflects those checks.

What to avoid: designing the schema in the same pull request that delivers the feature; treating the ORM's auto-generated migrations as the design; allowing one engineer to be the only person who can answer "why is the data shaped like this?"; adding indexes only after a query is slow in production; and putting PII in tables that don't have a documented retention and deletion policy.

The dependency on other Built to Last components is light but real. The data model needs the Discovery Workshop's clarity on what the product does, the Architecture Session's commitments on stack and integration boundaries, and the three-horizon test applied to growth assumptions. Most teams already have at least some version of these.

A composite scenario — twelve tables and six months of writes

A mid-sized Australian SaaS at the Scale stage had launched on time eighteen months earlier. The core product was a B2B workflow tool. The data model had been put together in week three of the original build by one engineer working from the wireframes; nothing about it was unreasonable for the launch use case. Customers had one workspace, workspaces had users, users had tasks. Relationships were single foreign keys; the audit story was a created_at on each table.

The second major feature on the roadmap was external collaboration — letting customers invite outside contributors to specific tasks, with constrained permissions and an audit trail of who saw what and when. The relationship the feature required was many-to-many with row-level permissions and full temporal history. The relationship the schema supported was one-to-many, with no permission layer and no history. The migration to bridge the gap touched twelve tables, required backfilling six months of permission inferences from application logs, and ran in production behind a feature flag for a fortnight.

The lift cost two engineers eight weeks and pushed the feature back almost a quarter. A review of the rest of the roadmap surfaced two further changes of similar shape inside the next year. The decision after the migration was to run a deliberate data model review against the next twelve months of features and rebuild the entity relationships before the third feature shipped, not during it. The shape was wrong from the start, but it was correctable. The cheaper version is to arrive at that realisation before launch.

When data model design matters most — and when it can wait

It matters most when any of three conditions hold. The data is regulated — health, finance, identity, payments, anything where audit and retention aren't negotiable. The data is long-lived — once written, it is expected to be queryable for years. The data is multi-tenant — schema decisions about tenant isolation are structurally hard to change once tenants are live. If you check any of those, the design is non-negotiable and the review is a pre-code deliverable. For teams on the custom software track, the discipline is part of the standard build, not a premium add-on.

It can wait — partially — for throwaway prototypes that will not survive the next pivot, for internal tools with small fixed user bases, and for pre-product-market-fit builds where the system will be rewritten before the data shape matters. Even there, the audit columns and a versioned migration tool are cheap insurance. The full discipline can wait. The minimum hygiene cannot.

The honest broker view: most teams defer data model design longer than they should because the cost of skipping it is invisible at launch. The first time it becomes urgent is usually the first time it is too late to do cheaply. For organisations that need dedicated data engineering capacity through the review and migration program it surfaces, our embedded engineering teams can take ownership of the schema rather than treating it as someone else's problem.

What to do next

If you have a roadmap with two or more features queued in the next six months, book a one-hour data model review this week. Bring the backend lead, the product owner, and the most likely-to-leave member of the team. Walk the next two features against today's ERD. Every relationship the schema does not natively support is on your migration backlog whether you've named it or not. For the broader engineering discipline this sits inside, our published view of how custom software is delivered end-to-end is a useful next read.

Frequently Asked Questions

How do we structure the data?

Begin with the queries, not the tables. List the operational queries the application will run in its first six months — page loads, API endpoints, reporting requests — and derive entities and relationships that support them. Default to third normal form for transactional data, denormalise selectively where read profiles demand it, and document every deviation as an architectural decision rather than an accident. The shape of the data should be defensible against the next twelve months of roadmap, not just the launch use case.

Will it scale to 10x data volume?

Apply the three-horizon test to every entity. At today's data volume, today's indexes and queries are fine by definition. At 10× volume the questions are whether your indexes still fit in memory, whether your joins still complete in the time the application allows, and whether your partition or shard keys hold up. The most reliable indicator that a model won't scale is the absence of any explicit thinking about 10× at design time. Plan the partitioning strategy and the index sizing against the larger horizon, not the smaller one.

Can we add new relationships without migration?

Sometimes — if the new relationship is purely additive and uses keys you already have. A new lookup table joined on existing IDs is cheap. Splitting an existing entity into two, changing a one-to-many into a many-to-many, or introducing row-level permissions onto a table that doesn't have them is a migration. The cheapest path to "yes, without migration" is designing the relationships you will plausibly need before the data accumulates against the relationships you currently have.

What indexes do we need?

Build the indexing plan from the operational query list, not from intuition. For each query, identify the smallest index that covers it efficiently. For composite filters, order indexed columns by selectivity. Add partial indexes where a column is nullable but only non-null rows are queried. Audit indexes quarterly — unused indexes cost on every write — and re-run query-plan analysis when traffic patterns shift. The right number of indexes is the number that supports the queries the application actually runs.

How much normalisation is enough?

Third normal form is the sensible default for transactional systems. Denormalise where read profiles clearly demand it, but treat the denormalisation as a documented trade-off — what consistency guarantee you are giving up, what update logic you are accepting in return. Materialised views, read replicas, and caches are usually better answers to read-heavy workloads than collapsing the canonical schema.

How do we handle PII and audit history without retrofitting?

Decide both at design time. Add timestamp, soft-delete, and last-modified-by columns on every entity that might plausibly need them — adding them later requires touching every existing row. Segregate PII into tables with explicit access controls and retention rules aligned to the Australian Privacy Principles or the equivalent regime in your customer base. Where a temporal record is needed, choose the pattern at design time: event-sourced, temporal table, or a separate history table.

Tiffany Palmer Senior UX/UI Designer

Tiffany brings creativity, adapts quickly to new tools, and leads atomic design principles to enhance UI/UX efficiency.