From PRD to Database Schema in 30 Minutes: AI-Powered Domain Modeling

Most database architecture mistakes happen in the first modeling session — before any code is written. A missing many-to-many relationship, weak normalization, no indexes on hot queries. Catching these after launch costs an order of magnitude more than catching them at the modeling stage.

AI changes the economics here. Not because it generates perfect schemas automatically — it doesn’t. But it compresses the “hypothesis → validation → correction” cycle from hours to minutes. PRD goes in, a validated PostgreSQL schema comes out. 30 minutes instead of two days.

Here are four steps that turn a requirements document into a production-ready migration: entity extraction, relationship building, SQL generation, and validation.

Step 1: Extract Entities from the PRD

A PRD describes a product in business language. A database works with entities, attributes, and relationships. The first step is crossing that boundary.

A typical PRD has User Stories, Functional Requirements, Non-Functional Requirements. Entities are hiding in the nouns — in User Stories and feature descriptions.

Prompt for Entity Extraction

You are a senior database architect. Analyze the PRD and extract all domain entities.

For each entity, define:
1. Name (snake_case, singular)
2. Business description (one sentence)
3. Key attributes with data types
4. Whether the entity is core or auxiliary (lookup/junction)

PRD:
"""
{paste PRD text here}
"""

Format the response as a table. Do not add entities that don't clearly follow from the requirements.

Example: Task Tracker PRD

Suppose the PRD describes a project management system. The User Stories section contains:

As a project manager, I want to create projects, add members, and assign tasks. As a member, I want to see tasks assigned to me, leave comments, and attach files. As an admin, I want to manage roles and permissions within an organization.

AI extracts the entities:

Entity	Type	Description
`organization`	core	Organization that owns projects
`user`	core	System user
`project`	core	Project within an organization
`task`	core	Task within a project
`comment`	core	Comment on a task
`attachment`	core	Attached file
`role`	lookup	User role (admin, manager, member)
`organization_member`	junction	Links user to organization with a role
`project_member`	junction	Links user to a project

Nine entities from three User Stories. Notice that AI surfaced the junction tables organization_member and project_member — neither is explicitly mentioned in the PRD, but both fall out of the phrase “add members.”

What to Check at This Stage

Three mistakes AI makes consistently during entity extraction:

Too many entities. It creates a separate table for every mentioned concept. “Task priority” becomes a priority table when an enum field would do fine. The rule: if a concept has no attributes of its own and won’t grow over time, it’s a field, not a table.

Missing audit fields. PRDs almost never say “record who changed a task and when.” But in production you’ll want created_at, updated_at, created_by on nearly every table. Add it explicitly to the prompt: “Include standard audit fields for every entity.”

Mixing domain and infrastructure. Tables like session, audit_log, notification_queue are infrastructure concerns. Don’t let them into the domain model — they’re a distraction at this stage.

Step 2: Build Relationships Between Entities

Entities without relationships are just a list of tables. Relationships define how data connects and how queries will be written. Mistakes here cause the worst kind of pain: impossible queries, or silent data duplication that you won’t notice until production.

Prompt for Building Relationships

Based on the entity list, identify all relationships between them.

For each relationship, specify:
1. Type: one-to-one, one-to-many, many-to-many
2. Nullability (nullable or NOT NULL)
3. Cascade delete rule (CASCADE, SET NULL, RESTRICT)
4. Ownership direction (which side "owns" the relationship)

Entities:
"""
{entity list from step 1}
"""

Format: relationship table. For many-to-many, specify the junction table.

Result: Relationship Graph

organization  1──N  project            (CASCADE)
organization  1──N  organization_member (CASCADE)
user          1──N  organization_member (CASCADE)
role          1──N  organization_member (RESTRICT)
project       1──N  project_member     (CASCADE)
user          1──N  project_member     (CASCADE)
project       1──N  task               (CASCADE)
task          N──1  user               (SET NULL, nullable — assignee)
task          N──1  user               (RESTRICT — creator)
task          1──N  comment            (CASCADE)
user          1──N  comment            (RESTRICT)
task          1──N  attachment         (CASCADE)
user          1──N  attachment         (RESTRICT)

The key decisions are right there in the graph. Delete an organization — projects and members go with it. Delete a user assigned to a task — assignee_id becomes NULL, the task survives. Delete a role that members hold — RESTRICT blocks it entirely.

The Trap: Hidden Many-to-Many

The PRD says “a user is assigned to a task.” Sounds like one-to-many — one assignee per task. Then two sprints in, someone says “we need multiple assignees.” Now the one-to-many becomes many-to-many: a junction table, a data migration, rewritten queries. Fun week.

Design for many-to-many upfront if there’s any chance the business logic expands “one → many.” An empty junction table costs nothing. A production migration costs days.

Ask AI directly:

For each one-to-many relationship, assess the likelihood
that it becomes many-to-many within the next 6 months.
Justify based on typical evolution of similar products.

Step 3: Generate the PostgreSQL Schema

Entities defined, relationships mapped. Time for the SQL migration. This is where AI saves the most time — picking correct data types, adding indexes on every FK, writing constraints you’d otherwise forget at 11pm.

Prompt for SQL Generation

Generate a PostgreSQL migration based on the entity and relationship list.

Requirements:
- UUID v4 for primary keys (gen_random_uuid())
- created_at/updated_at TIMESTAMPTZ on every table
- Indexes on all foreign keys
- Composite unique constraints where logically required
- CHECK constraints for enum-like fields
- COMMENT ON TABLE/COLUMN for documentation

Entities and relationships:
"""
{results from steps 1 and 2}
"""

Format: one SQL file, tables in dependency order (referenced before referencing).

Result: Migration

-- Migration: 001_initial_schema.sql
-- Generated from PRD: Project Management System

CREATE EXTENSION IF NOT EXISTS "pgcrypto";

-- ============================================================
-- Lookup tables
-- ============================================================

CREATE TABLE role (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name        VARCHAR(50) NOT NULL UNIQUE,
    description TEXT,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

COMMENT ON TABLE role IS 'User roles within the organization';

INSERT INTO role (name, description) VALUES
    ('admin',   'Full access to the organization'),
    ('manager', 'Manage projects and members'),
    ('member',  'Basic access to assigned projects');

-- ============================================================
-- Core tables
-- ============================================================

CREATE TABLE organization (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name        VARCHAR(255) NOT NULL,
    slug        VARCHAR(100) NOT NULL UNIQUE,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

COMMENT ON TABLE organization IS 'Organization — top level of the hierarchy';
CREATE INDEX idx_organization_slug ON organization (slug);

CREATE TABLE "user" (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email       VARCHAR(320) NOT NULL UNIQUE,
    full_name   VARCHAR(255) NOT NULL,
    avatar_url  TEXT,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

COMMENT ON TABLE "user" IS 'System user';
CREATE INDEX idx_user_email ON "user" (email);

CREATE TABLE organization_member (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    organization_id UUID NOT NULL REFERENCES organization (id) ON DELETE CASCADE,
    user_id         UUID NOT NULL REFERENCES "user" (id) ON DELETE CASCADE,
    role_id         UUID NOT NULL REFERENCES role (id) ON DELETE RESTRICT,
    joined_at       TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (organization_id, user_id)
);

COMMENT ON TABLE organization_member IS 'User membership in an organization';
CREATE INDEX idx_org_member_org ON organization_member (organization_id);
CREATE INDEX idx_org_member_user ON organization_member (user_id);

CREATE TABLE project (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    organization_id UUID NOT NULL REFERENCES organization (id) ON DELETE CASCADE,
    name            VARCHAR(255) NOT NULL,
    description     TEXT,
    status          VARCHAR(20) NOT NULL DEFAULT 'active'
                    CHECK (status IN ('active', 'archived', 'completed')),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

COMMENT ON TABLE project IS 'Project within an organization';
CREATE INDEX idx_project_org ON project (organization_id);
CREATE INDEX idx_project_status ON project (status);

CREATE TABLE project_member (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id  UUID NOT NULL REFERENCES project (id) ON DELETE CASCADE,
    user_id     UUID NOT NULL REFERENCES "user" (id) ON DELETE CASCADE,
    added_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (project_id, user_id)
);

COMMENT ON TABLE project_member IS 'User participation in a project';
CREATE INDEX idx_proj_member_project ON project_member (project_id);
CREATE INDEX idx_proj_member_user ON project_member (user_id);

CREATE TABLE task (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id  UUID NOT NULL REFERENCES project (id) ON DELETE CASCADE,
    created_by  UUID NOT NULL REFERENCES "user" (id) ON DELETE RESTRICT,
    assignee_id UUID REFERENCES "user" (id) ON DELETE SET NULL,
    title       VARCHAR(500) NOT NULL,
    description TEXT,
    status      VARCHAR(20) NOT NULL DEFAULT 'todo'
                CHECK (status IN ('todo', 'in_progress', 'review', 'done')),
    priority    SMALLINT NOT NULL DEFAULT 0
                CHECK (priority BETWEEN 0 AND 3),
    due_date    DATE,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

COMMENT ON TABLE task IS 'Task within a project';
COMMENT ON COLUMN task.priority IS '0=none, 1=low, 2=medium, 3=high';
CREATE INDEX idx_task_project ON task (project_id);
CREATE INDEX idx_task_assignee ON task (assignee_id);
CREATE INDEX idx_task_status ON task (status);
CREATE INDEX idx_task_created_by ON task (created_by);

CREATE TABLE comment (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    task_id     UUID NOT NULL REFERENCES task (id) ON DELETE CASCADE,
    author_id   UUID NOT NULL REFERENCES "user" (id) ON DELETE RESTRICT,
    body        TEXT NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

COMMENT ON TABLE comment IS 'Comment on a task';
CREATE INDEX idx_comment_task ON comment (task_id);
CREATE INDEX idx_comment_author ON comment (author_id);

CREATE TABLE attachment (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    task_id     UUID NOT NULL REFERENCES task (id) ON DELETE CASCADE,
    uploaded_by UUID NOT NULL REFERENCES "user" (id) ON DELETE RESTRICT,
    file_name   VARCHAR(500) NOT NULL,
    file_url    TEXT NOT NULL,
    file_size   BIGINT NOT NULL CHECK (file_size > 0),
    mime_type   VARCHAR(255) NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

COMMENT ON TABLE attachment IS 'File attached to a task';
CREATE INDEX idx_attachment_task ON attachment (task_id);

-- ============================================================
-- Trigger: auto-update updated_at
-- ============================================================

CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = now();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_organization_updated BEFORE UPDATE ON organization
    FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_user_updated BEFORE UPDATE ON "user"
    FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_project_updated BEFORE UPDATE ON project
    FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_task_updated BEFORE UPDATE ON task
    FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_comment_updated BEFORE UPDATE ON comment
    FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_role_updated BEFORE UPDATE ON role
    FOR EACH ROW EXECUTE FUNCTION update_updated_at();

Breaking Down the Key Decisions

UUID instead of SERIAL. In distributed systems and with Supabase, UUID is the right choice. No collisions when merging data across environments, no record count leakage.

CHECK constraints instead of separate tables. status and priority use CHECK constraints here. If the set of values changes once a year, CHECK is enough. If users define their own statuses — you’ll need a separate table.

The updated_at trigger. One function, one pattern across all tables. Without it, updated_at requires an explicit SET in every UPDATE query — and someone will eventually forget.

Step 4: Validate the PostgreSQL Schema

Don’t commit a generated schema without three levels of review.

Level 1: Formal Correctness

Run the migration on a clean database. Do the tables create without errors? Do all foreign keys point at tables that actually exist?

psql -d test_db -f 001_initial_schema.sql

If it runs clean, formal correctness is done.

Level 2: Requirements Coverage

For each User Story, write the SQL query that implements it. If a query is impossible, or needs five JOINs to answer a basic question, the schema is wrong.

For each User Story in the PRD, write a SQL query
that implements the primary scenario.
Flag queries that require more than two JOINs
or cannot be executed against the current schema.

Example check for “manager sees all project tasks with assignee names”:

SELECT
    t.id,
    t.title,
    t.status,
    t.priority,
    u.full_name AS assignee_name,
    t.due_date
FROM task t
LEFT JOIN "user" u ON u.id = t.assignee_id
WHERE t.project_id = :project_id
ORDER BY t.priority DESC, t.created_at;

One JOIN, and there’s already an index on project_id. That’s what a good schema looks like.

Level 3: Load Analysis

Estimate data volume per table at 12 months. That determines which indexes you actually need — and where you’ll hit walls first.

Estimate the cardinality of each table after 12 months
for a SaaS product with 500 organizations,
average organization size — 20 users.

Based on those estimates, check:
1. Are the current indexes sufficient for typical queries?
2. Are composite indexes needed?
3. Is partitioning needed for large tables?

Typical result for a task tracker:

Table	Rows after 12 months	Indexes sufficient
`organization`	500	Yes
`user`	10,000	Yes
`task`	500,000	Add composite (project_id, status)
`comment`	2,000,000	Add composite (task_id, created_at)
`attachment`	200,000	Yes

So add two composite indexes:

CREATE INDEX idx_task_project_status ON task (project_id, status);
CREATE INDEX idx_comment_task_created ON comment (task_id, created_at);

Full Workflow: 30 Minutes from PRD to Schema

Time breakdown:

Step	Time	Result
Entity extraction	5 min	List of 8–15 entities with types
Relationship building	5 min	Relationship graph with cascade rules
SQL generation	10 min	Migration with indexes and constraints
Validation	10 min	Verified schema with test queries

The principle: AI generates, you validate. Every step ends with a review. Don’t pipe the output of one prompt straight into the next — that’s how you get confidently wrong schemas.

Common AI Mistakes in Schema Modeling

Over-normalization. AI will build an address table with country, city, street, zip. For an MVP, a JSONB field is usually fine. Normalize when you actually query individual address components.

Missing soft delete. If deletion needs to be reversible — tasks, comments, projects — you need deleted_at TIMESTAMPTZ. AI won’t add it without being told.

Ignoring RLS. With Supabase, Row Level Security isn’t a nice-to-have. Add it to the prompt: “Generate RLS policies for each table based on the role model in the PRD.” More on data protection patterns in the circuit breaker in edge functions article.

Missing partial indexes. For low-cardinality fields like status or is_active, a regular B-tree index is wasteful. A partial index WHERE status = 'active' is smaller and faster for the queries that actually run.

Extension: From Schema to Type-Safe Code

The schema isn’t the end. The next step is type safety at the application layer. For TypeScript + Supabase:

npx supabase gen types typescript --local > src/types/database.ts

This turns the PostgreSQL schema into TypeScript types. Each table becomes an interface, each column a typed field. Schema changes propagate to types automatically — the compiler catches the mismatches before you do.

For more advanced prompt engineering and context management when generating code, see the context engineering guide.

Checklist: What to Verify Before Committing a Migration

All tables are created in dependency order (no forward references)
Every foreign key has an index
ON DELETE rules match the business logic
updated_at is updated by a trigger, not manually
Enum-like fields use CHECK or a separate table (a conscious choice)
UUID instead of SERIAL for primary keys in distributed systems
Composite unique constraints on junction tables
Comments on tables and non-obvious columns
Partial indexes for low-cardinality fields
RLS policies (for Supabase)

This process scales without changing shape. A PRD with 3 User Stories gives you 9 tables. A PRD with 30 gives you 40–60. Same prompt structure, same step order. Only validation takes longer — more entities means more relationships to check. AI handles the generation. You handle the decisions that will still matter in two years.

Need help with AI-powered domain modeling? I help startups build AI products and automate processes — belov.works.

FAQ

Can I use this workflow with ORMs like Prisma or SQLAlchemy instead of raw SQL?

Yes. Run the four-step process as described to produce the validated SQL migration first, then ask the model to convert it into your ORM’s schema format. Starting from validated SQL is safer than generating ORM schemas directly — the SQL step forces explicit decisions about cascade rules and indexes that ORMs often leave as defaults.

How should I handle schema changes when the PRD evolves mid-development?

Treat the PRD diff as a new input. Feed the original entity list, the current schema, and the changed requirements into the relationship-building prompt and ask the model to identify what changes. Generate only the delta migration, not a full rewrite. Always run the Level 2 validation (User Story coverage) again after any schema change — new requirements often break queries that previously worked.

What is the practical limit on schema complexity before this workflow breaks down?

The workflow handles 40–60 entities reliably with a single context window. Beyond that, the model starts producing inconsistent FK references and missing indexes. The fix is to split the domain into bounded contexts, run the four-step process per context, then add a final cross-context relationship step to wire the boundaries together.