From PRD to Database Schema in 30 Minutes: AI-Powered Domain Modeling
What is AI-powered database schema design?
AI-powered database schema design is the practice of using large language models to accelerate the transformation of product requirements documents (PRDs) into production-ready PostgreSQL schemas. It matters because most database architecture mistakes occur during the first modeling session, and catching them before launch costs far less than fixing them post-release. In a structured workflow, AI compresses the hypothesis-validation-correction cycle from hours to minutes by generating entity lists, relationship graphs, and SQL migrations from plain-text requirements.
TL;DR
- -Most database architecture mistakes happen during the first modeling session — catching them early costs far less than fixing them post-launch.
- -A four-step AI workflow (entity extraction → relationship mapping → SQL generation → validation) turns a PRD into a production-ready PostgreSQL schema in 30 minutes.
- -AI consistently misses audit fields (created_at, updated_at, created_by) and RLS policies — add explicit instructions for both in your prompts.
- -Validate generated schemas at three levels: formal SQL correctness, User Story coverage (all scenarios executable with ≤2 JOINs), and cardinality analysis for 12-month load.
- -For a SaaS with 500 organizations, task and comment tables hit millions of rows within a year — composite indexes on (project_id, status) and (task_id, created_at) are required from day one.
Most database architecture mistakes happen in the first modeling session — before any code is written. A missing many-to-many relationship, weak normalization, no indexes on hot queries. Catching these after launch costs an order of magnitude more than catching them at the modeling stage.
AI changes the economics here. Not because it generates perfect schemas automatically — it doesn’t. But it compresses the “hypothesis → validation → correction” cycle from hours to minutes. PRD goes in, a validated PostgreSQL schema comes out. 30 minutes instead of two days.
Here are four steps that turn a requirements document into a production-ready migration: entity extraction, relationship building, SQL generation, and validation.
Step 1: Extract Entities from the PRD
A PRD describes a product in business language. A database works with entities, attributes, and relationships. The first step is crossing that boundary.
A typical PRD has User Stories, Functional Requirements, Non-Functional Requirements. Entities are hiding in the nouns — in User Stories and feature descriptions.
Prompt for Entity Extraction
You are a senior database architect. Analyze the PRD and extract all domain entities.
For each entity, define:
1. Name (snake_case, singular)
2. Business description (one sentence)
3. Key attributes with data types
4. Whether the entity is core or auxiliary (lookup/junction)
PRD:
"""
{paste PRD text here}
"""
Format the response as a table. Do not add entities that don't clearly follow from the requirements.
Example: Task Tracker PRD
Suppose the PRD describes a project management system. The User Stories section contains:
As a project manager, I want to create projects, add members, and assign tasks. As a member, I want to see tasks assigned to me, leave comments, and attach files. As an admin, I want to manage roles and permissions within an organization.
AI extracts the entities:
| Entity | Type | Description |
|---|---|---|
organization | core | Organization that owns projects |
user | core | System user |
project | core | Project within an organization |
task | core | Task within a project |
comment | core | Comment on a task |
attachment | core | Attached file |
role | lookup | User role (admin, manager, member) |
organization_member | junction | Links user to organization with a role |
project_member | junction | Links user to a project |
Nine entities from three User Stories. Notice that AI surfaced the junction tables organization_member and project_member — neither is explicitly mentioned in the PRD, but both fall out of the phrase “add members.”
What to Check at This Stage
Three mistakes AI makes consistently during entity extraction:
Too many entities. It creates a separate table for every mentioned concept. “Task priority” becomes a priority table when an enum field would do fine. The rule: if a concept has no attributes of its own and won’t grow over time, it’s a field, not a table.
Missing audit fields. PRDs almost never say “record who changed a task and when.” But in production you’ll want created_at, updated_at, created_by on nearly every table. Add it explicitly to the prompt: “Include standard audit fields for every entity.”
Mixing domain and infrastructure. Tables like session, audit_log, notification_queue are infrastructure concerns. Don’t let them into the domain model — they’re a distraction at this stage.
Step 2: Build Relationships Between Entities
Entities without relationships are just a list of tables. Relationships define how data connects and how queries will be written. Mistakes here cause the worst kind of pain: impossible queries, or silent data duplication that you won’t notice until production.
Prompt for Building Relationships
Based on the entity list, identify all relationships between them.
For each relationship, specify:
1. Type: one-to-one, one-to-many, many-to-many
2. Nullability (nullable or NOT NULL)
3. Cascade delete rule (CASCADE, SET NULL, RESTRICT)
4. Ownership direction (which side "owns" the relationship)
Entities:
"""
{entity list from step 1}
"""
Format: relationship table. For many-to-many, specify the junction table.
Result: Relationship Graph
organization 1──N project (CASCADE)
organization 1──N organization_member (CASCADE)
user 1──N organization_member (CASCADE)
role 1──N organization_member (RESTRICT)
project 1──N project_member (CASCADE)
user 1──N project_member (CASCADE)
project 1──N task (CASCADE)
task N──1 user (SET NULL, nullable — assignee)
task N──1 user (RESTRICT — creator)
task 1──N comment (CASCADE)
user 1──N comment (RESTRICT)
task 1──N attachment (CASCADE)
user 1──N attachment (RESTRICT)
The key decisions are right there in the graph. Delete an organization — projects and members go with it. Delete a user assigned to a task — assignee_id becomes NULL, the task survives. Delete a role that members hold — RESTRICT blocks it entirely.
The Trap: Hidden Many-to-Many
The PRD says “a user is assigned to a task.” Sounds like one-to-many — one assignee per task. Then two sprints in, someone says “we need multiple assignees.” Now the one-to-many becomes many-to-many: a junction table, a data migration, rewritten queries. Fun week.
Design for many-to-many upfront if there’s any chance the business logic expands “one → many.” An empty junction table costs nothing. A production migration costs days.
Ask AI directly:
For each one-to-many relationship, assess the likelihood
that it becomes many-to-many within the next 6 months.
Justify based on typical evolution of similar products.
Step 3: Generate the PostgreSQL Schema
Entities defined, relationships mapped. Time for the SQL migration. This is where AI saves the most time — picking correct data types, adding indexes on every FK, writing constraints you’d otherwise forget at 11pm.
Prompt for SQL Generation
Generate a PostgreSQL migration based on the entity and relationship list.
Requirements:
- UUID v4 for primary keys (gen_random_uuid())
- created_at/updated_at TIMESTAMPTZ on every table
- Indexes on all foreign keys
- Composite unique constraints where logically required
- CHECK constraints for enum-like fields
- COMMENT ON TABLE/COLUMN for documentation
Entities and relationships:
"""
{results from steps 1 and 2}
"""
Format: one SQL file, tables in dependency order (referenced before referencing).
Result: Migration
-- Migration: 001_initial_schema.sql
-- Generated from PRD: Project Management System
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
-- ============================================================
-- Lookup tables
-- ============================================================
CREATE TABLE role (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(50) NOT NULL UNIQUE,
description TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
COMMENT ON TABLE role IS 'User roles within the organization';
INSERT INTO role (name, description) VALUES
('admin', 'Full access to the organization'),
('manager', 'Manage projects and members'),
('member', 'Basic access to assigned projects');
-- ============================================================
-- Core tables
-- ============================================================
CREATE TABLE organization (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
slug VARCHAR(100) NOT NULL UNIQUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
COMMENT ON TABLE organization IS 'Organization — top level of the hierarchy';
CREATE INDEX idx_organization_slug ON organization (slug);
CREATE TABLE "user" (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(320) NOT NULL UNIQUE,
full_name VARCHAR(255) NOT NULL,
avatar_url TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
COMMENT ON TABLE "user" IS 'System user';
CREATE INDEX idx_user_email ON "user" (email);
CREATE TABLE organization_member (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID NOT NULL REFERENCES organization (id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES "user" (id) ON DELETE CASCADE,
role_id UUID NOT NULL REFERENCES role (id) ON DELETE RESTRICT,
joined_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (organization_id, user_id)
);
COMMENT ON TABLE organization_member IS 'User membership in an organization';
CREATE INDEX idx_org_member_org ON organization_member (organization_id);
CREATE INDEX idx_org_member_user ON organization_member (user_id);
CREATE TABLE project (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID NOT NULL REFERENCES organization (id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
description TEXT,
status VARCHAR(20) NOT NULL DEFAULT 'active'
CHECK (status IN ('active', 'archived', 'completed')),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
COMMENT ON TABLE project IS 'Project within an organization';
CREATE INDEX idx_project_org ON project (organization_id);
CREATE INDEX idx_project_status ON project (status);
CREATE TABLE project_member (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID NOT NULL REFERENCES project (id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES "user" (id) ON DELETE CASCADE,
added_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (project_id, user_id)
);
COMMENT ON TABLE project_member IS 'User participation in a project';
CREATE INDEX idx_proj_member_project ON project_member (project_id);
CREATE INDEX idx_proj_member_user ON project_member (user_id);
CREATE TABLE task (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID NOT NULL REFERENCES project (id) ON DELETE CASCADE,
created_by UUID NOT NULL REFERENCES "user" (id) ON DELETE RESTRICT,
assignee_id UUID REFERENCES "user" (id) ON DELETE SET NULL,
title VARCHAR(500) NOT NULL,
description TEXT,
status VARCHAR(20) NOT NULL DEFAULT 'todo'
CHECK (status IN ('todo', 'in_progress', 'review', 'done')),
priority SMALLINT NOT NULL DEFAULT 0
CHECK (priority BETWEEN 0 AND 3),
due_date DATE,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
COMMENT ON TABLE task IS 'Task within a project';
COMMENT ON COLUMN task.priority IS '0=none, 1=low, 2=medium, 3=high';
CREATE INDEX idx_task_project ON task (project_id);
CREATE INDEX idx_task_assignee ON task (assignee_id);
CREATE INDEX idx_task_status ON task (status);
CREATE INDEX idx_task_created_by ON task (created_by);
CREATE TABLE comment (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
task_id UUID NOT NULL REFERENCES task (id) ON DELETE CASCADE,
author_id UUID NOT NULL REFERENCES "user" (id) ON DELETE RESTRICT,
body TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
COMMENT ON TABLE comment IS 'Comment on a task';
CREATE INDEX idx_comment_task ON comment (task_id);
CREATE INDEX idx_comment_author ON comment (author_id);
CREATE TABLE attachment (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
task_id UUID NOT NULL REFERENCES task (id) ON DELETE CASCADE,
uploaded_by UUID NOT NULL REFERENCES "user" (id) ON DELETE RESTRICT,
file_name VARCHAR(500) NOT NULL,
file_url TEXT NOT NULL,
file_size BIGINT NOT NULL CHECK (file_size > 0),
mime_type VARCHAR(255) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
COMMENT ON TABLE attachment IS 'File attached to a task';
CREATE INDEX idx_attachment_task ON attachment (task_id);
-- ============================================================
-- Trigger: auto-update updated_at
-- ============================================================
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = now();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_organization_updated BEFORE UPDATE ON organization
FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_user_updated BEFORE UPDATE ON "user"
FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_project_updated BEFORE UPDATE ON project
FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_task_updated BEFORE UPDATE ON task
FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_comment_updated BEFORE UPDATE ON comment
FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_role_updated BEFORE UPDATE ON role
FOR EACH ROW EXECUTE FUNCTION update_updated_at();
Breaking Down the Key Decisions
UUID instead of SERIAL. In distributed systems and with Supabase, UUID is the right choice. No collisions when merging data across environments, no record count leakage.
CHECK constraints instead of separate tables. status and priority use CHECK constraints here. If the set of values changes once a year, CHECK is enough. If users define their own statuses — you’ll need a separate table.
The updated_at trigger. One function, one pattern across all tables. Without it, updated_at requires an explicit SET in every UPDATE query — and someone will eventually forget.
Step 4: Validate the PostgreSQL Schema
Don’t commit a generated schema without three levels of review.
Level 1: Formal Correctness
Run the migration on a clean database. Do the tables create without errors? Do all foreign keys point at tables that actually exist?
psql -d test_db -f 001_initial_schema.sql
If it runs clean, formal correctness is done.
Level 2: Requirements Coverage
For each User Story, write the SQL query that implements it. If a query is impossible, or needs five JOINs to answer a basic question, the schema is wrong.
For each User Story in the PRD, write a SQL query
that implements the primary scenario.
Flag queries that require more than two JOINs
or cannot be executed against the current schema.
Example check for “manager sees all project tasks with assignee names”:
SELECT
t.id,
t.title,
t.status,
t.priority,
u.full_name AS assignee_name,
t.due_date
FROM task t
LEFT JOIN "user" u ON u.id = t.assignee_id
WHERE t.project_id = :project_id
ORDER BY t.priority DESC, t.created_at;
One JOIN, and there’s already an index on project_id. That’s what a good schema looks like.
Level 3: Load Analysis
Estimate data volume per table at 12 months. That determines which indexes you actually need — and where you’ll hit walls first.
Estimate the cardinality of each table after 12 months
for a SaaS product with 500 organizations,
average organization size — 20 users.
Based on those estimates, check:
1. Are the current indexes sufficient for typical queries?
2. Are composite indexes needed?
3. Is partitioning needed for large tables?
Typical result for a task tracker:
| Table | Rows after 12 months | Indexes sufficient |
|---|---|---|
organization | 500 | Yes |
user | 10,000 | Yes |
task | 500,000 | Add composite (project_id, status) |
comment | 2,000,000 | Add composite (task_id, created_at) |
attachment | 200,000 | Yes |
So add two composite indexes:
CREATE INDEX idx_task_project_status ON task (project_id, status);
CREATE INDEX idx_comment_task_created ON comment (task_id, created_at);
Full Workflow: 30 Minutes from PRD to Schema
Time breakdown:
| Step | Time | Result |
|---|---|---|
| Entity extraction | 5 min | List of 8–15 entities with types |
| Relationship building | 5 min | Relationship graph with cascade rules |
| SQL generation | 10 min | Migration with indexes and constraints |
| Validation | 10 min | Verified schema with test queries |
The principle: AI generates, you validate. Every step ends with a review. Don’t pipe the output of one prompt straight into the next — that’s how you get confidently wrong schemas.
Common AI Mistakes in Schema Modeling
Over-normalization. AI will build an address table with country, city, street, zip. For an MVP, a JSONB field is usually fine. Normalize when you actually query individual address components.
Missing soft delete. If deletion needs to be reversible — tasks, comments, projects — you need deleted_at TIMESTAMPTZ. AI won’t add it without being told.
Ignoring RLS. With Supabase, Row Level Security isn’t a nice-to-have. Add it to the prompt: “Generate RLS policies for each table based on the role model in the PRD.” More on data protection patterns in the circuit breaker in edge functions article.
Missing partial indexes. For low-cardinality fields like status or is_active, a regular B-tree index is wasteful. A partial index WHERE status = 'active' is smaller and faster for the queries that actually run.
Extension: From Schema to Type-Safe Code
The schema isn’t the end. The next step is type safety at the application layer. For TypeScript + Supabase:
npx supabase gen types typescript --local > src/types/database.ts
This turns the PostgreSQL schema into TypeScript types. Each table becomes an interface, each column a typed field. Schema changes propagate to types automatically — the compiler catches the mismatches before you do.
For more advanced prompt engineering and context management when generating code, see the context engineering guide.
Checklist: What to Verify Before Committing a Migration
- All tables are created in dependency order (no forward references)
- Every foreign key has an index
ON DELETErules match the business logicupdated_atis updated by a trigger, not manually- Enum-like fields use CHECK or a separate table (a conscious choice)
- UUID instead of SERIAL for primary keys in distributed systems
- Composite unique constraints on junction tables
- Comments on tables and non-obvious columns
- Partial indexes for low-cardinality fields
- RLS policies (for Supabase)
This process scales without changing shape. A PRD with 3 User Stories gives you 9 tables. A PRD with 30 gives you 40–60. Same prompt structure, same step order. Only validation takes longer — more entities means more relationships to check. AI handles the generation. You handle the decisions that will still matter in two years.
Need help with AI-powered domain modeling? I help startups build AI products and automate processes — belov.works.
FAQ
Can I use this workflow with ORMs like Prisma or SQLAlchemy instead of raw SQL?
Yes. Run the four-step process as described to produce the validated SQL migration first, then ask the model to convert it into your ORM’s schema format. Starting from validated SQL is safer than generating ORM schemas directly — the SQL step forces explicit decisions about cascade rules and indexes that ORMs often leave as defaults.
How should I handle schema changes when the PRD evolves mid-development?
Treat the PRD diff as a new input. Feed the original entity list, the current schema, and the changed requirements into the relationship-building prompt and ask the model to identify what changes. Generate only the delta migration, not a full rewrite. Always run the Level 2 validation (User Story coverage) again after any schema change — new requirements often break queries that previously worked.
What is the practical limit on schema complexity before this workflow breaks down?
The workflow handles 40–60 entities reliably with a single context window. Beyond that, the model starts producing inconsistent FK references and missing indexes. The fix is to split the domain into bounded contexts, run the four-step process per context, then add a final cross-context relationship step to wire the boundaries together.