Structured Interview Kit: AI Generates Questions, Scorecards, and Debrief Agendas

68% of interviews at companies under 200 people happen without prepared questions. The interviewer improvises, the candidate gets an unpredictable experience, and the hiring decision is made based on gut feeling. A structured interview kit fixes this: fixed questions, a shared rating scale, a formalized debrief. AI can build that kit in 30 minutes instead of days.

This article covers how to create a complete structured interview kit using prompts — from a competency matrix to a debrief agenda. All templates are ready to use.

What Goes into a Structured Interview Kit

A structured interview kit has four components:

Competency matrix. A list of skills and qualities to evaluate. Each competency is tied to a specific interview stage and a specific interviewer. Without the matrix, two interviewers might ask the same questions while skipping critical areas.

Competency-based questions. Behavioral and situational questions linked to the matrix. Behavioral questions test past experience (“Tell me about a time when…”). Situational questions test thinking (“What would you do if…”). Each question comes with follow-up questions and indicators of a strong answer.

Scorecard. A standard evaluation form with clear rating levels: from 1 (does not meet expectations) to 4 (exceeds expectations). Each level is defined by concrete observable indicators, not abstract phrases. Each interviewer fills it out right after their session, before talking to anyone else.

Debrief agenda. A structured meeting where interviewers discuss the candidate. Fixed order: each person reads their scores first, then the group discusses discrepancies, then votes. This eliminates the anchoring effect, where the first speaker’s opinion skews everyone else.

Competency Matrix: Prompt and Template

The first step is defining what to evaluate. The competency matrix distributes skills across interview stages so each interviewer owns their zone.

Prompt for generating the matrix:

Role: Senior Talent Acquisition Partner with structured hiring experience.

Task: Create a competency matrix for a [POSITION] role at the [LEVEL] level.

Company context:
- Industry: [INDUSTRY]
- Team size: [SIZE]
- Stack/tools: [STACK]
- Key responsibilities for this role: [TASKS]

Matrix requirements:
1. 6-8 competencies split into hard skills and soft skills
2. Each competency tied to one interview stage
3. For each competency — weight (critical / important / nice-to-have)
4. No more than 3 competencies per stage

Output format — table:
| Competency | Type | Weight | Stage | Interviewer |

Example output for a Product Manager (mid-level):

Competency	Type	Weight	Stage	Interviewer
Prioritization and roadmap	Hard	Critical	Product case	Head of Product
Working with metrics	Hard	Critical	Product case	Head of Product
Cross-functional communication	Soft	Critical	Behavioral	Engineering Lead
User research	Hard	Important	Technical	UX Researcher
Strategic thinking	Soft	Important	Behavioral	Engineering Lead
Stakeholder management	Soft	Important	Final	CEO/CTO
SQL / analytics tools	Hard	Nice-to-have	Technical	UX Researcher

The matrix should have no overlaps: if “cross-functional communication” is evaluated in the behavioral stage, there’s no need to repeat it in the technical stage. One skill — one stage — one owner.

Generating Interview Questions with AI

Once the matrix is ready, generate the questions. The key rule: each question tests exactly one competency. If a question tests two, it’s hard to score on the scorecard.

Prompt for generating questions:

Based on the competency matrix below, generate questions for a
structured interview.

[INSERT MATRIX]

For each competency, create:
1. One main behavioral question (STAR format: Situation-Task-Action-Result)
2. One situational question
3. Two follow-up questions to go deeper
4. Description of a strong answer (3-4 points)
5. Description of a weak answer (2-3 red flags)

Rules:
- Questions are specific and grounded in real work situations
- No questions with an obvious "right" answer
- Follow-ups dig deeper: numbers, details, the candidate's specific role
- Avoid questions that discriminate by age, gender, or origin

Example generated questions for the “Prioritization and roadmap” competency:

Main behavioral question: “Tell me about a time when you had more stakeholder requests than your team had capacity for. How did you decide what would make it into the next release and what wouldn’t?”

Situational question: “You have three requests: the CEO wants a feature for an enterprise client, data shows high churn at onboarding, and engineers are asking for a week to address tech debt. One sprint. How do you decide?”

Follow-up questions:

“What framework did you use for prioritization? Why that one?”
“How did you explain to stakeholders why their request didn’t make the release?”

Strong answer:

Names a specific framework (RICE, ICE, MoSCoW) and explains the choice
References data as the basis for the decision, not opinions
Describes how they communicated the rejection to affected stakeholders
Includes a result: metric improved, team stayed focused

Red flags:

“I did whatever the CEO asked” (no autonomy)
Can’t name the criteria used for the decision
Doesn’t mention communicating rejections

Scorecard: Rating Scale and Observable Indicators

A scorecard turns a subjective impression into a measurable evaluation. Without a scorecard, interviewers use phrases like “seemed okay” or “something felt off” — and there’s nothing useful to discuss in the debrief.

Prompt for generating a scorecard:

Create a scorecard for evaluating candidates for the [POSITION] role.

Competencies from the matrix:
[INSERT MATRIX]

For each competency, create a 4-level scale:
- 1 (Does not meet expectations): specific failure indicators
- 2 (Partially meets expectations): what exactly falls short
- 3 (Meets expectations): what an ideal candidate should demonstrate
- 4 (Exceeds expectations): what sets an outstanding candidate apart

Rules:
- Indicators describe observable behavior, not abstract qualities
- "Good communication skills" is a bad indicator
- "Structured their answer, gave specific numbers, asked a clarifying question" is a good indicator
- Each level has 2-3 specific indicators

Example scorecard for “Working with metrics”:

WORKING WITH METRICS
Weight: Critical | Stage: Product case | Evaluator: Head of Product

[1] Does not meet expectations
- Cannot name the product's key metrics
- Confuses vanity metrics with actionable metrics
- Cannot connect metrics to business goals

[2] Partially meets expectations
- Names standard metrics (DAU, retention) but can't explain the choice
- Knows analytics tools at a basic level
- Struggles to interpret data without prompting

[3] Meets expectations
- Builds a metric hierarchy: North Star → key → supporting
- Explains how metrics drive product decisions
- Provides an example where data changed product direction

[4] Exceeds expectations
- Designs a metric system with counter-metrics in mind
- Describes a statistically correct approach to A/B tests
- Has automated dashboards or built self-serve analytics

Each interviewer fills out the scorecard before the debrief meeting. If an interviewer hears others’ evaluations before recording their own, the results will be biased.

Debrief Agenda: Structuring the Candidate Discussion

The debrief is where interviewers make the hiring decision. Without structure, it becomes 40 minutes of retelling interviews with the conclusion “seems fine, let’s hire.” A formalized agenda saves time and improves decision quality.

Prompt for generating a debrief agenda:

Create a debrief agenda for discussing a candidate for [POSITION].

Number of interviewers: [N]
Interview stages: [LIST OF STAGES]
Debrief time: [MINUTES]

Requirements:
1. Independent score readout (no discussion until everyone has shared)
2. Focus on score discrepancies (2+ point difference)
3. Separate block for discussing red flags
4. Final vote: Strong Yes / Yes / No / Strong No
5. Rule: one Strong No = mandatory discussion, even if others voted Yes

Format: timeline with minutes and the person responsible for each block

Example debrief agenda for 30 minutes with 4 interviewers:

DEBRIEF AGENDA — [Candidate Name] for [Position]
Date: _____ | Moderator: Hiring Manager

[0:00 - 0:02] Opening
- Moderator reminds everyone of the format
- Confirm: all scorecards were completed before the meeting

[0:02 - 0:14] Score round (3 min per interviewer)
- Each person reads: competency scores + overall verdict
- Others do NOT comment until the round is complete
- Order: junior interviewers first, then senior (reduces authority bias)

[0:14 - 0:22] Discuss discrepancies
- Moderator surfaces competencies with 2+ point differences
- Each side provides specific examples from the interview
- Goal: not to persuade, but to understand why scores differ

[0:22 - 0:26] Red flags and risks
- Are there patterns that multiple interviewers noticed?
- Discuss mitigation: can the weakness be compensated for?

[0:26 - 0:30] Decision
- Each person votes: Strong Yes / Yes / No / Strong No
- Strong No rule: one Strong No = detailed discussion required
- Moderator records the decision and next steps

The order of speakers matters. If the CTO speaks first and gives a “Strong Yes,” others will unconsciously shift their scores upward. So the order runs from least influential to most.

Complete Prompt: Generate an Interview Kit in One Request

For those who want the entire kit in a single pass, here’s a combined prompt. Works best with long-context models (Claude, GPT-5.4).

Role: Senior Talent Acquisition Partner. You are designing a
structured interview process.

Task: Create a complete structured interview kit for this role:
- Position: [POSITION]
- Level: [JUNIOR / MID / SENIOR / LEAD]
- Company: [INDUSTRY, SIZE, STACK]
- Key responsibilities: [3-5 KEY TASKS FOR THIS ROLE]
- Number of interview stages: [N]
- Interviewers: [INTERVIEWER ROLES]

Create the following in sequence:

1. COMPETENCY MATRIX
- 6-8 competencies (hard + soft), distributed across stages
- Weight: critical / important / nice-to-have
- No more than 3 competencies per stage

2. QUESTIONS (for each competency)
- Behavioral question (STAR)
- Situational question
- 2 follow-up questions
- Strong answer indicators (3-4 points)
- Red flags (2-3 points)

3. SCORECARD
- 4-level scale (1-4) for each competency
- Concrete observable indicators at each level
- Notes field for the interviewer

4. DEBRIEF AGENDA
- Timeline for [MINUTES] minutes
- Independent score readout round
- Discrepancy discussion
- Red flags
- Vote and decision

This prompt creates a working kit that you’ll need to adapt to your company. Typical adjustments: replacing examples with industry-relevant ones, adding specific technical questions, adjusting competency weights.

Adapting the Kit for Different Roles

The same framework works across positions, but the emphasis shifts:

Engineering roles. More hard skills in the matrix (60–70%). Situational questions are replaced with live coding or system design. The scorecard includes code quality and architectural thinking.

Product roles. Balanced hard and soft skills (50/50). A product case is a mandatory standalone stage. The scorecard focuses on the quality of thinking, not on a “correct” answer.

Leadership roles. More soft skills (60–70%). Behavioral questions about conflicts, letting people go, crisis situations. The scorecard includes self-awareness: can the candidate name their own mistakes?

Startup (generalist roles). Shorter competency matrix (4–5 points). More situational questions, because past experience may not be directly relevant. The scorecard adds a “learning speed” criterion.

Automation: From Prompts to a System

After 5–10 hires, the kit turns into a system. Prompts become standardized and embedded in the workflow. The approach is similar to AI-generated SOP documentation: the same principles of formalizing unstructured processes.

Three levels of automation:

Level 1: templates. Save the prompts in your team’s knowledge base. When a new role opens, the hiring manager runs the prompt with the role’s parameters. The kit is generated in 15–20 minutes. This already saves 3–4 hours per opening.

Level 2: question library. After 10+ hires, you’ll have a library of proven questions with real examples of strong and weak answers. AI selects questions from the library rather than generating from scratch.

Level 3: ATS integration. The scorecard is embedded in the Applicant Tracking System. Interviewers fill in scores in the system, and the debrief automatically pulls all scorecards and highlights discrepancies. At this level, AI helps analyze patterns: which questions best predict successful hires.

Quality of prompts depends directly on how precisely you describe the context. The principles of context engineering fully apply here: the more relevant context you give the model (role description, company culture, real examples), the more accurate the result.

Where to Start

If your company doesn’t have a structured interview process yet, don’t roll out everything at once. A practical sequence:

Pick one open role. Use the combined prompt to generate a full kit. Spend 30 minutes adapting it to your specifics.
Run one full interview cycle in the new format. Ask interviewers to fill out scorecards and run the debrief using the agenda. Collect feedback: which questions worked, which scoring criteria felt vague.
Iterate. Refine questions and the scorecard based on feedback. Typically after 2–3 iterations the kit stabilizes and needs minimal changes.
Scale. Once the format proves effective for one role, build kits for others. The competency matrix will differ, but the scorecard format and debrief agenda will be reusable across the board.

The core principle: structured interviews aren’t about bureaucracy — they’re about reproducibility. Every candidate gets the same set of questions, every interviewer uses the same scale, every decision is based on data rather than impressions.

Need help building structured hiring processes? I help startups build AI products and automate processes — belov.works.

FAQ

Research shows structured interviews improve hiring, but by how much compared to unstructured ones?

Meta-analyses put structured interview validity at 0.51 (correlation with job performance) versus 0.38 for unstructured — a roughly 35% improvement in predictive power. More practically, organizations that switch to structured processes report 30–40% reduction in early-attrition hires within the first 90 days. The scorecard format specifically reduces the “halo effect,” where a strong answer on one dimension inflates scores across all others.

Should all interviewers see each other’s scorecards before the debrief, or only after everyone has submitted?

Only after everyone has submitted — this is the critical sequence that prevents anchoring. If interviewers share scores before the debrief, the first person’s ratings statistically pull the group toward their evaluation. In practice, use a shared form (Notion, ATS, Google Forms) where submissions are hidden until the moderator unlocks them at the start of the meeting. This takes 30 seconds to configure and eliminates the most common source of bias in panel hiring.

What’s the minimum viable version of a structured kit if the team has no time to build the full four-component set?

Prioritize in this order: scorecard first (even a 1–3 scale with one sentence per level), then two pre-agreed behavioral questions per interviewer. These two elements alone reduce “I liked them” decisions more than any other single change. The competency matrix and debrief agenda can be added iteratively after the first two or three hires using the same kit.