Structured Interview Kit: AI Generates Questions, Scorecards, and Debrief Agendas
What is a structured interview kit?
A structured interview kit is a standardized hiring package that includes a competency matrix, behavioral and situational questions tied to each competency, a scorecard with defined rating levels, and a debrief agenda. It replaces improvised interviewing with a consistent, bias-reducing evaluation process.
TL;DR
- -68% of interviews at companies under 200 people happen without prepared questions — structured kits fix this
- -A complete kit has 4 parts: competency matrix, behavioral questions, scorecard (1–4 scale), debrief agenda
- -Each scorecard level must use concrete observable indicators — not abstract phrases like 'meets expectations'
- -Debrief agenda: everyone reads scores first, then discusses discrepancies — eliminates anchoring bias
- -AI builds the full kit in 30 minutes using 3 targeted prompts: matrix → questions → scorecard
68% of interviews at companies under 200 people happen without prepared questions. The interviewer improvises, the candidate gets an unpredictable experience, and the hiring decision is made based on gut feeling. A structured interview kit fixes this: fixed questions, a shared rating scale, a formalized debrief. AI can build that kit in 30 minutes instead of days.
This article covers how to create a complete structured interview kit using prompts — from a competency matrix to a debrief agenda. All templates are ready to use.
What Goes into a Structured Interview Kit
A structured interview kit has four components:
Competency matrix. A list of skills and qualities to evaluate. Each competency is tied to a specific interview stage and a specific interviewer. Without the matrix, two interviewers might ask the same questions while skipping critical areas.
Competency-based questions. Behavioral and situational questions linked to the matrix. Behavioral questions test past experience (“Tell me about a time when…”). Situational questions test thinking (“What would you do if…”). Each question comes with follow-up questions and indicators of a strong answer.
Scorecard. A standard evaluation form with clear rating levels: from 1 (does not meet expectations) to 4 (exceeds expectations). Each level is defined by concrete observable indicators, not abstract phrases. Each interviewer fills it out right after their session, before talking to anyone else.
Debrief agenda. A structured meeting where interviewers discuss the candidate. Fixed order: each person reads their scores first, then the group discusses discrepancies, then votes. This eliminates the anchoring effect, where the first speaker’s opinion skews everyone else.
Competency Matrix: Prompt and Template
The first step is defining what to evaluate. The competency matrix distributes skills across interview stages so each interviewer owns their zone.
Prompt for generating the matrix:
Role: Senior Talent Acquisition Partner with structured hiring experience.
Task: Create a competency matrix for a [POSITION] role at the [LEVEL] level.
Company context:
- Industry: [INDUSTRY]
- Team size: [SIZE]
- Stack/tools: [STACK]
- Key responsibilities for this role: [TASKS]
Matrix requirements:
1. 6-8 competencies split into hard skills and soft skills
2. Each competency tied to one interview stage
3. For each competency — weight (critical / important / nice-to-have)
4. No more than 3 competencies per stage
Output format — table:
| Competency | Type | Weight | Stage | Interviewer |
Example output for a Product Manager (mid-level):
| Competency | Type | Weight | Stage | Interviewer |
|---|---|---|---|---|
| Prioritization and roadmap | Hard | Critical | Product case | Head of Product |
| Working with metrics | Hard | Critical | Product case | Head of Product |
| Cross-functional communication | Soft | Critical | Behavioral | Engineering Lead |
| User research | Hard | Important | Technical | UX Researcher |
| Strategic thinking | Soft | Important | Behavioral | Engineering Lead |
| Stakeholder management | Soft | Important | Final | CEO/CTO |
| SQL / analytics tools | Hard | Nice-to-have | Technical | UX Researcher |
The matrix should have no overlaps: if “cross-functional communication” is evaluated in the behavioral stage, there’s no need to repeat it in the technical stage. One skill — one stage — one owner.
Generating Interview Questions with AI
Once the matrix is ready, generate the questions. The key rule: each question tests exactly one competency. If a question tests two, it’s hard to score on the scorecard.
Prompt for generating questions:
Based on the competency matrix below, generate questions for a
structured interview.
[INSERT MATRIX]
For each competency, create:
1. One main behavioral question (STAR format: Situation-Task-Action-Result)
2. One situational question
3. Two follow-up questions to go deeper
4. Description of a strong answer (3-4 points)
5. Description of a weak answer (2-3 red flags)
Rules:
- Questions are specific and grounded in real work situations
- No questions with an obvious "right" answer
- Follow-ups dig deeper: numbers, details, the candidate's specific role
- Avoid questions that discriminate by age, gender, or origin
Example generated questions for the “Prioritization and roadmap” competency:
Main behavioral question: “Tell me about a time when you had more stakeholder requests than your team had capacity for. How did you decide what would make it into the next release and what wouldn’t?”
Situational question: “You have three requests: the CEO wants a feature for an enterprise client, data shows high churn at onboarding, and engineers are asking for a week to address tech debt. One sprint. How do you decide?”
Follow-up questions:
- “What framework did you use for prioritization? Why that one?”
- “How did you explain to stakeholders why their request didn’t make the release?”
Strong answer:
- Names a specific framework (RICE, ICE, MoSCoW) and explains the choice
- References data as the basis for the decision, not opinions
- Describes how they communicated the rejection to affected stakeholders
- Includes a result: metric improved, team stayed focused
Red flags:
- “I did whatever the CEO asked” (no autonomy)
- Can’t name the criteria used for the decision
- Doesn’t mention communicating rejections
Scorecard: Rating Scale and Observable Indicators
A scorecard turns a subjective impression into a measurable evaluation. Without a scorecard, interviewers use phrases like “seemed okay” or “something felt off” — and there’s nothing useful to discuss in the debrief.
Prompt for generating a scorecard:
Create a scorecard for evaluating candidates for the [POSITION] role.
Competencies from the matrix:
[INSERT MATRIX]
For each competency, create a 4-level scale:
- 1 (Does not meet expectations): specific failure indicators
- 2 (Partially meets expectations): what exactly falls short
- 3 (Meets expectations): what an ideal candidate should demonstrate
- 4 (Exceeds expectations): what sets an outstanding candidate apart
Rules:
- Indicators describe observable behavior, not abstract qualities
- "Good communication skills" is a bad indicator
- "Structured their answer, gave specific numbers, asked a clarifying question" is a good indicator
- Each level has 2-3 specific indicators
Example scorecard for “Working with metrics”:
WORKING WITH METRICS
Weight: Critical | Stage: Product case | Evaluator: Head of Product
[1] Does not meet expectations
- Cannot name the product's key metrics
- Confuses vanity metrics with actionable metrics
- Cannot connect metrics to business goals
[2] Partially meets expectations
- Names standard metrics (DAU, retention) but can't explain the choice
- Knows analytics tools at a basic level
- Struggles to interpret data without prompting
[3] Meets expectations
- Builds a metric hierarchy: North Star → key → supporting
- Explains how metrics drive product decisions
- Provides an example where data changed product direction
[4] Exceeds expectations
- Designs a metric system with counter-metrics in mind
- Describes a statistically correct approach to A/B tests
- Has automated dashboards or built self-serve analytics
Each interviewer fills out the scorecard before the debrief meeting. If an interviewer hears others’ evaluations before recording their own, the results will be biased.
Debrief Agenda: Structuring the Candidate Discussion
The debrief is where interviewers make the hiring decision. Without structure, it becomes 40 minutes of retelling interviews with the conclusion “seems fine, let’s hire.” A formalized agenda saves time and improves decision quality.
Prompt for generating a debrief agenda:
Create a debrief agenda for discussing a candidate for [POSITION].
Number of interviewers: [N]
Interview stages: [LIST OF STAGES]
Debrief time: [MINUTES]
Requirements:
1. Independent score readout (no discussion until everyone has shared)
2. Focus on score discrepancies (2+ point difference)
3. Separate block for discussing red flags
4. Final vote: Strong Yes / Yes / No / Strong No
5. Rule: one Strong No = mandatory discussion, even if others voted Yes
Format: timeline with minutes and the person responsible for each block
Example debrief agenda for 30 minutes with 4 interviewers:
DEBRIEF AGENDA — [Candidate Name] for [Position]
Date: _____ | Moderator: Hiring Manager
[0:00 - 0:02] Opening
- Moderator reminds everyone of the format
- Confirm: all scorecards were completed before the meeting
[0:02 - 0:14] Score round (3 min per interviewer)
- Each person reads: competency scores + overall verdict
- Others do NOT comment until the round is complete
- Order: junior interviewers first, then senior (reduces authority bias)
[0:14 - 0:22] Discuss discrepancies
- Moderator surfaces competencies with 2+ point differences
- Each side provides specific examples from the interview
- Goal: not to persuade, but to understand why scores differ
[0:22 - 0:26] Red flags and risks
- Are there patterns that multiple interviewers noticed?
- Discuss mitigation: can the weakness be compensated for?
[0:26 - 0:30] Decision
- Each person votes: Strong Yes / Yes / No / Strong No
- Strong No rule: one Strong No = detailed discussion required
- Moderator records the decision and next steps
The order of speakers matters. If the CTO speaks first and gives a “Strong Yes,” others will unconsciously shift their scores upward. So the order runs from least influential to most.
Complete Prompt: Generate an Interview Kit in One Request
For those who want the entire kit in a single pass, here’s a combined prompt. Works best with long-context models (Claude, GPT-4o).
Role: Senior Talent Acquisition Partner. You are designing a
structured interview process.
Task: Create a complete structured interview kit for this role:
- Position: [POSITION]
- Level: [JUNIOR / MID / SENIOR / LEAD]
- Company: [INDUSTRY, SIZE, STACK]
- Key responsibilities: [3-5 KEY TASKS FOR THIS ROLE]
- Number of interview stages: [N]
- Interviewers: [INTERVIEWER ROLES]
Create the following in sequence:
1. COMPETENCY MATRIX
- 6-8 competencies (hard + soft), distributed across stages
- Weight: critical / important / nice-to-have
- No more than 3 competencies per stage
2. QUESTIONS (for each competency)
- Behavioral question (STAR)
- Situational question
- 2 follow-up questions
- Strong answer indicators (3-4 points)
- Red flags (2-3 points)
3. SCORECARD
- 4-level scale (1-4) for each competency
- Concrete observable indicators at each level
- Notes field for the interviewer
4. DEBRIEF AGENDA
- Timeline for [MINUTES] minutes
- Independent score readout round
- Discrepancy discussion
- Red flags
- Vote and decision
This prompt creates a working kit that you’ll need to adapt to your company. Typical adjustments: replacing examples with industry-relevant ones, adding specific technical questions, adjusting competency weights.
Adapting the Kit for Different Roles
The same framework works across positions, but the emphasis shifts:
Engineering roles. More hard skills in the matrix (60–70%). Situational questions are replaced with live coding or system design. The scorecard includes code quality and architectural thinking.
Product roles. Balanced hard and soft skills (50/50). A product case is a mandatory standalone stage. The scorecard focuses on the quality of thinking, not on a “correct” answer.
Leadership roles. More soft skills (60–70%). Behavioral questions about conflicts, letting people go, crisis situations. The scorecard includes self-awareness: can the candidate name their own mistakes?
Startup (generalist roles). Shorter competency matrix (4–5 points). More situational questions, because past experience may not be directly relevant. The scorecard adds a “learning speed” criterion.
Automation: From Prompts to a System
After 5–10 hires, the kit turns into a system. Prompts become standardized and embedded in the workflow. The approach is similar to AI-generated SOP documentation: the same principles of formalizing unstructured processes.
Three levels of automation:
Level 1: templates. Save the prompts in your team’s knowledge base. When a new role opens, the hiring manager runs the prompt with the role’s parameters. The kit is generated in 15–20 minutes. This already saves 3–4 hours per opening.
Level 2: question library. After 10+ hires, you’ll have a library of proven questions with real examples of strong and weak answers. AI selects questions from the library rather than generating from scratch.
Level 3: ATS integration. The scorecard is embedded in the Applicant Tracking System. Interviewers fill in scores in the system, and the debrief automatically pulls all scorecards and highlights discrepancies. At this level, AI helps analyze patterns: which questions best predict successful hires.
Quality of prompts depends directly on how precisely you describe the context. The principles of context engineering fully apply here: the more relevant context you give the model (role description, company culture, real examples), the more accurate the result.
Where to Start
If your company doesn’t have a structured interview process yet, don’t roll out everything at once. A practical sequence:
-
Pick one open role. Use the combined prompt to generate a full kit. Spend 30 minutes adapting it to your specifics.
-
Run one full interview cycle in the new format. Ask interviewers to fill out scorecards and run the debrief using the agenda. Collect feedback: which questions worked, which scoring criteria felt vague.
-
Iterate. Refine questions and the scorecard based on feedback. Typically after 2–3 iterations the kit stabilizes and needs minimal changes.
-
Scale. Once the format proves effective for one role, build kits for others. The competency matrix will differ, but the scorecard format and debrief agenda will be reusable across the board.
The core principle: structured interviews aren’t about bureaucracy — they’re about reproducibility. Every candidate gets the same set of questions, every interviewer uses the same scale, every decision is based on data rather than impressions.