PLG Playbook: Building an AI-Powered PQL Scoring Model
What is a Product Qualified Lead (PQL)?
A Product Qualified Lead is a free-tier user who has demonstrated purchase intent through product behavior — completing activation events, reaching usage limits, or inviting teammates — rather than through marketing interactions like form fills or webinar attendance. PQL scoring quantifies this behavioral signal as a numeric score that tells sales who to contact and why.
TL;DR
- -MQL scoring fails in PLG because demographic and marketing signals are weak predictors; a user can register with a Gmail address, ignore all emails, and still convert to paid.
- -A PQL score combines three components: activation score (40%), engagement score (35%), and firmographic score (25%), each sourced from product events, usage analytics, and enrichment data.
- -Activation events must be discovered per product through cohort analysis — comparing first-14-day behavior of converted vs. churned users — not borrowed from case studies.
- -LLM enrichment turns a numeric score into a sales briefing: use case hypothesis, expansion signal, recommended opening, and risk factor — all derived from actual usage data.
- -Calibrate the model monthly; fixed thresholds decay within 2–3 months as activation patterns shift with new features and seasonal behavior.
PLG companies convert free-to-paid more effectively than sales-led ones. But only when sales gets the right leads at the right time. Most teams still dump every registered user into their CRM — including the ones who signed up, poked around once, and disappeared.
A Product Qualified Lead is fundamentally different from an MQL. An MQL filled out a form. A PQL has used the product and shown behavior that predicts purchase. This article walks through building a PQL scoring model from scratch: activation event definition, SQL-based scoring, LLM classification, and CRM sync.
What PQL Scoring Is and Why MQL Scoring Fails in PLG
MQL scoring works on demographics and marketing interactions: job title, company size, downloaded a whitepaper, attended a webinar. In a PLG model, these signals are nearly useless. A user might register with a Gmail address, never touch a marketing email, and still open the product every day.
PQL scoring uses product usage as the primary signal. Three categories matter:
Activation signals. The user has completed key actions that correlate with long-term retention. For Slack, that’s a team exchanging 2,000 messages. For Dropbox, uploading from one device and accessing from another. Activation events are different for every product — you can’t borrow them from a case study.
Engagement depth. Frequency, feature breadth, time in product. Not “logged in 5 times” but “used 3+ core features in the last 7 days.”
Expansion signals. Invites teammates, creates a team, hits free-plan limits, exports data. These actions show the product is delivering value at the organizational level — which is what B2B buyers care about.
Identifying Activation Events with LLMs
The first step is finding activation events for your specific product. The standard approach: retrospective cohort analysis. Take users who converted to paid and compare their first-14-days behavior against users who churned. This presupposes a clean event taxonomy — without consistent event names and properties, cohort comparisons return noise.
SQL query for a basic cohort breakdown (if you’re new to SQL, the SQL for product managers guide covers the patterns used below):
WITH converted_users AS (
SELECT user_id, MIN(subscription_start) AS conversion_date
FROM subscriptions
WHERE plan != 'free'
GROUP BY user_id
),
user_events AS (
SELECT
e.user_id,
e.event_name,
COUNT(*) AS event_count,
COUNT(DISTINCT DATE(e.created_at)) AS active_days,
CASE WHEN cu.user_id IS NOT NULL THEN 'converted' ELSE 'churned' END AS cohort
FROM events e
LEFT JOIN converted_users cu ON e.user_id = cu.user_id
WHERE e.created_at <= COALESCE(cu.conversion_date, e.created_at + INTERVAL '30 days')
AND e.created_at >= e.user_created_at
AND e.created_at <= e.user_created_at + INTERVAL '14 days'
GROUP BY e.user_id, e.event_name, cohort
)
SELECT
event_name,
cohort,
AVG(event_count) AS avg_count,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY event_count) AS median_count,
COUNT(DISTINCT user_id) AS users
FROM user_events
GROUP BY event_name, cohort
ORDER BY event_name, cohort;
This query compares converted vs. churned users across their first 14 days. The output is a table showing which events differ statistically between cohorts.
The catch: the table has dozens of events, and correlation doesn’t imply causation. An LLM helps cut through the noise — interpreting the data and generating hypotheses worth testing.
Prompt for analyzing activation events:
You are a product analyst. Analyze cohort analysis data for a SaaS product.
Product context: [product description, main use cases, key features]
Data (event_name | cohort | avg_count | median_count | users):
[paste SQL query result here]
Tasks:
1. Identify 3-5 events with the greatest difference between the converted and churned cohorts
2. For each event, suggest a threshold value at which conversion probability significantly increases
3. Exclude events that are a consequence of conversion, not a predictor
4. Suggest event combinations (activation milestones) that form an "aha moment"
Response format: JSON with fields event_name, threshold, confidence, reasoning
LLMs don’t replace statistical analysis. They speed up interpretation and surface hypotheses you’d then validate through A/B tests. The output of this prompt is a working set of activation events with threshold values — a starting point, not a final answer.
PQL Scoring Model Architecture
A PQL score is a number from 0 to 100. It’s composed of three components with different weights:
| Component | Weight | Data Source |
|---|---|---|
| Activation score | 40% | Product events |
| Engagement score | 35% | Usage analytics |
| Firmographic score | 25% | Enrichment data |
Activation Score
Binary check: did the user complete an activation event or not? Each event carries its own weight within the component.
WITH activation_checks AS (
SELECT
u.user_id,
u.email,
u.company_domain,
-- Activation event 1: created a project
MAX(CASE WHEN e.event_name = 'project_created' THEN 1 ELSE 0 END) AS created_project,
-- Activation event 2: invited a teammate
MAX(CASE WHEN e.event_name = 'team_invite_sent' THEN 1 ELSE 0 END) AS invited_teammate,
-- Activation event 3: used core feature 3+ times
CASE WHEN COUNT(CASE WHEN e.event_name = 'core_feature_used' THEN 1 END) >= 3
THEN 1 ELSE 0 END AS used_core_feature,
-- Activation event 4: connected an integration
MAX(CASE WHEN e.event_name = 'integration_connected' THEN 1 ELSE 0 END) AS connected_integration
FROM users u
LEFT JOIN events e ON u.user_id = e.user_id
AND e.created_at >= u.created_at
AND e.created_at <= u.created_at + INTERVAL '14 days'
GROUP BY u.user_id, u.email, u.company_domain
)
SELECT
user_id,
email,
company_domain,
ROUND(
(created_project * 30 +
invited_teammate * 30 +
used_core_feature * 25 +
connected_integration * 15)
) AS activation_score
FROM activation_checks;
Weights come from the cohort analysis correlation. invited_teammate tends to carry a high weight because bringing in colleagues is one of the strongest conversion predictors in B2B SaaS — it means the product is spreading inside an organization.
Engagement Score
Measures depth and frequency of use. Unlike the activation score, this one’s continuous — recalculated every day as behavior changes.
WITH daily_usage AS (
SELECT
user_id,
COUNT(DISTINCT DATE(created_at)) AS active_days_last_14,
COUNT(DISTINCT event_name) AS unique_features_used,
COUNT(*) AS total_events,
MAX(created_at) AS last_active_at
FROM events
WHERE created_at >= CURRENT_DATE - INTERVAL '14 days'
GROUP BY user_id
),
engagement_scored AS (
SELECT
user_id,
-- Frequency: active days out of 14
LEAST(active_days_last_14 / 14.0 * 100, 100) AS frequency_score,
-- Breadth: unique features used (normalized to total feature count)
LEAST(unique_features_used / 8.0 * 100, 100) AS breadth_score,
-- Recency: penalty for inactivity
CASE
WHEN last_active_at >= CURRENT_DATE - INTERVAL '1 day' THEN 100
WHEN last_active_at >= CURRENT_DATE - INTERVAL '3 days' THEN 75
WHEN last_active_at >= CURRENT_DATE - INTERVAL '7 days' THEN 40
ELSE 10
END AS recency_score
FROM daily_usage
)
SELECT
user_id,
ROUND(frequency_score * 0.4 + breadth_score * 0.35 + recency_score * 0.25) AS engagement_score
FROM engagement_scored;
The breadth_score normalization depends on how many core features you track. Replace the 8 in the example with your actual feature count.
Firmographic Score
Product usage carries the weight. The remaining 25% comes from company-level data: size, industry, tech stack. Pull it from enrichment services — Clearbit, Apollo, Clay.
SELECT
u.user_id,
CASE
WHEN c.employee_count > 500 THEN 30
WHEN c.employee_count > 100 THEN 25
WHEN c.employee_count > 20 THEN 20
WHEN c.employee_count > 5 THEN 15
ELSE 5
END +
CASE
WHEN c.industry IN ('technology', 'saas', 'fintech') THEN 25
WHEN c.industry IN ('ecommerce', 'media', 'education') THEN 20
WHEN c.industry IN ('healthcare', 'manufacturing') THEN 15
ELSE 10
END +
CASE
WHEN c.estimated_revenue > 10000000 THEN 25
WHEN c.estimated_revenue > 1000000 THEN 20
WHEN c.estimated_revenue > 100000 THEN 15
ELSE 5
END +
CASE
WHEN u.email NOT LIKE '%gmail.com'
AND u.email NOT LIKE '%yahoo.com'
AND u.email NOT LIKE '%hotmail.com' THEN 20
ELSE 0
END AS firmographic_score
FROM users u
LEFT JOIN companies c ON u.company_domain = c.domain;
A corporate email gets +20 points. It’s blunt, but it works: work-domain users convert significantly more often than free-email ones.
Final PQL Score and LLM Classification
Composite score:
SELECT
a.user_id,
a.email,
ROUND(
a.activation_score * 0.40 +
e.engagement_score * 0.35 +
f.firmographic_score * 0.25
) AS pql_score,
CASE
WHEN ROUND(a.activation_score * 0.40 + e.engagement_score * 0.35 + f.firmographic_score * 0.25) >= 75 THEN 'hot'
WHEN ROUND(a.activation_score * 0.40 + e.engagement_score * 0.35 + f.firmographic_score * 0.25) >= 50 THEN 'warm'
WHEN ROUND(a.activation_score * 0.40 + e.engagement_score * 0.35 + f.firmographic_score * 0.25) >= 25 THEN 'nurture'
ELSE 'monitor'
END AS pql_tier
FROM activation_scores a
JOIN engagement_scores e ON a.user_id = e.user_id
JOIN firmographic_scores f ON a.user_id = f.user_id;
Four tiers:
- Hot (75+): Pass to sales immediately. High conversion probability.
- Warm (50–74): In-app upsell triggers. Automatic hints about premium features.
- Nurture (25–49): Onboarding drip campaigns. Nudge toward activation events.
- Monitor (0–24): Observe. Don’t spend sales resources.
The tier distribution is itself a metric worth tracking — a PLG dashboard showing the hot/warm/nurture mix over time reveals whether activation is improving or regressing before it shows up in revenue.
A number handles prioritization. But sales reps need context: why is this user hot, what are they actually doing in the product, what do you lead with on the first call. That’s where LLMs earn their place.
Prompt for generating sales context:
You are a sales intelligence assistant. Based on product usage data, generate a briefing for a sales manager.
User data:
- Email: {email}
- Company: {company_name} ({industry}, {employee_count} employees)
- PQL score: {pql_score} (tier: {pql_tier})
- Completed activation events: {completed_activations}
- Missing activation events: {missing_activations}
- Most used features: {top_features}
- Active days in the last 14 days: {active_days}
- Number of invited teammates: {invited_count}
- Current plan: {plan}
- Has hit plan limits: {hit_limits}
Generate:
1. Use case hypothesis (1 sentence): what problem the user is solving
2. Expansion signal (1 sentence): why they're ready to upgrade
3. Recommended opening (1 sentence): how to start the conversation
4. Risk factor (1 sentence): what might prevent conversion
Format: JSON. No generic phrases. Only specifics based on data.
The output is structured JSON written to the CRM contact record. Instead of “score: 82,” the sales manager sees: “User runs feature X daily for task Y, has invited 4 teammates, and is hitting API request limits. Pitch the Enterprise plan around team collaboration.”
Automated Pipeline: From Events to CRM
The pipeline architecture has four stages:
Product Events → Event Store → Score Calculator → CRM Sync
│ │ │
Segment/ Scheduled job HubSpot/
Amplitude/ (hourly/daily) Salesforce
PostHog API
Stage 1: Event Collection
Product analytics (Segment, Amplitude, PostHog) pipes events into the warehouse. Minimum required fields per event:
{
"user_id": "usr_abc123",
"event_name": "core_feature_used",
"properties": {
"feature": "report_builder",
"duration_seconds": 340
},
"timestamp": "2026-03-25T14:22:00Z",
"context": {
"company_domain": "acme.com"
}
}
Stage 2: Score Calculation
A scheduled job (cron, Airflow, dbt) runs the SQL from the previous sections. Output is a pql_scores table: user_id, pql_score, pql_tier, activation_score, engagement_score, firmographic_score, calculated_at.
CREATE TABLE pql_scores AS
SELECT
a.user_id,
a.email,
a.company_domain,
a.activation_score,
e.engagement_score,
f.firmographic_score,
ROUND(a.activation_score * 0.40 + e.engagement_score * 0.35 + f.firmographic_score * 0.25) AS pql_score,
CASE
WHEN ROUND(a.activation_score * 0.40 + e.engagement_score * 0.35 + f.firmographic_score * 0.25) >= 75 THEN 'hot'
WHEN ROUND(a.activation_score * 0.40 + e.engagement_score * 0.35 + f.firmographic_score * 0.25) >= 50 THEN 'warm'
WHEN ROUND(a.activation_score * 0.40 + e.engagement_score * 0.35 + f.firmographic_score * 0.25) >= 25 THEN 'nurture'
ELSE 'monitor'
END AS pql_tier,
CURRENT_TIMESTAMP AS calculated_at
FROM activation_scores a
JOIN engagement_scores e ON a.user_id = e.user_id
JOIN firmographic_scores f ON a.user_id = f.user_id;
Stage 3: LLM Enrichment
LLM enrichment runs for pql_tier = 'hot' users and for anyone transitioning from warm to hot. Calls happen in batches, not real-time. Cost: ~$0.01–0.03 per user with GPT-5.4-mini (current pricing at platform.openai.com).
import json
from openai import OpenAI
client = OpenAI()
def generate_sales_context(user_data: dict) -> dict:
prompt = f"""You are a sales intelligence assistant...
[prompt from the previous section with substituted data]"""
response = client.chat.completions.create(
model="gpt-5.4-mini",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"},
temperature=0.3
)
return json.loads(response.choices[0].message.content)
Low temperature (0.3) keeps outputs consistent. The JSON response format eliminates parsing errors. If you’re running a multi-provider setup with LiteLLM, route calls through a unified proxy with model fallbacks. Track prompt quality with Langfuse.
Stage 4: CRM Sync
HubSpot and Salesforce support custom properties via API. Minimum fields to sync:
import hubspot
from hubspot.crm.contacts import SimplePublicObjectInput
client = hubspot.Client.create(access_token="your_token")
def sync_pql_to_hubspot(user_email: str, pql_data: dict, sales_context: dict):
properties = {
"pql_score": str(pql_data["pql_score"]),
"pql_tier": pql_data["pql_tier"],
"pql_activation_score": str(pql_data["activation_score"]),
"pql_engagement_score": str(pql_data["engagement_score"]),
"pql_use_case": sales_context.get("use_case_hypothesis", ""),
"pql_expansion_signal": sales_context.get("expansion_signal", ""),
"pql_recommended_opening": sales_context.get("recommended_opening", ""),
"pql_last_calculated": pql_data["calculated_at"]
}
contact = SimplePublicObjectInput(properties=properties)
client.crm.contacts.basic_api.update(
contact_id=get_contact_id_by_email(user_email),
simple_public_object_input=contact
)
When a user hits hot, the CRM auto-creates a task for the sales manager. Speed matters here: outreach velocity after a PQL signal is one of the most reliable win-rate drivers in B2B sales. Minutes beat hours.
Calibrating the Model and Thresholds
The model needs regular calibration. Two numbers to watch:
Precision. What share of hot PQLs actually convert. Target: >30%. If it drops below 20%, your thresholds are too low or the component weights are off.
Recall. What share of actual conversions the model predicted as hot or warm. Target: >70%. If it falls below 50%, the model is missing behavioral patterns — often because activation events are outdated.
SQL query for precision and recall evaluation:
WITH predictions AS (
SELECT
p.user_id,
p.pql_tier,
CASE WHEN s.user_id IS NOT NULL THEN 1 ELSE 0 END AS actually_converted
FROM pql_scores p
LEFT JOIN subscriptions s ON p.user_id = s.user_id
AND s.plan != 'free'
AND s.subscription_start > p.calculated_at
AND s.subscription_start <= p.calculated_at + INTERVAL '30 days'
WHERE p.calculated_at >= CURRENT_DATE - INTERVAL '90 days'
)
SELECT
pql_tier,
COUNT(*) AS total_users,
SUM(actually_converted) AS converted,
ROUND(SUM(actually_converted)::NUMERIC / COUNT(*) * 100, 1) AS precision_pct,
ROUND(SUM(actually_converted)::NUMERIC /
(SELECT COUNT(DISTINCT user_id) FROM subscriptions
WHERE plan != 'free'
AND subscription_start >= CURRENT_DATE - INTERVAL '90 days') * 100, 1
) AS recall_pct
FROM predictions
GROUP BY pql_tier
ORDER BY pql_tier;
Calibrate monthly. User behavior drifts: new features change activation patterns, seasonality hits engagement. Fixed thresholds decay within 2–3 months — you’ll start missing leads you should be catching.
You can automate calibration: feed metric drift to the same LLM and have it suggest weight adjustments. The product team still owns the final call.
PQL Scoring Economics
Implementation cost depends on your infrastructure.
| Component | Cost (per month) |
|---|---|
| Event tracking (Segment/PostHog) | $0–300 (free tier covers up to 10K MAU) |
| Data warehouse (BigQuery/Snowflake) | $50–200 |
| LLM API (GPT-5.4-mini for hot leads) | ~$10–50 (for ~1,000 hot PQLs/month) |
| CRM (HubSpot/Salesforce) | Existing subscription |
| Orchestration (Airflow/cron) | $0–50 |
Total: $60–600/month for companies under 50K MAU. Off-the-shelf alternatives — Pocus, Correlated — start at $500/month at comparable scale. Building your own costs less, but you’re also taking on maintenance. Worth it if you have engineering bandwidth and want full control over the model.
Common Mistakes When Implementing PQL Scoring
Too many activation events. Three to five is enough. A model with 15+ events overfits to noise and loses predictive power. More events aren’t more signal.
Skipping account-level aggregation. In B2B, the company buys — not the individual user. If five employees from the same domain are active but each has an individual score of 30, the model misses a hot account entirely. Aggregate at company_domain.
Static thresholds. Tier boundaries should match what the sales team can actually process. If hot PQLs are coming in at 500/week and you have three reps, raise the threshold. The scoring model is only as useful as the speed of follow-up.
No feedback loop. Reps should log PQL quality in the CRM: converted, not relevant, bad timing. Without that feedback, the model never gets better. Minimum viable: a binary “useful / not useful” flag after each outreach.
Score without context. “82” tells a sales rep nothing. LLM-generated briefings turn a number into an action. Don’t treat this as optional — it’s what makes the difference between leads that get worked and leads that sit in the queue.
What’s Next
PQL scoring is the foundation. Once it’s running, natural next steps:
- Predictive model. Replace rule-based scoring with logistic regression or gradient boosting trained on historical conversions. Rules work at launch; ML generalizes better as volume grows.
- Real-time scoring. Move from batch (daily) to stream processing. Score updates on every event; sales gets a notification the moment a user crosses into hot.
- In-product actions. Let the score drive UX, not just CRM: paywall triggers, premium feature hints, personalized onboarding flows.
- Multi-touch attribution. Layer PQL score on top of marketing touchpoints for a fuller picture of how deals actually close.
The whole system takes 2–3 sprints to ship. Results show up within 30 days: sales works fewer leads, but each one converts at a higher rate. That’s the point of Product-Led Growth — the product does the qualification work, so your team doesn’t have to.
Need help building a PQL scoring model? I help startups build AI products and automate processes — belov.works.
FAQ
How do you handle PQL scoring for a B2B product where multiple users from the same company are active on the free plan?
Aggregate at the company domain level, not the individual user level. Sum or average scores across all users from the same domain, and add a multiplier for team size — five active users from acme.com is a stronger signal than one user with a higher individual score. The SQL pattern is to GROUP BY company_domain and treat the account as the scoring unit. Individual user scores still matter for personalizing the sales briefing, but the tier assignment should reflect account-level behavior.
What’s the minimum viable version of PQL scoring a two-person startup can ship in a week?
Start with a single binary rule: if a user has completed at least two specific activation events within 14 days AND used a work-domain email, flag them as a PQL. No weights, no tiers, no LLM enrichment. Export the list weekly to a spreadsheet and have the founder call each person. This produces 80% of the value of a full scoring model and takes a day to implement. Layer in the full SQL-based scoring once you’ve validated that activation events actually predict conversion in your product.
How does PQL scoring interact with self-serve and product-led sales motions — should it apply to both?
Yes, but with different outputs. For self-serve (user converts without sales contact), PQL scoring drives in-product triggers: which upsell message to show, when to surface the pricing page, what feature to unlock in a trial. For product-led sales (a human reaches out), PQL scoring tells the rep who to contact and what to say. The same underlying score powers both motions — the difference is in what action fires when a user crosses a threshold.