Rigor Dial - a volume knob for AI assistant criticism
What is the Rigor Dial?
The Rigor Dial is a slash skill for Claude Code that controls how critically the AI assistant evaluates your decisions, on a scale from 0 (silent executor) to 10 (challenges every assumption). It lets developers switch between a fast-execution mode and a devil's advocate mode without loading separate prompts.
TL;DR
- -/rigor is a slash skill with a 0–10 scale: 0 = silent executor, 10 = challenges every assumption
- -At level 0, Claude just writes the code. At level 10, it interrogates the business case before touching a line
- -AI assistants default to sycophancy — they execute rather than challenge, even on bad architecture decisions
- -The skill is a single markdown file that injects behavior rules into Claude's context — no system prompt hacks
- -Install takes 30 seconds: git clone + symlink into ~/.claude/skills/
Claude Code operates in one mode. You ask a question - you get an answer. You ask it to write code - it writes code. Sometimes it asks a clarifying question, sometimes it doesn’t. There’s no way to adjust how critical it is.
Fixing a typo in a README and designing a new service with payment integration - Claude treats these the same way. Somewhere between “doing it silently” and “let’s discuss.” Too many questions for simple tasks. Too few for complex ones.
The single-mode problem
AI assistants tend toward sycophancy - agreeing with the user. The model would rather execute a request than challenge it. Reddit threads pop up regularly: “Claude gives performative pushback on trivial decisions but agrees with actually bad architecture.”
Solutions exist - devil’s advocate commands for specific scenarios. /devils-advocate for code review before a PR. Adversarial reasoning protocol for architecture decisions. But these are all one-shot tools: turn on, get criticism, turn off.
The thing is, assistant behavior isn’t a toggle. It’s a spectrum. Sometimes you need a silent executor, sometimes a colleague who asks uncomfortable questions. And you want to switch between modes fast, without loading separate prompts.
Rigor Dial: scale from 0 to 10
/rigor - a slash skill for Claude Code. One parameter: a number from 0 to 10.
| Level | Name | Behavior |
|---|---|---|
| 0 | Bro | Zero questions. Does what you ask. “Done.” is a valid response |
| 1–2 | Chill | Only warns if code will break prod right now |
| 3–4 | Gentle | Soft suggestions, mentions tradeoffs |
| 5 | Balanced | Default mode - speed and quality in balance |
| 6–7 | Critical | Challenges architecture, demands justification |
| 8–9 | Paranoid | Edge cases, worst-case scenarios, “what’s the rollback plan?“ |
| 10 | Dushnila | ”Why are we building this at all? Show me the data.” |
Level ten is named “dushnila” - a word from Russian dev slang for someone who suffocates you with questions and won’t let anything slide. Every team has one. Sometimes annoying. Sometimes saves you from expensive mistakes.
Same prompt, three levels
Task: “Let’s add Redis to cache AI chat responses.”
/rigor 0
Adding Redis client and caching middleware to
ai-chat-send-message. Done.
/rigor 5
A couple questions before I start:
- What TTL for the cache? AI responses depend on conversation context - same prompts with different history produce different answers.
- Redis managed (Upstash?) or self-hosted? We don’t have Redis in the stack right now - this is new infrastructure.
/rigor 10
Hold on. Several questions:
- What metric are we improving? Latency? Cost? How much do AI calls cost now, what percentage of requests are potentially cacheable?
- Cache hit rate will be near zero. Chat is a contextual dialogue. Two identical messages with different history = different responses.
- Redis is +$30/month, a new service to monitor, a new failure point. Who’s on call when it goes down at 3am?
- Alternatives without new infra: Anthropic prompt caching (built into the API), response streaming for perceived latency.
- Do users even complain about chat speed? Any data from analytics? Or are we optimizing something that doesn’t hurt?
Same task. Same knowledge in the assistant. The only difference - how aggressively it questions your decisions.
How it works
Technically, /rigor is a markdown file describing behavior for each level. When you call /rigor 7, Claude Code injects instructions into the conversation context. The model gets rules: ask clarifying questions, challenge architecture, demand justification.
No system prompt hacks, no fine-tuning, no complex infrastructure. One file. It works because Claude follows behavioral instructions well - especially when they’re specific and structured.
The level affects how Claude communicates, not what it knows. At zero, the code is just as correct - just no questions. At ten - same code, but only after an interrogation.
# Install — 30 seconds
git clone https://github.com/spyrae/rigor-dushno.git ~/.claude/rigor-dushno
ln -s ~/.claude/rigor-dushno/skills/rigor ~/.claude/skills/rigor
# Usage
/rigor 0 # for typo fixes
/rigor 5 # for regular features
/rigor 10 # for architecture decisions
When to use which level
0–2 - mechanical work. Rename a variable, fix a config, bump a dependency. Questions just slow you down.
3–5 - regular development. New widget, method refactor, writing a test. Claude mentions tradeoffs but doesn’t block progress.
6–7 - features with non-obvious architecture. API design, auth flow changes, new data models. You need someone asking “did you think about edge case X?”
8–10 - decisions that are expensive to reverse. Database choice, payment flow structure, auth architecture changes. Better to spend 10 minutes justifying than a week rolling back.
Limitations
Context isn’t infinite. Skill instructions take up ~800 tokens in the context window. Unnoticeable for most tasks, but in long sessions with a large codebase every token counts.
Level doesn’t persist across sessions. New conversation - default behavior. You need to call /rigor again. Workaround: set a default level in your project’s CLAUDE.md.
The scale is subjective. The difference between levels 6 and 7 isn’t always clear. The model interprets instructions, it doesn’t execute an algorithm. Sometimes at level 8, Claude asks a question that feels more like a 6. It’s a guideline, not a switch.
Doesn’t replace code review. Even at ten, Claude doesn’t see the project the way a colleague who’s been in the codebase for six months does. It’s a self-check tool, not a replacement for humans.
Why regulate pushback at all
The cost of a mistake depends on context. Typo in a README - zero. Payment service architecture - weeks of rework.
An AI assistant with the same level of criticism for everything either slows down simple tasks with unnecessary questions, or lets complex ones through without proper scrutiny. /rigor removes friction on small stuff and adds scrutiny where it matters.
Devil’s advocate on a schedule is a habit. But habits turned into a repeatable process scale better than good intentions.
Source code: github.com/spyrae/rigor-dushno. Also ships with /dushno - the Russian-language variant for full immersion.