Ryan Winkler This website requires JavaScript to function. Please enable JavaScript in your browser settings. Contact via email

← Back to Frameworks

AI Production Readiness Checklist

A systematic checklist for evaluating whether an AI feature is ready for production deployment. Covers safety, trust, monitoring, and user impact.

AI Production Readiness Checklist

Before shipping any AI feature to production, evaluate it against these dimensions.

1. Safety & Trust

Harm potential assessed — What happens when the AI is wrong? (High/Medium/Low)
Escalation path defined — Users can always reach a human
Confidence thresholds set — AI only acts when confidence is above threshold
Failure mode documented — Team knows what failure looks like and how to detect it
Content guardrails tested — Injection, prompt leaking, harmful outputs tested

2. Quality Evaluation

Real conversations reviewed — Not just synthetic test cases
Edge cases tagged — Unusual requests, multilingual, sarcasm, etc.
Quality scoring defined — What does “good” look like? Rubric exists.
Baseline measured — Current non-AI performance documented
A/B test plan ready — Statistical significance, rollback criteria

3. Monitoring & Observability

Metrics pipeline live — Latency, error rate, confidence distribution
Alert thresholds set — Team gets paged when quality degrades
Audit trail exists — Every AI decision is logged for review
Feedback loop active — Users can flag bad outputs
Dashboard accessible — Non-technical stakeholders can check health

4. User Experience

Expectations set — Users know they’re interacting with AI
Transparency built in — Users can see why the AI made a decision
Opt-out available — Users can choose the non-AI path
Loading states handled — AI latency doesn’t break the UX
Graceful degradation — Feature works when AI service is down

5. Operational Readiness

Rollback plan documented — Can ship be reverted in <5 minutes?
Cost model understood — Per-request cost, monthly projection
Rate limits in place — Abuse protection, cost ceiling
On-call knows the feature — Runbook exists, team briefed
Legal review complete — Data handling, privacy, compliance checked

When to Use This

Run this checklist:

Before any AI feature reaches GA
Before expanding AI to new user segments
After any significant model or prompt change
Quarterly as part of AI feature health review

Signal Scorecard — For evaluating customer signal quality
RICE/DRICE — For prioritising which AI features to build