Chapter 2 of 10

AI Quality Characteristics, Risk, and Acceptance Criteria

Teach learners to turn AI risk, trustworthiness, and quality characteristics into measurable release criteria.

45 min guide5 reference questions folded into the guide material

Guided briefing

AI Quality Characteristics, Risk, and Acceptance Criteria video briefing

A focused explanation of chapter 2, turning the AI testing theory into concrete validation checks.

Briefing focus

Module opening

This is a structured lesson briefing. Real video/audio can be added later as a media source.

Estimated time

9 min

1Module opening
2Learning objectives
3Mind map
4Scenario evidence breakdown

Transcript brief

Teach learners to turn AI risk, trustworthiness, and quality characteristics into measurable release criteria. The briefing explains why the topic matters, walks through a failure scenario, and identifies the artefacts a tester should produce for evidence and auditability.

Key takeaways

Connect the AI risk to a measurable test or monitor.
Document the evidence needed for reproducibility and audit.
Use the lab or scenario to practise the validation workflow.

Module opening

Teach learners to turn AI risk, trustworthiness, and quality characteristics into measurable release criteria.

Audience. QA professionals who need to challenge AI acceptance criteria and make quality measurable.

Why this matters. AI quality is multi-dimensional. A model can be accurate yet unfair, fast yet unsafe, useful yet impossible to explain, or impressive in demos but fragile in production.

ISTQB CT-AI mapping. CT-AI 2.1-2.8, 8.1-8.4

Trainer note

Start with the scenario before the theory. Ask learners what evidence would make them confident, then use the module to build that evidence step by step.

Learning path

Start Here
5 min
Outcome, CT-AI exam relevance, and the hiring-screener scenario.
Learn
20 min
Quality characteristics, risk, acceptance criteria, stakeholder evidence, and release gates.
See It
8 min
Scenario evidence breakdown for subgroup false rejection risk.
Try It
14 min
Build a risk-to-acceptance matrix for a realistic AI release decision.
Recall and Apply
10 min
Exam traps, active recall, and the portfolio artifact.

Learning objectives

Explain the core quality risk in ai quality characteristics, risk, and acceptance criteria.
Select practical test evidence that supports an AI release decision.
Apply the module concepts to a realistic QA scenario.
Produce a portfolio artifact that can be reused in a professional AI testing context.

Mind map

Real-life scenario · Recruitment technology

The hiring screener with a great average score

Situation. A ranking model prioritises CVs for recruiter review. Overall accuracy improved, but false rejection was higher for career changers and candidates with employment gaps.

Lesson. AI testing is strongest when risks, examples, evidence, and release decisions are connected.

Scenario evidence breakdown

Scenario element	Detail
Product/System	Candidate screening platform
AI feature	A ranking model prioritises CVs for recruiter review.
Failure or risk	Overall accuracy improved, but false rejection was higher for career changers and candidates with employment gaps.
Testing challenge	No one had defined acceptable subgroup performance, appeal process, explanation need, or audit evidence.
Tester response	The tester built a risk register linking harm, affected users, quality characteristic, metric, threshold, mitigation, and owner.
Evidence required	Risk register, acceptance criteria matrix, subgroup metric report, review workflow, and release sign-off record.
Business decision	Block release until high-impact false rejection criteria and human review controls are agreed.

Visual flow

AI Quality Characteristics, Risk, and Acceptance Criteria scenario flow

Topic-by-topic teaching guide

1. AI Quality Characteristics

AI quality includes functional performance, robustness, fairness, transparency, safety, security, privacy, usability, and maintainability.

Teaching lens	Practical detail
Real QA example	A loan model needs accuracy, but also fair treatment, explanation, monitoring, and secure access to decision data.
What can go wrong	Optimising only one characteristic and accidentally weakening another.
How a tester should think	Discuss quality as trade-offs tied to product risk.
Evidence to collect	Quality characteristic matrix and stakeholder priorities.

Quality characteristics as trade-offs

AI quality is not one number. A system can improve aggregate performance while weakening fairness, explainability, robustness, privacy, or user control.

Example

A loan model has higher overall accuracy after retraining, but a protected customer segment now receives fewer positive decisions and weaker explanations.

Mistake

Treating accuracy as the only acceptance signal.

Evidence

Quality characteristic matrix, stakeholder priorities, slice metrics, explanation checks, and agreed trade-offs.

2. Risk-Based Testing

Risk combines likelihood, impact, detectability, and the organisation's tolerance for harm.

Teaching lens	Practical detail
Real QA example	A wrong music recommendation is low impact; a wrong medical triage decision can be severe.
What can go wrong	Spending equal effort on low-risk and high-risk behaviours.
How a tester should think	Rank tests by business decision and possible harm.
Evidence to collect	AI risk register with severity, controls, and evidence.

Risk-based release criteria

Risk-based AI testing starts from possible harm, then decides which evidence is required for launch, shadow mode, limited rollout, rollback, or rejection.

Example

The hiring screener's average score improved, but false rejection increased for career changers and candidates with employment gaps.

Mistake

Giving every behaviour equal test effort or approving a high-impact release from an average metric alone.

Evidence

AI risk register with affected groups, severity, likelihood, detectability, mitigation, owner, and release action.

3. Acceptance Criteria

Good AI acceptance criteria are measurable and reviewable. They name metric, slice, data source, threshold, and release action.

Teaching lens	Practical detail
Real QA example	Urgent-ticket recall must be at least 95% on the latest labelled validation set and no lower than 92% for any priority customer segment.
What can go wrong	Writing vague criteria such as 'the model should be fair and accurate'.
How a tester should think	Turn values into observable checks and gates.
Evidence to collect	Acceptance matrix and metric definitions.

Measurable acceptance criteria

A useful AI acceptance criterion names the metric, population slice, dataset or traffic source, threshold, time window, and release action.

Example

Career-changer false rejection must not exceed the agreed threshold on the latest labelled validation set before the model can influence recruiter queues.

Mistake

Writing vague criteria such as "the model should be fair and accurate."

Evidence

Acceptance matrix, metric definitions, validation dataset version, threshold rationale, and sign-off owner.

4. Stakeholder Involvement

AI release criteria need product, data science, QA, operations, legal, security, and affected user representation where appropriate.

Teaching lens	Practical detail
Real QA example	Security may care about prompt injection while support managers care about escalation failures.
What can go wrong	Letting one team define trustworthiness alone.
How a tester should think	Facilitate evidence conversations across roles.
Evidence to collect	RACI, approval workflow, and documented trade-offs.

Stakeholder-owned evidence

AI quality decisions cross team boundaries, so evidence needs named owners and visible trade-offs rather than informal confidence.

Example

Product accepts some false positives, legal needs bias evidence, support needs escalation controls, and security needs misuse checks.

Mistake

Letting one team define trustworthiness alone.

Evidence

RACI, approval workflow, documented trade-offs, reviewer notes, and sign-off record.

5. Release Gates

Release gates decide what evidence is required before launch, shadow mode, limited rollout, or rollback.

Teaching lens	Practical detail
Real QA example	A model can enter shadow mode with incomplete confidence but cannot affect customers until guardrails pass.
What can go wrong	Treating a gate as paperwork after the release decision is already made.
How a tester should think	Make gates explicit before testing starts.
Evidence to collect	Gate checklist, owner sign-off, and rollback trigger.

Release gates

Release gates turn risk appetite into practical decisions: block, approve for shadow mode, approve for limited rollout, or release with monitoring.

Example

The hiring screener can enter shadow mode while reviewers compare ranked and unranked queues, but it cannot affect candidate outcomes until subgroup criteria pass.

Mistake

Treating a gate as paperwork after the release decision is already made.

Evidence

Gate checklist, owner sign-off, monitoring condition, rollback trigger, and post-release review date.

Practical QA workflow

Start from the user or business decision affected by the AI system.
Name the AI asset under test: data, feature pipeline, model, prompt, retrieval index, tool, or full workflow.
Convert the main risk into observable quality signals and release gates.
Choose the right oracle: deterministic assertion, metric threshold, metamorphic relation, reviewer rubric, comparison, or production monitor.
Test important slices, edge cases, misuse cases, and change scenarios.
Record versions, data sources, thresholds, reviewer notes, and decision rationale.

Test design checklist

What harm could happen if this AI behaviour is wrong?
Which users, groups, products, regions, or workflows need separate evidence?
Which metric or observation would reveal the failure early?
What is the minimum evidence needed for release, shadow mode, rollback, or rejection?
Who owns the evidence after the model, prompt, or data changes?

Worked QA example

A tester receives a release request for the module scenario. Instead of asking only whether tests pass, the tester writes three release questions: what changed, who could be harmed, and what evidence proves the change is controlled. The answer becomes a small evidence pack: one risk table, one set of representative examples, one automated or reviewable check, and one release recommendation.

Worked example: Blocking an accuracy-only release

Scenario. The hiring screener's retrained ranking model shows a stronger average validation score, and the product owner asks to release because the headline metric is above target.

Reasoning. Average validation score does not prove acceptable quality for high-impact slices. The tester checks false rejection by candidate segment, explanation quality, appeal workflow, reviewer override, audit logging, and rollback readiness.

Model answer. Block production influence until subgroup false rejection, explanation coverage, human review controls, and audit evidence meet the agreed acceptance matrix. Allow shadow mode only if candidates are not affected.

Common mistakes

Treating AI output as a normal deterministic response when the real risk is behavioural.
Reporting one impressive metric without slices, uncertainty, or business context.
Forgetting that data, prompts, model versions, and monitoring are part of the test surface.
Writing governance language that cannot be checked by a tester.

Guided exercise

Use the scenario above and create a one-page evidence plan. Include the decision being influenced, the main risk, the test oracle, the data or examples required, the release gate, and the owner.

Try it: Build the risk-to-acceptance matrix

Prompt. Use the hiring-screener scenario to define release criteria that would be acceptable to product, QA, data science, legal, and operations.

Learner action. Map each risk to a quality characteristic, metric or review signal, population slice, threshold, evidence source, owner, and release action.

Expected output. `ai-risk-acceptance-matrix.md` with at least five risks, measurable acceptance criteria, owners, and a release recommendation.

Discussion prompt

Which AI quality characteristic is easiest for your team to measure today, and which is most likely to be ignored?

Exam trap

Objective

CT-AI 2.1-2.8

Common trap

Choosing the answer with the best average performance metric while ignoring impact, slices, explainability, monitoring, or stakeholder acceptance.

Wording clue

Look for answers that turn quality attributes into measurable criteria and decision gates.

Portfolio checkpoint

Create the module portfolio deliverable and use it to support your release decision.

Artifact structure

ai-risk-acceptance-matrix.md

ContextRiskQuality characteristicMetric or review signalSliceThresholdOwnerRelease actionOpen questions

Hands-on lab mapping

Lab: CourseMaterials/sandboxes/introduction/
Task: Create a risk-to-acceptance matrix for an AI feature and identify missing release evidence.
Why this lab matters: it turns the module theory into visible evidence that a release approver can inspect.

Decision simulation

A stakeholder wants to launch because accuracy is above target, but fairness and explanation evidence is missing. Decide the release action and conditions.

Key terms

Trustworthiness: The degree to which stakeholders can rely on the AI system for its intended context.
Acceptance criterion: A measurable condition that must be satisfied for release.
Quality characteristic: A dimension of quality such as fairness, robustness, privacy, or transparency.
Release gate: A decision point that requires specific evidence before deployment.

Revision prompts

Explain the module scenario in two minutes to a product owner.
Name three pieces of evidence you would require before release.
Identify one automated check and one human-review check.
Describe how this topic changes after deployment.

Recall check

Why is aggregate accuracy insufficient for the hiring screener?: It can hide high-impact subgroup failures such as false rejection for career changers or candidates with employment gaps.
What makes an AI acceptance criterion testable?: It names a metric or review signal, slice, data source, threshold, release action, and owner.
When should a release be blocked rather than monitored?: When high-impact acceptance criteria, human controls, audit evidence, or rollback readiness are missing.
What portfolio artifact does this module produce?: ai-risk-acceptance-matrix.md, a risk-to-evidence matrix for AI release decisions.