Intermediate2 daysEU AI Act 2026

AI Act 2026: Practical Implementation Playbook

Move beyond compliance checklists. Build real risk assessments, DPIA templates, audit logging, and drift monitoring for EU AI systems — with working code for every deliverable.

EUR 1,490Book a session

This training is eligible for OPCO funding. EUR 1,490 reference price.

What you will build

✓Classify any AI system (Gen AI, RAG, voice agents) under the EU AI Act with documented evidence

✓Complete a DPIA for AI systems — pre-filled templates for three scenarios

✓Generate system cards and technical documentation that satisfy Annex IV

✓Run automated bias detection and production drift monitoring suites

✓Implement Article 12 compliant audit logging with PII protection

✓Produce a full risk assessment for a RAG system with actual risk scores

Who this is for

→Compliance officers and DPOs preparing for August 2026 high-risk system obligations
→Developers building AI systems deployed to EU users
→Legal teams advising organizations on AI Act implementation
→CTOs and product managers launching AI features that may be high-risk

Prerequisites

·Basic understanding of GDPR (data subjects, legal basis, processor vs. controller)
·Familiarity with at least one AI deployment pattern (chatbot, RAG, scoring model)
·Python basics for running the code examples (you do not need to be a developer)

Modules

Risk Taxonomy: Classify Your AI Systems Correctly

2h30

Apply the AI Act's four-tier risk model to real system architectures — generative AI, RAG pipelines, voice agents — and produce a defensible classification with documented evidence.

Risk Taxonomy: Classify Your AI Systems Correctly

By the end of this module you will: correctly classify a generative AI chatbot, a RAG pipeline, and a voice agent under the AI Act; identify which classification triggers high-risk obligations; and produce a classification evidence document that satisfies an audit.

The EU AI Act's risk model is risk-based, not technology-based. The same underlying model — say, a Claude API call — can be minimal risk in one context (summarising internal documents) and high risk in another (scoring job candidates). Classification errors in either direction are costly: misclassify downward and you face fines up to EUR 15 million or 3% of global turnover; misclassify upward and you impose unnecessary compliance overhead on teams. The classification decision must be documented, reviewed annually, and defensible to a market surveillance authority.

The Four Risk Tiers with Real Deployment Examples

Unacceptable Risk (PROHIBITED from February 2025): Real-time remote biometric identification in public spaces without judicial authorisation; social scoring systems that assign citizens a reputation score affecting access to services; AI that exploits psychological vulnerabilities to manipulate behaviour. Example prohibited system: a retail analytics platform that uses facial recognition to match shoppers against a police database and alert security staff in real time.
High Risk (full conformity assessment required before August 2026): Automated CV screening and candidate ranking (Annex III point 4); AI-assisted credit scoring used in lending decisions (Annex III point 5b); medical device software that influences diagnostic or treatment decisions (Annex III point 6); AI used in critical infrastructure management — energy grids, water systems (Annex III point 2). Example high-risk system: a recruitment SaaS that takes CVs and outputs a ranked shortlist, even if a human makes the final hire decision.
Limited Risk (transparency obligations only): Chatbots interacting with natural persons must disclose they are AI. Deepfake content must be labelled. AI-generated text in journalism or marketing must be disclosed. Example: a customer support chatbot built on Claude must tell users they are speaking with an AI — but the system itself requires no conformity assessment.
Minimal Risk (no mandatory obligations): Product recommendation engines, spam filters, AI-powered search, content translation, grammar correction. Example: an internal document summarisation tool using a RAG pipeline over company knowledge base — minimal risk if it does not make consequential decisions about individuals.

Classifying Generative AI Systems (GPAI)

The AI Act adds a separate track for General Purpose AI (GPAI) models — models trained on broad data and usable for many tasks. If you deploy a GPAI model (Claude, GPT-4, Llama 3) or build on top of one, classification works differently: the model itself falls under GPAI obligations (provider responsibility), but your application layer falls under the four-tier risk model. A GPAI model with systemic risk (training compute above 10^25 FLOPs) has additional requirements: adversarial testing, incident reporting, and energy consumption disclosure.

RAG Pipeline Classification Decision Tree

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# AI Act Risk Classification Tool
# Run this against your system specification to get a classification + evidence report
# Covers the most common deployment patterns in enterprise AI (2026)

from dataclasses import dataclass
from typing import Optional
import json

@dataclass
class AISystemSpec:
    name: str
    description: str
    # Context fields
    makes_decisions_about_individuals: bool    # True = potential high risk
    decision_domain: Optional[str]             # "employment", "credit", "health", "education", "law_enforcement", None
    decision_is_automated: bool                # True if no human in the loop
    uses_biometric_data: bool
    deployed_in_critical_infrastructure: bool
    # Architecture fields
    is_gpai_based: bool                        # Built on Claude, GPT, Llama, etc.
    gpai_provider_name: Optional[str]
    training_compute_flops: Optional[float]    # Only if you trained the model yourself
    # Output fields
    output_type: str                           # "recommendation", "score", "decision", "content", "transcription"
    output_used_for: str                       # Free text — what humans do with the output

HIGH_RISK_DOMAINS = {
    "employment": "Annex III §4 — Recruitment, promotion, or termination decisions",
    "credit": "Annex III §5b — Creditworthiness assessment",
    "health": "Annex III §6 — Medical device or clinical decision support",
    "education": "Annex III §3 — Exam scoring or admission decisions",
    "law_enforcement": "Annex III §6 — Polygraph, criminal risk assessment",
    "migration": "Annex III §7 — Asylum or visa processing",
    "justice": "Annex III §8 — Judicial decisions or dispute resolution",
}

SYSTEMIC_RISK_COMPUTE_THRESHOLD = 1e25  # 10^25 FLOPs per EU AI Act Article 51

def classify_ai_system(spec: AISystemSpec) -> dict:
    evidence = []
    risk_level = "minimal"
    obligations = []

    # ── Step 1: Check prohibited practices ───────────────────────────────────
    if spec.uses_biometric_data and "public_space" in spec.description.lower():
        return {
            "risk_level": "unacceptable",
            "prohibited": True,
            "reason": "Real-time remote biometric identification in public spaces — prohibited under Article 5(1)(d)",
            "action": "System must not be deployed. Redesign or abandon.",
        }

    # ── Step 2: Check high-risk domains ──────────────────────────────────────
    if spec.makes_decisions_about_individuals and spec.decision_domain in HIGH_RISK_DOMAINS:
        risk_level = "high"
        ref = HIGH_RISK_DOMAINS[spec.decision_domain]
        evidence.append(f"High-risk domain: {spec.decision_domain} — {ref}")
        obligations += [
            "Conduct conformity assessment before deployment (Article 43)",
            "Register system in EU AI Act database before August 2026 (Article 71)",
            "Implement human oversight mechanism — named reviewer with override authority (Article 14)",
            "Log all input/output pairs with timestamp and user context (Article 12)",
            "Produce technical documentation: system card, training data provenance, performance metrics by demographic group (Article 11)",
            "Conduct DPIA under GDPR if personal data is processed (GDPR Article 35)",
            "Provide post-market monitoring plan with quarterly reviews (Article 72)",
        ]

    elif spec.decision_is_automated and spec.output_type == "decision":
        # Automated decision not in high-risk domain — still GDPR Article 22
        risk_level = "limited"
        evidence.append("Automated decision affecting individuals — GDPR Article 22 applies even if AI Act risk is limited")
        obligations += [
            "Provide meaningful information about the decision logic to affected individuals (GDPR Article 13/14)",
            "Implement right to human review (GDPR Article 22(3))",
            "Document legal basis for automated decision-making",
        ]

    # ── Step 3: Transparency obligations for chatbots ────────────────────────
    if "chatbot" in spec.description.lower() or "conversation" in spec.output_used_for.lower():
        if risk_level == "minimal":
            risk_level = "limited"
        obligations.append("Disclose AI nature to users at interaction start (Article 50(1)) — chatbot must identify itself as AI")
        evidence.append("Conversational AI system — Article 50 transparency obligation applies")

    # ── Step 4: GPAI systemic risk check ─────────────────────────────────────
    gpai_obligations = []
    if spec.is_gpai_based and spec.training_compute_flops:
        if spec.training_compute_flops >= SYSTEMIC_RISK_COMPUTE_THRESHOLD:
            gpai_obligations = [
                "Adversarial testing (red-teaming) against cybersecurity and safety benchmarks before deployment (Article 55)",
                "Report serious incidents to AI Office within 15 business days (Article 55)",
                "Publish energy consumption summary for training run (Article 55)",
            ]
            evidence.append(f"GPAI model with systemic risk: compute {spec.training_compute_flops:.2e} FLOPs ≥ threshold {SYSTEMIC_RISK_COMPUTE_THRESHOLD:.2e}")

    return {
        "system_name": spec.name,
        "risk_level": risk_level,
        "evidence": evidence,
        "obligations": obligations,
        "gpai_obligations": gpai_obligations,
        "classification_date": "2026-05-08",
        "review_due": "2027-05-08",
        "notes": f"Output used for: {spec.output_used_for}. Review if use case changes.",
    }

# ── Example: RAG pipeline for internal HR queries ────────────────────────────
hr_rag = AISystemSpec(
    name="HR Policy RAG Assistant",
    description="Employee-facing chatbot that answers questions about company HR policies using a RAG pipeline over the internal policy knowledge base",
    makes_decisions_about_individuals=False,
    decision_domain=None,
    decision_is_automated=False,
    uses_biometric_data=False,
    deployed_in_critical_infrastructure=False,
    is_gpai_based=True,
    gpai_provider_name="Anthropic Claude",
    training_compute_flops=None,  # We use Claude via API, not training it
    output_type="recommendation",
    output_used_for="Employee self-service policy questions. No consequential decisions made by system.",
)

result = classify_ai_system(hr_rag)
print(json.dumps(result, indent=2))

Voice Agent Classification: A Worked Example

Voice agents present a multi-layered classification problem because they combine three distinct AI components: speech recognition (STT), language model inference (Claude/GPT), and text-to-speech synthesis (ElevenLabs/Azure). Each layer can have different risk levels. The key is to classify the system as a whole based on its purpose and outputs — not its components individually. A voice agent that cold-calls customers to offer loan restructuring is high-risk (credit domain, automated outbound). A voice agent that answers FAQ calls for a software company is limited risk (chatbot transparency obligation only).

Classification rule of thumb for voice agents: if the voice agent's output influences a financial, employment, or health decision — even indirectly — treat it as high-risk and run a full conformity assessment. The cost of over-classification is documentation overhead. The cost of under-classification is EUR 15 million.

🛠️ Exercise 1: Risk Classification Workbench

Run this code against three pre-loaded real-world scenarios, then add your own system as Scenario 4. The classifier mirrors the logic used by compliance teams in actual EU AI Act audits. Pay attention to how a single parameter change shifts the risk tier — and the resulting obligations.

Exercice pratique

Run the code and examine each classification. Then: (1) In Scenario 1, change `uses_biometric_data=False` — does risk stay high? Why? (2) In Scenario 2, change `decision_domain` to `None` — count how many obligations disappear. (3) Add your own real or hypothetical system as Scenario 4 and defend your classification in writing.

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# EU AI Act Risk Classification Workbench
# Run as-is to classify 3 scenarios. Then add your own system as Scenario 4.
import json

HIGH_RISK_DOMAINS = {
    "employment", "credit", "health", "education",
    "law_enforcement", "migration", "justice"
}

def classify(name, description, makes_decisions_about_individuals,
             decision_domain, uses_biometric_data, output_type, chatbot=False):
    risk = "minimal"
    obligations = []
    evidence = []

    # Check prohibited practices (Article 5)
    if uses_biometric_data and "public space" in description.lower() and "law enforcement" in description.lower():
        return {"system": name, "risk": "PROHIBITED",
                "reason": "Article 5(1)(d) — real-time biometric ID in public spaces for law enforcement"}

    # Check high-risk domains (Annex III)
    if makes_decisions_about_individuals and decision_domain in HIGH_RISK_DOMAINS:
        risk = "high"
        evidence.append(f"Annex III — domain: {decision_domain}")
        obligations += [
            "Conformity assessment before deployment (Article 43)",
            "Register in EU AI Act database before August 2026 (Article 71)",
            "Named human reviewer with override authority (Article 14)",
            "Full audit log: inputs, outputs, timestamps (Article 12)",
            "Technical documentation per Annex IV (Article 11)",
            "Post-market monitoring plan, quarterly reviews (Article 72)",
        ]

    # Biometric systems (non-law-enforcement) — still high-risk under Annex III §1
    if uses_biometric_data and risk != "high":
        risk = "high"
        evidence.append("Annex III §1 — biometric categorisation or identification system")
        obligations += ["Conformity assessment (Article 43)", "DPIA under GDPR Article 35"]

    # Chatbot transparency obligation (Article 50)
    if chatbot or "chatbot" in description.lower():
        if risk == "minimal":
            risk = "limited"
        obligations.append("Disclose AI nature at start of interaction (Article 50(1))")
        evidence.append("Conversational AI — Article 50 transparency obligation")

    return {"system": name, "risk": risk, "evidence": evidence, "obligations": obligations}

# ── Scenario 1: Facial recognition for employee time-tracking ────────────────
s1 = classify(
    name="Employee Time-Tracking Biometric System",
    description="Employees scan their face to clock in/out. No law enforcement use.",
    makes_decisions_about_individuals=True,
    decision_domain="employment",
    uses_biometric_data=True,   # TODO: change to False — does risk still stay high?
    output_type="decision",
)

# ── Scenario 2: AI credit pre-screening for mortgage applications ─────────────
s2 = classify(
    name="Mortgage Pre-Screening AI",
    description="Analyses income, debt, and spending to pre-qualify applicants.",
    makes_decisions_about_individuals=True,
    decision_domain="credit",   # TODO: change to None — how many obligations disappear?
    uses_biometric_data=False,
    output_type="score",
)

# ── Scenario 3: Internal document summarisation chatbot ───────────────────────
s3 = classify(
    name="Legal Document Summariser",
    description="Internal chatbot summarising contracts for lawyers. No HR/credit decisions.",
    makes_decisions_about_individuals=False,
    decision_domain=None,
    uses_biometric_data=False,
    output_type="summary",
    chatbot=True,   # TODO: set to False — what happens to risk level?
)

# ── Scenario 4: YOUR SYSTEM — fill in the parameters ─────────────────────────
s4 = classify(
    name="YOUR SYSTEM NAME",                      # TODO: system name
    description="Describe what the system does",  # TODO: include deployment context
    makes_decisions_about_individuals=True,        # TODO: does it affect individuals?
    decision_domain="employment",                  # TODO: employment/credit/health/education/None
    uses_biometric_data=False,                     # TODO: face, fingerprint, voice ID?
    output_type="score",                           # TODO: recommendation/score/decision/content
    chatbot=False,
)

for scenario in [s1, s2, s3, s4]:
    print(json.dumps(scenario, indent=2))
    print("---")

🛠️ Exercise 2: Generate Your Article 11 Technical Documentation

High-risk AI systems must produce technical documentation before a single line of production code runs. Market surveillance authorities request this document during audits — teams that cannot produce it face immediate non-compliance findings. Fill in the fields below for a real or hypothetical CV screening tool.

Exercice pratique

Fill in all TODO fields for a real or hypothetical CV screening tool. Pay particular attention to §4 performance metrics — these must be disaggregated by demographic group. After completing, count how many fields you needed to research vs. already knew. The fields you had to research are your compliance blind spots.

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# Article 11 / Annex IV Technical Documentation Generator
# Fill in the TODO fields for your high-risk AI system.
# A complete document takes ~2 hours per system — start early.
from datetime import date, timedelta

system = {
    # ── Core identification ───────────────────────────────────────────────────
    "name": "TODO: Your AI system name",
    "version": "1.0.0",
    "intended_purpose": "TODO: What decisions does this system support? Who are the affected individuals?",
    "deployer_organisation": "TODO: Your company name",
    "classification_reviewer": "TODO: First Last (DPO or Compliance Officer)",
    "deployment_date": str(date.today()),

    # ── Annex IV §1 — System logic ────────────────────────────────────────────
    "system_logic": "TODO: Model type, input features, output format in 2-3 sentences",

    # ── Annex IV §2 — Training data ───────────────────────────────────────────
    "training_data": {
        "source": "TODO: Where did training data come from? (e.g. historical CVs 2018-2023)",
        "size": "TODO: Number of records (e.g. 450,000 anonymised CV-outcome pairs)",
        "demographic_coverage": "TODO: What groups are represented? Known gaps?",
        "data_cutoff_date": "TODO: YYYY-MM-DD",
        "bias_assessment": "TODO: Was bias testing done? By whom? Key findings?",
    },

    # ── Annex IV §4 — Performance metrics (MUST be disaggregated) ────────────
    "performance": {
        "accuracy_overall": "TODO: e.g. 89% — measured on hold-out set",
        "false_positive_rate": "TODO: Rate at which unqualified candidates are incorrectly passed",
        "false_negative_rate": "TODO: Rate at which qualified candidates are incorrectly rejected",
        "accuracy_by_gender": {"male": "TODO", "female": "TODO", "non_binary": "TODO"},
        "accuracy_by_age_group": {"18_30": "TODO", "31_50": "TODO", "51_plus": "TODO"},
        "demographic_gap": "TODO: Largest accuracy gap across groups (e.g. 4.2pp between male/female)",
    },

    # ── Annex IV §5 — Human oversight ────────────────────────────────────────
    "human_oversight": {
        "reviewer_name": "TODO: First Last",
        "reviewer_role": "TODO: e.g. Senior HR Manager",
        "override_procedure": "TODO: How does the reviewer override an AI decision?",
        "review_frequency": "TODO: Every decision / Daily batch / Weekly 10% sample",
        "escalation_path": "TODO: Who is notified if the reviewer disagrees with >20% of decisions?",
    },

    # ── Annex IV §6 — Post-market monitoring ─────────────────────────────────
    "monitoring": {
        "review_cadence": "Quarterly",
        "drift_detection_method": "TODO: How will you detect model performance degradation?",
        "escalation_threshold": "TODO: At what accuracy drop do you halt the system?",
        "next_review_date": str(date.today() + timedelta(days=90)),
    },
}

# ── Generate the documentation outline ───────────────────────────────────────
DIVIDER = "-" * 55
print(f"ANNEX IV TECHNICAL DOCUMENTATION — {system['name'].upper()}")
print(f"Version {system['version']} | Generated: {date.today()}")
print(f"Classification reviewer: {system['classification_reviewer']}")
print()

sections = [
    ("§1 General Description", [
        ("System name", system["name"]),
        ("Intended purpose", system["intended_purpose"]),
        ("Deployer", system["deployer_organisation"]),
        ("System logic", system["system_logic"]),
    ]),
    ("§2 Training Data", [
        ("Source", system["training_data"]["source"]),
        ("Dataset size", system["training_data"]["size"]),
        ("Demographic coverage", system["training_data"]["demographic_coverage"]),
        ("Data cutoff", system["training_data"]["data_cutoff_date"]),
        ("Bias assessment", system["training_data"]["bias_assessment"]),
    ]),
    ("§4 Performance (Disaggregated)", [
        ("Overall accuracy", system["performance"]["accuracy_overall"]),
        ("False positive rate", system["performance"]["false_positive_rate"]),
        ("False negative rate", system["performance"]["false_negative_rate"]),
        ("By gender", str(system["performance"]["accuracy_by_gender"])),
        ("By age group", str(system["performance"]["accuracy_by_age_group"])),
        ("Largest demographic gap", system["performance"]["demographic_gap"]),
    ]),
    ("§5 Human Oversight", [
        ("Named reviewer", system["human_oversight"]["reviewer_name"]),
        ("Role", system["human_oversight"]["reviewer_role"]),
        ("Override procedure", system["human_oversight"]["override_procedure"]),
        ("Review frequency", system["human_oversight"]["review_frequency"]),
        ("Escalation path", system["human_oversight"]["escalation_path"]),
    ]),
    ("§6 Post-Market Monitoring", [
        ("Review cadence", system["monitoring"]["review_cadence"]),
        ("Drift detection", system["monitoring"]["drift_detection_method"]),
        ("Escalation threshold", system["monitoring"]["escalation_threshold"]),
        ("Next review", system["monitoring"]["next_review_date"]),
    ]),
]

all_values = []
for title, items in sections:
    print(DIVIDER)
    print(f"  {title}")
    print(DIVIDER)
    for label, value in items:
        all_values.append(str(value))
        print(f"  {label:<32} {value}")
    print()

# Completeness check
todos = sum(1 for v in all_values if "TODO" in v)
if todos == 0:
    print("SUCCESS — All fields complete. Submit to DPO for review before deployment.")
else:
    print(f"WARNING — {todos} field(s) still marked TODO. Do not submit until complete.")
    print("Market surveillance authorities reject incomplete technical files.")

Quiz disponible

Terminez la lecture de ce module puis validez vos connaissances avec le quiz.

DPIA Templates: Pre-filled for Three Common Scenarios

Write compliant Data Protection Impact Assessments for the three AI deployment patterns that most frequently require one: RAG pipelines processing personal data, voice agents recording conversations, and automated scoring systems.

DPIA Templates: Pre-filled for Three Common Scenarios

By the end of this module you will have a complete DPIA template for each of the three scenarios below — ready to submit to your DPO or supervisory authority, with all legally required sections filled.

A DPIA is mandatory under GDPR Article 35 whenever AI processing is 'likely to result in a high risk to the rights and freedoms of natural persons.' For AI systems, CNIL and equivalent authorities consider three triggers automatic: (1) systematic monitoring of individuals at scale, (2) processing sensitive categories of data (health, biometric, political), (3) automated decision-making with legal or similarly significant effects. If your AI system hits any of these, a DPIA is required before you process a single record in production.

DPIA Scenario 1 — RAG Pipeline Processing Employee Personal Data

Json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
{
  "dpia_reference": "DPIA-2026-RAG-001",
  "dpia_version": "1.0",
  "creation_date": "2026-05-08",
  "review_date": "2027-05-08",
  "controller": "Acme Corp SA",
  "dpo_consulted": true,
  "dpo_name": "Marie Lefebvre",
  "dpo_date": "2026-05-01",

  "section_1_description": {
    "system_name": "Employee Benefits RAG Assistant",
    "purpose": "Answer employee questions about benefits, leave entitlements, and company policies using a RAG pipeline over HR documents. Reduce HR team workload by 40% on FAQ queries.",
    "processing_activities": [
      "Employee queries are sent to the RAG system (free text, may contain personal details)",
      "Queries are embedded and searched against a vector store of HR documents",
      "Retrieved context and query are sent to Claude API (Anthropic, US processor) for answer generation",
      "Query + answer logged for quality monitoring (30-day retention)",
      "Anonymised usage statistics retained indefinitely for system improvement"
    ],
    "data_categories": [
      "Employee name (inferred from authentication token, not stored in query)",
      "Query content (may contain sensitive information: health conditions, family situations, salary discussions)",
      "Employee ID (for audit log)",
      "Timestamp and session ID"
    ],
    "special_categories": "Query content may incidentally contain health data (e.g., 'how many sick days can I take for a chronic condition'). System is not designed to process health data but cannot prevent incidental disclosure.",
    "data_subjects": "All employees (estimated 2,400 at Acme Corp)",
    "recipients": [
      "Anthropic (Claude API) — data processor, EU SCCs signed, data processing agreement in place",
      "Pinecone (vector store) — data processor, EU region selected, DPA in place",
      "HR team — system administrators with access to audit logs only"
    ],
    "transfers_outside_eu": "Claude API calls: data transmitted to Anthropic (US). Covered by Standard Contractual Clauses (SCCs) signed 2026-03-15. Transfer impact assessment completed: low risk given technical safeguards.",
    "retention": {
      "raw_queries": "30 days — then deleted",
      "audit_logs_anonymised": "3 years — for compliance and dispute resolution",
      "vector_store_documents": "Retained as long as source HR document is current; deleted within 48 hours of source document removal"
    }
  },

  "section_2_necessity_proportionality": {
    "legal_basis": "Article 6(1)(b) GDPR — processing necessary for performance of employment contract (providing benefits information is part of the employment relationship)",
    "legitimate_interests_assessment": "N/A — legal basis is 6(1)(b)",
    "purpose_limitation": "Query data used exclusively for generating answers and quality monitoring. No use for performance evaluation, disciplinary action, or profiling.",
    "data_minimisation": "Employee name is not stored in query. Employee ID logged only in audit trail (not sent to LLM). Queries processed ephemerally — not retained in LLM context after session ends.",
    "accuracy": "RAG retrieval is from authoritative HR documents updated weekly. System includes disclaimer: 'This information is based on current policy documents. For personal situations, contact HR directly.'",
    "alternatives_considered": [
      "Traditional search: rejected — does not handle natural language queries, higher drop-off rate",
      "Expanded HR FAQ static page: rejected — does not scale to policy complexity, poor user experience",
      "Human-only HR helpline: retained as escalation path alongside AI system"
    ]
  },

  "section_3_risk_assessment": {
    "risks_identified": [
      {
        "risk": "Inadvertent processing of health data in query content",
        "likelihood": "medium",
        "severity": "high",
        "risk_score": 12,
        "mitigations": [
          "Query content filtered through PII detector before logging (spaCy NER model identifying health-related terms)",
          "Logging policy excludes query text if health-related keywords detected — only query category logged",
          "Privacy notice explicitly informs employees not to include sensitive health details in queries"
        ],
        "residual_risk": "low"
      },
      {
        "risk": "Data breach at Anthropic API level — query content exposed",
        "likelihood": "low",
        "severity": "high",
        "risk_score": 8,
        "mitigations": [
          "SCCs and DPA with Anthropic",
          "No employee identifiers sent to API — only query text",
          "Anthropic's zero-retention API option enabled (queries not stored beyond response generation)",
          "Incident response plan covers API provider breach scenario"
        ],
        "residual_risk": "low"
      },
      {
        "risk": "System provides incorrect policy information, employee acts on it",
        "likelihood": "medium",
        "severity": "medium",
        "risk_score": 9,
        "mitigations": [
          "Every response includes: 'For decisions with significant impact, verify with HR directly'",
          "RAG retrieves source document reference — employee can check original",
          "Quarterly accuracy audit: HR team samples 50 queries, rates answer accuracy",
          "Feedback button on every response — low-rated answers flagged for review"
        ],
        "residual_risk": "low"
      }
    ],
    "overall_residual_risk": "low",
    "dpo_opinion": "Approved with conditions: (1) health-keyword filter must be implemented before go-live; (2) quarterly accuracy audit must be documented and shared with DPO; (3) privacy notice must be updated to reference RAG system before first user access."
  },

  "section_4_consultation": {
    "data_subjects_consulted": "Employee representatives informed via works council (comité social et économique) on 2026-04-20. No objections raised.",
    "supervisory_authority_consulted": false,
    "reason_not_consulted": "Residual risk assessed as low after mitigations. CNIL prior consultation threshold not reached.",
    "consultation_date": "2026-04-20"
  }
}

DPIA Scenario 2 — Voice Agent Recording Customer Conversations

Voice agents that record and transcribe conversations require a DPIA in virtually all EU deployments. Audio recordings are personal data under GDPR. Voiceprints may qualify as biometric data (special category) if used for identification. Even if you are 'just transcribing' for quality purposes, the recording itself triggers GDPR obligations including explicit consent or a solid legitimate interest assessment.

Json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{
  "dpia_reference": "DPIA-2026-VOICE-001",
  "system_name": "Customer Support Voice Agent",
  "purpose": "Automated first-line customer support via phone. Agent handles tier-1 queries (order status, returns, billing) without human intervention. Escalates to human agent for complex issues.",

  "section_1_processing": {
    "data_collected": [
      "Audio recording of full conversation (caller's voice)",
      "Whisper STT transcript (text representation of speech)",
      "Caller phone number (from telephony system)",
      "Timestamp, call duration, escalation flag",
      "Customer account number (from IVR authentication)"
    ],
    "special_categories_risk": "Audio recordings potentially qualify as biometric data if used for speaker identification. Current system does NOT use voiceprint identification — authentication is via account number + PIN. Risk: low if no voice biometrics are introduced.",
    "legal_basis": "Article 6(1)(b) — contract performance for existing customers. Article 6(1)(a) — consent for recording (consent collected via IVR announcement: 'This call may be recorded for quality and training purposes').",
    "retention": {
      "audio_recording": "90 days, then permanently deleted",
      "transcript": "1 year — needed for dispute resolution",
      "call_metadata": "3 years — fraud prevention and compliance"
    }
  },

  "section_3_risks": {
    "risks": [
      {
        "risk": "Caller does not hear or understand consent announcement — invalid consent",
        "mitigation": "IVR plays announcement in caller's preferred language (detected from account profile). Caller can press 0 to opt out of recording; call proceeds without recording. Consent event logged.",
        "residual_risk": "low"
      },
      {
        "risk": "Transcript contains sensitive data (health issues, financial hardship) processed by third-party STT",
        "mitigation": "Whisper STT runs on-premise (self-hosted on GPU server). No audio leaves the organisation's infrastructure. Transcript PII is redacted before storage using presidio library.",
        "residual_risk": "low"
      },
      {
        "risk": "Claude API receives transcript content containing personal data",
        "mitigation": "Before sending to Claude, transcript is PII-masked: phone numbers → [PHONE], names → [NAME], account numbers → [ACCOUNT]. Claude receives anonymised problem description only.",
        "residual_risk": "low"
      }
    ]
  }
}

🛠️ Exercise 3: DPIA Risk Matrix — Employee Monitoring Scenario

Your company wants to deploy facial recognition for employee time-tracking. Before the DPO can approve, you must complete a GDPR Article 35 risk matrix. This exercise walks you through assessing three real risks that supervisory authorities look for in biometric system DPIAs.

Exercice pratique

Complete the three risk entries: set likelihood, severity, add two concrete mitigations each, then set residual_risk after mitigations. When done: (1) Does R-002 (discriminatory error rates) justify requiring demographic benchmark testing as a deployment gate? (2) For R-003 (scope creep), what technical control would you implement to make expansion impossible without triggering a new DPIA?

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# DPIA Risk Matrix — Facial Recognition Employee Monitoring
# Scenario: Biometric time-tracking system for 800 employees across 3 EU offices
# Complete the TODO fields then run to check if DPO approval is achievable.

RISKS = [
    {
        "id": "R-001",
        "risk": "Biometric template database breach — facial recognition templates exposed",
        "data_category": "Biometric data (Article 9 GDPR — special category)",
        # TODO: Set likelihood: "low" | "medium" | "high"
        # Hint: Databases are targeted. Biometric data cannot be changed if leaked.
        "likelihood": "TODO",
        # TODO: Set severity: "low" | "medium" | "high"
        # Hint: Biometric data enables permanent identity fraud — unlike a password breach.
        "severity": "TODO",
        "mitigations": [
            # TODO: Add a technical mitigation (e.g. on-device processing, encryption at rest)
            "TODO: Technical mitigation",
            # TODO: Add an organisational mitigation (e.g. access controls, incident response plan)
            "TODO: Organisational mitigation",
        ],
        # TODO: After mitigations, what is residual risk? "low" | "medium" | "high"
        "residual_risk": "TODO",
    },
    {
        "id": "R-002",
        "risk": "Discriminatory error rates — system performs worse for certain employee demographics",
        "data_category": "Biometric (Article 9) — intersects with protected characteristics",
        # Hint: Academic studies show FR error rates vary significantly by skin tone and gender.
        "likelihood": "TODO",
        # Hint: Wrong time records affect pay — high impact on affected employees.
        "severity": "TODO",
        "mitigations": [
            # TODO: Technical mitigation (e.g. benchmark testing across demographic groups before deploy)
            "TODO: Benchmark testing across demographic subgroups — minimum 95% accuracy for all groups",
            # TODO: Procedural mitigation (e.g. manual override for flagged cases)
            "TODO: Procedural mitigation",
        ],
        "residual_risk": "TODO",
    },
    {
        "id": "R-003",
        "risk": "Scope creep — system extended from time-tracking to productivity monitoring",
        "data_category": "Behavioural monitoring — may qualify as surveillance under BetrVG/works council laws",
        # Hint: Feature expansion is extremely common in deployed systems.
        "likelihood": "TODO",
        # Hint: Surveillance of work behaviour has significant rights implications (Article 8 ECHR).
        "severity": "TODO",
        "mitigations": [
            # TODO: Contractual/policy mitigation
            "TODO: Contractual/policy mitigation",
            # TODO: Technical limitation (purpose-lock the system)
            "TODO: Technical limitation preventing use case expansion without new DPIA",
        ],
        "residual_risk": "TODO",
    },
]

SCORES = {"low": 1, "medium": 2, "high": 3}

print("DPIA Risk Matrix — Employee Facial Recognition System")
print("=" * 58)
print()

incomplete = []
high_residual = []

for risk in RISKS:
    likelihood = risk["likelihood"]
    severity = risk["severity"]
    residual = risk["residual_risk"]

    if likelihood == "TODO" or severity == "TODO":
        score_str = "⚠️  Set likelihood + severity"
        incomplete.append(risk["id"])
    else:
        score = SCORES.get(likelihood, 0) * SCORES.get(severity, 0)
        score_str = f"Risk score: {score}/9"

    print(f"[{risk['id']}] {risk['risk']}")
    print(f"  Category:    {risk['data_category']}")
    print(f"  Likelihood: {likelihood}  |  Severity: {severity}  |  {score_str}")
    print(f"  Mitigations:")
    for m in risk["mitigations"]:
        done = "TODO" not in m
        print(f"    {'OK' if done else 'TODO'} {m}")
    print(f"  Residual risk: {residual}")

    if residual == "high":
        high_residual.append(risk["id"])
    print()

# DPO readiness gate
print("-" * 58)
if incomplete:
    print(f"BLOCKED — Incomplete assessments for: {', '.join(incomplete)}")
    print("  Fill in all likelihood, severity, and residual_risk fields.")
elif high_residual:
    print(f"DPO REJECTION LIKELY — High residual risk in: {', '.join(high_residual)}")
    print("  Strengthen mitigations or reconsider system design before submission.")
else:
    print("DPO REVIEW READY — All residual risks are low or medium.")
    print("  Attach this matrix to your DPIA Section 3 and request DPO sign-off.")
    print()
    print("  Reminder: Article 36 GDPR — if ANY residual risk is high after")
    print("  mitigation, you must consult the supervisory authority BEFORE deployment.")

Quiz disponible

Terminez la lecture de ce module puis validez vos connaissances avec le quiz.

Documentation Patterns: System Cards, Technical Records, and Model Cards

1h30

Build the three documentation artifacts required by AI Act Article 11: a system card (business-level), a technical record (engineering-level), and a model card (model-level). Includes automation scripts.

Documentation Patterns: What the AI Act Actually Requires

AI Act Article 11 requires high-risk AI systems to maintain technical documentation before market placement — and keep it updated throughout the system's lifecycle. The regulation specifies what must be documented (Annex IV) but not the format. This module establishes three practical document types that together satisfy Annex IV, are maintainable by engineering teams, and can be version-controlled alongside code.

The System Card (Business-Level, for DPO and Management)

Yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# system-card.yaml — AI Act Annex IV compliance document
# Place in repository root. Version with git. Update before each release.

system:
  name: "Customer Churn Prediction Model"
  version: "2.4.1"
  release_date: "2026-05-08"
  slug: "churn-prediction-v2"
  owner:
    team: "Data Science — Customer Intelligence"
    product_owner: "Laurent Dupont <l.dupont@company.com>"
    compliance_owner: "Marie Chen <m.chen@legal.company.com>"
    last_reviewed: "2026-05-01"

classification:
  ai_act_risk_level: "high"
  ai_act_annex: "Annex III §5b — creditworthiness / customer scoring in financial services"
  gpai_based: true
  gpai_provider: "Anthropic Claude claude-3-haiku (feature extraction layer)"
  evidence: "System outputs a churn probability score used by retention team to prioritise outbound calls. Scores influence which customers receive discount offers (financial benefit) and which do not. Constitutes decision with 'similarly significant effects' under GDPR Article 22."
  classification_date: "2026-03-15"
  classification_reviewer: "Marie Chen (DPO)"

purpose:
  intended_use: "Predict 30-day customer churn probability for B2C telecom customers. Score used by retention team to prioritise intervention calls."
  out_of_scope:
    - "Automated termination of customer contracts"
    - "Automated pricing changes"
    - "Use for employee performance evaluation"
  known_limitations:
    - "Model trained on customers with >6 months tenure. Performance on new customers (0-6 months) is 18% lower accuracy."
    - "Model shows 12% higher false-positive rate for customers in rural postcodes — under investigation (see bias-report-2026-Q1.md)"
    - "Model does not account for seasonal patterns in telecom usage. Retraining required after major market events."

performance:
  metrics:
    auc_roc: 0.847
    precision_at_k10: 0.612  # Precision for top 10% highest-risk customers
    recall_at_k10: 0.534
    false_positive_rate: 0.143
  evaluation_dataset: "Hold-out set: Q4 2025 cohort, 45,000 customers, 8.3% actual churn rate"
  performance_by_subgroup:
    rural_customers: {auc_roc: 0.791, false_positive_rate: 0.198}
    urban_customers: {auc_roc: 0.861, false_positive_rate: 0.131}
    high_value_segment: {auc_roc: 0.882, false_positive_rate: 0.108}

human_oversight:
  mechanism: "Retention agents review each score before making outbound calls. Agents can override score and flag for model review."
  override_rate_last_quarter: "7.2%"
  escalation_path: "Agent flags customer → Team Lead review → excluded from automated list"
  override_review_process: "Monthly review of overrides with Data Science team to detect systematic model errors"

data_governance:
  training_data_sources:
    - "CRM: customer tenure, product mix, payment history (pseudonymised)"
    - "Usage logs: call volume, data consumption, service interactions (aggregated)"
    - "Support tickets: sentiment score (NLP-derived, not raw text)"
  sensitive_attributes_excluded: ["age", "gender", "ethnicity", "postcode_raw"]
  proxy_risk: "Postcode excluded but product mix and payment method may correlate with demographics — see bias report"
  data_retention: "Training data: 24 months rolling. Model artifacts: indefinite (version-controlled in MLflow)."

compliance:
  dpia_reference: "DPIA-2026-CHURN-001 (approved 2026-03-20)"
  conformity_assessment_date: "2026-04-15"
  eu_ai_act_database_registration: "pending — required before August 2026"
  next_audit: "2026-11-01"
  post_market_monitoring: "Quarterly performance reports. Automatic alert if AUC drops below 0.80."

Automated Documentation Generation

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# generate_system_card.py
# Auto-populate system card from MLflow run + code metadata
# Run as part of CI/CD pipeline before every model release

import mlflow
import yaml
import json
from datetime import datetime, date
from pathlib import Path

def generate_system_card(
    mlflow_run_id: str,
    system_config_path: str = "system-card-base.yaml",
    output_path: str = "system-card.yaml"
) -> dict:
    """
    Merge static system configuration with live MLflow run metrics
    to produce a complete system card for AI Act compliance.
    """
    # Load static config (human-written: purpose, classification, oversight)
    with open(system_config_path) as f:
        card = yaml.safe_load(f)

    # Pull live metrics from MLflow
    client = mlflow.tracking.MlflowClient()
    run = client.get_run(mlflow_run_id)
    metrics = run.data.metrics
    params = run.data.params
    tags = run.data.tags

    # Auto-populate performance section
    card["performance"] = {
        "mlflow_run_id": mlflow_run_id,
        "metrics": {
            "auc_roc": round(metrics.get("auc_roc", 0), 4),
            "precision_at_k10": round(metrics.get("precision_k10", 0), 4),
            "recall_at_k10": round(metrics.get("recall_k10", 0), 4),
            "false_positive_rate": round(metrics.get("fpr", 0), 4),
        },
        "model_type": params.get("model_type", "unknown"),
        "training_samples": int(metrics.get("train_samples", 0)),
        "evaluation_date": datetime.now().strftime("%Y-%m-%d"),
        "evaluation_dataset": tags.get("eval_dataset_description", "See MLflow artifact"),
    }

    # Auto-populate version info from git tag
    import subprocess
    git_tag = subprocess.run(
        ["git", "describe", "--tags", "--abbrev=0"],
        capture_output=True, text=True
    ).stdout.strip()
    card["system"]["version"] = git_tag or card["system"].get("version", "unknown")
    card["system"]["release_date"] = str(date.today())

    # Write output
    with open(output_path, "w") as f:
        yaml.dump(card, f, default_flow_style=False, allow_unicode=True)

    print(f"System card written to {output_path}")
    print(f"  Version: {card['system']['version']}")
    print(f"  AUC-ROC: {card['performance']['metrics']['auc_roc']}")
    print(f"  Risk level: {card['classification']['ai_act_risk_level']}")

    # Compliance check: alert if performance degraded below threshold
    auc = card["performance"]["metrics"]["auc_roc"]
    if auc < 0.80:
        print(f"⚠️  COMPLIANCE ALERT: AUC-ROC {auc} below monitoring threshold 0.80")
        print("   Post-market monitoring requirement triggered. Review before deployment.")

    return card

# Usage: python generate_system_card.py <mlflow_run_id>
if __name__ == "__main__":
    import sys
    run_id = sys.argv[1] if len(sys.argv) > 1 else "latest"
    generate_system_card(run_id)

Testing Frameworks: Bias Detection and Drift Monitoring

Build automated test suites for demographic bias and production drift — the two most common causes of AI Act non-compliance violations discovered during post-market monitoring.

Testing Frameworks: Bias Detection and Drift Monitoring

The AI Act requires ongoing post-market monitoring (Article 72) — not just pre-deployment testing. You need automated tests that run in production and alert you when performance degrades, bias increases, or the input distribution shifts. This module builds a complete monitoring stack using open-source tools: Evidently for drift detection, Great Expectations for data quality, and a custom fairness testing framework.

Demographic Parity Test Suite

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
# bias_test_suite.py
# Automated bias detection tests for AI Act post-market monitoring
# Run weekly as part of CI/CD or monitoring pipeline
# pip install pandas scipy scikit-learn evidently

import pandas as pd
import numpy as np
from scipy import stats
from dataclasses import dataclass
from typing import Callable
import warnings

@dataclass
class BiasTestResult:
    test_name: str
    protected_attribute: str
    passed: bool
    metric_value: float
    threshold: float
    details: str

class FairnessTestSuite:
    """
    AI Act-compliant bias testing for classification models.
    Covers the three fairness metrics most scrutinised by EU regulators:
    demographic parity, equalized odds, and individual fairness.
    """

    def __init__(
        self,
        predictions: pd.Series,
        ground_truth: pd.Series,
        protected_attributes: pd.DataFrame,
        max_demographic_disparity: float = 0.10,  # 10% max allowed difference
        max_fpr_disparity: float = 0.08,
    ):
        self.predictions = predictions
        self.ground_truth = ground_truth
        self.protected_attributes = protected_attributes
        self.max_demographic_disparity = max_demographic_disparity
        self.max_fpr_disparity = max_fpr_disparity
        self.results: list[BiasTestResult] = []

    def test_demographic_parity(self, attribute: str) -> BiasTestResult:
        """
        Demographic parity: positive outcome rate should be similar across groups.
        Regulator threshold: typically ≤10% absolute difference.
        """
        groups = self.protected_attributes[attribute].unique()
        rates = {}
        for group in groups:
            mask = self.protected_attributes[attribute] == group
            rates[group] = self.predictions[mask].mean()

        max_disparity = max(rates.values()) - min(rates.values())
        min_group = min(rates, key=rates.get)
        max_group = max(rates, key=rates.get)

        passed = max_disparity <= self.max_demographic_disparity
        return BiasTestResult(
            test_name="demographic_parity",
            protected_attribute=attribute,
            passed=passed,
            metric_value=round(max_disparity, 4),
            threshold=self.max_demographic_disparity,
            details=f"Max disparity between {max_group} ({rates[max_group]:.3f}) and {min_group} ({rates[min_group]:.3f})",
        )

    def test_equalized_odds(self, attribute: str) -> BiasTestResult:
        """
        Equalized odds: false positive rate should be similar across groups.
        Ensures model does not disproportionately misclassify one group.
        """
        groups = self.protected_attributes[attribute].unique()
        fpr_by_group = {}
        for group in groups:
            mask = (self.protected_attributes[attribute] == group) & (self.ground_truth == 0)
            if mask.sum() == 0:
                continue
            fp = ((self.predictions[mask] == 1) & (self.ground_truth[mask] == 0)).sum()
            tn = ((self.predictions[mask] == 0) & (self.ground_truth[mask] == 0)).sum()
            fpr_by_group[group] = fp / (fp + tn) if (fp + tn) > 0 else 0

        if len(fpr_by_group) < 2:
            return BiasTestResult("equalized_odds", attribute, True, 0.0, self.max_fpr_disparity, "Insufficient group data")

        max_fpr_disparity = max(fpr_by_group.values()) - min(fpr_by_group.values())
        worst_group = max(fpr_by_group, key=fpr_by_group.get)
        best_group = min(fpr_by_group, key=fpr_by_group.get)

        passed = max_fpr_disparity <= self.max_fpr_disparity
        return BiasTestResult(
            test_name="equalized_odds",
            protected_attribute=attribute,
            passed=passed,
            metric_value=round(max_fpr_disparity, 4),
            threshold=self.max_fpr_disparity,
            details=f"FPR: {worst_group}={fpr_by_group[worst_group]:.3f} vs {best_group}={fpr_by_group[best_group]:.3f}",
        )

    def run_all(self, attributes: list[str]) -> dict:
        for attr in attributes:
            self.results.append(self.test_demographic_parity(attr))
            self.results.append(self.test_equalized_odds(attr))

        passed = [r for r in self.results if r.passed]
        failed = [r for r in self.results if not r.passed]

        report = {
            "total_tests": len(self.results),
            "passed": len(passed),
            "failed": len(failed),
            "compliance_status": "PASS" if not failed else "FAIL",
            "failures": [
                {
                    "test": r.test_name,
                    "attribute": r.protected_attribute,
                    "value": r.metric_value,
                    "threshold": r.threshold,
                    "details": r.details,
                }
                for r in failed
            ],
        }

        if failed:
            print(f"❌ BIAS TEST FAILED — {len(failed)} tests failed")
            for f in report["failures"]:
                print(f"   {f['test']} ({f['attribute']}): {f['value']:.4f} > threshold {f['threshold']}")
            print("   Action required: Review model and mitigate before next deployment.")
        else:
            print(f"✅ All {len(passed)} bias tests passed")

        return report

# ── Usage example with synthetic data ────────────────────────────────────────
np.random.seed(42)
n = 2000

# Simulate predictions with demographic disparity
gender = np.random.choice(["M", "F"], n, p=[0.55, 0.45])
predictions = np.random.binomial(1, np.where(gender == "M", 0.55, 0.40))  # Built-in disparity
ground_truth = np.random.binomial(1, 0.48, n)

df = pd.DataFrame({"gender": gender, "region": np.random.choice(["urban", "rural"], n)})
suite = FairnessTestSuite(
    predictions=pd.Series(predictions),
    ground_truth=pd.Series(ground_truth),
    protected_attributes=df,
)
report = suite.run_all(["gender", "region"])
print(f"\nFull report: {report['compliance_status']} ({report['passed']}/{report['total_tests']} tests passed)")

Production Drift Monitor with Evidently

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# drift_monitor.py
# Weekly drift detection for AI Act post-market monitoring
# pip install evidently pandas

import pandas as pd
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset
from evidently.metrics import DatasetDriftMetric
import json
from datetime import datetime

def run_drift_report(
    reference_df: pd.DataFrame,       # Training period baseline
    current_df: pd.DataFrame,          # Current production window (last 7 days)
    output_path: str = "drift_report.html",
    alert_threshold: float = 0.25,     # Fraction of drifted features to trigger alert
) -> dict:
    """
    Run Evidently drift report comparing production data to training baseline.
    Produces both HTML report (for humans) and JSON summary (for monitoring pipeline).
    AI Act Article 72 requires this to run at least monthly for high-risk systems.
    """
    report = Report(metrics=[
        DataDriftPreset(),
        TargetDriftPreset(),
        DatasetDriftMetric(),
    ])

    report.run(reference_data=reference_df, current_data=current_df)

    # Save HTML report for compliance archive
    report.save_html(output_path)

    # Extract structured summary for alerting
    result = report.as_dict()
    dataset_drift = result["metrics"][2]["result"]

    drifted_features = dataset_drift.get("number_of_drifted_columns", 0)
    total_features = dataset_drift.get("number_of_columns", 1)
    drift_fraction = drifted_features / total_features

    summary = {
        "report_date": datetime.now().isoformat(),
        "reference_period": "Training baseline (Q3 2025)",
        "current_period": "Production last 7 days",
        "dataset_drift_detected": dataset_drift.get("dataset_drift", False),
        "drifted_features": drifted_features,
        "total_features": total_features,
        "drift_fraction": round(drift_fraction, 4),
        "alert_triggered": drift_fraction >= alert_threshold,
        "compliance_action_required": drift_fraction >= alert_threshold,
        "html_report": output_path,
    }

    if summary["alert_triggered"]:
        print(f"🚨 DRIFT ALERT: {drifted_features}/{total_features} features drifted ({drift_fraction:.1%})")
        print(f"   AI Act Article 72 action required:")
        print(f"   1. Review drift report: {output_path}")
        print(f"   2. Assess impact on model performance within 5 business days")
        print(f"   3. Decide: retrain, restrict use, or document as acceptable drift")
        print(f"   4. Log decision in post-market monitoring register")
    else:
        print(f"✅ Drift within acceptable bounds: {drift_fraction:.1%} of features drifted")

    return summary

Audit Logging: What to Log, How to Structure It, How Long to Keep It

1h30

Implement the logging infrastructure required by AI Act Article 12 for high-risk systems. Covers log schema design, immutability requirements, and a FastAPI middleware implementation.

Audit Logging for AI Act Article 12

AI Act Article 12 requires high-risk AI systems to 'automatically log events' to enable post-market monitoring and incident investigation. The logs must be: (1) automatically generated — not manually written, (2) sufficient to reconstruct the system's operation at any given time, (3) retained for the system's operational lifetime (minimum 6 months for most systems, 10 years for systems used in critical infrastructure). This module shows the complete logging stack from schema design to infrastructure.

AI Act Compliant Log Schema

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
# ai_audit_logger.py
# Article 12 compliant audit logging for high-risk AI systems
# Produces structured logs suitable for long-term archival and audit queries
# pip install fastapi structlog python-ulid cryptography

import structlog
import hashlib
import json
from datetime import datetime, timezone
from dataclasses import dataclass, asdict
from typing import Optional, Any
from ulid import ULID

# ── Log entry schema (Annex IV requirements) ─────────────────────────────────

@dataclass
class AIAuditEvent:
    """
    Structured log entry for AI Act Article 12 compliance.
    Every field maps to a specific Annex IV or Article 12 requirement.
    """
    # Identification
    event_id: str                    # Globally unique — ULID for sortability
    system_id: str                   # AI system identifier from system card
    system_version: str              # Deployed model version

    # Temporal
    timestamp_utc: str               # ISO 8601 UTC — required by Article 12
    processing_duration_ms: int      # Performance monitoring

    # Request context
    session_id: str                  # Groups related events (pseudonymised)
    request_hash: str                # SHA-256 of input — enables reproducibility audit WITHOUT storing PII
    input_category: str              # Semantic category of input (not raw content if PII present)

    # Decision
    output_label: Optional[str]      # Model decision or classification
    output_score: Optional[float]    # Confidence / probability score
    output_category: str             # "automated_decision" | "recommendation" | "generation"

    # Human oversight
    human_override: bool             # Was the decision overridden by a human?
    override_reason: Optional[str]   # If override: reason code

    # Compliance flags
    risk_level: str                  # "high" | "limited" | "minimal"
    dpia_reference: Optional[str]    # DPIA covering this processing
    gdpr_legal_basis: str            # Legal basis for processing (Article 6)

    # Incident tracking
    error_occurred: bool
    error_code: Optional[str]

def create_audit_event(
    system_id: str,
    system_version: str,
    session_id: str,
    raw_input: Any,                  # Hashed, never stored
    output_label: Optional[str],
    output_score: Optional[float],
    output_category: str,
    input_category: str,
    risk_level: str = "high",
    dpia_reference: Optional[str] = None,
    gdpr_legal_basis: str = "Article 6(1)(b)",
    human_override: bool = False,
    override_reason: Optional[str] = None,
    processing_duration_ms: int = 0,
    error_code: Optional[str] = None,
) -> AIAuditEvent:
    """
    Create an audit event with automatic PII protection:
    raw input is hashed (SHA-256), never stored in the log.
    """
    input_hash = hashlib.sha256(
        json.dumps(raw_input, sort_keys=True).encode()
    ).hexdigest()

    return AIAuditEvent(
        event_id=str(ULID()),
        system_id=system_id,
        system_version=system_version,
        timestamp_utc=datetime.now(timezone.utc).isoformat(),
        processing_duration_ms=processing_duration_ms,
        session_id=session_id,
        request_hash=input_hash,
        input_category=input_category,
        output_label=output_label,
        output_score=output_score,
        output_category=output_category,
        human_override=human_override,
        override_reason=override_reason,
        risk_level=risk_level,
        dpia_reference=dpia_reference,
        gdpr_legal_basis=gdpr_legal_basis,
        error_occurred=error_code is not None,
        error_code=error_code,
    )

# ── Structured logger setup ───────────────────────────────────────────────────

def setup_audit_logger(log_path: str = "/var/log/ai-audit/events.jsonl"):
    """
    Configure structlog to write AI audit events to immutable append-only log.
    In production: ship to immutable storage (S3 with Object Lock, CloudWatch Logs).
    """
    structlog.configure(
        processors=[
            structlog.processors.TimeStamper(fmt="iso"),
            structlog.processors.JSONRenderer(),
        ],
        logger_factory=structlog.WriteLoggerFactory(
            file=open(log_path, "a", buffering=1)  # Line-buffered for reliability
        ),
    )
    return structlog.get_logger("ai_audit")

# ── FastAPI middleware for automatic logging ──────────────────────────────────

from fastapi import FastAPI, Request, Response
import time

def add_ai_audit_middleware(app: FastAPI, system_id: str, system_version: str):
    """
    Middleware that automatically logs every AI inference request.
    Attach to FastAPI app before deployment.
    """
    logger = setup_audit_logger()

    @app.middleware("http")
    async def audit_middleware(request: Request, call_next):
        start_time = time.time()
        session_id = request.headers.get("X-Session-ID", "anonymous")

        try:
            response: Response = await call_next(request)
            duration_ms = int((time.time() - start_time) * 1000)

            event = create_audit_event(
                system_id=system_id,
                system_version=system_version,
                session_id=session_id,
                raw_input={"path": str(request.url.path), "method": request.method},
                output_label=response.headers.get("X-AI-Decision"),
                output_score=float(response.headers.get("X-AI-Score", 0)) or None,
                output_category=response.headers.get("X-AI-Output-Type", "generation"),
                input_category=request.headers.get("X-Input-Category", "unknown"),
                processing_duration_ms=duration_ms,
            )

            logger.info("ai_inference", **asdict(event))
            return response

        except Exception as e:
            event = create_audit_event(
                system_id=system_id,
                system_version=system_version,
                session_id=session_id,
                raw_input={"path": str(request.url.path)},
                output_label=None,
                output_score=None,
                output_category="error",
                input_category="unknown",
                error_code=type(e).__name__,
            )
            logger.error("ai_inference_error", **asdict(event))
            raise

Retention setup for Article 12 compliance: For high-risk systems in most domains, keep logs for the system's operational lifetime + 10 years. Use S3 Object Lock (COMPLIANCE mode) or Azure Immutable Blob Storage to prevent deletion. Test your log retention policy annually — an unreachable log is equivalent to no log during an audit.

Capstone: Complete Risk Assessment for a Production RAG System

2h30

Work through a complete AI Act + GDPR risk assessment for a legal document analysis RAG system — the type most frequently scrutinised by regulators. Produce actual risk scores, identify mitigations, and generate the compliance certificate.

Capstone: End-to-End Risk Assessment for a Legal RAG System

Scenario: Your company has built a RAG system that analyses legal contracts and flags non-standard clauses. Lawyers use it to prioritise review effort. You are about to roll it out to 40 law firms across the EU. Before go-live, you need a complete risk assessment. Work through each section below and fill in the assessment for your own system.

A legal contract analysis RAG system sits in an interesting compliance position: it touches legal advice (Annex III §8 of the AI Act covers AI used in 'administration of justice and democratic processes'), processes confidential business information, and its outputs influence legal decisions. Let's assess it systematically.

Step 1 — Risk Classification with Evidence Score

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
# risk_assessment_rag_legal.py
# Complete risk assessment for a legal document RAG system
# Run this script to produce a structured risk assessment report

from dataclasses import dataclass, field
from typing import Literal
import json
from datetime import date

@dataclass
class RiskFinding:
    area: str
    description: str
    likelihood: Literal["low", "medium", "high"]        # 1, 2, 3
    severity: Literal["low", "medium", "high"]           # 1, 2, 3
    score: int = field(init=False)
    mitigations: list[str] = field(default_factory=list)
    residual_likelihood: Literal["low", "medium", "high"] = "low"
    residual_severity: Literal["low", "medium", "high"] = "low"
    residual_score: int = field(init=False)
    regulation_reference: str = ""

    SCALE = {"low": 1, "medium": 2, "high": 3}

    def __post_init__(self):
        self.score = self.SCALE[self.likelihood] * self.SCALE[self.severity]
        self.residual_score = self.SCALE[self.residual_likelihood] * self.SCALE[self.residual_severity]

    def risk_label(self, score: int) -> str:
        if score >= 6: return "HIGH"
        if score >= 3: return "MEDIUM"
        return "LOW"

# ── System under assessment ───────────────────────────────────────────────────
SYSTEM = {
    "name": "LexAnalyse — Contract Review RAG",
    "version": "1.3.0",
    "deployment": "SaaS, 40 EU law firms, ~200 daily active users",
    "ai_act_classification": "HIGH RISK — Annex III §8 (administration of justice)",
    "gpai_provider": "Anthropic Claude claude-3-5-sonnet (analysis layer)",
    "assessment_date": str(date.today()),
    "assessor": "Compliance team",
}

# ── Risk register ─────────────────────────────────────────────────────────────
RISK_REGISTER: list[RiskFinding] = [

    RiskFinding(
        area="AI Act Classification",
        description="System used in legal analysis — Annex III §8 risk. If output influences substantive legal advice given to clients, high-risk classification applies.",
        likelihood="high",
        severity="high",
        mitigations=[
            "System positioned as 'clause flagging assistant' not 'legal advice tool' — terms of service prohibit use as primary legal opinion",
            "Every output includes mandatory disclaimer: 'This analysis is a review aid. Legal conclusions require qualified lawyer review.'",
            "Lawyers must document that they reviewed and independently validated every flagged clause before relying on it",
            "Conformity assessment completed 2026-04-30 against Annex III §8 requirements",
        ],
        residual_likelihood="medium",
        residual_severity="medium",
        regulation_reference="AI Act Annex III §8, Article 43",
    ),

    RiskFinding(
        area="Hallucination in Legal Clause Analysis",
        description="LLM may hallucinate clause interpretations, missing obligations, or incorrectly flag benign clauses as problematic. Lawyers relying on incorrect analysis could miss material legal risk.",
        likelihood="medium",
        severity="high",
        mitigations=[
            "RAG retrieval constrained to clause text only — model cannot introduce information not in the contract",
            "Grounding score threshold: responses with retrieval similarity < 0.75 trigger 'low confidence' warning",
            "Automated factual consistency check using NLI model: output claim cross-checked against source clause",
            "Monthly red-team evaluation: 50 synthetic contracts with known correct analysis, accuracy target ≥95%",
        ],
        residual_likelihood="low",
        residual_severity="medium",
        regulation_reference="AI Act Article 9 (risk management), Article 15 (accuracy)",
    ),

    RiskFinding(
        area="Confidential Client Data Exposure",
        description="Law firms upload confidential client contracts. Data must not leak to other law firms via shared vector store or model fine-tuning.",
        likelihood="low",
        severity="high",
        mitigations=[
            "Strict multi-tenancy: each law firm's data in isolated vector store namespace with separate API key",
            "Zero-retention API: Claude API configured with prompt caching only (no training on API inputs)",
            "Contracts deleted from vector store within 30 days of upload (configurable per firm, default 30 days)",
            "Annual penetration test including cross-tenant data isolation verification",
        ],
        residual_likelihood="low",
        residual_severity="low",
        regulation_reference="GDPR Article 32 (security), AI Act Article 10 (data governance)",
    ),

    RiskFinding(
        area="GDPR — Personal Data in Contracts",
        description="Contracts uploaded by law firms frequently contain personal data (party names, signatures, addresses). Processing personal data triggers GDPR obligations including purpose limitation and retention controls.",
        likelihood="high",
        severity="medium",
        mitigations=[
            "DPIA completed: DPIA-2026-LEGAL-001 (approved by DPO 2026-04-15)",
            "PII detected in contracts is flagged to user before analysis — user confirms processing is appropriate",
            "Data processing agreement with each law firm signed (they are data controllers; we are data processor)",
            "Sub-processor DPA with Anthropic (SCCs signed 2026-03-15)",
            "Retention: contract data deleted 30 days after upload or on explicit user request",
        ],
        residual_likelihood="low",
        residual_severity="low",
        regulation_reference="GDPR Article 28 (processor), Article 35 (DPIA), AI Act Article 10",
    ),

    RiskFinding(
        area="Model Drift — Legal Language Evolution",
        description="Claude's legal language understanding may drift as laws change, new clause types emerge, or jurisdiction-specific language evolves. Stale model behaviour could miss new high-risk clauses.",
        likelihood="medium",
        severity="medium",
        mitigations=[
            "Monthly evaluation against a curated benchmark of 200 contract clauses with known classifications",
            "Automatic alert if accuracy on benchmark drops below 92%",
            "Prompt engineering layer updated quarterly with new clause patterns identified by legal team",
            "Anthropic model changelog reviewed on every Claude model update — impact assessment before any API version change",
        ],
        residual_likelihood="low",
        residual_severity="medium",
        regulation_reference="AI Act Article 72 (post-market monitoring)",
    ),
]

# ── Generate assessment report ────────────────────────────────────────────────
def generate_assessment_report(system: dict, risks: list[RiskFinding]) -> dict:
    findings = []
    for r in risks:
        findings.append({
            "area": r.area,
            "initial_risk": f"{r.risk_label(r.score)} (score {r.score})",
            "residual_risk": f"{r.risk_label(r.residual_score)} (score {r.residual_score})",
            "mitigations_count": len(r.mitigations),
            "regulation": r.regulation_reference,
        })

    critical_risks = [r for r in risks if r.risk_label(r.residual_score) == "HIGH"]
    overall_status = "BLOCKED — resolve critical risks" if critical_risks else "APPROVED FOR DEPLOYMENT"

    report = {
        "system": system,
        "assessment_summary": {
            "total_risks": len(risks),
            "high_initial": sum(1 for r in risks if r.risk_label(r.score) == "HIGH"),
            "high_residual": len(critical_risks),
            "overall_status": overall_status,
        },
        "findings": findings,
        "compliance_certificate": {
            "issued": overall_status == "APPROVED FOR DEPLOYMENT",
            "valid_until": "2027-05-08",
            "conditions": [
                "Monthly bias tests must continue",
                "Drift monitoring report filed quarterly",
                "DPIA reviewed annually or on material change",
                "EU AI Act database registration before August 2026",
            ] if overall_status == "APPROVED FOR DEPLOYMENT" else [],
        },
    }

    print(f"\n{'='*60}")
    print(f"RISK ASSESSMENT REPORT — {system['name']}")
    print(f"{'='*60}")
    print(f"Total risks assessed: {report['assessment_summary']['total_risks']}")
    print(f"High initial risks:   {report['assessment_summary']['high_initial']}")
    print(f"High residual risks:  {report['assessment_summary']['high_residual']}")
    print(f"\n⚖️  STATUS: {overall_status}")

    if report["compliance_certificate"]["issued"]:
        print(f"\n✅ Compliance certificate issued — valid until {report['compliance_certificate']['valid_until']}")
        print("Conditions:")
        for c in report["compliance_certificate"]["conditions"]:
            print(f"  • {c}")

    return report

report = generate_assessment_report(SYSTEM, RISK_REGISTER)
print("\nFull report written to: risk-assessment-2026.json")

Step 2 — Checklist Before Go-Live

Risk classification documented and signed off by compliance officer — evidence file stored in version control
DPIA completed and approved by DPO — stored in privacy register with expiry date
Conformity assessment completed (for high-risk systems) — certificate issued
EU AI Act database registration submitted (mandatory before August 2026 for high-risk systems)
Human oversight mechanism tested — override workflow verified end-to-end
Audit logging running — sample log reviewed by compliance team, retention policy confirmed
Bias test suite running — baseline report generated, thresholds documented
Drift monitor configured — first reference dataset snapshot saved
Data processing agreements signed with all sub-processors (LLM API, vector store, cloud provider)
User-facing disclosure implemented — 'This is an AI system' notice verified in UI
Incident response plan documented — who to notify, within what timeframe (Article 73: serious incidents within 15 business days to market surveillance authority)
Post-market monitoring schedule set — minimum quarterly review for high-risk systems

Save this checklist as a GitHub issue template or JIRA ticket template. Run it for every new AI deployment and at every major version release. Attach the completed checklist to your release PR — it becomes the audit trail.

Ready to implement AI Act compliance?

On-site or remote. Groups of 4 to 12. Contact us to schedule a session for your team.