Technically Implement AI Governance
1. What does Demographic Parity Difference (DPD) measure?
DPD = |P(Ŷ=1|A=0) - P(Ŷ=1|A=1)| — Difference in Positive Rates.
2. Which Python library provides MetricFrame for group fairness metrics? 2 pts
fairlearn.metrics.MetricFrame — Standard for Fairness Evaluation.
3. Why can Demographic Parity and Equalized Odds not be fulfilled simultaneously?
Chouldechova 2017: Fairness definitions are mathematically incompatible with unequal base rates.
4. What does SHAP compute for a single prediction?
SHAP explains A prediction — why did the model decide exactly that?
5. When is LIME better than SHAP?
LIME = Local Interpretable Model-agnostic Explanations. Works with any model.
6. A model card contains for a high-risk credit model: Fairness metrics show DPD=0.07. What does this mean? 2 pts
EU AI Act: DPD < 0.05 is recommended as a threshold. 0.07 = Review, not immediate stop.
7. What is logged according to EU AI Act Art. 12 — and what is NOT logged?
Art. 12 + GDPR: Audit trail yes, but no direct PII logging. Hash instead of raw data.
8. Which tool is used for data drift detection in production operations?
Evidently AI — Standard tool for drift detection and model monitoring in production.
9. What does MLflow track in the context of AI governance?
MLflow = Experiment-Tracking + Audit-Trail. Log fairness metrics as mlflow.log_metrics().
10. What does EU AI Act Annex IV (Technical Documentation) prescribe for high-risk systems? 2 pts
Annex IV defines 8 mandatory sections. Must be available before market launch.
11. How often must the Technical Documentation be updated according to Art. 11?
Art. 11: Documentation must be kept up to date — with every model version.
12. A credit model shows a TPR of 0.68 for applicants < 25 years compared to 0.91 overall. What is the correct response?
Systematic underperformance for a group = Bias. Root cause first, then mitigation.
13. What is the difference between SHAP for classical ML models and LLMs?
LLMs: stochastic, very many parameters, Attention ≠ Importance. Explainability is fundamentally more difficult.
14. Which RAGAS metric measures whether a RAG response is covered by the retrieved documents?
faithfulness = Grounding Score. Indicates how much of the response is verifiable in the context.
15. What does the Microsoft Responsible AI Toolbox offer beyond Fairlearn?
RAI Toolbox = Fairlearn + Error Analysis + Explainability + Causal + Counterfactuals.
16. Which tool is the best choice for Production Drift Detection?
Evidently: specialized in Data Drift, Model Drift, Data Quality — in Production.
17. An agent has: CRM access (PII), web search (untrusted), email sending. What is the risk? 2 pts
Lethal Trifecta: Attacker injects via web search → Agent accesses CRM → sends via email.
18. What does the Principle of Least Privilege mean for AI agents?
PoLP: Minimal Capability Scope. Trust Level 'low' = no Write, no External API, no Email.
19. An agent waits 5 minutes for HITL approval. No human responds. What happens?
Fail-closed: Timeout is not an implicit okay. Block in case of uncertainty.
20. You are developing a credit scoring system. Which stack is fully compliant with the EU AI Act high-risk requirements? 2 pts
High-risk requires: Fairness measurement + Explainability + Audit trail + Drift monitoring + Technical documentation + Human oversight.