1. What does Demographic Parity Difference (DPD) measure?

A The difference in model accuracy between groups B The difference in the rate of positive predictions between protected groups C The difference in training time

2. Which Python library provides MetricFrame for group fairness metrics? 2 pts

A scikit-learn B Fairlearn (Microsoft) C TensorFlow

3. Why can Demographic Parity and Equalized Odds not be fulfilled simultaneously?

A Because they require different libraries B Impossibility Theorem: except when base rates are equal, they are mutually exclusive C Because Equalized Odds requires more data

4. What does SHAP compute for a single prediction?

A The overall accuracy of the model B The contribution of each feature to the specific prediction (Shapley Values) C The feature importance across the entire dataset

5. When is LIME better than SHAP?

A For tree models — LIME is faster for Random Forests B If a model-agnostic, local explanation method is needed C For large datasets — LIME scales better

6. A model card contains for a high-risk credit model: Fairness metrics show DPD=0.07. What does this mean? 2 pts

A The model is compliant — 0.07 is below 0.1 B Review required — 0.07 exceeds the recommended threshold of 0.05 C The model must be shut down immediately

7. What is logged according to EU AI Act Art. 12 — and what is NOT logged?

A All raw data including PII for complete traceability B Input hash (no PII), Prediction, Decision, Model version, Timestamp C Only the final decision without details

8. Which tool is used for data drift detection in production operations?

A Pandas B Evidently C LIME

9. What does MLflow track in the context of AI governance?

A Only the model accuracy B Experiment parameters, metrics (including fairness), artifacts — complete audit trail C Only deployment configurations

10. What does EU AI Act Annex IV (Technical Documentation) prescribe for high-risk systems? 2 pts

A Only a brief description of the model type B 8 Mandatory Sections: Purpose, Development Process, Monitoring, Accuracy, Fairness, Declaration of Conformity, etc. C A certification by an accredited auditor

11. How often must the Technical Documentation be updated according to Art. 11?

A Annually B For each significant system change C Only in the first version

12. A credit model shows a TPR of 0.68 for applicants < 25 years compared to 0.91 overall. What is the correct response?

A Acceptable — young applicants often have less credit history B Remove model from scoring, root cause analysis, bias mitigation before re-deployment C Adjust threshold for this group

13. What is the difference between SHAP for classical ML models and LLMs?

A SHAP works with LLMs just like with tree models B In LLMs, attention weights provide limited explanations — SHAP is complex and less reliable C LLMs do not need explainability as they output text

14. Which RAGAS metric measures whether a RAG response is covered by the retrieved documents?

A answer_relevancy B context_precision C faithfulness

15. What does the Microsoft Responsible AI Toolbox offer beyond Fairlearn?

A Only a better UI for Fairlearn metrics B Error Analysis, Causal Inference, What-If Scenarios, and Counterfactuals in a Dashboard C Production Monitoring and Alerting

16. Which tool is the best choice for Production Drift Detection?

A SHAP B Evidently AI C IBM watsonx.governance

17. An agent has: CRM access (PII), web search (untrusted), email sending. What is the risk? 2 pts

A Minimal risk — these are normal business functions B Lethal Trifecta: allowing all three components simultaneously enables data exfiltration via prompt injection C Medium risk — only if the agent is poorly trained

18. What does the Principle of Least Privilege mean for AI agents?

A The agent receives the least computational resources B The agent receives only the capabilities that are minimally necessary for the specific task. C The agent may only perform simple tasks

19. An agent waits 5 minutes for HITL approval. No human responds. What happens?

A The agent performs the action with the lowest priority B The agent continues to wait — Human oversight takes precedence C Timeout = Rejection (fail-closed). Action will not be executed.

20. You are developing a credit scoring system. Which stack is fully compliant with the EU AI Act high-risk requirements? 2 pts

A XGBoost + good accuracy + GDPR-compliant logging B XGBoost + Fairlearn (Bias < 0.05) + SHAP + MLflow (Audit) + Evidently (Drift) + Technical Documentation (Annex IV) + HITL-Override C XGBoost + IBM watsonx.governance license