1. Core Activities & Roles (Monitoring & Optimisation)¶
Purpose
Overview of core activities and role assignments during the Monitoring & Optimisation phase, from operational monitoring to drift detection and cost control.
1. Core Activities¶
Operational Monitoring & MLOps¶
We monitor the 'heartbeat' of the system.
- Real-time Performance Tracking: Dashboarding of critical metrics: Latency (speed), Error rates, Uptime, Throughput.
- Performance Degradation Monitoring: Statistically monitoring whether production input data deviates from training data (Data Drift) or whether the relationship between data and outcomes changes (Concept Drift).
- Data Loop Integration: Feeding production data and outcomes back into the development environment for analysis (Feedback Loop).
- Automated Triggers: Setting alerts for drops below thresholds (e.g. accuracy \< 85%).
Continuous Improvement & Retraining¶
Standing still means falling behind.
- Retraining Strategy: When do we retrain? (Periodically? On drift alert? On new data?).
- Experiment Loops: Use production insights to test new hypotheses in short sprints (A/B testing, Canary releases).
- Backlog Management: Maintain a living list of bugs, improvements and feature requests from users.
Cost Control & Energy Efficiency¶
Sustainability in euros and CO2.
- Cloud & API Optimisation (Cost Overview): Monthly review of compute (GPU/CPU) and token costs. Optimise through model compression (quantisation) or caching.
- Sustainability Measurement (ESG): Monitoring energy consumption (inference footprint) and reporting for ESG goals.
- Resource Allocation: Set up autoscaling to adjust infrastructure to actual demand.
Ethical Oversight & Compliance Monitoring¶
Ongoing legal conformity.
- Post-Market Surveillance: (EU AI Act requirement) Continuously scanning for unforeseen bias, discrimination or safety risks.
- Audit-ready Logging: Retaining logs of decisions and human interventions for auditors.
- Transparency Reports: Periodic reporting to stakeholders and CAIO on safety and performance.
- Fairness Audit (Bias Audit): Regular sampling by the Ethicist of the 'tone' and quality of outputs.
Decommissioning¶
An AI system has a finite lifespan. Define in advance when shutdown is justified.
Decommissioning triggers:
| Category | Trigger | Action |
|---|---|---|
| Technical | Drift exceeds threshold and retraining does not improve performance | System offline, root cause analysis |
| Economic | Cost per Productive Outcome rises > 50% above baseline after 2 quarters | CAIO review: stop or re-architect |
| Ethical/Legal | Critical fairness audit finding or new legislation renders system non-compliant | Immediate stop, Guardian review mandatory |
| Strategic | Use case disappears due to organisational change or better alternative available | Controlled wind-down per handover plan |
Decommissioning process:
- Announcement: Inform users and stakeholders in advance (minimum 4 weeks).
- Archiving: Retain the technical dossier, validation reports and Kaizen Log per retention policy.
- Knowledge transfer: Document lessons learned in the Lessons Learned register.
- Data deletion: Delete or anonymise production data in accordance with GDPR [so-49].
- Infrastructure: Shut down compute, API keys and monitoring pipelines.
- Guardian sign-off: Guardian confirms all Hard Boundaries obligations have been fulfilled.
2. Team & Roles¶
| Role | Responsibility in Monitoring & Optimisation |
|---|---|
| MLOps Engineer | Responsible: Owner of monitoring pipelines, infrastructure and stability. |
| AI Product Manager | Accountable: Guards Business KPIs, manages backlog and user feedback. |
| Chief AI Officer (CAIO) | Consulted: Evaluates long-term ROI and strategic impact. |
| Data Scientist | Responsible: Analyses Performance Degradation, performs retraining and improves models. |
| Guardian (Ethicist) | Consulted: Performs ethical reviews and post-market surveillance. |
5. Related Modules¶
Further reading:
See also: Phase 5 Overview · Deliverables
Next step: Set drift thresholds and schedule the first quarterly review (Gate 4). → Use the Gate 4 Checklist as your starting point. → See also: Continuous Improvement | Evidence Standards
Was this page helpful?
Give feedback