Performance Evaluation — Or, How to Prove Your AI Management System Was Doing Anything

Welcome back. We have arrived at the part of the series where we stop talking about building the AI management system and start talking about proving it does anything. Clause 9 is the great winnower of management systems. By the time you reach it, every organisation has policies, an objective or two, and a risk register someone is “still finalising.” What Clause 9 asks — and it asks with the same faintly tired expression every auditor I have ever met has been wearing for fifteen years — is: yes, but how do you know?

This is the chapter where ISO/IEC 42001:2023 quietly reminds you that “we had a workshop” is not evidence of effectiveness. Strap in.

What Clause 9 Actually Says

Clause 9 — “Performance evaluation” — is the standard’s instruction manual for self-assessment. It contains three subclauses: 9.1 Monitoring, measurement, analysis and evaluation; 9.2 Internal audit; and 9.3 Management review. They are, between them, the difference between a binder and a system.

9.1 — Monitoring, Measurement, Analysis and Evaluation

Clause 9.1 requires the organisation to determine what needs to be monitored and measured, the methods used (chosen so that valid results are produced), when monitoring shall be performed, and when results shall be analysed. It also requires you to evaluate AI performance and the effectiveness of the AIMS itself, and to retain documented information as evidence. None of which is exciting until you realise the standard is asking AI teams to do something they almost categorically refuse to do: commit, in writing, to the metrics they will be judged by.

You are expected to monitor at least two layers. The first is the AIMS itself: are the controls operating, are policies being followed, are objectives being met? The second is the AI systems within scope: are the models behaving the way you said they would, in the conditions you said they would? The standard is mercifully unprescriptive about which metrics to use — it just insists that you pick some, defend them, and apply them with the tedious regularity that distinguishes a management system from a press release.

9.2 — Internal Audit

Clause 9.2 mandates an internal audit programme. This will surprise precisely no one who has ever read another ISO management system standard, because Clause 9.2 in 42001 is, with the lightest possible coat of AI-flavoured paint, the same internal audit clause that has appeared in ISO 9001, ISO 27001, ISO 14001, and every other standard whose number ends in “001” since approximately the Carter administration.

The clause splits cleanly: 9.2.1 sets the purpose (audits at planned intervals to determine whether the AIMS conforms to the organisation’s own requirements and to ISO 42001, and is effectively implemented and maintained), and 9.2.2 sets the programme (plan and maintain audit programmes; define criteria and scope; select auditors and ensure objectivity and impartiality; report results to relevant management; retain documented information; take action on findings). If you have not sat through an ISO 27001 audit, the short version is: someone competent and not personally invested in the result will read your evidence and tell you, in a tactful but unmistakable way, where you are kidding yourselves.

9.3 — Management Review

Clause 9.3 has three sub-sub-clauses. (Yes, sub-sub. Welcome to ISO drafting, where nesting is a virtue.) 9.3.1 establishes that top management shall review the AIMS at planned intervals. 9.3.2 lists the inputs that must be considered: status of actions from previous reviews; changes in external and internal issues relevant to the AIMS; information on AIMS performance and effectiveness, including nonconformities and corrective actions, monitoring and measurement results, audit results, fulfilment of AI objectives, and feedback from interested parties; opportunities for continual improvement. 9.3.3 lists the outputs: decisions and actions related to continual improvement and any need for changes to the AIMS, with documented information retained as evidence.

In practical terms: top management — the actual ones, not their delegates — must sit in a room (or, fine, a Teams meeting) at planned intervals and look at the evidence the rest of Clause 9 produced. They must then make decisions about what to do. They must record those decisions. If “the executive team is too busy” was your plan, the standard would like a word.

What It Means in Practice

Practically, an organisation passing through Clause 9 needs to build, at minimum: a defined set of AIMS and AI-system performance indicators with documented monitoring frequencies; an internal audit programme with a written charter, a schedule, qualified auditors, audit reports, and a corrective-action loop; a management-review cadence (most organisations land on quarterly or semi-annually) with a fixed agenda mapped to 9.3.2 inputs and minutes that map cleanly to 9.3.3 outputs. None of this is conceptually hard. All of it requires the kind of discipline that organisations typically discover three weeks before a certification audit.

What’s New, or At Least Mildly Surprising

Most of Clause 9 will look comfortingly familiar to anyone who has worked an ISO 27001 implementation. The structure is identical, the language is nearly identical, and the cadence — monitor, audit, review — is unchanged. This is by design: ISO management system standards share a common Annex SL high-level structure, and 42001 is a faithful citizen of that empire.

The novel bits, such as they are, sit in what you must monitor. Clause 9.1 of 42001 quietly assumes that you are measuring AI-specific performance — model accuracy, fairness metrics, drift, incident rates, the effectiveness of impact assessments under Clause 6.1.4 and Annex A — and not just the operating health of the management system. Most organisations will already have ad-hoc model dashboards somewhere; Clause 9.1 will require those dashboards to be elevated from “engineering curiosity” to “documented monitoring artifact subject to retention requirements.”

The internal audit clause adds an obligation, often underappreciated, that auditors must be objective and impartial with respect to the AIMS. In practice, that means the data scientist who built the model cannot audit the model’s controls. Many organisations are about to discover that “objective and impartial” is a more expensive HR problem than they budgeted for.

And finally — the bit nobody will warn you about until it’s too late — the management review minutes. They are documented information. They are retained. They are readable by certification auditors. If your executive team enjoys speaking expansively about how AI risk is being “managed holistically across the enterprise,” they should be aware that future auditors will want the receipts.

Closing

Clause 9 is, in the end, the part where the AIMS earns the right to call itself a management system rather than a particularly elaborate set of intentions. Build the metrics. Schedule the audits. Hold the reviews. Take the minutes. Retain the evidence. None of it is glamorous, and almost all of it is exactly the kind of work that AI teams insist they “don’t have time for” until the moment they are asked to demonstrate compliance to a regulator, a customer, or a slightly disappointed certification body.

Next time, mercifully, we move into Clause 10 — Improvement. Which is the standard’s official acknowledgement that whatever you put in place under Clause 9 will surface things that need fixing. And so the cycle continues. Wearily.

Leave a Comment

Scroll to Top