Are commanders obliged to consider, verify and rely on AI DSS output?

Verifying AI DSS output
image by TSD Studio for unsplash.com

Estimated reading time: 15 minutes


Artificial Intelligence Decision Support Systems (AI DSS) increasingly sit between the commander and the evidence on which he acts. Picture one in use before an attack: it reports a target’s status, with a confidence figure attached. What does the law require the commander to do with that AI DSS output? May he rely on it? Must he? And if he relies on it, has he verified the target — or merely deferred to a machine?

This post answers with a single thesis. Consulting and relying on AI DSS output is itself an exercise of the duty to take feasible precautions to verify the target under Article 57(2)(a)(i) of Additional Protocol I to the Geneva Conventions of 1949 (AP I). Reliance is a verification act. It is neither a shortcut around the verification duty nor a substitute for it.

The thesis has a second edge, which the post develops in stages. If reliance is a verification act, it inherits the standard of one. The commander must take feasible care that the output he relies on is itself sound — not vitiated by systemic error or other shortcomings of the system. Verifying the target through the tool is the first layer. Verifying the tool’s output is the second. The second layer is where most of the work lies, and Section II takes it up.

Two earlier posts supply the ground I build on, and I do not re-argue them here. Post #19 developed the reasonable-commander standard: the law tests the quality of the commander’s assessment on the information reasonably available, not the outcome with hindsight. Post #20 set out what the doubt rules in Articles 50(1) and 52(3) AP I require when doubt survives a reasonably executed verification. Both posts already found that a reasonable commander must consider the information reasonably available to him, AI DSS output included. I treat that as settled and start from it.

I. Reliance as a verification act, not a substitute for one

The duty to consider, taken as given

I state the starting point rather than prove it again. A reasonable commander must consider the information reasonably available to him before he engages a target, and that information includes AI DSS output. Posts #19 and #20 reached that conclusion, and this post does not reopen it. What those posts reserved for separate treatment is the harder question: once the output is in front of him, what must the commander do with it?

Reliance as an exercise of the Article 57(2)(a)(i) duty

Reliance on AI DSS output is best understood as a way of discharging the verification duty, not as an alternative to it. Article 57(2)(a)(i) AP I requires those who plan or decide upon an attack to “do everything feasible to verify” that the objectives are military objectives and not civilians or civilian objects. The duty attaches to the verification of the target. It does not prescribe the means. Consulting a reliable AI DSS is one means; interrogating its output is the verification.

Renato Wolf’s analysis supports reading the duty this way. He breaks the use of AI DSS into stages — legal qualification, classification, and identification or location — and shows that the obligation to do everything feasible to verify reaches every one of them (Wolf, at p. 291). The duty does not switch off because a machine performed part of the analysis; it runs through the tool. Wolf then makes his distinctive move: precautions to verify can be specific to the AI DSS and can operate outside the human operator’s role altogether (Wolf, at p. 304). Verification is not confined to an operator’s final glance at a screen.

These two points converge on the thesis. If the verification duty reaches every stage of AI DSS use, and if it is discharged partly through the system rather than only at the operator’s checkpoint, then consulting and relying on the output is part of how the commander verifies. Reliance is a verification act.

The duty is bounded, not absolute. Article 57(2)(a)(i) AP I does not demand everything possible, regardless of military cost. It demands everything feasible — what is practicable or practically possible, weighing humanitarian benefit against military cost (Wolf, at pp. 291–292). Wolf draws the consequence for our subject directly: the precautions to verify a commander must take when using AI DSS cannot be settled without accounting for the military costs those precautions carry. Feasibility, not perfection, is the measure. I adopt that assessment, and it governs both layers of the verification duty that follow.

Is non-reliance defensible?

Here I press the harder edge. If reliance is a way of verifying, then refusing to consider reliable, available output is, in my view, a failure to verify. The commander who ignores sound AI DSS output without good reason has not done everything feasible to confirm his target. Post #20 put the general principle in place: a commander who declines to consider relevant and reliable information reasonably available to him has not reasonably executed the verification process. AI DSS output is such information. Disregarding it without justification therefore undermines the ex post defensibility of the decision, and in a serious case it can expose the commander to liability for an attack that feasible verification would have corrected.

In operational terms, the duty to rely is not utopian. Reasonable reliance on AI DSS output is achievable in trained hands, as the empirical work in post #18 indicates. The point here is narrow. If reasonable reliance is achievable, then unjustified non-reliance is not the cautious choice it appears to be; it is itself a verification failure.

II. The reliability a verification act presupposes

Verifying the output, not only through it

Relying on AI DSS output discharges the verification duty only if the commander also attends to the soundness of the output itself. This is the second layer of the duty. The first layer asks whether the target is a military objective, and the commander answers it partly through the tool. A second question then arises: is the output he relies on itself reliable? An output that confirms his intended course of action does not answer that question. It raises that question rather than settling it.

The point follows from what the output is. AI DSS output is evidence, and the duty to do everything feasible to verify the target reaches the evidence on which the verification rests. Wolf locates the human operator’s role precisely here: the operator’s task is to detect the false positives the system produces (Wolf, at p. 300). The IAPS policy memo makes the operational difficulty plain — AI DSS generate recommendations that are hard to verify independently under time pressure (IAPS memo, introduction). A commander who treats a confirming output as self-executing has done the first layer and skipped the second.

A confirming output is not self-certifying. AI DSS can err in ways a glance at the result will not reveal — through bias inherited from the training data or the algorithm (Wolf, at p. 302), and through systemic failure modes that sit beneath the output. Those failure modes do not condemn AI DSS as a class. They are simply why the commander’s engagement with an output must be more than a nod.

The feasibility bound

The second layer does not demand the impossible. The commander need not recreate the system’s reasoning step by step, nor follow every internal process the AI DSS run. Wolf is realistic about this: recreating the entire classification process is unlikely to be possible within the time constraints of military operations (Wolf, at p. 300). What is feasible is a plausibility check. The operator looks at the proposed target and the data behind it and asks, in Wolf’s phrase, “Does this look right?” (Wolf, at p. 300). That is the realistic shape of the second layer.

So the reliability the commander owes is bounded by feasibility, as the verification duty is throughout. He must make a reasonable assessment of the output in the circumstances — against the data, the situation, and the time and means available. He does not owe an exhaustive audit.

Here I part company with a dominant strand of the literature. The demand that a human supervise every processing step of an AI DSS asks more than the law requires and more than operations allow. Article 57(2)(a)(i) AP I asks for feasible verification, not perfect oversight (Wolf, at pp. 291–292). A standard that required the commander to follow each internal computation would make lawful reliance impossible, which is not what the feasible-verification duty says.

Automation bias is real. It is a challenge to be managed — through who is selected to operate the system, how they are trained, and the conditions under which they work (Wolf, at pp. 302, 303) — rather than a defeater of reliance. I developed the empirical side of this in post #18: trained military users calibrate their reliance instead of deferring blindly. The legal consequence is narrow but firm. Automation bias bears on whether reliance was reasonable in the circumstances; it does not convert reasonable reliance into a breach.

Where the assessment sits: the distributed duty

Most of the work of guarding against systemic error happens before the commander ever sees an output. It sits upstream — with those who develop, test, validate, procure and accredit the system. The commander inherits that work; he does not reconstruct it. Wolf shows part of the mechanism. Many precautions against AI DSS error are taken upstream, by modifying the software and hardware or by choosing the classification method and features (Wolf, at p. 301). Those steps belong to the developers and to the operators who run the AI DSS, not to the commander at the moment of attack. NATO’s Principles of Responsible Use (PRU) point the same way. An AI application is to have a well-defined use case. Its safety, security and robustness are established by testing and assurance across the life cycle, including through certification (NATO PRUs, Reliability principle). Reliability, on this picture, is an accredited property the commander draws on, not one he builds from scratch.

The degree of scrutiny owed also varies by role. Klonowska and Kwik observe that the optimal level of trust — how far a user questions and intervenes in a system’s recommendations — may differ depending on whether the user is an engineer, an operator, or a legal adviser (Klonowska & Kwik, ch. 9, at p. 109). Each interrogates a different thing: the engineer the model, the operator the output, the legal adviser the targeting decision. Reliability assessment is distributed across these roles. It is not a single audit the commander performs alone.

Where does accountability sit in this distributed structure? On an identified human, and the law already puts it there. Proposals to designate a specific individual in the chain of command as accountable for each AI-involved engagement (IAPS memo, Recommendation 1), and NATO’s principle that clear human responsibility must apply (NATO PRUs, Responsibility and Accountability principle), both locate answerability in people. In my view, such designation adds organisational clarity; it does not create the accountability. A natural person is already answerable at law for the harm his decisions cause, including where a user deploys an AI DSS outside the use case for which it was accredited. Calls for a new senior accountable officer respond to a need for clarity about who answers, when the answer at law is already settled. I developed the underlying point in post #19: accountability rests with the human and is not transferable to the system.

The commander’s own share of the second layer is therefore specific. He owes a feasible assessment of the output in the circumstances — its fit with the system’s stated use case, its known limitations, and any red flag actually before him. He does not owe what the procurement and accreditation layer already owes. In operational terms, he may rely on an AI DSS within its accredited use case unless something gives him reason to doubt it. What he may not do is treat the output as beyond question, or treat its confirmation as the end of his duty rather than one input to it.

That leaves the harder case. What happens when the commander’s feasible reliability check does not resolve the doubt — when the output’s soundness, or the target’s status, remains genuinely uncertain? That question belongs to the doubt rules, and Section III takes it up.

III. When the verification act leaves doubt unresolved

The output does not retire the doubt rule

A confident AI DSS output does not retire the doubt rule, because the output is a probability, not a resolution of doubt. When an AI DSS classifies a person or object by proxy features, the correlation between those features and what the law actually requires is never perfect; the most the system can yield is that the target belongs to a class with some probability, never with absolute certainty (Wolf, at p. 295). A confidence figure of ninety expresses how likely, not whether. It measures the residual uncertainty; it does not eliminate it.

This probabilistic character is not a defect of the AI DSS. Uncertainty is inherent to armed conflict; the fog of war predates the tool and survives it, and an AI DSS that reports a probability is measuring that uncertainty rather than creating it. Properly used, AI-enabled intelligence, surveillance and reconnaissance (ISR) helps lift that fog; it is no source of new fog. The output’s numerical form does not change the legal test. Whether doubt and certainty can meaningfully be quantified is a question I leave open here; what matters for now is that a confidence figure is the system’s estimate of residual uncertainty, not a legal threshold. As post #20 concluded, the law measures the commander’s assessment by reasonableness in the circumstances, not against a fixed numerical threshold of certainty — so the figure is an input the commander must still weigh, never the rule that decides.

The probabilistic character of the output has a direct doctrinal consequence. The doubt rules engage on the underlying uncertainty — whether a person is a civilian under Article 50(1) AP I, whether a normally civilian object is being used militarily under Article 52(3) AP I — and not on the AI DSS output as such. An output that looks like it has resolved the matter does not engage the rules on itself. Post #20 set out what those two rules direct, in their two deliberately different verbs, and I do not restate that here. The point for present purposes is narrower: a high-confidence output is the system’s estimate of the very uncertainty the doubt rules address, so it cannot be the thing that switches them off.

Doubt engages on the underlying facts

When the commander’s feasible reliability check leaves real uncertainty, the doubt rules engage on the facts. Post #20 established the structure I rely on: the rules engage on doubt that survives a reasonably executed verification, and the duty to take feasible precautions under Article 57(2)(a)(i) AP I runs ahead of them. Applied to AI DSS, the structure is unchanged. If, after the commander has made his feasible assessment of the output, genuine doubt remains about the person’s status or the object’s use, that residual doubt is the trigger. Article 50(1) AP I then directs the result for persons and Article 52(3) AP I the result for objects. The output does not change which rule applies or what it requires.

The converse case matters just as much, and it links back to Section I. Where the commander could have dispelled the doubt by feasibly relying on or verifying available AI DSS output, but did not, the doubt that persists is not a clean trigger for the protective rules. It is evidence that the verification process failed. Post #20 made the general point: doubt the commander could have dispelled reflects a failure of the precautions duty rather than the occasion for the doubt rules. AI DSS output is one of the things that could have dispelled it. Unjustified non-reliance, in other words, does not earn the protection of a rule designed for doubt that survives a genuine effort to resolve it.

So the practitioner reading of the third question is this. An AI DSS changes what the commander must do to dispel doubt: it adds a means of verification he must feasibly use and feasibly assess. It does not change when the doubt rules engage, or what they engage on. They engage on the underlying facts — status and use — when uncertainty survives a reasonable inquiry, whatever the output reports. An AI DSS that appears to resolve doubt the underlying facts leave open has not moved the rule; it has only offered an estimate the commander must still weigh. Reliance can form part of the feasible verification the law demands, but it is not automatically the discharge of that duty — and this is where that distinction bites. Where reliance leaves doubt genuinely unresolved, the commander is handed back to the doubt rules, and they meet him on the facts.

That returns the analysis to where post #20 left it: an ex post tribunal asking whether the verification was reasonably executed and the assessment reasonable in the circumstances. The Conclusion draws the two layers and the doubt-engagement point into a single statement of the commander’s position.

Conclusion

The commander is obliged to consider AI DSS output and, within limits, to rely on it — because consulting and interrogating that output is itself a way of doing what Article 57(2)(a)(i) AP I requires: everything feasible to verify the target. Reliance is a verification act. That single characterisation answers the three questions this post set out to address.

On the first, the duty to consider available output is settled, and reasonable reliance on it can form part of feasible verification. Reliance is not automatically the discharge of that duty. But unjustified non-reliance is no safe harbour either: ignoring sound, available output without reason is a failure to verify, and an ex post tribunal can reach it.

On the second, the duty has a second layer. Because the output is the evidence, the commander must take feasible care that the output itself is sound. That care is bounded — he owes a reasonable plausibility assessment in the circumstances, not an audit of the model or oversight of every internal step. It is also distributed: reliability is built, tested and accredited upstream, and the commander inherits that work rather than rebuilding it. Accountability, throughout, rests on identified humans and does not pass to the system.

On the third, an AI DSS changes what the commander must do to dispel doubt, not when the doubt rules engage or what they engage on. A confident output estimates the residual uncertainty; it does not resolve it. Where genuine doubt about a person’s status or an object’s use survives a reasonable inquiry, Articles 50(1) and 52(3) AP I engage on those facts, whatever the output reports. And where the commander could have dispelled the doubt by feasibly using available output but did not, the doubt that persists marks a failure of verification rather than an occasion for the protective rules.

The optimistic-realist reading is straightforward. AI DSS extend the commander’s reach without displacing his legal judgement; they give that judgement more to work with, and the law governing it is well able to absorb them. The reasonable-commander standard and the doubt rules already supply the test. What AI DSS alter is the quality of the inputs the commander weighs; the standard he is held to stays the same.

This post has taken one specific precaution — the duty to verify in Article 57(2)(a)(i) AP I — as its anchor. My next post steps back to the obligation that sits above all the specific precautions: the duty of constant care in Article 57(1) AP I to spare the civilian population in the conduct of military operations. It is the most assumed and least examined obligation in targeting law — and the quiet foundation on which much of this series has rested.

About the author

With more than 25 years of experience, Andreas Leupold is a lawyer trusted by German, European, US and UK clients.

He specializes in intellectual property (IP) and IT law and the law of armed conflict (LOAC). Andreas advises clients in the industrial and defense sectors on how to address the unique legal challenges posed by artificial intelligence and emerging technologies.

A recognized thought leader, he has edited and co-authored several handbooks on IT law and the legal dimensions of 3D printing/Additive Manufacturing, which he also examined in a landmark study for NATO/NSPA.

Connect with Andreas on LinkedIn