The reasonable-commander standard in the age of AI

Estimated reading time: 21 minutes

The reasonable-commander standard is the benchmark against which international humanitarian law (IHL) judges a proportionality decision. It asks not whether an attack turned out well, but whether the assessment behind it was one a reasonable commander could have reached on the information available at the time. My argument here is that this standard holds in the age of Artificial Intelligence Decision Support Systems (AI DSS). Properly designed and properly used, AI DSS can strengthen a commander’s capacity to meet the standard rather than erode it.

That claim is conditional, and I want to name the conditions at the outset. Everything turns on system design, on the AI literacy and training of the people using these tools, on sound institutional procedure, and on preserving room for genuine human deliberation. The claim is also about support, not substitution. Just as Autonomous Weapon Systems (AWS) engage targets that a human has already selected through the targeting cycle, AI DSS do not make the proportionality decision; they inform the commander who does.

In my previous post, Where the human belongs, I examined the evidentiary mechanics of the ex ante test — the operational records, legal-adviser memoranda and documentation infrastructure on which any later review of a commander’s judgement depends. Only after that foundational work does this post turn to what the AI DSS context changes. My focus is the attacker’s obligation under Article 57(2)(a)(iii) of Additional Protocol I to the Geneva Conventions of 1949 (AP I), with a distinct and shorter treatment of the defender’s position under Article 58.

I. What the reasonable-commander standard actually asks

The standard tests the assessment, not the outcome

Proportionality is codified in Article 51(5)(b) and Article 57(2)(a)(iii) AP I. It prohibits an attack expected to cause incidental civilian harm that would be “excessive in relation to the concrete and direct military advantage anticipated”. The treaty fixes the rule. It does not tell us how to judge a particular commander’s application of that rule to messy facts.

Notably, AP I never mentions a “reasonable military commander”. As Henderson and Reece observe, the phrase does not describe who bears the obligation; it describes the standard against which the decision is judged (Henderson and Reece, at p. 840). The term owes its origin to the ICTY Final Report on the NATO bombing campaign, which made a deliberate choice of a military-commander benchmark over a lay “reasonable person” (ICTY Final Report, at paras 49–50).

The temporal vantage point matters most. The standard looks at the decision ex ante. In Galić, the Tribunal framed the test around “a reasonably well-informed person in the circumstances of the actual perpetrator” (Galić Trial Judgment, at para. 58), making reasonable use of the information then available. The question therefore fixes on the quality of the assessment given what was knowable, not on the outcome seen in hindsight.

Objective but qualified

What kind of assessment is this? Henderson and Reece distinguish three possibilities: a subjective belief, an unqualified objective reasonableness, and an objective reasonableness qualified by a defined role (Henderson and Reece, at p. 840). The reasonable-commander standard is the third kind. It measures the decision against a person carrying the training, experience and operational understanding of a military commander (Henderson and Reece, at p. 841).

Gotovina shows both the reviewability of the standard and its practical limits. The Trial Chamber had inferred unlawful shelling from a 200-metre margin of error around identified targets. The Appeals Chamber set that figure aside for want of a reasoned evidentiary basis (Gotovina Appeal Judgment, at paras 61, 64–65). In dissent, Judge Agius objected that, with no margin of error substituted in its place, almost no attack could be classed as “indiscriminate on the basis of evidence regarding impact sites” — leaving the legal category, on his reading, all but unworkable (Gotovina Appeal Judgment, Dissenting Opinion of Judge Agius, at para. 21). The episode shows a tribunal grappling with where the boundary of reasonable assessment lies — and how reviewability suffers when no boundary is fixed at all.

A zone of reasonableness — and the hinge for AI DSS

Read together, these sources point to something the dominant “IHL cannot be quantified” narrative tends to blur. The Final Report accepted that answers “may differ depending on the background and values of the decision maker”, and that commanders will not always agree in close cases, while many cases will still draw clear agreement that harm was disproportionate (ICTY Final Report, at para. 50). Schmitt and Schauss reach the same place from the other direction, rejecting the idea that IHL holds the definitive thresholds found in domestic criminal or civil law (Schmitt and Schauss, at p. 192). The reasonable-person literature echoes it: even criminal law has declined to reduce “reasonable doubt” to a number (Rane, at p. 7).

The convergence across these treatments suggests a precise way to state what the standard is. It defines a zone of reasonableness: a bounded range of assessments a reasonable commander could reach, rather than a single correct answer. A decision inside the zone is lawful even where another commander would have chosen differently; a decision outside it is not. That boundary is what makes the standard reviewable. A court asks whether the assessment fell outside the range that no reasonable commander would have entered, and does not substitute its own preferred figure.

The decisive question for AI DSS is therefore narrow and answerable: does the system improve the information reasonably available to the commander, and the quality of the weighing, without displacing the judgement itself? Where it does, it strengthens the commander’s claim to have acted reasonably.

II. Where AI DSS meets the standard: the attacker through the targeting cycle

AI DSS are already inside the joint targeting cycle (JTC) — the structured process through which a commander’s targeting staff (the multidisciplinary element that develops, validates and recommends targets to the commander) turns intelligence into a lawful attack and reviews the result. So the useful question is not whether these systems belong there. It is whether a given system is built to widen the commander’s reasoning or to narrow it. My argument is that the reasonable-commander standard supplies the test for telling the two apart, and that a well-designed system clears it. I walk through four stations of the cycle where the standard bites.

Situational awareness and the information set

Start with what the standard actually rewards. Galić fixes reasonableness to the information available to the decision-maker at the time (Galić Trial Judgment, at para. 58). The commander who reaches a sound judgement on a thin or stale picture has still met the standard; the commander who reaches a poor one on a rich picture has not. The decisive contribution of AI DSS, therefore, is to the picture itself.

Here a well-designed system does something a human under time pressure cannot. It draws together the relevant streams of intelligence, surveillance and reconnaissance and fuses them, so that the commander is freed from assembling the picture and can spend scarce time interrogating it. This is the inversion the critics miss. The commander’s task is not to gather from many sources; it is to verify what the system has gathered.

Lewis and Ilachinski, working from an analysis of several thousand real-world civilian-harm incidents, give this concrete shape (Lewis and Ilachinski, at p. ii). The functions they identify are not generic optimism; they map to documented failure patterns. One alerts forces to transient civilians who move into a target area. A second compares the imagery behind a collateral-damage estimate with fresh imagery to surface unanticipated civilian presence. A third flags a possible miscorrelation when a tracked vehicle is no longer the one first tracked — a swap between a threat vehicle and a civilian one (Lewis and Ilachinski, at p. 50). Each function does verification work the commander would otherwise have to do unaided, or not at all.

This also disposes of a persistent objection: that an AI DSS may hand the commander a single recommendation drawn from one intelligence stream, narrowing rather than widening the view. That describes a system designed badly, not a property of AI. Verifying a target across all available sources is not a new duty conjured up to compensate for unreliable algorithms. It is the existing law. Article 57(2)(a)(i) AP I requires those who plan or decide an attack to do everything feasible to verify the target. The UK Manual states the corollary plainly: commanders must assess the “information from all sources which is available to them at the relevant time” when deciding upon an attack (JSP 383, at para. 5.3.4, p. 54), and may have regard to intelligence reports, aerial or satellite reconnaissance and any other information in their possession (JSP 383, at para. 5.32.2, p. 82). A system that confines the commander to one stream fails a standard the law already set. A system that fuses many streams serves it.

Distinction and verification

At the verification station, AI changes the factual basis of the judgement without touching the rule. Geairon captures this exactly: these systems structure what decision-makers can “foresee, compare and justify” ex ante, and so recalibrate the factual basis of legal judgement while leaving the legal rules untouched (Geairon, lead section). Distinction still asks the same question. AI changes the evidence the commander brings to it.

The risks Geairon names — data gaps, opacity and over-reliance on technical outputs (Geairon, lead section) — are real, but they are engineering and training problems, not an inevitable slide into deference. We have empirical reason to doubt the assumption that trained military operators will simply defer to whatever an AI presents them. Lopez and colleagues, interviewing fourteen United States Air Force F-35A pilots — instructors, flight leads, wingmen — about trust, ethics and autonomous teaming, found that trust was not a default. The pilots conditioned it on being able to understand the system’s processing, and they framed the question inside the disciplinary architecture they already operate in, the Rules of Engagement and the Uniform Code of Military Justice (Lopez et al., section 5.2). Over-reliance is a failure mode to be designed and trained against, not a feature of human nature.

Proportionality: stating the erosion thesis and answering it

Proportionality is where the strongest objection lives, and it deserves its strongest statement. Call it the erosion thesis: the claim that AI DSS, by quantifying and accelerating the assessment, gradually displace the qualitative human judgement that proportionality requires. Dorsey makes this case carefully. She identifies three mechanisms. Automation bias is the tendency to over-trust a machine’s output. Anchoring is the pull of an early figure — a casualty estimate, a policy threshold — on the judgement that follows. Cognitive offloading is the handing of mental work to the system until the human stops doing it. At machine speed, she argues, the room for moral and contextual reasoning contracts, and human oversight risks becoming “little more than a procedural rubber stamp” (Dorsey, at p. 1068).

I take this seriously. But notice where Dorsey locates the harm. Her remedy is not deliberation at any price. She objects specifically to designs that accelerate the decision “to the point where opportunities for critical assessment are reduced or eliminated” (Dorsey, at p. 1071), and asks that the deliberative space — what she calls cognitive friction, meaning the moments of effort that make a human actually weigh a recommendation rather than wave it through — be preserved rather than engineered out. Her objection is to a threshold being crossed.

That is the whole argument, and it cuts toward my conclusion rather than against it. If the harm arises when a particular threshold is crossed, then the harm is a property of the design, not of the technology. Klonowska reaches the matching conclusion from the other side: systems that offer the commander several courses of action promote the contestation and reflection that reasonable outcomes require, while systems that produce a single output tend to suppress them (Klonowska, concluding section). The erosion thesis, taken to its own logical end, is an argument about how to build these systems, not a reason to reject them.

Three further objections — and why they collapse into the same answer

Three lines of argument associated with the ICRC’s observations on Arthur Holland Michel’s external report sharpen the erosion thesis and deserve direct engagement. Each, on examination, makes the same misstep.

The first is Holland Michel’s claim that fusing multiple imperfect ISR sources expands the total uncertainty of the output (Holland Michel, at p. 33). The claim has the wrong sign. Combining independent sources that point in the same direction reduces the uncertainty of the conclusion they jointly support; this is the elementary logic of corroboration on which all-source intelligence has always depended. Holland’s own annex concedes the point by defining intelligence fusion as a process that correlates information from disparate sources because battlefield data are only meaningful when read against other sources. What he has actually identified is the data-quality problem familiar to every IT system — garbage in, garbage out — which is pre-existing and domain-general. It is not a property of fusion, and it is not a reason against AI.

The second is that a complex DSS consolidates several previously human-involved steps into a single output, so that the user’s role is “reduced to either approving or negating a proposed plan for the use of force” (ICRC, Observations on the External Report, at p. 6, quoting Holland Michel). The unstated premise is that legal reasonableness requires human judgement at every step. It does not. The law attaches the obligation to the decision — the Article 57 judgements of verification, proportionality and precaution — not to each computational sub-task feeding it. A commander has never personally performed every step that produces his picture: weaponeering, image analysis, CDE inputs, source weighting and translation have always been distributed across staff and tools. What the law requires is genuine judgement at the legally decisive points on a picture the commander can interrogate. Consolidation is a problem only where it eliminates that interrogation, which is again a function of design and use.

The third, more fundamental, is that “by transferring an assumption that is a fundamental aspect of a human decision from the human to a machine, human responsibility for that decision is potentially diminished” (Holland Michel, at p. 39). This conflates making an assumption with bearing responsibility for the decision built on it. Responsibility under IHL attaches to the person who decides to attack, not to whoever — or whatever — supplied each component of the underlying picture. If the principle were otherwise, every staffed military decision in history would already be suspect, since commanders have always relied on assumptions absorbed by their J2, J3 and TARGCO. They are not relieved of responsibility thereby; they are required to interrogate that work to the standard a reasonable commander would. The reasonable-commander standard already has the doctrinal apparatus for this. The migration of assumptions to a machine makes the duty harder to discharge where the machine is opaque. It does not transfer the duty off the commander.

Read together, these three objections converge on the same answer that the ICRC’s observations themselves contain. The user, the ICRC writes, must be able to scrutinise the available information independently, take account of the system’s capabilities and limitations, and retain the operational capacity to override its outputs (ICRC, Observations on the External Report, at p. 6). That is not an argument against AI DSS. It is a description of what a system used reasonably looks like — and it is what the standard already requires.

Feasible precautions and the decision space

The same reasoning governs precautions, and here I want to meet the commander’s real concern head-on. Article 57(2)(a)(ii) AP I requires all feasible precautions in the choice of means and methods to avoid or minimise civilian harm; “feasible” is what is practicable or practically possible, taking into account all circumstances ruling at the time, including humanitarian and military considerations (JSP 383, at para. 5.32, n. 191, p. 81). The commander does not want a system designed to slow him down; he wants to act inside the enemy’s decision cycle, not behind it. The objection to “preserving deliberation” is that it sounds like a brake.

It is not, and this is the heart of the operational case. A well-designed AI DSS does not buy deliberation by adding delay. It buys deliberation by removing cognitive load — by fusing the picture, vetting the inputs, and presenting accurate, actionable options — so that the commander reaches a better-informed decision in less time, not more. The Lewis and Ilachinski functions are precautions of exactly this kind: each lets the commander take more care without taking longer. The decisive edge that AI DSS promise is real, and it is fully compatible with the standard, because a commander who decides faster on a verified picture is deciding more reasonably, not less.

That reframes the “decision space” debate. Whether AI DSS widen the commander’s room for judgement or shrink it turns on design and use, not on the technology. And it disposes of the inferential leap at the centre of the erosion narrative. Even if the reports on Gospel and Lavender could be trusted, they would not support the sweeping conclusion that all AI DSS inevitably drive commanders to rubber-stamp recommendations in rapid succession. A system that produces that behaviour suffers from a serious but avoidable design fault. The remedy is to build it correctly, and the reasonable-commander standard tells us what correct looks like.

III. The defender’s side: Article 58

Section II argued the attacker’s case under Article 57. The defender owes a different duty, and it is worth treating in its own terms — not least because the affirmative case for AI DSS is at its strongest here, and at its most under-developed.

A different obligation on a different benchmark

Article 58 AP I requires the parties to a conflict, to the maximum extent feasible, to remove civilians from the vicinity of military objectives, to avoid locating military objectives within or near densely populated areas, and to take the other precautions necessary to protect the civilian population, civilians and civilian objects under their control against the dangers resulting from military operations. Galić states the duty plainly (Galić Trial Judgment, at para. 61), and the ICTY Final Report on the NATO Bombing Campaign restates it in the same terms (at para. 51). The text and the case law are stable and uncontroversial.

What is less often noticed in the AI DSS literature is that Article 58 operates on a different feasibility benchmark from Article 57. Schmitt and Schauss capture the structural point: Article 57 imposes “active precautions” — what those who plan or decide an attack must do — while Article 58 imposes “passive precautions” on a defender to “the maximum extent feasible” (Schmitt and Schauss, at p. 177, n. 124). The two regimes share a common root in the concept of feasibility but apply at different points and to different actors. The Article 58 duty is also independent: a defender’s failure to discharge it does not relieve the attacker of distinction or proportionality, and civilians improperly located within or near military objectives still count in the proportionality equation (Galić Trial Judgment, at para. 61; ICTY Final Report, at para. 51).

Where the affirmative case for AI DSS is strongest

The Article 58 assessment differs from the Article 57 assessment in three operationally important ways. The decision-maker is assessing his own civilians and his own infrastructure, not an adversary’s. The time horizons are typically longer — planning, mapping, evacuation, sheltering. And the legal benchmark is more demanding. These are exactly the conditions in which AI DSS can carry the defender’s argument furthest.

Greipl makes the operational case concrete. AI can be used to map critical civilian infrastructure — hospitals, power plants, water sources — and the interconnectedness between them, a step she describes as crucial to reducing the impact on civilians and their livelihoods (Greipl, section “AI to Enhance Civilian Harm Mitigation Measures”). The defender’s planner who can see where a hospital draws its power, which substation feeds the water-pumping station, and which neighbourhoods depend on each, has a far better grip on what “the maximum extent feasible” actually requires than the planner without such tools.

Geairon, writing from a humanitarian-law perspective, lands on the matching point. Protection of critical civilian infrastructure, she argues, depends less on the technical performance of AI than on how these tools are embedded in rigorous, well-documented legal decision-making processes (Geairon, section “Human judgment under algorithmic influence”). The defender-side affirmative case is therefore not a claim about the technology in the abstract. It is a claim about institutional design — the same conditional structure that runs through this post — applied to a different obligation on a different benchmark.

Greipl makes one further observation that bears repeating. Governments have, she writes, shown preference for “faster and more precise targeting decisions rather than the protection of civilians more broadly” (Greipl, opening section). That is a diagnosis worth heeding. The Article 58 use case is the one the dominant erosion narrative has least to say against, and the one democratic states have the strongest doctrinal and political claim to develop. It is where AI DSS could most clearly strengthen the reasonable commander’s capacity to meet his legal duty — by giving him the picture of his own population and infrastructure that the duty already assumes he must have.

IV. Closing the loop and keeping the commander reasonable

The reasonable-commander standard is not a one-shot test. Each engagement produces a record that ought to feed the next. The standard rewards an institution that learns — not an algorithm that learns, but a force whose people, procedures and tools improve from one operation to the next. AI DSS belong inside that loop, and the loop is what keeps the standard meaningful as use accumulates.

The ex post loop

Kwik’s account of “iterative assessment” gives the loop its sharpest articulation. He frames AI-related uncertainty as dynamic rather than static — many failure modes only become knowable ex post — and proposes that decision-makers systematically integrate post-deployment insight into the next assessment, “repeatedly refining decisions or actions” on the basis of accumulated experience and evolving understanding, with continuous improvement as the aim (Kwik, at p. 209).

The crucial point is that this is not novel. Kwik himself observes that the ex post discipline he describes is already embedded in established operational planning and targeting practice — the NATO targeting cycle ends in an “Assessment” phase, and the broader doctrine of “lessons learned” treats learning from prior iterations as a fundamental element of operational art (Kwik, at p. 222). The AI case does not invent a new institutional habit; it extends one that operational doctrine has insisted on for decades.

Dorsey reaches the same place from the proportionality side. She calls for aligning the metrics used in ex ante civilian-harm estimates with those used in ex post battle-damage assessment and after-action reporting, so that the lessons from one strike inform the next (Dorsey, at p. 1047). Read together, these treatments converge on a structural claim about AI DSS and the standard: the ex post record sustains ex ante reasonableness across iterations. The institutional infrastructure I examined in post #18 — the operational records, legal-adviser memoranda, and documentation that make the ex ante test reviewable — is the same infrastructure that closes this loop forward.

The conditions

The affirmative case I have made in this post is conditional, and the conditions are worth naming plainly. System design must expose its assumptions and support contestation, so the commander can interrogate rather than merely approve. The operator must possess the AI literacy and training to do that interrogation. The institution must run the ex post/ex ante loop, not treat lawfulness review as a one-off. And the decision moment must preserve deliberation — not at any price, but to the point where critical assessment remains possible.

These conditions are not technology-determinist. They are how design and use either meet the reasonable-commander standard or do not. Where they fail, the standard will judge the assessment as one no reasonable commander could have made; the zone of reasonableness is not infinitely elastic, and a poorly designed or poorly used AI DSS will not stretch it. Where they hold, the standard will recognise the commander as having done what a reasonable commander could.

The accountability anchor

This is what keeps the standard enforceable as AI DSS use accumulates. IHL applies to individuals, not to systems. As Barrett-Taylor and Karner observe — drawing on the March 2025 ICRC submission to the UN Secretary-General — it is humans, not the technologies they use, who plan, decide upon and execute attacks, and on whom accountability for those determinations rests (Barrett-Taylor and Karner, at p. 31). The duty does not migrate to the machine, however much the picture or the recommendation does.

Barrett-Taylor and Karner state the corollary directly: “the burden of responsibility remains on the commander”, who is expected to make decisions that are defensible and grounded in military necessity (Barrett-Taylor and Karner, at p. 36). Boutin reinforces the same allocation from the State-responsibility side: where an AI failure can be traced to negligent human conduct attributable to the State, the State bears responsibility — responsibility tracks human decision-making, not the machine (Boutin, at p. 328).

Read together, these three propositions close the argument. The duty does not transfer from the commander to the system; the ex post record sustains reviewability across iterations; and the conditions of design, training and procedure determine whether a particular use falls inside or outside the zone of reasonableness. The reasonable-commander standard therefore remains both meaningful and enforceable in the AI DSS context — not by surviving despite AI, but by doing in the AI context what it has always done: testing the quality of the human assessment against the information reasonably available, and holding the human accountable for it.

Conclusion

The reasonable-commander standard was built to do something the law rarely asks of itself: to test a human judgement made under uncertainty, on the information available at the time, against the weight of military advantage and civilian harm. It has tested that judgement through four decades of armed conflict, across tribunals and military manuals, without ever supplying a single correct answer. It defines a zone of reasonableness — bounded, reviewable, and not collapsible into a value the law itself fixes on a scale. AI DSS do not threaten that test. They enter a structure the standard has always accommodated.

Properly designed and properly used, AI DSS strengthen the commander’s capacity to meet the standard. They widen the information set the standard already requires him to assemble. They support the verification that distinction already demands. They free his attention from gathering for interrogating. They give the defender, in particular, the picture of his own population and infrastructure that Article 58 has always required him to construct. The erosion thesis, taken to its own logical end, names a set of design failures the standard already forbids. None of this is new doctrine. It is the existing doctrine, applied to a new generation of tools.

This case is conditional, and the conditions matter. System design must expose its assumptions and support contestation. The operator must possess the AI literacy to interrogate what the system presents. The institution must run the ex post/ex ante loop, not treat lawfulness review as a one-off. And the decision moment must preserve deliberation enough that critical assessment remains possible. Where these conditions hold, the zone of reasonableness accommodates AI-assisted judgement. Where they fail, it does not — and a poorly designed or poorly used system will not stretch the zone to fit. The standard does the work it always did: it tests the assessment against what a reasonable commander could have made.

The duty does not migrate. Whatever the system fuses, whatever it recommends, whatever it consolidates, responsibility for the attack remains with the commander. That is what makes the standard enforceable as AI DSS use accumulates. The commander cannot offload his decision to the machine because the law does not allow him to — and the ex post record that documents his assessment is the same record that sustains the standard’s reviewability into the next operation, and the one after that.

Having asked how the reasonable commander reaches a sound assessment, my next post turns to what the law requires when his information runs out — the presumption of civilian status under Article 50(1) of Additional Protocol I, and the commander’s burden when doubt cannot be resolved..

About the author

With more than 25 years of experience, Andreas Leupold is a lawyer trusted by German, European, US and UK clients.

He specializes in intellectual property (IP) and IT law and the law of armed conflict (LOAC). Andreas advises clients in the industrial and defense sectors on how to address the unique legal challenges posed by artificial intelligence and emerging technologies.

A recognized thought leader, he has edited and co-authored several handbooks on IT law and the legal dimensions of 3D printing/Additive Manufacturing, which he also examined in a landmark study for NATO/NSPA.

Connect with Andreas on LinkedIn