Skip to main content Scroll Top

Black Box, Burden of Proof

conny-schneider-Image_For_Feyi_Article

Black Box, Burden of Proof: Who Carries the Risk of AI Opacity?

 

“It is putting a very high price on one’s conjectures to have someone burned for them.”

…Montaigne

A system labels someone high risk, a facial-recognition tool produces a match, a ranking system lowers a worker’s visibility, a predictive model recommends heightened scrutiny. What follows is often treated as ordinary procedure, and the affected individual is expected to challenge the outcome. Yet in many cases, they are asked to do so without meaningful access to the logic, weighting, assumptions, or methodological choices that produced it. In most areas of litigation, this would appear unusual. A party seeking to rely on an instrument, methodology, dataset, or expert opinion is ordinarily expected to disclose enough information to permit meaningful scrutiny of its reliability. Opaque AI systems, however, increasingly operate on different terms. Their outputs are sometimes presented with the authority of technical objectivity, while the basis upon which those outputs are generated may remain inaccessible.

The issue is not simply one of transparency. In practice, such systems may also alter how evidential uncertainty and procedural burden are distributed between parties. European courts have already confronted versions of the problem in concrete terms. In the SyRI case, the Dutch government used an algorithmic system to identify potential welfare fraud. Individuals could be flagged as high risk without any meaningful insight into how that assessment had been produced; what data were relied upon, how indicators were weighted, or how conclusions were drawn.. The system was ultimately struck down on human rights grounds. But the deeper concern raised by the case extended beyond  privacy or data protection. It was also procedural. The state was able to rely on algorithmically generated suspicion, while the individuals affected were left in the position of attempting to contest conclusions whose basis remained substantially obscured.

Similar concerns can be seen in the Italian administrative algorithm cases, where Italian administrative courts emphasised that algorithmic decision-making affecting rights or legal interests must remain sufficiently transparent and intelligible to permit meaningful challenge. Importantly, transparency was not treated as an abstract virtue. Rather, it was tied to the broader requirement that individuals must retain a genuine ability to contest decisions that materially affect them.

A related logic can also be seen in the British facial-recognition litigation. In R (Bridges) v Chief Constable of South Wales Police, the Court of Appeal held that the legal framework governing the deployment of facial-recognition technology was inadequate because it afforded excessive discretion regarding who could be placed on watchlists and where the technology could be used. In Thomson and Carlo v Commissioner of Police of the Metropolis, by contrast, a revised Metropolitan Police policy survived scrutiny largely because tighter safeguards had been introduced, including more clearly defined watchlist criteria, authorisation requirements, and proportionality checks.

What is notable across these cases is that courts increasingly appear to be sensitive to the institutional risks associated with opaque and highly discretionary systems. Their responses, however, have largely focused on governance safeguards, transparency obligations, or proportionality review. They have been less willing to confront the evidential dimension directly: namely, who should bear the consequences of opacity when such systems are relied upon in legal or quasi-legal decision-making.

Existing evidentiary doctrine already points toward a possible answer. In U.S. federal law, Rule 702 requires the proponent of expert evidence to demonstrate that the opinion rests on sufficient facts, reliable methods, and reliable application. In England and Wales, the Criminal Practice Directions 2023 (as amended) similarly treat reliability as dependent upon matters such as data quality, methodological validity, uncertainty, peer review, and the disclosure of information necessary to evaluate the opinion. These requirements are not peripheral technicalities. They reflect a deeper evidential principle: evidence should ordinarily be capable of meaningful interrogation by the opposing party.

That principle arguably becomes more important, rather than less, when AI systems enter the picture.

The EU AI Act reflects a similar intuition from a regulatory perspective. High-risk AI systems, particularly in areas such as law enforcement and public administration, are subject to obligations relating to documentation, logging, monitoring, and human oversight. From August 2026, affected individuals will also obtain rights to meaningful explanations regarding the role played by certain high-risk systems in decisions producing legal or similarly significant effects. Likewise, the Council of Europe’s 2024 Framework Convention on AI emphasises transparency, accountability, and effective remedies as central rule-of-law safeguards in the context of AI deployment.

These developments are significant because they increasingly recognise that opacity creates institutional and procedural risks. Yet they do not fully resolve the question of how those risks should be allocated once disputes reach adjudicative settings.

The tension is illustrated clearly by the American decision in State v Loomis. There, the Wisconsin Supreme Court permitted the use of a proprietary risk assessment tool during sentencing while simultaneously acknowledging that the defendant’s ability to challenge its scientific validity was constrained by the opacity of the system. Significantly, however, the case also demonstrates that the difficulty is not simply the absence of evidential safeguards. U.S. evidence law already contains mechanisms, like the broader gatekeeping logic associated with Daubert and reflected in Rule 702; through which courts assess methodological reliability and validity. Yet algorithmic systems are not always treated as triggering this level of scrutiny. In Loomis, COMPAS functioned less as expert scientific evidence subjected to rigorous methodological interrogation and more as a sentencing support tool accompanied by cautionary warnings. This suggests that existing evidential doctrines may be partially circumvented where algorithmic outputs are framed as advisory, administrative, or supplementary rather than as expert evidence requiring full adversarial testing. The court’s response was therefore essentially cautionary: judges were warned against placing undue reliance on the tool. Yet caution alone does not resolve the underlying allocation of evidential risk. The asymmetry remains. One party continues to rely on a system whose underlying operation cannot be meaningfully interrogated by the other.

But evidence law is not unfamiliar with asymmetry. In discrimination law, both within the EU and elsewhere, burdens of proof are already adjusted where one party controls access to the relevant information. Under Article 10 of the EU Directive on Equal Treatment in Employment and Occupation, once a claimant establishes facts giving rise to a presumption of discrimination, the burden shifts to the defendant to demonstrate that no unlawful discrimination occurred. The rationale is relatively straightforward: where proof of internal processes or motivations lies overwhelmingly within the control of one party, fairness may require evidential burdens to be recalibrated accordingly.

The logic seems expandable to opaque AI systems. Where a party seeks to rely on an AI-assisted output that cannot be meaningfully tested, the practical burden should not rest entirely on the individual affected by it. That is particularly so where the relevant asymmetry results from choices relating to system design, contractual restrictions, disclosure practices, or institutional control over the relevant information. In such circumstances, it becomes difficult to characterise opacity as a neutral evidential condition.

The technical literature increasingly reinforces this point. Cynthia Rudin has argued that in many high-stakes settings, black-box systems are adopted despite the existence of more interpretable alternatives. Finale Doshi-Velez and Been Kim similarly emphasise that interpretability becomes especially important where AI systems affect questions of fairness, safety, or legal consequence.

Taken together, these developments increasingly undermine the idea that opacity is simply an inevitable by-product of technological sophistication. In many contexts, it is also the result of institutional and design choices with procedural implications. Hence, the allocation of evidential risk matters. Courts sometimes speak as though black-box systems merely create unfortunate evidential difficulties. But where one party selects the system, negotiates the contractual constraints, controls access to the documentation, and determines the conditions of disclosure, the resulting asymmetry is not naturally occurring. It is institutionally produced, and the law should respond accordingly.

Three implications follow. First, courts should recognise a structured duty of disclosure where AI-assisted evidence is materially relied upon. At a minimum, this should include information relating to the system’s purpose, known limitations, validation studies, error rates, data characteristics, version history, and the role of human oversight in the specific case. The objective is not unlimited transparency, but to ensure that contestation remains practically meaningful rather than merely formal. Second, courts should be more willing to scrutinise the use of opaque systems in high-stakes adjudicative settings where reasonably interpretable alternatives are available. If opacity is avoidable, the decision to rely on it should carry evidential consequences. Third, where a party with superior access refuses meaningful disclosure, courts should be more prepared to draw adverse inferences (Deliveroo Italy case), reduce evidential weight, grant procedural accommodations for independent expert review, or, where necessary, exclude the evidence altogether.

Black-box AI does not merely complicate factual assessment. It may also redistribute legal and evidential risk in ways that place affected individuals at a structural disadvantage. If courts are serious about procedural fairness, the relevant question is no longer whether opacity is regrettable, but whether the costs of that opacity should continue to be borne primarily by those least able to penetrate it.