Abstract
Algorithmic risk assessment tools, such as COMPAS, are increasingly used in criminal justice systems to predict the risk of defendants to reoffend in the future. This paper argues that these tools may not only predict recidivism, but may themselves causally induce recidivism through self-fulfilling predictions. We argue that such “performative” effects can yield severe harms both to individuals and to society at large, which raise epistemic-ethical responsibilities on the part of developers and users of risk assessment tools. To meet these responsibilities, we present a novel desideratum on algorithmic tools, called explainability-in-context, which requires clarifying how these tools causally interact with the social, technological, and institutional environments they are embedded in. Risk assessment practices are thus subject to high epistemic standards, which haven't been sufficiently appreciated to date. Explainability-in-context, we contend, is a crucial goal to pursue in addressing the ethical challenges surrounding risk assessment tools.
1. Introduction
Algorithmic risk assessment tools such as Equivant's COMPAS (Equivant 2019) are increasingly used in criminal justice systems around the world to help judicial actors evaluate defendants’ risk of reoffending in the future and to inform sentencing decisions. There has been extensive debate in recent years about whether these tools and other artificial intelligence systems exhibit problematic kinds of biases, including racial bias, and whether they achieve relevant fairness properties (e.g., Angwin et al. 2016; Arnold et al. 2021; Berk et. al 2023; Biddle 2020; Dieterich et al. 2016; Flores et al. 2016; Freeman 2016; Hedden 2021; Hellman 2020; Washington 2019; Zezulka and Genin 2023). In this paper, we highlight a related but distinct problem surrounding the use of risk assessment tools, which has to date been relatively neglected in the ethics of AI literature: performativity.
A risk assessment tool (RAT) is performative when its predictions about a defendant's future behaviors causally affect those behaviors. Performativity can arise in different forms, such as when a prediction is self-defeating or self-fulfilling (Buck 1963; van Basshuysen et al. 2021; Khosrowi 2023; van Basshuysen 2023; King and Mertens 2023). We focus on a particularly problematic version, where (1) a defendant's risk to reoffend is predicted as high, (2) the defendant is subsequently incarcerated, (3) the defendant engages in future criminal behaviors in line with the prediction, but (4) incarceration itself plays a central role in inducing these behaviors. In such a case, the RAT correctly classifies the defendant as high-risk, but it does so for the wrong reasons because the prediction is self-fulfilling in virtue of the criminogenic effects that incarceration can have (e.g., Lambie and Randell 2013; Stevenson 2017).1
While we are not the first to note the potentially self-fulfilling nature of risk assessments (e.g., Morgan et al. 1996; Hamilton 2015a; Sidhu 2015), we argue that this type of performativity creates previously underappreciated epistemic-ethical challenges. First, because it imposes significant injustices on defendants and creates harms for society at large. Second, because such injustices and harms are difficult to recognize empirically: the predictive performance of risk assessment tools is routinely validated against observational data, but this approach fails to unpack why there is agreement between predictions and actual courses of events.
The potential performativity of RATs hence points to serious deficiencies in the methodology used to evaluate these tools. To address these problems, we argue that RATs need to be complemented with sincere efforts aimed at explainability. Explainable artificial intelligence (XAI) seeks to help stakeholders understand how and why algorithmic systems come to make certain predictions, classifications, decisions, or recommendations and continues to attract significant attention from AI researchers, ethicists, civil rights scholars, and many others (e.g., Burrell 2016; Fleisher 2021; Selbst and Barocas 2018; Nyrup and Robinson 2022; Winter, Hollman, and Manheim 2023). Here, we propose a wider explainability desideratum we call explainability-in-context (EIC). In a nutshell, EIC aims to elucidate how an AI system interacts with the wider social, technological, and institutional landscape it is embedded in. In particular, we advocate that validating RATs requires taking a causal perspective that aims to understand whether and how these tools causally interact with a target to co-shape the very outcomes they serve to predict. In the case of RATs, an ideal version of this strategy would seek to distinguish between prediction-dependent and prediction-independent effects on recidivism, and ensure that both are included in risk assessments.
More broadly, an EIC perspective grounds the claim that performativity yields previously underappreciated epistemic-ethical responsibilities on the part of developers and users of RATs (cf. King and Mertens 2023): These tools aren't merely epistemic tools that provide one-way access to decision-relevant features of the world, but are performative tools that have the capacity to change the world, for better or worse. Developers and users must hence (1) epistemically, ensure that RATs provide information about potential performative effects and to consider such information in decision-making, and (2) morally, be accountable for harms induced through departures from this constraint.2 While additional work is needed to study the challenges raised by performativity, to develop ways to alert institutional actors to its significance and the epistemic-ethical responsibilities that arise from it, and to design better strategies to mitigate harmful forms of performativity, EIC marks a first important step in guiding such efforts.
The discussion is organized as follows. Section 2 elaborates on the problem of performative predictions in the context of RATs in the criminal justice system. We identify the two main causal pathways that enable harmful performativity: (1) RATs can causally influence the type and severity of sentences, and (2) sentencing decisions can in turn causally affect downstream criminal behavior. Against this background, we highlight how harmful performative effects may arise but go unnoticed when validating the predictive performance of RATs using observational data. Section 3 outlines EIC as a general desideratum and elaborates the concrete demands it imposes for dealing with performative predictions responsibly. Section 4 sketches epistemic-ethical responsibilities arising on the part of developers and users to promote EIC for RATs. Section 5 concludes.
2. Can Risk Assessments Become Self-fulfilling Prophecies?
Consider the following example:
Tyler, an 18-year-old male from a disadvantaged neighborhood, is arrested for the possession of methamphetamine with intent to distribute. Even though he has no prior criminal records, Tyler's risk of recidivism—including violent recidivism—is assessed as high by a RAT. Based on the assessment, the judge rules out probation and passes the maximum sentence of five years in state prison. During his first year in prison, a cell mate blackmails Tyler and extorts money from him. Desperate to escape his predicament, Tyler joins a gang that promises protection in exchange for executing jobs for the gang downstream. Within six months upon release, in a robbery gone wrong, Tyler fatally shoots a man.
In our vignette, the risk assessment becomes a self-fulfilling prophecy, because it triggers a chain of events that leads to Tyler shooting the man. Had the judge not based the verdict on the high estimated recidivism risk, Tyler may have been released on probation. He wouldn't have been blackmailed, wouldn't have joined the gang, and subsequently, he wouldn't have shot the victim. The example is stylized in order to clearly trace a causal chain of events, from the high risk assessment to the harsh sentence, and from Tyler's stint in prison to the shooting. In any real case, however, identifying similar causal links may be difficult and contestable. So, may real-world risk assessments actually be self-fulfilling in ways similar to our vignette? Rather than seeking to identify concrete cases where this might have happened, our aim here is to provide a proof-of-concept by identifying two causal pathways which, taken together, establish a possible way in which risk assessments can be self-fulfilling: first, risk assessments may affect judicial decisions regarding the type and severity of sentences; and second, time spent in prison may have criminogenic effects, that is, it can change individuals’ propensity to reoffend. Let's consider these causal pathways in turn.
2.1 Do Risk Assessments Affect Sentences?
Whether risk assessments may be used to inform sentencing is a controversial issue in ethics, law, and public policy. When determining whether someone will be released on parole or sent to prison, or for how long, the permissibility of drawing on risk assessments will depend on the theory of sentencing one endorses. According to some theories, sentences ought to be exclusively determined by what should be regarded as the fair retribution for a crime, which rules out the use of forward-looking risk assessments in sentencing decisions (see Monahan and Skeem 2014; 2016). Other theories, in contrast, base sentencing on the consequences for the defendant and society. These theories thus allow for a more forward-looking sentencing including the use of risk assessment, as do hybrid theories that combine retributivist and forward-looking considerations in sentencing (see Monahan and Skeem 2014; 2016; Hamilton 2015b). But, even when we find forward-looking punishment permissible, as many influential theories of sentencing (e.g., Morris 1974) in fact do, this doesn't imply that the use of specific tools, such as COMPAS, is permissible for sentencing as, for instance, these tools might not be designed or fit for such purposes (cf. State vs. Loomis 2016, 100).
Notwithstanding prevailing disagreement, actual legal practice in the US commonly includes, and sometimes mandates (in the pretrial context; see Garrett and Monahan 2020), information from RATs in pre-sentence investigation reports (PSIs) that are provided to sentencing courts (Casey et al. 2011). Such PSIs are widely used in helping judges arrive at sentencing decisions (though there is also substantial resistance, see e.g., Pruss 2023). This practice is fully licensed in many US states; as Monahan and Skeem (2014) point out, a number of states have explicitly “[. . .] incorporated risk assessment into sentencing guidelines as one factor that judges may consider in determining the appropriate sentence within the limits established by law” (2014, 495, emphasis original). While judges are widely permitted to use information from RATs, whether they in fact draw on such information or ignore recommendations made by RATs is a different issue. Existing studies regarding whether information from RATs is indeed impactful in shaping sentencing decisions document substantial variation between jurisdictions and individual judges; but they also indicate that, at least by their own reports, judges do in fact use information from RATs; and they do so not only to pass milder, non-custodial sentences on low-risk defendants, but also, albeit to a lesser extent, harsher sentences on high-risk defendants (Garrett and Monahan 2020; Stevenson and Doleac 2022).
Let us turn to a specific, much-discussed case as an example for how information from RATs may shape a judicial decision for high-risk defendants: That of Eric L. Loomis.3 Loomis was accused of being the driver in a drive-by shooting. His PSI featured COMPAS risk scores, according to which Loomis presented a high risk of recidivism, including of violent recidivism. Even though it was emphasized in the PSI that these scores shouldn't be used for determining probation or the severity of the sentence, it appears that the responsible circuit court, following an argument by the state attorney, did both. First, the court made reference to the COMPAS scores when ruling out probation:
You're identified, through the COMPAS assessment, as an individual who is at high risk to the community. In terms of weighing the various factors, I'm ruling out probation because of the seriousness of the crime and because your history, your history on supervision, and the risk assessment tools that have been utilized, suggest that you're extremely high risk to re-offend. (State v. Loomis 2016, 19, emphasis added)
Based on the court's language, we think it is plausible that information from COMPAS was at least among the factors that causally contributed to ruling out probation for Loomis.
Whether the court also used the scores for determining the length of the sentence is less clear. After receiving the maximum penalty on two counts (six years of imprisonment and five years of extended supervision), Loomis filed a motion soliciting a new hearing, arguing that the court's consideration of the COMPAS risk scores at sentencing violated his due process rights (State v. Loomis 2016, 23). Perhaps surprisingly, the court denied the motion not by arguing, against Loomis, that its use of the risk scores at sentencing didn't violate his due process rights. Rather, the court denied that the risk scores had contributed to determining Loomis's sentence in the first place, explaining it had only “used the COMPAS risk assessment to corroborate its findings and that it would have imposed the same sentence regardless of whether it considered the COMPAS risk scores” (State v. Loomis 2016, 28, emphasis added). The court's reasoning here is unconvincing, however. To rule out that the scores contributed to determining the sentence, the relevant counterfactual to consider is not only whether the court would have imposed the identical sentence regardless of whether it considered the COMPAS scores, as the court contended, but also whether it would have imposed the identical sentence had the scores been different. Since the court remained silent about these counterfactuals, its reasoning fell short of fully responding to Loomis's due process challenges. Furthermore, the Wisconsin Supreme Court indirectly confirmed the circuit court's use of COMPAS at sentencing, while however denying that this had violated Loomis's due process rights: “Ultimately, we disagree with Loomis because consideration of a COMPAS risk assessment at sentencing along with other supporting factors is helpful in providing the sentencing court with as much information as possible in order to arrive at an individualized sentence” (State v. Loomis 2016, 765). The Supreme Court's ruling thus licensed the use of COMPAS at sentencing, only circumscribing its use by requiring that “[a] COMPAS risk assessment is only one of many factors that may be considered and weighed at sentencing” (State v. Loomis 2016, 769). Summing up, the circuit court, in this case, made use of COMPAS scores in ruling out probation and, arguably, in determining the severity of the sentence; and the Wisconsin Supreme Court licensed this use.
Moving beyond State v. Loomis, risk assessments are frequently referenced when determining probation or the severity of sentences. While large-scale comparative and survey studies document that RATs frequently fail to achieve their envisioned effects on the criminal justice system (Stevenson 2018), in particular in regard to reducing incarceration rates and decreasing recidivism while improving public safety, and also show that judges frequently pass sentences that deviate significantly from RAT recommendations (Stevenson and Doleac 2022), these studies also demonstrate that RATs nevertheless play recognizable roles in judicial decision-making (Garrett and Monahan 2020), including in determining sentence type and severity/duration, and that there is ample scope for their use to become more frequent and entrenched.
2.2 Does Incarceration Have Criminogenic Effects?
Let us turn to the second causal pathway, which concerns the effects of incarceration on recidivism. The criminogenic effects of incarceration have been investigated by researchers for decades and there is now substantial empirical and theoretical literature that documents and analyses such effects. For instance, a host of empirical studies have made progress on clarifying the criminogenic effects of imprisonment by estimating differences in recidivism between otherwise similar defendants with respect to whether they were incarcerated, their sentence was suspended or they were sentenced to community services (standalone or as probation) (Cid 2009; Clear 2008; Spohn and Holleran 2002; Vieraitis et al. 2007). Relatedly, other studies find similar effects for shorter pre-trial detention periods (Lowenkamp et al. 2013; Gupta et al. 2016; Heaton et al. 2017), suggesting that even short durations of incarceration can yield recognizable differences in defendants’ future propensity to (re-)offend.
These empirical efforts are complemented with a wide range of attempts to explain the relationship between imprisonment and (future) criminal behavior. These range from atheoretical, common sense causal narratives to theory-driven attempts to explain how different factors bear on criminal behavior and how imprisonment, in turn, can intervene with these in negative ways. Broadly, imprisonment is often argued to be conducive to recidivism by disrupting family ties and offenders’ social networks; worsening offenders’ mental health, especially when imprisonment does not involve rehabilitative efforts; and by making it more difficult for offenders to secure housing and employment upon release. Moreover, especially for young offenders, imprisonment is often believed to foreclose opportunities to build social capital relative to their peers. Additional worries focus on the effects of being exposed to other inmates with high criminal potential, which may draw susceptible offenders towards criminal behaviors in the future. All of these factors, plausibly, can make it difficult for offenders to re-enter society in a way that protects them from being drawn towards criminal behaviors, such as in our initial vignette. Some of these concerns have been put on theoretical grounding, too. For instance, labeling theory (Paternoster and Iovanni 1989) seeks to describe the ways in which ex-prisoners’ being labeled as such can drastically change not just formal but also informal interactions with society on re-entry, worsening offenders’ prospects at successful re-integration in ways that are difficult to detect, for example through subtle stereotyping and associated prejudice. While these attempts at explaining empirically diagnosed relationships between incarceration and criminal behavior seem largely compelling, we do not commit here to any of the more specific mechanisms offered. Our arguments remain effective even if the true causal mechanisms by which incarceration influences criminal behavior remain unknown.
In sum, there are both empirical and theoretical reasons to believe that incarceration can causally affect individuals’ propensity to reoffend in the future. Because of this, using RATs to inform pre-trial judicial decisions and sentencing can sometimes yield self-fulfilling performative effects. To be sure, we are not the first to emphasize this as a problem: The performative potentials of risk assessment have been recognized in the existing literature, both in the general context of judicial decision-making as well as in the specific context of using RATs (Hamilton 2015; Barabas et al. 2018). Our arguments differ from existing contributions, however, in that we specifically focus on the epistemic-ethical problems that arise when RATs are presented and used as putatively predictive tools. On our account, RATs’ functioning can significantly extend beyond this role and they are hence better understood as performative instruments that do not only provide one-way epistemic access to relevant features of the world (e.g., defendants’ robust propensities to reoffend), but also have the capacity to actively shape the world, including in epistemic-ethically problematic ways. This, in turn, yields underappreciated epistemic-ethical obligations on the part of developers and users of RATs, as we argue below.
2.3 Linking the Pathways
We are now in a better position to understand the problems performativity can pose for the responsible use of RATs. The first causal pathway captures how RATs can causally contribute to whether defendants are incarcerated and for how long. The second causal pathway, in turn, makes clear how the criminogenic effects of incarceration enable risk assessments to yield self-fulfilling performative effects, inducing rather than just predicting future criminal behavior. Both pathways are necessary for harmful performative effects, but not yet sufficient for such effects actually obtaining. There are both conditions and choices that can affect whether performative effects in fact materialize.
To better understand how such effects can come about, let us consider some stylized cases, which illustrate how a RAT implemented in a criminal justice system may come to induce harmful performative effects. Before RATs can be used in judicial decision making, they must be tested and validated to show that they can provide accurate assessments of recidivism risk. Without such validation, decision-makers would likely not, and should not, use RATs at all. But validating the predictive performance of RATs against available observational data is susceptible to turning features of defendants that are otherwise causally and probabilistically unrelated to recidivism into good predictors of it, though only as a result of self-fulfilling performative effects. Let us consider two cases in turn to illustrate how this might happen.
Assume that judicial decisions about pre-trial detention, parole and imprisonment in some past period (say, the 1950s–1980s) were widely based on a range of individual characteristics (or variables) that judges believed to be relevant to defendants’ recidivism risk. Let us assume that at least some of these variables were neither causally nor statistically relevant for recidivism but were still frequently included to make informal risk assessments. Let us take one variable, G, which tracks a defendant's gender and assume that some fraction of decision-makers believed that males were more likely to reoffend than females, all else equal. Let us also imagine that this assumption was wrong, perhaps induced by confirmation bias, cherry-picking of notable cases, and so on. So, gender may have been a tie-breaker between defendants being incarcerated or not in a substantial number of cases, but wrongly so. Why is this a problem? First, if gender does not have incarceration-independent statistical or causal relationships to recidivism, then it is simply wrong, epistemically (as well as morally and legally), for judicial decision-makers to have used it in decision-making. But, more pertinently for our present arguments concerning RATs, using gender in this way can also establish a statistical relationship between gender and recidivism that becomes important when developing and validating RATs later on. Let's fast forward in our stylized mini-world to the 2000s. If a software developer building a RAT draws on observational data of defendant characteristics and post-trial criminal history, they will see that gender and recidivism are correlated, perhaps even when controlling for other factors. And this is not just a statistical fluke. If gender played an important role in prior sentencing decisions, and incarceration has criminogenic effects, then gender is indeed a good predictor of recidivism because it is now causally relevant for recidivism. But, problematically, it is the wrong variable to focus on. In an ideal setting, with varied and large enough datasets, ‘controlling for’ imprisonment status or sentence type could, at least in principle, shed some light on whether the correlation between gender and recidivism is due to imprisonment(-type) or not. But unfortunately, no risk assessment score or tool we are aware of has been developed to take this into account. Pertinently, the development and validation of Equivant's controversial COMPAS tool was “strongly influenced” (Equivant 2019, 13) by the OGRS (offender group reconviction scale) score offered by Copas and Marshall (1998). While Copas and Marshall explicitly consider criminogenic effects of incarceration, they note that their approach “[. . .] does not condition on the sentence given and so if a sentence has an effect on reconviction then this effect is not taken into account” (2019, 170). Equivant (2019) do not comment at all on whether the estimation strategy underlying COMPAS improves on Copas and Marshall's approach, so there are reasons to worry that it does not.
Importantly, the case described here is one where a RAT itself is faultless at least insofar as it does not itself induce the relationship between gender and recidivism. Its malfunctioning is only parasitic on earlier, human decision-making, which ‘baked’ this relationship into the observational data a RAT is calibrated on and tested against. But we can easily imagine more severe cases where a RAT itself induces such statistical relationships. Social science methodologists and statisticians have known for more than a century that spurious correlations are a threat to sound predictive (and causal) inference, so all it takes is a variable Z that a developer considers a candidate for predicting recidivism, and once Z is included in a RAT that is deployed, this may cement or intensify Z's predictive, and ultimately causal, relevance by virtue of performative effects.
In sum, our concern is that even variables that are, in principle, causally and statistically unrelated to recidivism can end up being statistically relevant for predicting recidivism if prior decision-making has considered them relevant for decisions about sentence type, and if sentence type, in particular imprisonment, can be causally relevant to increasing recidivism. In such a case, a RAT may successfully predict recidivism, but for the wrong reasons. Performativity is hence a crucial concern for validating the predictive accuracy of RATs. Existing methods for RAT validation do not, to our knowledge, consider performative effects, but unless they do, we may reasonably be suspicious about whether the use of RATs in the criminal justice system is warranted. Let us turn to elaborate in a more systematic way what principled approach could be useful for making progress. As we argue, (1) it is empirically challenging, but not impossible, to explore whether RATs are performative, (2) developers and users must meet these challenges by pursuing the ideal of explainability-in-context (EIC), and (3) if they don't, they may violate important epistemic-ethical obligations that undermine the legitimacy of using RATs.
3. Explainability-in-Context
The central problem posed by self-fulfilling risk assessments is now clearly in view. They can induce, cement, and intensify statistical and causal relationships between individuals’ characteristics and recidivism, leading them to make predictions about recidivism that may be accurate, but for the wrong reasons. If unnoticed and unaddressed, the performativity of RATs can hence impose substantial, but possibly avoidable, harm on defendants and society at large. So, what should we do to address this challenge?
Here, we propose a novel epistemic-ethical desideratum, which we call explainability-in-context (EIC). The EIC desideratum aims at enabling stakeholders to explore, understand, explain, and change how algorithmic systems interact with the broader environment they are embedded in.4 In doing so, EIC draws on two established movements. One is the socio-technical systems approach popular in science and technology studies, which aims to understand how technologies interact with the social and institutional settings they are implemented in and seeks to identify ways to co-ordinate their interaction and integration in beneficial ways (Pitt, Schaumeier, and Artikis 2012; Chopra and Singh 2016; Chopra and Singh 2018; Selbst et al. 2019). The second is the recent cross-disciplinary XAI movement calling for explainability of artificial intelligence systems (Burrell 2016; Atkinson, Bench-Capon, and Bollegala 2020; Deeks 2020; Nyrup and Robinson 2022; Selbst and Barocas 2018; Winter, Hollman, and Manheim 2023). These systems are often argued to be inscrutable black boxes that do not readily reveal information about their internal functioning, making it difficult for stakeholders to assess whether they function in acceptable or problematic ways. In the case of RATs, for instance, critics stress that it often remains unknown what information about defendants is used to make an assessment and how different characteristics are weighted, as these aspects are partly veiled as trade secrets relating to proprietary technologies (Washington 2019). Some take this inscrutability to be threatening defendants’ right to due process, as without knowing what information is used and in what ways, it is not possible for defendants to challenge risk assessments (Freeman 2016; Washington 2019; Biddle 2022). Extending the scope of explainability, and in the spirit of the sociotechnical systems approach, EIC insists that we must be able to understand how RATs like COMPAS are likely to causally affect defendants, not just which variables inform an assessment and how they are weighted. Centrally, pursuing EIC amounts to understanding how systems like COMPAS interact with the criminal justice system, starting from what data are used to estimate risk models, extending to how risk scores influence decision-making, and elucidating how the use of these scores causally affects the outcomes to be predicted.
How, then, can we make progress towards EIC for RATs? The main problem we see with RATs in a performative world is that existing methodology to develop and validate RATs does not, so far, distinguish between prediction-dependent and prediction-independent effects on the outcomes to be predicted. Specifically, they do not distinguish between a defendant who would have recidivated regardless of sentence-type and a defendant who would recidivate if imprisoned but wouldn't if sentenced to probation; or simply between an individual's robust propensity to reoffend and their contingent propensity to reoffend that varies with sentence type and duration. This distinction is crucial for understanding whether RATs that predict accurately do so for the wrong reasons, and induce rather than merely predict outcomes. Ideally, RATs should be able to make assessments that discern between factors that bear on recidivism only through sentence-type and those that influence recidivism regardless of sentence-type. For instance, membership in a white supremacist terrorist organization might be a good predictor of recidivism if it encodes that individuals are likely to reintegrate into their gang upon release and continue to engage in joint criminal endeavors. In this stylized case, let us assume, it doesn't matter for someone's recidivism risk whether they receive a probationary or a custodial sentence for their racist attack on a middle eastern store clerk—their propensity to reoffend in similar ways will remain the same. Contrast this with a case where, due to a statistical fluke, defendants with blue hair have been especially likely to be incarcerated and incarceration is criminogenic. In such a case, blue hair will correlate with recidivism and hence be valuable for predicting it, but is, let us stipulate, unrelated to recidivism other than through imprisonment. What distinguishes both cases is that when ‘controlling for’ or ‘conditioning on’ imprisonment (or sentence type more generally), blue hair would be unrelated to recidivism, whereas membership in a white supremacist terrorist organization would not be. The latter may remain predictively relevant even if we control for the effects that imprisonment may have on recidivism, whereas the former's statistical relationship to recidivism would be filtered out. A useful way of teasing apart prediction-dependent and prediction-independent effects is to consider some causal diagrams and their associated probabilistic dependencies and independencies. Figure 1 captures causal diagrams for the two cases.
Causal diagrams for the blue hair case (I) and supremacist case (II)
Here, blue hair (B) is causally relevant for recidivism (R), but only through imprisonment (I). By contrast, being a member of a white supremacist organization (W) is both directly causally relevant for R, for example because it encodes attitudes that directly cause violent attacks against minority individuals and because it is correlated (dashed arc) with crime-conducive causal factors, for example gun-ownership, being in contact with other individuals prone to commit violent offenses (Z), which are in turn directly relevant to R5. Importantly, both B and W are probabilistically dependent on I unconditionally, so they are indistinguishable through this narrow lens. However, once we condition on I, B is rendered independent, whereas W remains dependent, suggesting they play different causal roles. This difference helps us recognize that using blue hair to predict recidivism may problematically involve prediction-dependent effects, whereas white supremacist status does not. Understanding how a RAT latches onto different kinds of correlates and causes of recidivism is hence crucial for understanding whether its predictive performance rests on problematic reasons.
Our illustrations here are naturally limited—they do not suggest a definitive method for teasing apart prediction-dependent and independent effects and doing so in practice may be extremely difficult: It is widely understood that casually interrogating probabilistic dependencies is a useful but nevertheless crude way of understanding what goes on causally. We have taken this approach here simply for ease of illustration and note that more advanced methods are available to disambiguate causal relationships from observational data (e.g., Pearl 2009). Ultimately, such efforts should pursue a common objective. Our EIC desideratum demands that RATs should be able to make (accurate) counterfactual predictions. For each case, they should be able to predict R for at least two scenarios: One telling us what an individual's outcome would be if they were imprisoned, and the other telling us what would happen if they were subjected to a different sentence type, for example probation, community service, and the like. If a decision-maker recognizes risk scores to be robust over the decision-space, they may feel more confident that a RAT tracks prediction-independent causes or correlates of R. However, if the scores change over the decision-space, this can alert decision-makers to the performative nature of the assessment. Likewise, defendants (and their legal representatives) should be able to interrogate risk scores on these aspects, and challenge decisions based on them if performative effects cannot be ruled out. As it stands, existing RATs do not have capabilities to make such counterfactual predictions, to our knowledge, and it seems that developing these is a first important step towards helping relevant stakeholders better understand how RATs function and whether they do so in epistemically, morally and legally acceptable or problematic ways.
In sum, looking at performativity through a causal lens reveals a major deficiency of existing methods for developing and validating RATs. To tell whether a prediction might include prediction-dependent effects, we should look at making counterfactual conditional predictions, but unfortunately this is not done in practice. Importantly, if these predictions differ, this suggests that there are performative effects that we may wish to exclude in furnishing our risk assessment and in making judgments based on it.
4. Epistemic-Ethical Duties of Developers and Users of RATs
Let us explain how EIC's emphasis on understanding how RATs causally interact with the criminal justice system raises novel epistemic-ethical duties on the part of developers and users of RATs at various stages of the development, deployment, and post-deployment validation cycle of these tools.
First, developers should ensure that RATs allow for conditional predictions that depend on sentence type and duration, so that stakeholders can gain understanding of the potential performative effects of different kinds of sentences. This is particularly important for anticipating the effects that a high risk assessment may have on individuals by means of harsher sentences. Performativity is a familiar problem encountered in other areas, too. For instance, in the machine learning literature, computer scientists have proposed strategies to incorporate the causal effects of a prediction into the prediction itself (Perdomo et al. 2021; Hardt et al. 2022; Mendler-Dünner et al. 2023; Kim and Hardt 2023). The aim here is mainly to deal with self-effacing predictions, where a model's performativity undermines its predictive accuracy, for example because agents respond to a prediction. This is different from the cases we highlight, where performativity is self-fulfilling. Here, the aim is not to calibrate a RAT to incorporate its own, crime-inducing effects; the problem is rather that these effects might already be part of RAT predictions, and the aim is to tease prediction-dependent and prediction-independent effects apart and to issue different conditional predictions for different scenarios in a way that reflects performative effects and makes them accessible to decision-makers. So, while not quite yet in tune with the needs arising to manage performativity of RATs, we believe that future work drawing on recent approaches offered by computer scientists can make important steps towards EIC. In sum, since it seems in principle possible to provide conditional predictions comparing the effects of different kinds of sentences, we contend, developers of RATs have a duty to follow this strategy (and to develop techniques that help with doing so), and, conversely, should be held accountable for any harms induced through departures from this constraint.
Second, users of RATs have the duty to use these tools in responsible ways. In particular, pursuing EIC, for example by demanding suitable performativity audits of RATs before use, enables them to understand whether using the system might have undesirable or unintended effects, which they ought to reflect upon and account for in their decision-making. There will mainly be two relevant user groups. The first consists of judges and jury members. Faced with conditional predictions, these users ought to reflect on the conditional predictions demanded by EIC to better understand the causal pathways their sentencing decisions may influence, possibly in detrimental ways. In particular, by providing predictions about how their sentencing decisions might influence a defendant's rehabilitation and recidivism outcomes, EIC may ultimately prevent them from unjustly incarcerating some defendants. Secondly, for defendants and their attorneys, pursuing EIC promises to provide them with understanding of how predictions and the sentences they inform came about and may help them challenge sentences when these are formed on epistemically or ethically inappropriate grounding. For instance, with the help of conditional predictions, it might convincingly be argued that a prediction may be self-fulfilling, which may in turn constitute a violation of due process. Here, EIC complements existing efforts to increase the transparency of algorithmic decision-making systems (e.g., Rudin et al. 2020; Nyrup and Robinson 2022; Fleisher 2022) and help stakeholders interrogate these systems for relevant properties, such as fairness. Our EIC desideratum adds to this by widening the scope to not only consider how systems like COMPAS work internally, but also how they interact with the environments they are deployed in (cf. Selbst et al. 2019; Mendler-Dünner et al. 2023; Zezulka and Genin 2023).
At this point, some objections may be raised concerning the epistemic duties that EIC imposes on developers and users of RATs. First, we have focused here on how EIC may help prevent harmful cases where high risk assessments unjustly lead to higher rates of imprisonment and this goes unnoticed as the assessments constitute self-fulfilling prophecies.
But other, beneficial cases seem possible, too, for example, where, without RATs, a defendant might be incarcerated, subject to the criminogenic effects of imprisonment, but a hypothetical low risk assessment would, performatively, prevent these effects from obtaining and thus lead to more favorable outcomes. Indeed, while not cast in terms of performativity, one of the central promises of RATs is that they can help reduce incarceration rates and recidivism, without negative effects on public safety (see e.g., Kleinberg et al. 2018 in the pretrial detention context). So, given that RATs may also have beneficial performative effects, wouldn't EIC seem to prevent such benefits from obtaining? To clarify, first, EIC is committed to the project of helping mitigate performative effects that are likely robustly identified as negative by various stakeholders, that is, self-fulfilling predictions of recidivism. However, EIC does not thereby involve a definitive stance on whether there are also beneficial forms of performativity, how to identify them, and whether to tolerate or even to promote them (cf. van Basshuysen et al. 2021; Khosrowi 2023). These issues must be the subject of a larger debate informed by substantive ethical, legal, and social theory, including about the functions of the criminal justice system and its role in society. Second, aside from some ameliorative goals, EIC is mainly intended to figure as an ends-neutral epistemic-ethical desideratum to help govern how RATs are developed and used, that is, with a view towards helping relevant stakeholders explore and understand performative effects. As such, beyond the somewhat uncontroversial cases outlined here, EIC does not involve larger commitments regarding whether certain cases of performativity are problematic or unproblematic, and whether they should be mitigated.
A second, related concern is that EIC requires a very high epistemic bar that must be met before RATs can justifiably be implemented. Meeting this epistemic bar will raise costs and risks for companies developing these tools, and for the criminal justice system more generally. But, if these costs and risks are too high, this might prevent RATs from being developed and used in the first place, which means, however, foregoing their possible benefits, too. Surely, when evaluating and implementing these tools, we should consider both their risks and their benefits (see van Wijck 2013), but the epistemic duties implied by EIC may skew the calculation towards a very conservative use of RATs. In response, we do not argue that the high epistemic bar must be met before RATs may permissibly be implemented, or that we need a moratorium for RATs. As we have stressed before, risk assessments, with or without RATs, can have performative effects. But, while informal assessments may be performative in obscure ways and might encode subjective preferences and biases, RATs, when they are combined with an EIC approach, hold significant promise to provide a better understanding of such performative effects and to allow judicial actors to both mitigate their risks and reap their benefits. As RATs are increasingly used in the criminal justice system, sincere efforts should be made towards achieving EIC, and since it might be significantly easier to trace out the performative effects of RAT predictions than to do so for human decision-makers, the status quo is not a good benchmark: there is an opportunity to do better, and EIC insists that we take it.
5. Conclusion
We have argued that risk assessment tools (RATs) used in the criminal justice system to predict the risk of defendants to reoffend in the future should be viewed not as mere predictive but as performative tools that have the potential to influence offenders’ future lives, for better or worse. In particular, high risk assessments may constitute self-fulfilling predictions by way of (1) causally affecting sentence type and severity, and (2) because imprisonment can be criminogenic. Such cases impose severe burdens and injustices both on offenders and society at large and raise serious epistemic-ethical concerns about the use of RATs. Our proposed solution, explainability-in-context (EIC), is to take a causal perspective on RATs that distinguishes prediction-dependent from prediction-independent effects so as to put stakeholders in a better position to identify problematic performative effects. We further argued that developers and users of RATs have epistemic-ethical duties to assist in the project of preventing harmful performative effects from materializing by working towards EIC. There are reasons to think that these duties are currently being violated.
Notes
We would like to thank Markus Ahlers, Jannik Zeiser, and Sebastian Zezulka for their very helpful comments and suggestions on various drafts of this paper. We would also like to thank the audiences at EPSA 2023 in Belgrade, the Visiting Speaker Forum at the Frankfurt School of Finance & Management, and the Philosophy Research Seminar at Wageningen University & Research for stimulating discussions of the ideas developed here. Finally, we wish to thank two anonymous reviewers of this journal for their helpful and speedy reviews.
Van Basshuysen's contribution is funded by the European Union (ERC, MAPS, 101115973). Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.
Notably, this focus excludes a significant number of cases where RATs are used to inform judicial decision-making, and, more broadly, does not engage with one of the central motivations behind using RATs: they promise to recommend lighter ‘alternative’ sentences for low-risk defendants, thus contributing to an overall reduction in incarceration rates at little cost to public safety. While we do not deny that RATs may hold such promise, we also note that their performance in delivering on them continues to be called in question (e.g., Sloan et al. 2018; Stevenson and Doleac 2022). Moreover, the issues we focus on here may offer independent reasons against the use of RATs even if they did overall contribute to decreased incarceration rates and recidivism.
Note that our argument doesn't amount to a principled objection to the use of RATs in the criminal justice system. Harmful performative effects may also materialize through informal risk assessments made by judicial decision-makers. But while performative effects following subjective assessments may incorporate various kinds of biases that will be hard to scrutinize and properly account for, RATs, in contrast, promise to make it in principle possible to understand their own causal effects on decisions and thus, to rule out such harmful effects. To deliver on this promise, however, RATs should be made explainable-in-context, as we argue below.
The term ‘explainability in context’ (without dashes) already appears in the literature, where it is used to express that different kinds of explanations may be required in different contexts (e.g., Wolf and Blomberg 2019). We use the term ‘explainability-in-context’ (with dashes) differently, that is, as the ability to explain how a system causally interacts with the broader environment it is embedded in. Since different environments might call for different kinds of explanations, our concept is fully consistent with ‘explainability in context’. We thank an anonymous reviewer for bringing this to our attention.
Note that we assume that there is no I → R arrow for the white supremacist, encoding that they are criminally ‘saturated’, so imprisonment does not further increase their post-sentence recidivism probability. This is an assumption for convenience only.