Auditing bias of recourse in classifiers (part I)



Fairness is a fundamental principle reflecting our innate sense of justice and equity. The essence of fairness lies in the equitable, unbiased and just treatment of all individuals. Nevertheless, translating this principle to specific rules, people, and systems can adhere to is highly context specific, with context meaning, social and cultural circumstances, as well as use case scenarios and applications.  

The legal world instantiates fairness through non-discrimination laws, which demand the equitable treatment of protected individuals or groups of individuals defined by sensitive attributes, including racial and ethnic origin, sex, religion and belief, disability, age, and sexual orientation [1]. An important distinction to make is between individual and group fairness. The former demands that any two individuals, differing only on a sensitive attribute should receive the same outcomes, while the latter demands that groups of individuals defined by different values of a sensitive attribute receive similar outcomes, according to a formally defined fairness notion. In this post, we focus on auditing algorithmic group fairness, as discussed next. 

The algorithmic world usually examines a system’s output and employs different statistical formulas on the prediction outcomes of the system to measure variants of fairness, according to the different requirements of a use case. These formulas might consider only the model’s predictions on a test set (independence); the model’s prediction in conjunction with the ground truth (separation or sufficiency); or causal relationships between dependent and independent variables of the system upon which counterfactual worlds, based on changes of sensitive attribute values are compared, outcome-wise (counterfactual fairness). Most of the literature revolves around these concepts: group fairness measured on the outcomes of a system, usually a binary classifier. For an overview of the aforementioned fairness concepts, you may refer to [2,3,4]. 

Nevertheless, other, more implicit types of bias exist, including the difficulty or burden to achieve recourse, which we focus on in this post. In this case, the focus is on the individuals that have received an unfavorable outcome (e.g., a reject decision). Thus, fairness auditing consists in measuring and comparing how difficult it is for different (groups of) individuals to change their profiles in order to change the system’s decision, i.e., receive a favorable outcome. A useful instrument to quantify this difficulty is counterfactual explanations. Given an individual (the factual), its counterfactual is an imaginary individual as close, or as similar, as possible to the factual, that receives the opposite decision from the classifier. A counterfactual explanation is then comprised by the set of changes required to transform the factual to the counterfactual. In general, a counterfactual is considered good if it is close to its factual according to a specific distance function; the best or nearest counterfactual is the one that minimizes the distance to the factual.  

In a local setting, i.e., when examining recourse for an individual, a counterfactual straightforwardly defines a counterfactual action, comprising the set of changes that the individual needs to perform. However, moving from individuals to groups (or populations) of individuals, things change. Now, each individual might have their own best counterfactual, each defining a potentially different counterfactual action (set of changes). Thus, when we move from a local to a “global” setting, examining recourse gets trickier. In this setting, we need to derive one counterfactual explanation per (sub)group of individuals, so as to measure, via this explanation, an aggregate difficulty of recourse for the whole group. What then makes more sense is to identify a counterfactual action that effectively achieves recourse for the group’s individuals. On a more global scale, one needs to identify a small set of counterfactual actions that effectively achieve recourse for the majority of the population and on the same time are cost efficient with respect to the number and magnitude of changes they represent. But why is this important for auditing fairness of recourse? Let’s see an example next. 

Bias of Recourse: A Motivating Example 

Consider a company that supports its promotion decisions with an AI system that classifies employees as good candidates for promotion, the favorable positive class, based on various performance metrics, including their cycle time efficiency (CTE) and the annual contract value (ACV) for the projects they lead. Figure 1 draws ten employees from the negative predicted class as points in the CTE-ACV plane, and also depicts the decision boundary of the classifier. Race is the protected attribute, and there are two protected subgroups with five employees each, depicted as circles and triangles. For each employee, the arrow depicts the best action to achieve recourse, i.e., to cross the decision boundary, and the number indicates the cost of the action, here simply computed as the distance to the boundary. For example, x1 may increase their chances for promotion mostly by acquiring more high-valued projects, while x2 mostly by increasing their efficiency. 

Burden can be defined as the mean cost for a protected subgroup [5]. For the protected race 0, the burden is 2, while it is 2.2 for race 1, indicating thus unfairness of recourse against race 1. In contrast, assuming there is an equal number of employees of each race in the company, the classifier satisfies fairness of prediction in terms of statistical parity (equal positive rate in the subgroups). 

Figure 1: (a) An example of affected individuals, the decision boundary, actions, and a subpopulation (in the shaded region), depicted in the feature space; (b) Cumulative distribution of cost of recourse for the individuals in (a); (c) Comparison of two actions to achieve recourse for two individuals 

While fairness of recourse is an important concept that captures a distinct notion of algorithmic bias, we argue that it is much more nuanced than the mean cost of recourse (aka burden) considered in prior work [5,6], and raise three issues. 

First, the mean cost, which comprises a micro viewpoint of the problem, does not provide the complete picture of how the cost of recourse varies among individuals and may lead to contrasting conclusions. Consider Figure 1a and observe that for race 1 all but one employee can achieve recourse with cost at most 2, while an outlier achieves recourse with cost 6. It is this outlier that raises the mean cost for race 1 above that of race 0. For the two race subgroups, Figure 1b shows the cumulative distribution of cost, termed the effectiveness-cost distribution (ecd). These distributions allow the fairness auditor to inspect the tradeoff between cost and recourse, and define the appropriate fairness of recourse notion. For example, they may consider that actions with a cost of more than 2 are unrealistic (e.g., because they cannot be realized within some timeframe), and thus investigate how many employees can achieve recourse under this constraint; we refer to this as equal effectiveness within budget [9]. Under this notion, there is unfairness against race 0, as only 60% of race 0 employees (compared to 80% of race 1) can realistically achieve recourse. 

There are several options to go beyond the mean cost. One is to consider fairness of recourse at the individual level and compare an individual with their counterfactual counterpart had their protected attribute changed value [7]. However, this approach is impractical as, similar to other causal-based definitions of fairness, it requires strong assumptions about the causal structure in the domain. In contrast, we argue that it’s preferable to investigate fairness in subpopulations and inspect the trade-off between cost and recourse. 

Second, there are many cases where the aforementioned micro-level aggregation of individuals’ costs is not meaningful in auditing real-world systems. To account for this, we introduce a macro viewpoint where a group of individuals is considered as a whole, and an action is applied to and assessed collectively for all individuals in the group. An action represents an external horizontal intervention, such as an affirmative action in society, or focused measures in an organization that would change the attributes of some subpopulation (e.g., decrease tax or loan interest rates, increase productivity skills). In the macro viewpoint, the cost of recourse does not burden the individuals, but the external third party, e.g., the society or an organization. Moreover, the macro viewpoint offers a more intuitive way to audit a system for fairness of recourse, as it seeks to uncover systemic biases that apply to a large number of individuals. 

To illustrate the macro viewpoint, consider the group within the shaded region in Figure 1a. In the micro viewpoint, each employee seeks recourse individually, and both race subgroups have the same distribution of costs. However, we can observe that race 0 employees, like x1 achieve recourse by actions in the ACV direction, while race 1 employees, like x2 in the orthogonal CTE direction. This becomes apparent when we take the macro viewpoint, and investigate the effect of action a1, depicted on the border of the shaded region, discovering that it is disproportionally effective on the race subgroups (leads to recourse for two-thirds of one subgroup but for none in the other). In this example, action a1 might represent the effect of a training program to enhance productivity skills, and the macro viewpoint finds that it would perpetuate the existing burden of race 0 employees. 

Third, existing notions of fairness of recourse have an important practical limitation: they require a cost function that captures one’s ability to modify one’s attributes, whose definition may involve a learning process, or an adaptation of off-the-shelf functions by practitioners. Even the idea of which attributes are actionable, in the sense that one can change (e.g., one cannot get younger), or “ethical” to suggest as actionable (e.g., change of marital status from married to divorced) hides many complications [8].

Conclusions drawn about fairness of recourse crucially depend on the cost definition. Consider individuals x1, x2, and actions a1, a2, shown in Figure 1c. Observe that it is hard to say which action is cheaper, as one needs to compare changes within and across very dissimilar attributes. Suppose the cost function indicates action a1 is cheaper; both individuals achieve recourse with the same cost. However, if action a2 is cheaper, only x2 achieves recourse. Is the classifier fair? 

To address this limitation, we propose definitions that are oblivious to the cost function. The idea is to compare the effectiveness of actions to the protected subgroups, rather than the cost of recourse for the subgroups. One way to define a cost-oblivious notion of fairness of recourse is the equal choice for recourse, which we illustrate using the example in Figure 1c. According to it, the classifier is unfair against x1, as x1 has only one option, while x2 has two options to achieve recourse. 


Fairness of recourse is an important notion of fairness, yet still understudied. In this post (part I), we provided an introduction to the problem. In our next post (part II), we will describe our framework for defining and auditing bias of recourse on subgroups comprising a series of fairness definitions as well as an efficient auditing mechanism utilizing them to audit fairness of recourse. You may find a detailed description in our published work [9], as well as additional discussion on the topic in related works [5,6,7,10]. 


[1] European Commission, Directorate-General for Justice and Consumers, Gerards, J., Xenidis, R., Algorithmic discrimination in Europe – Challenges and opportunities for gender equality and non-discrimination law, Publications Office, 2021,
[2] Marc P Hauer, Johannes Kevekordes, and Maryam Amir Haeri. Legal perspective on possible fairness measures–a legal discussion using the example of hiring decisions. Computer Law & Security Review, 42:105583, 2021. 
[3]  Barocas, Solon, Moritz Hardt, and Arvind Narayanan. Fairness and machine learning: Limitations and opportunities. MIT Press, 2023. 
[4] Sahil Verma and Julia Rubin. Fairness definitions explained. In Proceedings of the International Workshop on Software Fairness, FairWare ’18, page 1–7, New York, NY, USA, 2018. Association for Computing Machinery. 
[5] Shubham Sharma, Jette Henderson, and Joydeep Ghosh. CERTIFAI: A common framework to provide explanations and analyse the fairness and robustness of black-box models. In AIES, pages 166–172. ACM, 2020. 
[6] Alejandro Kuratomi, Evaggelia Pitoura, Panagiotis Papapetrou, Tony Lindgren, and Panayiotis Tsaparas. Measuring the burden of (un) fairness using counterfactuals. In Machine Learning and Principles and Practice of Knowledge Discovery in Databases: International Workshops of ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part I, pages 402–417. Springer, 2023. 
[7] Julius von Kügelgen, Amir-Hossein Karimi, Umang Bhatt, Isabel Valera, Adrian Weller, and Bernhard Schölkopf. On the fairness of causal algorithmic recourse. In AAAI, pages 9584–9594. AAAI Press, 2022. 
[8] Suresh Venkatasubramanian and Mark Alfano. The philosophical basis of algorithmic recourse. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 284–293, 2020. 
[9] Loukas Kavouras, Konstantinos Tsopelas, Giorgos Giannopoulos, Dimitris Sacharidis, Eleni Psaroudaki, Nikolaos Theologitis, Dimitrios Rontogiannis, Dimitris Fotakis, Ioannis Z. Emiris: Fairness Aware Counterfactuals for Subgroups. In NeurIPS 2023. 
[10] Bell, Andrew, Joao Fonseca, Carlo Abrate, Francesco Bonchi, and Julia Stoyanovich. “Fairness in Algorithmic Recourse Through the Lens of Substantive Equality of Opportunity.” arXiv preprint arXiv:2401.16088 (2024). 

Bio Notes

Dimitris Sacharidis is an assistant professor at the Data Science and Engineering Lab of the Université Libre de Bruxelles. Prior to that he was an assistant professor at the Technical University of Vienna, and a Marie Skłodowska Curie fellow at the “Athena” Research Center and at the Hong Kong University of Science and Technology. He finished his PhD and undergraduate studies on Computer Engineering at the National Technical University of Athens, while in between he obtained an MSc in Computer Science from the University of Southern California. His research interests include data science, data engineering, and responsible AI. He has served as a PC member and has been involved in the organization of top related conferences, and has acted as a reviewer and editorial member of associated journals. 

Giorgos Giannopoulos is a Scientific Associate at Athena Research Center, Greece and at the National Observatory of Athens. He has performed research on the fields of Information Retrieval, Machine Learning, Data Integration, Geospatial Data Management and Analysis and Semantic Web, while he has been involved in the specification, design and development of cataloguing services for European Research Infrastructures. His current research interests focus on the adaptation and extension of Machine/Deep Learning methods in various disciplines, including: fairness and explainability of ML algorithms and pipelines; ML-driven next day prediction of forest fires; pattern recognition on RES timeseries; explanation and prediction of Solar events; ML-driven data integration and annotation; and fact checking. 

Loukas Kavouras is a Scientific Associate at Athena Research Center, Greece. He has performed research on the fields of Machine Learning, Online and Approximation Algorithms, and Resource Allocation Problems. He finished his PhD and his MSc in Computer Science at the School of Electrical and Computer Engineering, at the National Techinal University of Athens (NTUA) and his undergraduate studies in Applied Mathematical and Physical Sciences at the NTUA. His current research interests include data science, data engineering and fairness/explainability of ML algorithms.