Auditing bias of recourse in classifiers (part II) 

Fairness No Comment

Introduction 

Fairness is a fundamental principle reflecting our innate sense of justice and equity. The essence of fairness lies in the equitable, unbiased and just treatment of all individuals.  In our previous post (part I), we provided an introduction to the bias of recourse problem. In this post (part II), we describe our framework for defining and auditing bias of recourse on subgroups comprising a series of fairness definitions as well as an efficient auditing mechanism utilizing them to audit fairness of recourse. 

Fairness-Aware Counterfactuals for Subgroups 

Preliminaries 

We start by providing a set of formal definitions capturing different aspects of fairness of recourse.  

We consider a feature space X = X1 , …. , Xn, where Xn denotes the protected feature, which, for ease of presentation, takes two protected values {0, 1}. For an instance xX, we use the notation x.Xi to refer to its value in feature Xi

We consider a binary classifier h : X {1, 1} where the positive outcome is the favorable one. For a given h, we are concerned with a dataset D of adversely affected individuals, i.e., those who receive the unfavorable outcome. We prefer the term instance to refer to any point in the feature space X, and the term individual to refer to an instance from the dataset D

We define an action a as a set of changes to feature values, e.g., a = {country ⟶ US, education-num ⟶ 12}. We denote as A the set of possible actions. An action a applied to an individual (a factual instance) x returns a counterfactual instance x’ = a(x). If the individual x was adversely affected (h(x) = -1) and the action results in a counterfactual that receives the desired outcome (h(a(x)) = 1), we say that action a offers recourse to the individual x and is thus effective. In line with the literature, we also refer to an effective action as a counterfactual explanation for individual x

An action a incurs a cost to an individual x, which we denote as cost(a, x). The cost function captures both how feasible the action a is for the individual x, and how plausible the counterfactual a(x) is. 

Given a set of actions A, we define the recourse cost rc(A, x) of an individual x as the minimum cost among effective actions if there is one, or otherwise some maximum cost represented as c

An effective action of minimum cost is also called a nearest counterfactual explanation. We define a subspace Xp X using a predicate p, which is a conjunction of feature-level predicates of the form “featureoperatorvalue”, e.g., the predicate p = (country = US) ∧ (education-num 9) defines instances from the US that have more than 9 years of education. 

Given a predicate p, we define the subpopulation group GpD as the set of affected individuals that satisfy p, i.e., Gp = {x D|p(x)}. We further distinguish between the protected subgroups Gp,1 =   { D | p(x) ∧ x.Xn = 1}  and Gp,0 =  { x D | p(x) ∧ x.Xn = 0 } . When the predicate p is understood, we may omit it in the designation of a group to simplify notation. 

For a specific action a, we naturally define its effectiveness (eff) for a group G, as the proportion of individuals from G that achieve recourse through a

We want to examine how recourse is achieved for group G through a set of possible actions A. We define the aggregate effectiveness (aeff) of A for G in two distinct ways. 

In the micro viewpoint, the individuals in the group are considered independently, and each may choose the action that benefits itself the most. Concretely, we define the micro-effectiveness of set of actions A for group G as the proportion of individuals in G that can achieve recourse through some action in A, i.e.: 

In the macro viewpoint, the group is considered as a whole, and an action is applied collectively to all individuals in the group. Concretely, we define the macro-effectiveness of set of actions A for group G as the largest proportion of individuals in G that can achieve recourse through the same action in A, i.e.: 

For a group G, actions A, and a cost budget c, we define the in-budget actions as the set of actions that cost at most c for any individual in G

We define the effectiveness-cost distribution (ecd) as the function that for a cost budget c returns the aggregate effectiveness possible with in-budget actions: 

We use ecdµ, ecdM to refer to the micro, macro viewpoints of aggregate effectiveness. The value ecd(c; A, G) is the proportion of individuals in G that can achieve recourse through actions A with cost at most c. Therefore, the ecd function has an intuitive probabilistic interpretation. Consider the subspace Xp determined by predicate p, and define the random variable C as the cost required by an instance x Xp to achieve recourse. The function ecd(c; A, Gp) is the empirical cumulative distribution function of C using sample Gp

The inverse effectiveness-cost distribution function ecd−1(ϕ; A, G) takes as input an effectiveness level ϕ ∈ [0, 1] and returns the minimum cost required so that ϕ|G| individuals achieve recourse

Definitions for fairness of recourse 

We define recourse fairness of classifier h for a group G by comparing the ecd functions of the protected subgroups G0, G1 in different ways. 

The first two definitions are cost-oblivious, and apply whenever we can ignore the cost function. Specifically, given a set of actions A and a group G, we assume that all actions in A are considered equivalent, and that the cost of any action is the same for all individuals in the group; i.e., cost(a, x) = cost(a, x), ∀a ≠ aA, x ≠ xG. The definitions simply compare the aggregate effectiveness of a set of actions on the protected subgroups. 

Equal Effectiveness This definition has a micro and a macro interpretation, and says that the classifier is fair if the same proportion of individuals in the protected subgroups can achieve recourse:  

Equal Choice for Recourse This definition has only a macro interpretation and claims that the classifier is fair if the protected subgroups can choose among the same number of sufficiently effective actions to achieve recourse, where sufficiently effective means the actions should work for at least 100ϕ% (for ϕ ∈ [0, 1]) of the subgroup:

The next three definitions assume the cost function is specified, and have both a micro and a macro interpretation.

Equal Effectiveness within Budget The classifier is fair if the same proportion of individuals in the protected subgroups can achieve recourse with a cost at most c:

Equal Cost of Effectiveness The classifier is fair if the minimum cost to achieve aggregate effectiveness of ϕ ∈ [0, 1] in the protected subgroups is equal:

Fair Effectiveness-Cost Trade-Off The classifier is fair if the protected subgroups have the same effectiveness-cost distribution, or equivalently for each cost budget c, their aggregate effectiveness is equal:

The left-hand side represents the two-sample Kolmogorov-Smirnov statistic for the empirical cumulative distributions (ecd) of the protected subgroups. We say that the classifier is fair with confidence α if this statistic is less than:

The last definition takes a micro viewpoint and extends the notion of burden [5] from literature to the case where not all individuals may achieve recourse. The mean recourse cost of a group:

considers individuals that cannot achieve recourse through A and have a recourse cost of c. To exclude them, we denote as G the set of individuals of G that can achieve recourse through an action in A, i.e., G =   {  G |   a    A, h(a(x)) = 1}. Then the conditional mean recourse cost is the mean recourse cost among those that can achieve recourse:

If G = G, the definitions coincide with burden.

Equal (Conditional) Mean Recourse The classifier is fair if the (conditional) mean recourse cost for the protected subgroups is the same:

Note that when group G is the entire dataset of affected individuals, and all individuals can achieve recourse through A, this fairness notion coincides with fairness of burden [5].

Auditing Fairness of Recourse

Let’s now see how these definitions can be exploited in our framework, FACTS (Fairness-aware Counterfactuals for Subgroups), to audit the difficulty of achieving recourse in subgroups. The output of FACTS comprises population groups that are assigned (a) an unfairness score that captures the disparity between protected subgroups according to different fairness definitions and allows us to rank groups, and (b) a user-intuitive, easily explainable counterfactual summary, which we term Comparative Subgroup Counterfactuals (CSC).

Figure 2 presents an example result of FACTS derived from the adult dataset[1]. The “if clause” represents the subgroup Gp, which contains all the affected individuals that satisfy the predicate:

p=(hours-per-week =FullTime)(marital-status = Married-civ-spouse)  (occupation = Adm-clerical)

The information below the predicate refers to the protected subgroups Gp,0 and Gp,1 which are the female and male individuals of Gp respectively. With blue color, we highlight the percentage cov(Gp,i) = |Gp,i | / |Di | which serves as an indicator of the size of the protected subgroup. The most important part of the representation is the actions applied that appear below each protected subgroup and are evaluated in terms of fairness metrics. In this example, the metric is the Equal Cost of Effectiveness with effectiveness threshold ϕ = 0.7. For Gp,0 there is no action surpassing the threshold ϕ = 0.7, therefore we display a message accordingly. On the contrary, the action:

a = {hours-per-week à Overtime, occupation à Exec-managerial}

has effectiveness 0.72 > ϕ for Gp,1, thus allowing a respective 72% of the male individuals of Gp to achieve recourse. The unfairness score is “inf”, since no recourse is achieved for the female subgroup.

Figure 2: CSC for a highly biased subgroup in terms of Equal Cost of Effectiveness with ϕ = 0.7.

Method overview Deploying FACTS comprises three main steps: (a) Subgroup and action space generation that creates the sets of groups   and actions A to examine; (b) Counterfactual summaries generation that applies appropriate actions to each group G  ; and (c) CSC construction and fairness ranking that applies the fairness definitions defined above. Next, we describe these steps in detail.

(a) Subgroup and action space generation. Subgroups are generated by executing the fp-growth algorithm between the common subgroups of the protected populations. The set of all actions A is generated by executing fp-growth on the unaffected population to increase the chance of identifying more effective actions and to reduce the computational complexity. The above process is parameterizable with respect to the selection of the protected attribute(s) and the minimum frequency threshold for obtaining candidate subgroups. We emphasize that alternate mechanisms to generate the sets A and G are possible.

(b) Counterfactual summaries generation. For each subgroup Gp, the following steps are performed: (i) Find valid actions, i.e., the actions in A that contain a subset of the features appearing in p and at least one different value in these features; (ii) For each valid action a compute eff(a, Gp,0) and eff(a, Gp,1). The aforementioned process extracts, for each subgroup Gp, a subset Vp of the actions A, with each action having exactly the same cost for all individuals of Gp. Therefore, individuals of Gp,0 and Gp,1 are evaluated in terms of subgroup-level actions, with a fixed cost for all individuals of the subgroup, in contrast to methods that rely on aggregating the cost of individual counterfactuals. This approach provides a key advantage to our method in cases where the definition of the exact costs for actions is either difficult or ambiguous: a misguided or even completely erroneous attribution of a cost to an action will equally affect all individuals of the subgroup and only to the extent that the respective fairness definition allows it. In the setting of individual counterfactual cost aggregation, changes in the same sets of features could lead to highly varying action costs for different individuals within the same subgroup.

(c) CSC construction and fairness ranking. FACTS evaluates all presented definitions on all subgroups, producing an unfairness score per definition, per subgroup. In particular, each definition quantifies a different aspect of the difficulty for a protected subgroup to achieve recourse. This quantification directly translates to difficulty scores for the protected subgroups Gp,0 and Gp,1 of each subgroup Gp, which we compare accordingly (computing the absolute difference between them) to arrive at the unfairness score of each Gp based on a specific fairness metric.

The outcome of this process is the generation, for each fairness definition, of a ranked list of CSC representations, in decreasing order of their unfairness score. Apart from unfairness ranking, the CSC representations will allow users to intuitively understand unfairness by directly comparing differences in actions between the protected populations within a subgroup.

Experimental Findings

We now present some indicative findings from our evaluation of FACTS, focusing on the Adult dataset1; more information about the datasets, experimental setting, and additional results can be found in the appendix[GU1]  in [9]. The code is available at: https://github.com/AutoFairAthenaRC/FACTS. First, we briefly describe the experimental setting and then present and discuss Comparative Subgroup Counterfactuals for subgroups ranked as the most unfair according to some presented definitions.

Experimental Setting. The first step was the dataset cleanup (e.g., removing missing values and duplicate features, creating bins for continuous features like age). The resulting dataset was split randomly with a 70:30 split ratio and was used to train and test respectively a logistic regression model (consequently used as the black-box model to audit). For the generation of the subgroups and the set of actions we used fp-growth with a 1% support threshold on the test set. We also implemented various cost functions, depending on the type of feature, i.e., categorical, ordinal, and numerical.

Unfair subgroups.  The following table presents three subgroups which were ranked at position 1 according to three different definitions: Equal Cost of Effectiveness (ϕ = 0.7), Equal Choice for Recourse (ϕ = 0.7) and Equal Cost of Effectiveness (ϕ = 0.3), meaning that these subgroups were detected to have the highest unfairness according to the respective definitions. For each subgroup, its rank, bias against, and unfairness score are provided for all definitions presented in the left-most column. When the unfairness score is 0, we display the value “Fair” in the rank column. Note that subgroups with exactly the same score w.r.t. a definition will receive the same rank. The CSC representations for the fairness metric that ranked the three subgroups of Table 1 at the first position are shown in Figure 3.

  Subgroup 1 Subgroup 2 Subgroup 3
rankbias against    unfairness scorerankbias against  unfairness scorerankbias against unfairness score
Equal Effectiveness2950Male0.1110063Female0.0004275Female   0.32
Equal Choice for Recourse (ϕ = 0.3)Fair012Female2Fair0
Equal Choice for Recourse (ϕ = 0.7)6Male11Female     6Fair0
Equal Effectiveness within Budget (c = 5)Fair02806Female0.05670Female0.3
Equal Effectiveness within Budget (c = 10)2350Male 0.118518Female0.0004226Female 0.3
Equal Effectiveness within Budget (c = 18)2675Male 0.119222Female 0.0004272Female 0.3
Equal Cost of Effectiveness (ϕ = 0.3)Fair0Fair01Femaleinf
Equal Cost of Effectiveness (ϕ = 0.7)1Maleinf12Female2Fair0
Fair Effectiveness-Cost Trade-Off4065Male 0.113579Female0.13306Female 0.32
Equal (Conditional) Mean RecourseFair–  03145Female 0.35Fair      0
Table 1: Unfair subgroups identified in the Adult dataset.

Subgroup 1 is ranked first (highly unfair) based on Equal Cost of Effectiveness with ϕ = 0.7, while it is ranked much lower or it is even considered as fair according to most of the remaining definitions (see the values of the “rank” column for Subgroup 1). The same pattern is observed for Subgroup 2 and Subgroup 3: they are ranked first based on Equal Choice for Recourse with ϕ = 0.7 and Equal Cost of Effectiveness with ϕ = 0.3 accordingly, but much lower according to the remaining definitions. This finding provides a strong indication of the utility of the different fairness definitions, i.e., the fact that they are able to capture different aspects of the difficulty in achieving recourse.

Figure 3:Comparative Subgroup Counterfactuals for the subgroups

What is more, these different aspects can easily be motivated by real-world auditing needs. For example, out of the aforementioned definitions, Equal Cost of Effectiveness with ϕ = 0.7 would be suitable in a scenario where a horizontal intervention to support a subpopulation needs to be performed, but a limited number of actions is affordable. In this case, the macro viewpoint demonstrated in the CSC Subgroup 1 (top result in Figure 3) serves exactly this purpose: one can easily derive single, group-level actions that can effectively achieve recourse for a desired percentage of the unfavored

Recap

Fairness of recourse is an important notion of fairness, yet still understudied. In this series of posts, we presented a framework comprising a set of fairness definitions as well as an efficient auditing mechanism utilizing them to audit fairness of recourse. You may find a detailed description in our published work [9] as well as additional discussion on the topic in related works [5,6,7,10].

Acknowledgments

This work has been performed by: Loukas Kavouras, Konstantinos Tsopelas, Giorgos Giannopoulos, Dimitris Sacharidis, Eleni Psaroudaki, Nikolaos Theologitis, Dimitrios Rontogiannis, Dimitris Fotakis, Ioannis Z. Emiris and has been funded by the European Union’s Horizon Europe research and innovation programme under Grant Agreement No. 101070568 (AutoFair).

[1]European Commission, Directorate-General for Justice and Consumers, Gerards, J., Xenidis, R., Algorithmic discrimination in Europe – Challenges and opportunities for gender equality and non-discrimination law, Publications Office, 2021, https://data.europa.eu/doi/10.2838/544956.
[2]Marc P Hauer, Johannes Kevekordes, and Maryam Amir Haeri. Legal perspective on possible fairness measures–a legal discussion using the example of hiring decisions. Computer Law & Security Review, 42:105583, 2021.
[3]Barocas, Solon, Moritz Hardt, and Arvind Narayanan. Fairness and machine learning: Limitations and opportunities. MIT Press, 2023.
[4]Sahil Verma and Julia Rubin. Fairness definitions explained. In Proceedings of the International Workshop on Software Fairness, FairWare ’18, page 1–7, New York, NY, USA, 2018. Association for Computing Machinery.
[5]Shubham Sharma, Jette Henderson, and Joydeep Ghosh. CERTIFAI: A common framework to provide explanations and analyse the fairness and robustness of black-box models. In AIES, pages 166–172. ACM, 2020.
[6]Alejandro Kuratomi, Evaggelia Pitoura, Panagiotis Papapetrou, Tony Lindgren, and Panayiotis Tsaparas. Measuring the burden of (un) fairness using counterfactuals. In Machine Learning and Principles and Practice of Knowledge Discovery in Databases: International Workshops of ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part I, pages 402–417. Springer, 2023.
[7]Julius von Kügelgen, Amir-Hossein Karimi, Umang Bhatt, Isabel Valera, Adrian Weller, and Bernhard Schölkopf. On the fairness of causal algorithmic recourse. In AAAI, pages 9584–9594. AAAI Press, 2022.
[8]Suresh Venkatasubramanian and Mark Alfano. The philosophical basis of algorithmic recourse. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 284–293, 2020.
[9]Loukas Kavouras, Konstantinos Tsopelas, Giorgos Giannopoulos, Dimitris Sacharidis, Eleni Psaroudaki, Nikolaos Theologitis, Dimitrios Rontogiannis, Dimitris Fotakis, Ioannis Z. Emiris: Fairness Aware Counterfactuals for Subgroups. In NeurIPS 2023.
[10]Bell, Andrew, Joao Fonseca, Carlo Abrate, Francesco Bonchi, and Julia Stoyanovich. “Fairness in Algorithmic Recourse Through the Lens of Substantive Equality of Opportunity.” arXiv preprint arXiv:2401.16088 (2024).

Bio Notes

Dimitris Sacharidis is an assistant professor at the Data Science and Engineering Lab of the Université Libre de Bruxelles. Prior to that he was an assistant professor at the Technical University of Vienna, and a Marie Skłodowska Curie fellow at the “Athena” Research Center and at the Hong Kong University of Science and Technology. He finished his PhD and undergraduate studies on Computer Engineering at the National Technical University of Athens, while in between he obtained an MSc in Computer Science from the University of Southern California. His research interests include data science, data engineering, and responsible AI. He has served as a PC member and has been involved in the organization of top related conferences, and has acted as a reviewer and editorial member of associated journals.

Giorgos Giannopoulos is a Scientific Associate at Athena Research Center, Greece and at the National Observatory of Athens. He has performed research on the fields of Information Retrieval, Machine Learning, Data Integration, Geospatial Data Management and Analysis and Semantic Web, while he has been involved in the specification, design and development of cataloguing services for European Research Infrastructures. His current research interests focus on the adaptation and extension of Machine/Deep Learning methods in various disciplines, including: fairness and explainability of ML algorithms and pipelines; ML-driven next day prediction of forest fires; pattern recognition on RES timeseries; explanation and prediction of Solar events; ML-driven data integration and annotation; and fact checking.

Loukas Kavouras is a Scientific Associate at Athena Research Center, Greece. He has performed research on the fields of Machine Learning, Online and Approximation Algorithms, and Resource Allocation Problems. He finished his PhD and his MSc in Computer Science at the School of Electrical and Computer Engineering, at the National Techinal University of Athens (NTUA) and his undergraduate studies in Applied Mathematical and Physical Sciences at the NTUA. His current research interests include data science, data engineering and fairness/explainability of ML algorithms

Facebooktwitterlinkedin
39 views

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories