September 25, 2024

Fairness No Comment

Fairness is a fundamental principle reflecting our innate sense of justice and equity. The essence of fairness lies in the equitable, unbiased and just treatment of all individuals. In our previous post (part I), we provided an introduction to the *bias of recourse *problem. In this post (part II), we describe our framework for defining and auditing bias of recourse on subgroups comprising a series of fairness definitions as well as an efficient auditing mechanism utilizing them to audit fairness of recourse.

We start by providing a set of formal definitions capturing different aspects of fairness of recourse.

We consider a **feature space ***X *= *X*_{1} , …. , *X _{n}*, where

We consider a binary **classifier ***h *: *X *⟶* *{*–*1*, *1} where the positive outcome is the favorable one. For a given *h*, we are concerned with a dataset *D *of **adversely affected individuals**, i.e., those who receive the unfavorable outcome. We prefer the term instance to refer to any point in the feature space *X*, and the term individual to refer to an instance from the dataset *D*.

We define an action** ***a *as a set of changes to feature values, e.g., *a *= {*country ⟶** US, education-num ⟶** *12}. We denote as *A *the set of possible actions. An action *a *applied to an individual (a factual instance) *x returns* a **counterfactual **instance *x*’ = *a*(*x*). If the individual *x *was adversely affected (*h*(*x*) = -1) and the action results in a counterfactual that receives the desired outcome (*h*(*a*(*x*)) = 1), we say that action *a *offers recourse to the individual *x *and is thus **effective**. In line with the literature, we also refer to an effective action as a **counterfactual explanation **for individual *x*.

An action *a *incurs a **cost **to an individual *x*, which we denote as cost(*a, x*). The cost function captures both how *feasible *the action *a *is for the individual *x*, and how *plausible *the counterfactual *a*(*x*) is.

Given a set of actions *A*, we define the **recourse cost **rc(*A, x*) of an individual *x *as the minimum cost among effective actions if there is one, or otherwise some maximum cost represented as *c*_{∞}:

An effective action of minimum cost is also called a **nearest counterfactual explanation**. We define a **subspace ***X _{p} *⊆

Given a predicate *p*, we define the subpopulation **group ***G _{p}* ∈

For a specific action *a*, we naturally define its **effectiveness **(eff) for a group *G*, as the proportion of individuals from *G *that achieve recourse through *a*:

We want to examine how recourse is achieved for group *G *through a set of possible actions *A*. We define the **aggregate effectiveness **(aeff) of *A *for *G *in two distinct ways.

In the *micro viewpoint*, the individuals in the group are considered independently, and each may choose the action that benefits itself the most. Concretely, we define the **micro-effectiveness **of set of actions *A *for group *G *as the proportion of individuals in *G *that can achieve recourse through *some *action in *A*, i.e.:

In the *macro viewpoint*, the group is considered as a whole, and an action is applied collectively to all individuals in the group. Concretely, we define the **macro-effectiveness **of set of actions *A *for group *G *as the largest proportion of individuals in *G *that can achieve recourse through *the same *action in *A*, i.e.:

For a group *G*, actions *A*, and a cost budget *c*, we define the **in-budget actions **as the set of actions that cost at most *c *for any individual in *G*:

We define the **effectiveness-cost distribution **(ecd) as the function that for a cost budget *c *returns the aggregate effectiveness possible with in-budget actions:

We use ecd* _{µ}*, ecd

The **inverse effectiveness-cost distribution **function ecd^{−1}(*ϕ*; *A, G*) takes as input an effectiveness level *ϕ *∈ [0*, *1] and returns the minimum cost required so that *ϕ*|*G*| individuals achieve recourse

We define recourse fairness of classifier *h *for a group *G *by comparing the ecd functions of the protected subgroups *G*_{0}, *G*_{1} in different ways.

The first two definitions are *cost-oblivious*, and apply whenever we can ignore the cost function. Specifically, given a set of actions *A *and a group *G*, we assume that all actions in *A *are considered equivalent, and that the cost of any action is the same for all individuals in the group; i.e., cost(*a, x*) = cost(*a*^{’}*, x*^{′}), ∀*a* ≠ *a*^{′} ∈ *A, x* ≠ *x*^{′} ∈ *G*. The definitions simply compare the aggregate effectiveness of a set of actions on the protected subgroups.

**Equal Effectiveness **This definition has a micro and a macro interpretation, and says that the classifier is fair if the same proportion of individuals in the protected subgroups can achieve recourse:

**Equal Choice for Recourse **This definition has only a macro interpretation and claims that the classifier is fair if the protected subgroups can choose among the same number of sufficiently effective actions to achieve recourse, where sufficiently effective means the actions should work for at least 100*ϕ*% (for *ϕ *∈ [0*, *1]) of the subgroup:

The next three definitions assume the cost function is specified, and have both a micro and a macro interpretation.

**Equal Effectiveness within Budget **The classifier is fair if the same proportion of individuals in the protected subgroups can achieve recourse with a cost at most *c*:

**Equal Cost of Effectiveness **The classifier is fair if the minimum cost to achieve aggregate effectiveness of *ϕ *∈ [0*, *1] in the protected subgroups is equal:

**Fair Effectiveness-Cost Trade-Off **The classifier is fair if the protected subgroups have the same effectiveness-cost distribution, or equivalently for each cost budget *c*, their aggregate effectiveness is equal:

The left-hand side represents the two-sample Kolmogorov-Smirnov statistic for the empirical cumulative distributions (ecd) of the protected subgroups. We say that the classifier is fair with confidence *α *if this statistic is less than:

The last definition takes a micro viewpoint and extends the notion of *burden *[5] from literature to the case where not all individuals may achieve recourse. The **mean recourse cost **of a group:

considers individuals that cannot achieve recourse through *A *and have a recourse cost of *c*_{∞}. To exclude them, we denote as *G*^{∗} the set of individuals of *G *that can achieve recourse through an action in *A*, i.e., *G*^{∗} = {*x ** G *| * a ** A, h*(*a*(*x*)) = 1}. Then the **conditional mean recourse cost **is the mean recourse cost among those that can achieve recourse:

If *G *= *G*^{∗}, the definitions coincide with burden.

**Equal (Conditional) Mean Recourse **The classifier is fair if the (conditional) mean recourse cost for the protected subgroups is the same:

Note that when group *G *is the entire dataset of affected individuals, and all individuals can achieve recourse through *A*, this fairness notion coincides with fairness of burden [5].

Let’s now see how these definitions can be exploited in our framework, FACTS (Fairness-aware Counterfactuals for Subgroups), to audit the difficulty of achieving recourse in subgroups. The output of FACTS comprises population groups that are assigned (a) an unfairness score that captures the disparity between protected subgroups according to different fairness definitions and allows us to rank groups, and (b) a user-intuitive, easily explainable counterfactual summary, which we term *Comparative Subgroup Counterfactuals *(CSC).

Figure 2 presents an example result of FACTS derived from the adult dataset[1]. The “if clause” represents the subgroup *G _{p}*, which contains all the affected individuals that satisfy the predicate:

*p=(hours-per-week =FullTime)**(marital-status = Married-civ-spouse)** ** (occupation = Adm-clerical)*

The information below the predicate refers to the protected subgroups *G _{p,}*

*a *= {*hours-per-week **à** Overtime, occupation **à** Exec-managerial*}

has effectiveness 0*.*72 *> ϕ *for *G _{p,}*

**Method overview **Deploying FACTS comprises three main steps: (a) *Subgroup and action space generation *that creates the sets of groups and actions *A *to examine; (b) *Counterfactual summaries generation *that applies appropriate actions to each group *G ** *; and (c) *CSC construction and fairness ranking *that applies the fairness definitions defined above. Next, we describe these steps in detail.

**(a)** **Subgroup and action space generation**. Subgroups are generated by executing the fp-growth algorithm between the common subgroups of the protected populations. The set of all actions *A *is generated by executing fp-growth on the unaffected population to increase the chance of identifying more effective actions and to reduce the computational complexity. The above process is parameterizable with respect to the selection of the protected attribute(s) and the minimum frequency threshold for obtaining candidate subgroups. We emphasize that alternate mechanisms to generate the sets *A *and G are possible.

**(b) Counterfactual summaries generation. **For each subgroup *G _{p}*, the following steps are performed: (i) Find valid actions, i.e., the actions in

**(c) CSC construction and fairness ranking. **FACTS evaluates all presented definitions on all subgroups, producing an unfairness score per definition, per subgroup. In particular, each definition quantifies a different aspect of the difficulty for a protected subgroup to achieve recourse. This quantification directly translates to difficulty scores for the protected subgroups *G _{p,}*

The outcome of this process is the generation, for each fairness definition, of a ranked list of CSC representations, in decreasing order of their unfairness score. Apart from unfairness ranking, the CSC representations will allow users to intuitively understand unfairness by directly comparing differences in actions between the protected populations within a subgroup.

We now present some indicative findings from our evaluation of FACTS, focusing on the Adult dataset1; more information about the datasets, experimental setting, and additional results can be found in the appendix[GU1] in [9]. The code is available at: https://github.com/AutoFairAthenaRC/FACTS. First, we briefly describe the experimental setting and then present and discuss Comparative Subgroup Counterfactuals for subgroups ranked as the most unfair according to some presented definitions.

**Experimental Setting. **The first step was the dataset cleanup (e.g., removing missing values and duplicate features, creating bins for continuous features like age). The resulting dataset was split randomly with a 70:30 split ratio and was used to train and test respectively a logistic regression model (consequently used as the black-box model to audit). For the generation of the subgroups and the set of actions we used fp-growth with a 1% support threshold on the test set. We also implemented various cost functions, depending on the type of feature, i.e., categorical, ordinal, and numerical.

**Unfair subgroups. ** The following table presents three subgroups which were ranked at position 1 according to three different definitions: *Equal Cost of Effectiveness (ϕ = 0.7)*, *Equal Choice for Recourse (ϕ = 0.7) *and *Equal Cost of Effectiveness (ϕ = 0.3)*, meaning that these subgroups were detected to have the highest unfairness according to the respective definitions. For each subgroup, its *rank*, *bias against*, and *unfairness score *are provided for all definitions presented in the left-most column. When the unfairness score is 0, we display the value “Fair” in the rank column. Note that subgroups with exactly the same score w.r.t. a definition will receive the same rank. The CSC representations for the fairness metric that ranked the three subgroups of Table 1 at the first position are shown in Figure 3.

Subgroup 1 | Subgroup 2 | Subgroup 3 | |||||||

rank | bias against | unfairness score | rank | bias against | unfairness score | rank | bias against | unfairness score | |

Equal Effectiveness | 2950 | Male | 0.11 | 10063 | Female | 0.0004 | 275 | Female | 0.32 |

Equal Choice for Recourse (ϕ = 0.3) | Fair | – | 0 | 12 | Female | 2 | Fair | – | 0 |

Equal Choice for Recourse (ϕ = 0.7) | 6 | Male | 1 | 1 | Female | 6 | Fair | – | 0 |

Equal Effectiveness within Budget (c = 5) | Fair | – | 0 | 2806 | Female | 0.056 | 70 | Female | 0.3 |

Equal Effectiveness within Budget (c = 10) | 2350 | Male | 0.11 | 8518 | Female | 0.0004 | 226 | Female | 0.3 |

Equal Effectiveness within Budget (c = 18) | 2675 | Male | 0.11 | 9222 | Female | 0.0004 | 272 | Female | 0.3 |

Equal Cost of Effectiveness (ϕ = 0.3) | Fair | – | 0 | Fair | – | 0 | 1 | Female | inf |

Equal Cost of Effectiveness (ϕ = 0.7) | 1 | Male | inf | 12 | Female | 2 | Fair | – | 0 |

Fair Effectiveness-Cost Trade-Off | 4065 | Male | 0.11 | 3579 | Female | 0.13 | 306 | Female | 0.32 |

Equal (Conditional) Mean Recourse | Fair | – | 0 | 3145 | Female | 0.35 | Fair | – | 0 |

Subgroup 1 is ranked first (highly unfair) based on *Equal Cost of Effectiveness with ϕ *= 0*.*7, while it is ranked much lower or it is even considered as fair according to most of the remaining definitions (see the values of the “rank” column for Subgroup 1). The same pattern is observed for Subgroup 2 and Subgroup 3: they are ranked first based on *Equal Choice for Recourse with ϕ *= 0*.*7 and *Equal Cost of Effectiveness with ϕ *= 0*.*3 accordingly, but much lower according to the remaining definitions. This finding provides a strong indication of the utility of the different fairness definitions, i.e., the fact that they are able to capture different aspects of the difficulty in achieving recourse.

What is more, these different aspects can easily be motivated by real-world auditing needs. For example, out of the aforementioned definitions, *Equal Cost of Effectiveness with ϕ *= 0*.*7 would be suitable in a scenario where a horizontal intervention to support a subpopulation needs to be performed, but a limited number of actions is affordable. In this case, the macro viewpoint demonstrated in the CSC Subgroup 1 (top result in Figure 3) serves exactly this purpose: one can easily derive single, group-level actions that can effectively achieve recourse for a desired percentage of the unfavored

Fairness of recourse is an important notion of fairness, yet still understudied. In this series of posts, we presented a framework comprising a set of fairness definitions as well as an efficient auditing mechanism utilizing them to audit fairness of recourse. You may find a detailed description in our published work [9] as well as additional discussion on the topic in related works [5,6,7,10].

This work has been performed by: *Loukas Kavouras, Konstantinos Tsopelas, Giorgos Giannopoulos,* *Dimitris Sacharidis, Eleni Psaroudaki, Nikolaos Theologitis, Dimitrios Rontogiannis, Dimitris Fotakis, Ioannis Z. Emiris* and has been funded by the European Union’s Horizon Europe research and innovation programme under Grant Agreement No. 101070568 (AutoFair).

[1] | European Commission, Directorate-General for Justice and Consumers, Gerards, J., Xenidis, R., Algorithmic discrimination in Europe – Challenges and opportunities for gender equality and non-discrimination law, Publications Office, 2021, https://data.europa.eu/doi/10.2838/544956. |

[2] | Marc P Hauer, Johannes Kevekordes, and Maryam Amir Haeri. Legal perspective on possible fairness measures–a legal discussion using the example of hiring decisions. Computer Law & Security Review, 42:105583, 2021. |

[3] | Barocas, Solon, Moritz Hardt, and Arvind Narayanan. Fairness and machine learning: Limitations and opportunities. MIT Press, 2023. |

[4] | Sahil Verma and Julia Rubin. Fairness definitions explained. In Proceedings of the International Workshop on Software Fairness, FairWare ’18, page 1–7, New York, NY, USA, 2018. Association for Computing Machinery. |

[5] | Shubham Sharma, Jette Henderson, and Joydeep Ghosh. CERTIFAI: A common framework to provide explanations and analyse the fairness and robustness of black-box models. In AIES, pages 166–172. ACM, 2020. |

[6] | Alejandro Kuratomi, Evaggelia Pitoura, Panagiotis Papapetrou, Tony Lindgren, and Panayiotis Tsaparas. Measuring the burden of (un) fairness using counterfactuals. In Machine Learning and Principles and Practice of Knowledge Discovery in Databases: International Workshops of ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part I, pages 402–417. Springer, 2023. |

[7] | Julius von Kügelgen, Amir-Hossein Karimi, Umang Bhatt, Isabel Valera, Adrian Weller, and Bernhard Schölkopf. On the fairness of causal algorithmic recourse. In AAAI, pages 9584–9594. AAAI Press, 2022. |

[8] | Suresh Venkatasubramanian and Mark Alfano. The philosophical basis of algorithmic recourse. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 284–293, 2020. |

[9] | Loukas Kavouras, Konstantinos Tsopelas, Giorgos Giannopoulos, Dimitris Sacharidis, Eleni Psaroudaki, Nikolaos Theologitis, Dimitrios Rontogiannis, Dimitris Fotakis, Ioannis Z. Emiris: Fairness Aware Counterfactuals for Subgroups. In NeurIPS 2023. |

[10] | Bell, Andrew, Joao Fonseca, Carlo Abrate, Francesco Bonchi, and Julia Stoyanovich. “Fairness in Algorithmic Recourse Through the Lens of Substantive Equality of Opportunity.” arXiv preprint arXiv:2401.16088 (2024). |

Dimitris Sacharidis is an assistant professor at the Data Science and Engineering Lab of the Université Libre de Bruxelles. Prior to that he was an assistant professor at the Technical University of Vienna, and a Marie Skłodowska Curie fellow at the “Athena” Research Center and at the Hong Kong University of Science and Technology. He finished his PhD and undergraduate studies on Computer Engineering at the National Technical University of Athens, while in between he obtained an MSc in Computer Science from the University of Southern California. His research interests include data science, data engineering, and responsible AI. He has served as a PC member and has been involved in the organization of top related conferences, and has acted as a reviewer and editorial member of associated journals.

Giorgos Giannopoulos is a Scientific Associate at Athena Research Center, Greece and at the National Observatory of Athens. He has performed research on the fields of Information Retrieval, Machine Learning, Data Integration, Geospatial Data Management and Analysis and Semantic Web, while he has been involved in the specification, design and development of cataloguing services for European Research Infrastructures. His current research interests focus on the adaptation and extension of Machine/Deep Learning methods in various disciplines, including: fairness and explainability of ML algorithms and pipelines; ML-driven next day prediction of forest fires; pattern recognition on RES timeseries; explanation and prediction of Solar events; ML-driven data integration and annotation; and fact checking.

Loukas Kavouras is a Scientific Associate at Athena Research Center, Greece. He has performed research on the fields of Machine Learning, Online and Approximation Algorithms, and Resource Allocation Problems. He finished his PhD and his MSc in Computer Science at the School of Electrical and Computer Engineering, at the National Techinal University of Athens (NTUA) and his undergraduate studies in Applied Mathematical and Physical Sciences at the NTUA. His current research interests include data science, data engineering and fairness/explainability of ML algorithms