Does a Fair Model Produce Fair Explanations? Relating Distributive and Procedural Fairness

Date

2024-01-03

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Volume

Number/Issue

Starting Page

6868

Ending Page

Alternative Title

Abstract

We consider interactions between fairness and explanations in neural networks. Fair machine learning aims to achieve equitable allocation of resources --- distributive fairness --- by balancing accuracy and error rates across protected groups or among similar individuals. Methods shown to improve distributive fairness can induce different model behavior between majority and minority groups. This divergence in behavior can be perceived as disparate treatment, undermining acceptance of the system. In this paper, we use feature attribution methods to measure the average explanations for a protected group, and show that differences can occur even when the model is fair. We prove a surprising relationship between explanations (via feature attribution) and fairness (in a regression setting), demonstrating that under moderate assumptions, there are circumstances when controlling one can influence the other. We then study this relationship experimentally by designing a novel loss term for explanations called GroupWise Attribution Divergence (GWAD) and comparing its effects with an existing family of loss terms for (distributive) fairness. We show that controlling explanation loss tends to preserve accuracy. We also find that controlling distributive fairness loss tends to also reduce explanation loss empirically, even though it is not guaranteed to do so theoretically. We also show that there are additive improvements by including both loss terms. We conclude by considering the implications for trust and policy of reasoning about fairness as manipulations of explanations.

Description

Keywords

Artificial Intelligence and Digital Discrimination, explainable ai, fairness, interpretability, neural networks

Citation

Extent

10 pages

Format

Geographic Location

Time Period

Related To

Proceedings of the 57th Hawaii International Conference on System Sciences

Related To (URI)

Table of Contents

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Rights Holder

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.