Does a Fair Model Produce Fair Explanations? Relating Distributive and Procedural Fairness
Files
Date
2024-01-03
Authors
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
6868
Ending Page
Alternative Title
Abstract
We consider interactions between fairness and explanations in neural networks. Fair machine learning aims to achieve equitable allocation of resources --- distributive fairness --- by balancing accuracy and error rates across protected groups or among similar individuals. Methods shown to improve distributive fairness can induce different model behavior between majority and minority groups. This divergence in behavior can be perceived as disparate treatment, undermining acceptance of the system. In this paper, we use feature attribution methods to measure the average explanations for a protected group, and show that differences can occur even when the model is fair. We prove a surprising relationship between explanations (via feature attribution) and fairness (in a regression setting), demonstrating that under moderate assumptions, there are circumstances when controlling one can influence the other. We then study this relationship experimentally by designing a novel loss term for explanations called GroupWise Attribution Divergence (GWAD) and comparing its effects with an existing family of loss terms for (distributive) fairness. We show that controlling explanation loss tends to preserve accuracy. We also find that controlling distributive fairness loss tends to also reduce explanation loss empirically, even though it is not guaranteed to do so theoretically. We also show that there are additive improvements by including both loss terms. We conclude by considering the implications for trust and policy of reasoning about fairness as manipulations of explanations.
Description
Keywords
Artificial Intelligence and Digital Discrimination, explainable ai, fairness, interpretability, neural networks
Citation
Extent
10 pages
Format
Geographic Location
Time Period
Related To
Proceedings of the 57th Hawaii International Conference on System Sciences
Related To (URI)
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.