Does a Fair Model Produce Fair Explanations? Relating Distributive and Procedural Fairness

dc.contributor.authorYang, Yiwei
dc.contributor.authorHowe, Bill
dc.date.accessioned2023-12-26T18:51:37Z
dc.date.available2023-12-26T18:51:37Z
dc.date.issued2024-01-03
dc.identifier.doi10.24251/HICSS.2024.823
dc.identifier.isbn978-0-9981331-7-1
dc.identifier.othera9dbcd07-bd49-4e76-bbbf-2d522524fea2
dc.identifier.urihttps://hdl.handle.net/10125/107209
dc.language.isoeng
dc.relation.ispartofProceedings of the 57th Hawaii International Conference on System Sciences
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectArtificial Intelligence and Digital Discrimination
dc.subjectexplainable ai
dc.subjectfairness
dc.subjectinterpretability
dc.subjectneural networks
dc.titleDoes a Fair Model Produce Fair Explanations? Relating Distributive and Procedural Fairness
dc.typeConference Paper
dc.type.dcmiText
dcterms.abstractWe consider interactions between fairness and explanations in neural networks. Fair machine learning aims to achieve equitable allocation of resources --- distributive fairness --- by balancing accuracy and error rates across protected groups or among similar individuals. Methods shown to improve distributive fairness can induce different model behavior between majority and minority groups. This divergence in behavior can be perceived as disparate treatment, undermining acceptance of the system. In this paper, we use feature attribution methods to measure the average explanations for a protected group, and show that differences can occur even when the model is fair. We prove a surprising relationship between explanations (via feature attribution) and fairness (in a regression setting), demonstrating that under moderate assumptions, there are circumstances when controlling one can influence the other. We then study this relationship experimentally by designing a novel loss term for explanations called GroupWise Attribution Divergence (GWAD) and comparing its effects with an existing family of loss terms for (distributive) fairness. We show that controlling explanation loss tends to preserve accuracy. We also find that controlling distributive fairness loss tends to also reduce explanation loss empirically, even though it is not guaranteed to do so theoretically. We also show that there are additive improvements by including both loss terms. We conclude by considering the implications for trust and policy of reasoning about fairness as manipulations of explanations.
dcterms.extent10 pages
prism.startingpage6868

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0672.pdf
Size:
847.64 KB
Format:
Adobe Portable Document Format