Notebooks

## Ethical and Political Issues in Data Mining, Especially Unfairness in Automated Decision Making

31 Aug 2022 17:58

I won't be explaining data mining here. But I will say that I think "ethical and political issues in data mining" is a lot more accurate and reasonable name for what people are really worried about than "algorithmic fairness". This opinion is partly because I don't think algorithms are really at the core of a lot of the justified and widely-shared concerns. The formal notions of "algorithmic fairness" could also be applied to human decision makers. (It would be very interesting to see whether, say, unaided human loan officers are closer to, or further from, false positive parity than credit-scoring algorithms; maybe someone's done this experiment.) Indeed, if those formal notions are good ones, we probably ought to be applying them to human decision-makers. That doesn't mean those who design automated decision-making systems shouldn't pay attention, but it does tell me that the real issue here isn't the use of algorithms.

Or, again: it's (probably!) a fact that in contemporary English, the word "doctor" is more likely to refer to a man than to a woman, and vice versa for "nurse". If a text-mining model picks up this actual correlation and uses it (for instance in an analogy-completion tassk), it is accurately reflecting facts about how English is used in our society. It seems obvious to me that those facts are explained by untold generations of sexism. Whether and when we want language models to exploit such facts would seem to depend on the uses we're putting those algorithms to, as well as on contested ethical and political choices about what kind of world we'd like to see. (There are, after all, plenty of people who approve of a world where doctors are more likely to be men and nurses women.) It would also seem to require sociological knowledge, or at least theories, about how modifying the output of text-mining systems might, or might not, contribute to changing society. If the combination of political and ethical contestation with reliance on necessarily-speculative theories about the remote, cumulative impacts of technical choices on social structure seems like a recipe for disputes, well, you wouldn't be wrong. I wouldn't even blame you for wanting to ignore the issue and get back to making the damn things work. But the issue will not ignore you.

(I also dislike talk of "regulating artificial intelligence", not least because artificial intelligence, in the sense people like to think of it, "is the technology of the future, and always will be".)

Do my homework for me: A lot of the work in this area is done by people who more or less presuppose secular, egalitarian values, with some variation in how they feel about liberalism. As a secular, egaliatarian liberal socialist, I share these values, but this is also a very narrow range of opinion. Is there no serious work being done by conservatives? Is there no work on algorithmic fairness informed by Catholic teaching, or by Islamic law? No neo-Confucians? If anyone could send me pointers, I'd appreciate it.

Disclaimer: I'm not an active researcher in this area, but many of my friends and colleagues are, I sit on thesis committees, etc., and so my recommended readings below are, no doubt, more CMU-centric than an impartial survey of the literature would warrant. I wouldn't bother to mention this, except that some readers appear to be confused between "a personal notebook I put online in case others might find it useful" and "a reference work which makes claims to authority".

Recommended, close ups (very misc. for such a huge topic):
• danah boyd and Kate Crawford, "Six Provocations for Big Data" (2011) [ssrn/1926431]
• Henry Farrell and Marion Fourcade, "The Moral Economy of High Tech Modernism", Daedalus forthcoming [PDF preprint via Prof. Fourcade]
• Henry Farrell, Abraham Newman and Jeremy Wallace, "Spirals of Delusion: How AI Distorts Decision-Making and Makes Dictators More Dangerous", Foreign Affairs September-October 2022
• Alison Gopnik, "What AI Still Doesn't Know How to Do; Artificial intelligence programs that learn to write and speak can sound almost human—but they can't think creatively like a small child can", Wall Street Jounral 15 July 2022 [What is more interesting here than the headline stuff is the suggestion that the right way to think about large language models is as a information-retrieval technology, a way of interacting with the corpus of texts fed into it --- for better or worse...]
• Cynthia Rudin, "Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead", arxiv:1811.10154
• Zeynep Tufekci, "Engineering the Public: Big Data, Surveillance, and Computational Politics", First Monday 19:7 (2014)
Recommended, close-ups, algorithmic fairness etc.:
• Ali Alkhatib and Michael Bernstein, "Street-Level Algorithms: A Theory at the Gaps Between Policy and Decisions", paper 530 in CHI Conference on Human Factors in Computing Systems Proceedings [CHI 2019] [PDF reprint via the Stanford HCI group. My comments.]
• Richard A. Berk and Ayya A. Elzarka, "Almost Politically Acceptable Criminal Justice Risk Assessment", Criminology and Public Policy 19 (2020): 1231--257, arxiv:1910.11410
• Alexandra Chouldechova, "Fair prediction with disparate impact: A study of bias in recidivism prediction instruments", arxiv:1610.07524
• Amanda Coston, Neel Guha, Derek Ouyang, Lisa Lu, Alexandra Chouldechova, Daniel E. Ho, "Leveraging Administrative Data for Bias Audits: Assessing Disparate Coverage with Mobility Data for COVID-19 Policy", arxiv:2011.07194 [I am convinced by their points about differential measurement error across groups, but equaly struck by this: "the estimates at the individual polling place location level are quite noisy: root meansquared error is 1375 voters". This seems excessively imprecise to base any decisions on!]
• Amanda Coston, Alan Mishler, Edward H. Kennedy, Alexandra Chouldechova, "Counterfactual Risk Assessments, Evaluation, and Fairness", FAT* '20, arxiv:1909.00066
• Kate Crawford, "The Hidden Biases in Big Data", Harvard Business Review 1 April 2013 [but serious, despite the date!]
• Simon DeDeo, "Wrong side of the tracks: Big Data and Protected Categories", pp. 31--42 in Cassidy R. Sugimoto, Hamid R. Ekbia and Michael Mattioli, Big Data Is Not a Monolith (MIT Press, 2016), arxiv:1412.4643 [This is the idea that, when Simon and I were batting it around, we called "prediction without racism". Basically: you don't want to be racist (or sexist, etc.), so obviously you don't directly base your predictions /decisions on race. But there are lots of other innocous-seeming features which might be relevant to what you're trying to predict, or what course of action you should recommend, but are also really correlated, especially in bulk, with race. (For instance, your race and sex can be predicted with reasonable accuracy from the websites you visit.) So how can we use the features without just slipping in the racism through the back door? Simon's very ingenious solution was to use information theory to find the distribution which is closest to the real distribution of the data, but where the variable we're trying to predict is independent of the protected variable(s). This sets up an optimization problem which can actually be solved in closed form, and basically tells you how much you have to re-weight each data point in your model fitting. It's a really clever idea, and I wish it was more widely used.]
• Julia Dressel and Hany Farid, "The accuracy, fairness, and limits of predicting recidivism", Science Advances 4 (2018): eaao5580 [Demonstrating that you can reproduce the error rates of the proprietary COMPAS score (at least on one data set...) using a logistic regression on age and number of priors. This doesn't surprise me, because (before reading this paper!) I'd set that as an exercise in my undergraduate data mining class. The Kids also convinced me that very small classification trees, using those two features, do only very slightly worse. (The optimal tree for predicting violent recidivism has just four leaves.) Now, this doesn't necessarily mean that algorithmic risk prediction tools are a bad idea --- we don't have error rates for judges! --- but it does blow up the justification for using complex, proprietary models. (How proprietary models can possibly have any place in a supposedly adversarial legal system, I cannot understand.)]
• Michael Feldman, Sorelle Friedler, John Moeller, Carlos Scheidegger and Suresh Venkatasubramanian, "Certifying and removing disparate impact", arxiv:1412.3756
• Ira Globus-Harris, Michael Kearns, Aaron Roth, "An Algorithmic Framework for Bias Bounties", arxiv:2201.10408 [The basic idea here is (appropriately!) telegraphed by the abstract. If we're using a model $f$ to make predictions, and someone or something can point to a (measurable) group of cases $g$ where another model $h$ does better by the agreed-upon loss function, switch to a new model, which follows $h$ on group $g$, and otherwise still follows $f$. There is more to it than that, because groups might overlap and so they introduce some machinery to try to keep overlapping patches from interfering with each other, and there are interesting learning-theoretic aspects to making sure that we're not data-mining in the bad sense, i.e., over-fitting accidents of the sample data. But this idea --- when we find a group where another model does better, use that model instead on that group --- is the core. It's a very good paper and I will certainly teach it going forward, but there are some limitations which I wish they'd addressed. (1) At the basic level of procedural fairness (i.e., broadly-liberal ideas of justice), this is a recipe for treating different groups according to different criteria. This is the literal definition of "privilege" (or at least its etymology); it might nonetheless be ethically acceptable for liberals, but there's a tension there which needs at least to be explored. (2) Relatedly, I am not a lawyer, but because this constructs a patchwork of different rules and criteria for different groups, it'd seem very easy to attack this under American anti-discrimination law: these are algorithms for coding disparate treatment into the ultimate model! (3) The algorithms only care about reducing expected loss / "risk" conditional on group membership. They do not try to equalize anything. It is entirely possible that the minimum possible conditional risk for different demographic groups is just different. But this would mean that the predictor which minimizes the conditional risk for every group might violate demographic parity, error rate parity, etc., indeed all the usual notions of algorithmic fairness. So we seem to be back to the trade-offs which the paper sought to escape at its beginning: "Be as accurate as possible for everyone" is, at least potentially, in tension with "Be equally accurate for everyone". My impulse, at least at the time I write this, is to say that it isn't really fair to use a model which is deliberately less accurate than possible for some groups, just because it's equally accurate for all groups. (That is, my gut sides with this paper.) But the contrary position, that equal accuracy across groups (in some form) is what justice and/or political prudence demands, isn't self-evidently absurd. At the least there needs to be an ethical and/or political argument here. (4) Since I happened to re-read chapter 2 of Kearns and Roth's The Ethical Algorithm, prior to teaching it, right after reading this paper, I'd point out that my (1)--(3) are basically all issues they raise in that chapter, which makes it a bit weirder that they're not handled here...]
• Sara Hooker, "Moving beyond 'algorithmic bias is a data problem'", Patterns 2 (2021): 100241 [Commentary]
• Abigail Z. Jacobs, Hanna Wallach, "Measurement and Fairness", arxiv:1912.05511
• Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, Aram Galstyan, "A Survey on Bias and Fairness in Machine Learning", arxiv:1908.09635
• Alan Mishler, Auditing and Achieving Counterfactual Fairness [Ph.D. thesis, CMU Statistics Dept., 2021]
• Arvind Narayanan, "Translation tutorial: 21 fairness definitions and their politics" [PDF]
• Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, Kilian Q. Weinberger, "On Fairness and Calibration", arxiv:1709.02012
• Sonja B. Starr, "Evidence-Based Sentencing and the Scientific Rationalization of Discrimination", Stanford Law Review 66 (2014) 803--872 [The strongest part of this, to my mind, is the causal-inference critique: predicting the risk that someone will re-offend within $k$ years, under current conditions, is not at all the same as predicting the risk of their committing another crime as a function of the sentence they receive. I am also very sympathetic to the points about the very modest predictive power of the existing algorithms, the possibility of great unmeasured heterogeneity within groups, and the ethical dubiousness of punishing someone more because of demographic groups they belong to. About the legal-constitutional issues I'm not fit to comment. One point to which Starr doesn't, I think, give enough weight is that even if risk-prediction formulas aren't any fairer or more accurate than what judges do now, they are however more explicit and public, and so both more subject to democratic control and to improvement over time. (Comment written in 2014.)]
• Megha Srivastava, Hoda Heidari, Andreas Krause, "Mathematical Notions vs. Human Perception of Fairness: A Descriptive Approach to Fairness for Machine Learning", arxiv:1902.04783 [The headline is that the simplest possible notion of fairness, namely "demographic parity" (equal rates of positive decisions across groups) best captures lay people's notions of "fairness".]
Recommended, now-historical close-ups:
• Kling, Scherson and Allen, "Parallel Computing and Information Capitalism," in Metropolis and Rota (eds.), A New Era in Computation (1992) [A batch of UC Irvine comp. sci. professors who write like sociologists. "'Information capitalism' refers to forms of organization in which data-intensive techniques and computerization are key strategic resources for corporate production."]
• Erik Larson, The Naked Consumer: How Our Private Lives Become Public Commodities
Modest forbids me to recommend:
• The lecture notes on algorithmic fairness from my data mining class (latest iteration)