Notebooks

Ethical and Political Issues in Data Mining, Especially Unfairness in Automated Decision Making

03 Aug 2024 21:50

Attention conservation notice: I'm not an active researcher in this area, but many of my friends and colleagues are, I sit on thesis committees, etc., and so my recommended readings below are, no doubt, more CMU-centric than an impartial survey of the literature would warrant. I wouldn't bother to mention this, except that some readers appear to be confused between "a personal notebook I put online in case others might find it useful" and "a reference work which makes claims to authority".

I won't be explaining data mining here. But I will say that I think "ethical and political issues in data mining" is a lot more accurate and reasonable name for what people are really worried about than "algorithmic fairness". This opinion is partly because I don't think algorithms are really at the core of a lot of the justified and widely-shared concerns. The formal notions of "algorithmic fairness" could also be applied to human decision makers. (It would be very interesting to see whether, say, unaided human loan officers are closer to, or further from, false positive parity than credit-scoring algorithms; maybe someone's done this experiment.) Indeed, if those formal notions are good ones, we probably ought to be applying them to human decision-makers. That doesn't mean those who design automated decision-making systems shouldn't pay attention, but it does tell me that the real issue here isn't the use of algorithms.

Or, again: it's (probably!) a fact that in contemporary English, the word "doctor" is more likely to refer to a man than to a woman, and vice versa for "nurse". If a text-mining model picks up this actual correlation and uses it (for instance in an analogy-completion tassk), it is accurately reflecting facts about how English is used in our society. It seems obvious to me that those facts are explained by untold generations of sexism. Whether and when we want language models to exploit such facts would seem to depend on the uses we're putting those algorithms to, as well as on contested ethical and political choices about what kind of world we'd like to see. (There are, after all, plenty of people who approve of a world where doctors are more likely to be men and nurses women.) It would also seem to require sociological knowledge, or at least theories, about how modifying the output of text-mining systems might, or might not, contribute to changing society. If the combination of political and ethical contestation with reliance on necessarily-speculative theories about the remote, cumulative impacts of technical choices on social structure seems like a recipe for disputes, well, you wouldn't be wrong. I wouldn't even blame you for wanting to ignore the issue and get back to making the damn things work. But the issue will not ignore you.

(I also dislike talk of "regulating artificial intelligence", not least because artificial intelligence, in the sense people like to think of it, "is the technology of the future, and always will be".)

Do my homework for me: A lot of the work in this area is done by people who more or less presuppose secular, egalitarian values, with some variation in how they feel about liberalism vs. socialism. As a secular, egaliatarian liberal socialist, I share these values, but this is also a very narrow range of opinion. Is there no serious work being done by conservatives? Is there no work on algorithmic fairness informed by Catholic social teaching, or by Islamic law? No neo-Confucians? If anyone could send me pointers, I'd appreciate it.

A straightforward, if labor-intensive, project in the sociology of science / science-and-technology-studies: Go over the first, say, five years of conference proceedings in algorithmic fairness. Grab the CVs of all the contributors. How many of them had received any formal training in ethics, political theory, or even any social science? (I have a guess!) Now apply Abbott on the "system of professions", and in particular on claims of jurisdiction by would-be professions. (To be clear, I have no formal training in any of those areas.)

A personal point of incredulity: Randomized decision-making algorithms as a way of achieving fairness. I understand the technical reasons why people write papers about these, but I just can't swallow it. The line I used to use at thesis defenses was to imagine that your brother's case is being decided by such a procedure, and the judge / loan officer / etc. is rolling the dice right in front of you --- would you really feel your brother had been fairly treated? (I no longer use this line at thesis defenses because (a) everyone at CMU's heard it too many times, and (b) it's not fair to take this out on graduate students who are just going along with the literature.)


Notebooks: