Course Announcement: "Statistics of Inequality and Discrimination" (36-313)
Attention
conservation notice: Advertisement for a course you won't take, at a university you don't attend, in which very human and passionately contentious topics deliberately have all the life sucked from them, leaving only the husk of abstractions and the dry bones of methodology.
In the fall I will, again, be teaching my class on inequality
36-313, Statistics of Inequality and Discrimination
9 units
Time and place: Tuesdays and Thursdays, 1:25 -- 2:45 pm, in Wean Hall (WEH) 6403 (tentatively)
Description: Many social questions about inequality, injustice and unfairness are, in part, questions about evidence, data, and statistics. This class lays out the statistical methods which let us answer questions like Does this employer discriminate against members of that group?, Is this standardized test biased against that group?, Is this decision-making algorithm biased, and what does that even mean? and Did this policy which was supposed to reduce this inequality actually help? We will also look at inequality within groups, and at different ideas about how to explain inequalities between groups. The class will interweave discussion of concrete social issues with the relevant statistical concepts.
Prerequisites: 36-202 ("Methods for Statistics and Data Science") (and so also 36-200, "Reasoning with Data"), or similar with permission of the instructor
Last year was the first time I got to teach it, and it was a mixed
experience. The students who stuck with it were, gratifyingly, uniformly very
happy with it (and I am pretty sure they learned a lot!). But it also had the
biggest "melt" of any class I've taught, with fully half of those who initially
signed up for it eventually dropping it. The most consistent reason why --- at
least, the one they felt comfortable telling me! --- was that they were
expecting something with a lot more arguing about politics, and a lot less math
and data analysis. I have taken this feedback to heart, and decided to do
even more math and data analysis.
Tentative topic schedule
Slightly more than one week per. A more detailed listing, with related readings, can be
found on the
class
homepage.
- "Recall": Reminders about probability and statistics: populations, distribution
within a population, distribution functions, joint and conditional probability;
samples and inference from samples.
- Income and wealth inequality: What does the distribution of income
and wealth look like within a population? How do we describe population
distributions, especially when there is an extreme range of values (a big
difference between the rich and poor)? Where does the idea of "the 1%"
wealthy elite come from? How has income inequality changed over recent
decades?
Statistical tools: measures of central tendency (median, mode, mean),
of dispersion, and of skew; measures of dispersion (standard deviation etc.); measures of concentration and inequality (ratios between percentiles, the Lorenz curve, Gini coefficient); the concept of "heavy tails" (the largest
values being orders of magnitude larger than typical values); log-normal
and power law distributions; fitting distributions to existing data;
positive feedback, multiplicative growth and "cumulative advantage" processes.
- Speed-run through social and economic stratification: Reminders (?) about social concepts:
ascriptive and attained social statuses, and qualitative/categorical vs. more-or-less dimensions of differentiation. Important forms of differentiation, including (but not necessarily limited to): sex, gender, income, wealth, consumption, caste, race, ethnicity, citizenship, class, order, education. The legal notion of "protected categories".
- Income disparities: How does income (and wealth) differ across groups? How do we compare average or typical values? How do we compare entire
distributions? How have income inequalities by race and sex changed over recent decades?
Statistical tools: permutation tests for differences in mean (and other
measures of the average); two-sample tests for differences in distribution;
bootstrapping;
inverting tests to find the range of differences compatible with the data;
the "analysis of variance" method of comparing populations;
the "relative distribution" method of comparing populations
- Explaining, or explaining away, inequality: To what
extent can differences in outcomes between groups be explained by differences
in their attributes (e.g., explaining differences in incomes by differences
in marketable skills)? How should we go about making such adjustments? Is
it appropriate to treat discrimination as the "residual" left unexplained?
When does adjusting or controlling for a variable contribute to an
explanation, and when is it "explaining away" discrimination? What would it
mean to control for race, sex or gender?
Statistical tools: Observational causal inference; using
regression to "control for" multiple variables at once, with both linear
models and nonparametrically (by means of matching or nearest-neighbors);
using graphical models to represent causal relations between variables; how
to use graphical models to decide what should and what should not be
controlled for; the causal model implicit in decisions about controls.
- Detecting discrimination in hiring, admissions, etc.: Do employers discriminate in
hiring (or schools in admission, etc.)? How can we tell? When are
differences in hiring rates evidence for discrimination? How do statistical
perspectives on this question line up with legal criteria for "disparate
treatment" and "disparate impact"?
Statistical tools: tests for differences in proportions or
probabilities; adjusting for applicant characteristics (again)
- Inequalities in health, disease and mortality: Quantifying differences in the incidence of diseases, in death rates, and in life expectancy. The "deaths of despair" controversy.
Statistical tools: differences in proportions and probabilities again; survival analysis and survival curves; some of the elements of demography.
- Mobility and Transmission of Inequality: What does it mean to talk about social mobility? Conversely, what doe it mean to say inequality can be transmitted from one generation to the next? What are the mechanisms this happens through? What are the large-scale patterns about mobility and transmission, over the last few decades?
Statistical tools: correlations; conditional probability modeling;
Markov models.
- Measuring segregation: What do we mean by "segregation"? Segregation in law ("de jure") and
segregration in fact ("de facto"). Different ways of measuring de facto
segregation. Trends in de facto racial segregation since the end of de jure
racial segregation. Why different measures of segregation give different
results. Segregation by income. Segregation by political partisanship.
Consequences of segregation. Inter-generational transmission again.
Statistical tools: Standard measures of segregation; more
recent measures of segregation based on information theory; spatial correlation; how do we make adjustments for changing distributions?
- Algorithmic bias and/or fairness: Can predictive or decision-making algorithms be
biased? What would that even mean? Do algorithms trained on existing data
necessarily inherit the biases of the world? What notions of fairness or
unbiased can we actually implement for algorithms? What trade-offs are
involved in enforcing different notions of fairness? Are "risk-prediction
instruments" fair?
Statistical tools: Methods for evaluating the accuracy of predictions;
differential error rates across groups; decision trees; optimization and multi-objective
optimization.
- Standardized tests: Are standardized tests for school
admission biased against certain racial groups? What does it mean to
measure qualifications, and how would we know whether tests really are
measuring qualifications? What does it mean for a measurement to be biased?
When do differences across groups indicate biases? (Disparate impact
again.) Why correlating outcomes with test scores among admitted
students may not make sense. The "compared to what?" question.
Statistical tools: Predictive validity; differential
prediction; "conditioning on a collider"
- Intelligence tests: Are intelligence tests biased? How do
we measure latent attributes? How do we know the latent attributes even
exist? What would it mean for there to be such a thing as "general
intelligence", that could be measured by tests? What, if anything, do
intelligence tests measure? What rising intelligence test results (the
Flynn Effect) tell us?
Statistical tools: correlation between test scores; factor
models as an explanation of correlations; estimating factor values from
tests; measurement invariance; alternatives to factor models; item
response theory
- Measuring attitude and prejudice: How do we measure people's feelings about different groups? Why do different measures give different results? Do "implicit association tests" measure
unconscious biases? What, if anything, do implicit
association tests measure?
Statistical tools: More on measurement; the distinction between
reliability and validity; why it's much easier to quantify reliability than
validity; approaches to "construct validity".
- Evaluating inequality-reducing interventions: If we try to do something to reduce inequality, how do we know whether or not it worked? How do we design a good study of an intervention? How do we pool information from
multiple studies? What can we do if only bad studies are available? Do implicit bias interventions
change behavior? Does having a chief diversity officer increase faculty
diversity? What does, in fact, seem to work?
Statistical tools: Design and analysis of studies; experimental design: selecting measurements
of outcomes, and the importance of randomized studies; meta-analytic
methods for combining information
- Policing and crime: When do differences in traffic stops, arrests, or police-caused deaths
indicate discrimination? How do we know how many traffic stops, arrests and
police-caused deaths there are to begin with? Does "profiling" or "statistical
discrimination" make sense for the police, whether or not it's socially
desirable? How can the same group be simultaneously
over- and under- policed?
Statistical tools: test for differences in proportions; signal
detection theory; adjusting for systematically missing data; self-reinforcing equilibria
- Self-organizing inequalities and "structural" or "systematic"
inequalities: Models of how inequalities can perpetuate themselves
even when nobody is biased. Models of how inequalities can appear
even when nobody is biased. The Schelling model of spatial segregation as a
"paradigm". How relevant are Schelling-type models to actual, present-day
inequalities?
Statistical tools: Agent-based models; models of social
learning and game theory.
- Statistics and its history: The development of statistics
in the 19th and early 20th century was intimately tied to the eugenics
movement, which was deeply racist and even more deeply classist (but also
often anti-sexist). The last part of the course will cover this history, and
explain how many of the intellectual tools we have gone over to document, and
perhaps to help combat, inequality and discrimination were invented by people
who wanted to use them for quite different purposes. The twin learning
objectives for this section are for students to grasp something of this
history, and to grasp why the "genetic fallacy", of judging ideas by where
they come from (their "genesis") is, indeed, foolish and wrong.
Statistical tools: N/A.
- How do we know what we do about inequalities?
Social data-collection systems and institutions. Measurement again, and
measurement as a social process. Difficulties in reducing social reality to
data; the case of race in the US census as an example. What systematic data
collection leaves out.
Evaluation
There will be one problem set per week; each of these homeworks will involve
some combination of (very basic) statistical theory, (possibly less basic)
calculations using the theory we've gone over, and analysis of real data sets
using the methods discussed in class. There will also be readings for each
class session, and a short-answer quiz after each session will combine
questions based on lecture content with questions based on the readings.
There will be no exams.
My usual policy is to drop a certain number of homeworks, and a certain
number of lecture/reading questions, no questions asked. The number of
automatic drops isn't something I'll commit to here and now (similarly, I won't
make any promises here about the relative weight of homework
vs. lecture-related questions).
Textbook, Lecture Notes
There is, unfortunately, no one textbook which covers the material we'll go
over at the required level. You will, instead, get very detailed lecture notes
after each lecture. There will also be a lot of readings from various books
and articles. (I will not agree with every reading I assign.)
Teaching: Statistics of Inequality and Discrimination;
Corrupting the Young;
Enigmas of Chance;
Commit a Social Science
Posted at June 21, 2022 13:45 | permanent link