Course Announcement: "Statistics of Inequality and Discrimination" (36-313)
Attention
conservation notice: Advertisement for a course you won't take, at a university you don't attend.  Even if the subject is of some tangential interest, why not check back in a few months to see if the teacher has managed to get himself canceled, and/or produced anything worthwhile?
In the fall I will, again, be teaching something new:
36-313, Statistics of Inequality and Discrimination
9 units
Time and place: Tuesdays and Thursdays, 1:25 -- 2:45 pm, location TBA
Description: Many social questions about inequality, injustice and unfairness are, in part, questions about evidence, data, and statistics. This class lays out the statistical methods which let us answer questions like Does this employer discriminate against members of that group?, Is this standardized test biased against that group?, Is this decision-making algorithm biased, and what does that even mean? and Did this policy which was supposed to reduce this inequality actually help? We will also look at inequality within groups, and at different ideas about how to explain inequalities between groups. The class will interweave discussion of concrete social issues with the relevant statistical concepts.
Prerequisites: 36-202 ("Methods for Statistics and Data Science") (and so also 36-200, "Reasoning with Data")
This is a class I've been wanting to teach for some years now, and I'm very
happy to finally get the chance to feel my well-intentioned but
laughably inadequate efforts crushed beneath massive and justified opprobrium
evoked from all sides bore and perplex some undergrads who
thought they were going to learn something interesting in stats. class for a
change try it out.
Tentative topic schedule
About one week per.
-  "Recall": Reminders about probability and statistics: populations, distribution
within a population, distribution functions, joint and conditional probability;
samples and inference from samples.  Reminders (?) about social concepts:
ascriptive and attained social categories; status, class, race, caste, sex,
gender, income, wealth.
 - Income and wealth inequality:  What does the distribution of income
  and wealth look like within a population?  How do we describe population
  distributions, especially when there is an extreme range of values (a big
  difference between the rich and poor)?  Where does the idea of "the 1%"
  wealthy elite come from? How has income inequality changed over recent
  decades?
Statistical tools: measures of central tendency (median, mode, mean),
    of dispersion, and of skew; the concept of "heavy tails" (the largest
    values being orders of magnitude larger than typical values); log-normal
    and power law distributions; fitting distributions to existing data;
    positive feedback, multiplicative growth and "cumulative advantage" processes.
 -  Income disparities: How does income (and wealth) differ across groups?  How do we compare average or typical values?  How do we compare entire
distributions?  How have income inequalities by race and sex changed over recent decades?
Statistical tools: permutation tests for differences in mean (and other
    measures of the average); two-sample tests for differences in distribution;
    inverting tests to find the range of differences compatible with the data;
    the "analysis of variance" method of comparing populations;
    the "relative distribution" method of comparing populations
 -  Detecting discrimination in hiring: Do employers discriminate in
  hiring (or schools in admission, etc.)?  How can we tell? When are
  differences in hiring rates evidence for discrimination? How do statistical
  perspectives on this question line up with legal criteria for "disparate
  treatment" and "disparate impact"?
  
Statistical tools: tests for differences in proportions or
     probabilities; adjusting for applicant characteristics; deciding what to
     adjust for
 - Detecting discrimination in policing: Do the police discriminate
   against members of particular racial groups?  When do differences in
   traffic stops, arrests, or police-caused deaths indicate discrimination?
   Does profiling or "statistical discrimination" make sense for the police?
   Can groups be simultaneously be over- and under- policed?
   
 Statistical tools: test for differences in proportions; signal
     detection theory; adjusting for systematically missing data; self-reinforcing equilibria
 -  Algorithmic bias: Can predictive or decision-making algorithms be
  biased?  What would that even mean?  Do algorithms trained on existing data
  necessarily inherit the biases of the world?  What notions of fairness or
  unbiased can we actually implement for algorithms? What trade-offs are
  involved in enforcing different notions of fairness?  Are "risk-prediction
  instruments" fair?
  
Statistical tools: Methods for evaluating the accuracy of predictions;
    differential error rates across groups; decision trees; optimization and multi-objective
    optimization.
 -  Standardized tests: Are standardized tests for school
   admission biased against certain racial groups?  What does it mean to
   measure qualifications, and how would we know whether tests really are
   measuring qualifications?  What does it mean for a measurement to be biased?
   When do differences across groups indicate biases?  (Disparate impact
   again.)  Why correlating outcomes with test scores among admitted
   students may not make sense.  The "compared to what?" question.
   
Statistical tools: Predictive validity; differential
   prediction; "conditioning on a collider"
 -  Intelligence tests: Are intelligence tests biased? How do
   we measure latent attributes?  How do we know the latent attributes even
   exist?  What would it mean for there to be such a thing as "general
   intelligence", that could be measured by tests?  What, if anything, do
   intelligence tests measure?  What rising intelligence test results (the
   Flynn Effect) tell us?
   
Statistical tools: correlation between test scores; factor
     models as an explanation of correlations; estimating factor values from
     tests; measurement invariance; alternatives to factor models
 -  Implicit bias: Do "implicit association tests" measure
   unconscious biases?  Again on measurement, as well as what it would mean for
   a bias to be "implicit" or "unconscious".  What, if anything, do implicit
   association tests measure?
   
Statistical tools: Approaches to "construct validity".
 -  Interventions on implicit bias: Can trainings or other
  interventions reduce implicit bias?  How do we investigate the effectiveness
  of interventions?  How do we design a good study an intervention?  How do we
  pool information from multiple studies.  Do implicit bias interventions
  change behavior?  Does having a chief diversity officer increase faculty
  diversity?
  
 Statistical tools: Experimental design: selecting measurements
    of outcomes, and the importance of randomized studies; meta-analytic
    methods for combining information.
 -  Explaining, or explaining away, inequality: To what
  extent can differences in outcomes between groups be explained by differences
  in their attributes (e.g., explaining differences in incomes by differences
  in marketable skills)?  How should we go about making such adjustments?  Is
  it appropriate to treat discrimination as the "residual" left unexplained?
  When does adjusting or controlling for a variable contribute to an
  explanation, and when is it "explaining away" discrimination?  What would it
  mean to control for race, sex or gender?
  
 Statistical tools: Observational causal inference; using regression to
    "control for" multiple variables at once; using graphical models to
    represent causal relations between variables; how to use graphical models
    to decide what should and what should not be controlled for; the causal
    model implicit in decisions about controls.
 -  Self-organizing inequalities and "structural" or "systematic"
  inequalities: Models of how inequalities can perpetuate themselves
  even when nobody is biased.  Models of how inequalities can appear
  even when nobody is biased.  The Schelling model of spatial segregation as a
  "paradigm".  How relevant are Schelling-type models to actual, present-day
  inequalities?
  
 Statistical tools: Agent-based models; models of social
  learning and game theory.
 -  Statistics and its history: The development of statistics
  in the 19th and early 20th century was intimately tied to the eugenics
  movement, which was deeply racist and even more deeply classist, but also
  often anti-sexist.  The last part of the course will cover this history, and
  explain how many of the intellectual tools we have gone over to document, and
  perhaps to help combat, inequality and discrimination were invented by people
  who wanted to use them for quite different purposes.  The twin learning
  objectives for this section are for students to grasp something of this
  history, and to grasp why the "genetic fallacy", of judging ideas by where
  they come from (their "genesis") is, indeed, foolish and wrong.
  
 Statistical tools: N/A.
 
Evaluation
There will be one problem set per week; each of these homeworks will involve
some combination of (very basic) statistical theory, (possibly less basic)
calculations using the theory we've gone over, and analysis of real data sets
using the methods discussed in class.  There will also be readings for each
class session, and a short-answer quiz after each session will combine
questions based on lecture content with questions based on the readings.
There will not be any exams.
My usual policy is to drop a certain number of homeworks, and a certain
number of lecture/reading questions, no questions asked.  The number of
automatic drops isn't something I'll commit to here and now (similarly, I won't
make any promises here about the relative weight of homework
vs. lecture-related questions).
Textbook, Lecture Notes
There is, unfortunately, no one textbook which covers the material we'll go
over at the required level.  You will, instead, get very detailed lecture notes
after each lecture.  There will also be a lot of readings from various books
and articles.  (I will not agree with every reading I assign.)
Teaching: Statistics of Inequality and Discrimination;
Corrupting the Young;
Enigmas of Chance;
Commit a Social Science
 
Posted at June 03, 2021 23:59 | permanent link