Value-Added Measures in Education

Last update: 21 Apr 2025 21:17
First version: 14 August 2015

This is an in-principle simple idea which rests on some very dubious premises: basically, that every teacher adds (or subtracts) some expected amount to all of their students' standardized test scores, and so one can figure out who the good teachers are by appropriate averaging.

The core math isn't so hard, so I'll sketch it. Start by supposing that everything which matters about what students in school learn is captured by standardized test scores. If we give them the same test at the beginning and end of a school year (or end of each year, etc.), we can then measure how much they've learned, or forgotten, by the change in their score. Call the change for student $ i $, $ Y_i $. Inspired by ideas from regression, we can hope to write this as a function of various other variables plus noise: \[ Y_i = a+m(C_i) + \epsilon_i \] where $a$ is the average gain over all students, the $ C_i $ are the "covariates" associated with student $i$, and the function $m$ is supposed to be the same across students. (Additive noise $ \epsilon_i $ is an additional assumption at this stage.) We expect that some of those covariates are attributes of the students, like their demographic characteristics, previous test scores, etc., call them collectively $ X_i $; but we might also think that some teachers help their students learn more than others. Say that student $i$ is taught by teacher $t(i)$, and that teachers have multiple students. People then make the assumption that \[ m(C_i) = f(X_i) + V_{t(i)} \] where $ V_j $ is the "value" of teacher $j$, which we assume they add or subtract to each of their students. (If we make this assumption, we can also assume that the $ V_j $ average out to 0, since if they averaged out to anything else we can just declare that part of the global mean $a$, without changing anything observable.) If you assume this sort of adding up, so that the final model is \[ Y_i = f(X_i) + V_{t(i)} + \epsilon_i \] and you assume that there is no correlation between the $ V_j $ and the attributes of the students $ X_i $, then you could estimate the $V$'s. Here'd be one way to do it:

Estimate $ f(X) $ by grouping together all the students with the same value of the attributes $ X $ and averaging them (by assuming no correlation, the $ V_{t(i)} $ terms just act like extra noise)
Calculate $ Y - f(X) $ for all students
Average $ Y - f(X) $ for all students of teacher $ j $ to get $ V_j $.

Actual estimation procedures are more statistically sophisticated (e.g., they often assume a Gaussian distribution for the $ V_j $, because reasons), or try to include not just teacher-effect terms but ones for the school, etc.

I have nothing, in principle, against such hierarchical statistical models, but they rest on a whole pile of assumptions, and, to the extent that they're wrong, the results range from the meaningless to the actively misleading. (Even allowing all of the additivity assumptions, etc., just suppose that the best teachers got assigned to the hardest students.) Before relying on them for public policy, hiring and firing decisions, etc., I'd very much want to see strong evidence that the assumptions held, or came close to holding, and that (e.g.) the estimated value of a student's teacher this year didn't predict their test scores in the past. This sort of model-checking seems to be conspicuously lacking in the literature, and so my not-quite-gut (chest?) reaction to these methods is that they are more an abuse of statistical reason than an application, but I really, really ought to read much more before having a firm opinion.

John Ewing, "Mathematical Intimidation: Driven by the Data", Notices of the American Mathematical Society 58 (2011): 665--673
Douglas N. Harris, "Value-Added Measures and the Future of Educational Accountability", Science 333 (2011): 826--827
Jesse Rothstein, "Revisiting the Impact of Teachers" [PDF preprint]
Gary Rubinstein, "Analyzing Released NYC Value-Added Data" (2012), parts 1, 2, 3, 4 5 and 6

Eva L. Baker, Paul E. Barton, Linda Darling-Hammond, Edward Haertel, Helen F. Ladd, Robert L. Linn, Diane Ravitch, Richard Rothstein, Richard J. Shavelson, and Lorrie A. Shepard, "Problems with the Use of Student Test Scores to Evaluate Teachers", Economic Policy Institute briefing paper 278 (2010) [PDF preprint]
Douglas N. Harris, Value-Added Measures in Education: What Every Educator Needs to Know