Instrumental Variables

Last update: 07 Jul 2025 13:33
First version: 28 May 2021

\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \]

(I'll just talk graphical causal models here, because they make more sense to me than alternatives.)

This is a technique of causal inference. The basic logic is as follows. We want to estimate (or test, etc.) the effect of one observable variable $ X $ on another, $ Y $. That is, we want to find $ \Expect{Y|do(X)} $. Unfortunately, we are pretty sure that this effect is confounded; there is some third variable $ U $ which is a causal ancestor of both $ X $ and $ Y $. The "instrument" is a fourth, observable variable, say $ W $, which is (i) an ancestor of $ X $ and (ii) has no (unblocked) paths to $ Y $ except through $ X $. The no-unblocked-paths bit makes it easy for us to estimate both $ \Expect{Y|do(W)} $ and $ \Expect{X|do(W)} $. The trick is then to "back out" or "factor out" $ \Expect{Y|do(X)} $ from these two observationally-identified functions.

If everything's linear, this is pretty straightforward in principle. First, write out the "structural" equations showing how each variable depends on its parents: \[ \begin{eqnarray} X & \leftarrow & \alpha_1 W + \alpha_2 U + \eta\\ Y & \leftarrow & \gamma_1 X + \gamma_2 U + \epsilon \end{eqnarray} \] Substituting the first into the second, we get that \[ Y = \alpha_1 \gamma_1 W + (\alpha_2 \gamma_1 + \gamma_2) U + \gamma_1 \eta + \epsilon \] so the true coefficient of $ Y $ on $ W $ will be $ \alpha_1 \gamma_1 $. But the true coefficient of $ X $ on $ W $ will be $ \alpha_1 $. So just taking the ratio is one way to back out the coefficient we want, which is $ \gamma_1 $. Notice by the way that \[ \Cov{X, Y} = \gamma_1 \Var{X} + \alpha_2 \gamma_2 \Var{U} \] so just regressing $ Y $ on $ X $ will yield a coefficient which we might call \[ \beta = \gamma_1 + \alpha_2 \gamma_2 \frac{\Var{U}}{\Var{X}} ~, \] which can be arbitrarily different from $ \gamma_1 $. (Remember that the optimal linear coefficient for predicting any $ Z $ from any $ W $ is $ \frac{\Cov{Z,W}}{\Var{W}} $, whether or not the true regression function is linear, the direction [if any] of causal relation, etc.)

Alternately, we can do "two-stage least squares". This is where we regress $ Y $ not on $ X $, but on what we'd predict $ X $ to be based on $ W $, namely $ \alpha_1 W $. This, again, will plainly yield the coefficient $ \gamma_1 $.

The last two paragraphs assume everything is linear, but the basic logic doesn't. That logic is: we know $ W $ only affects $ Y $ by first affecting $ X $; we can identify how $ W $ affects $ X $ and how $ W $ affects $ Y $; this has to tell us how the impulse is transmitted through $ X $. What I am particularly interested in are nonparametric methods for instrumental-variable inference, which do not assume linearity.

There is a classic derivation here, which ends up expressing what we want as the solution to an integral equation. (I believe this formulation is due to Darolles et al. but I am writing from memory so I might be off.) Let's abbreviate $ \Expect{Y|do(X=x)} $ as $ f(x) $. The trick is to show that a certain integral transformation of $ f $ can be expressed in terms of observably-identified quantities.

Say that $p(x,w)$ is the joint pdf of $ X $ and $ W $. (Similarly for the related conditional and marginal pdfs, hopefully kept clear by their arguments.) This is an observationally identified quantity. We can thus define \[ t(x,z) \equiv \int{p_{XW}(x, w) p_{XW}(z, w) dw} = \int{p(x|w) p(z|w) p^2(w) dw} \] as a sort of kernel (in the machine-learning sense), expressing something like "how similar are the events $ X=x $ and $ Z=z $, as potential consequences of $ W $?" We can in fact make this into the kernel of an integral operator on functions of $ x $, \[ (T\psi)(x) = \int{t(z,x) \psi(z) dz} \] Now the claim is that \[ \Expect{\Expect{Y|W} p_{XW}(x, W)} = (Tg)(x) \] This helps us if the operator $ T $ has an inverse, $ T^{-1} $, because then \[ f(x) = \Expect{\Expect{Y|W} (T^{-1} p_{XW})(x, W)} \] (To see this, apply $ T $ to both sides of the last equation above, and remember that $ T $ is by construction a linear operator.)

To verify the claim, start by noticing that we can write \[ Y = f(X) + U + \epsilon \] where without loss of generality $ \Expect{U} = 0 $, but $ \Expect{U|X} \neq 0 $. On the other hand, $ \Expect{U|W} = 0 $, because (in the graphical model we're assuming) $ U $ and $ W $ are both exogeneous, hence independent. So \[ \begin{eqnarray} \Expect{Y|W=w} & = & \Expect{g(X) + U+\epsilon|W=w}\\ & = & \Expect{f(X)|W=w}\\ & = & \int{p(x|w) f(x) dx}\\ & = & \frac{\int{p(x, w) f(x) dx}}{p(w)} \end{eqnarray} \] Thus \[ \begin{eqnarray} \Expect{\Expect{Y|W} p(x,W)} & = & \int{p(w) \Expect{Y|W=w} p(x,w) dw}\\ & = & \int{p(w) p(x,w) \frac{\int{p(z,w) f(z) dz}}{p(w)} dw}\\ & = & \int{\int{f(z) p(z,w) p(x,w) dw dx}}\\ & = & \int{dz g(z) \int{p(z,w) p(x,w) dw}}\\ & = & \int{dz g(z) t(x,z)} \end{eqnarray} \] as desired.

This is one of the places where I follow the math and can use it, but there is something missing from my grasp of it, because it would never occur to me on my own to go through this set of manipulations. In fact I have to look at my notes to remember it right now. (In fact, when I wrote the section of ADAfaEPoV about instrumental variables and integral equations, I worked from memory / trying to derive everything from first principles, and came up with a much simpler approach --- which was quite wrong.) So one thing I would like to do is find some story which makes all this natural. If nothing else, it would help me to teach it!

S. Darolles, Y. Fan, J. P. Florens and E. Renault, "Nonparametric Instrumental Regression", Econometrica 79 (2011): 1541--1565 [Preprint version, 2002. While I haven't done a line by line comparison between the preprint and the published version, remarkably little seems to have changed over those 9 years. There is a story there and I'd be curious to learn it.]
Peter Hall, Joel L. Horowitz, "Nonparametric methods for inference in the presence of instrumental variables", Annals of Statistics 33 (2005): 2904--2929, arxiv:math/0603130
Whitney K. Newey and James L. Powell, "Instrumental Variable Estimation of Nonparametric Models", Econometrica 71 (2003): 1565--1578
Rahul Singh, Maneesh Sahani, Arthur Gretton, "Kernel Instrumental Variable Regression", NeurIPS 2019, arxiv:1906.00232
Liangjun Su and Aman Ullah, "Local polynomial estimation of nonparametric simultaneous equations models", Journal of Econometrics 144 (2008): 193--218

Stephen G. Hall, P. A. V. B. Swamy and George S. Tavlas, "On the Interpretation of Instrumental Variables in the Presence of Specification Errors", working paper 14/19, Department of Economics, University of Leicester[PDF preprint. I actually find myself in the odd position of thinking that while this is technically correct, it's a bit unfair to instrumental variables. Some of the issues here seem like they could be sensibly resolved using Pearl's graphical definition of IVs, perhaps in combination with nonparametric regressions.]
Jonathan Mellon, "Rain, Rain, Go Away: 176 Potential Exclusion-Restriction Violations for Studies Using Weather as an Instrumental Variable", ssrn/3715610 [This well-written paper makes the interesting point that using the same instrument $ W $ to study the effect of many different causes $ X $ weakens the credibility of all the studies, because each such $ X ^{\prime} $ provides another pathway by which $ W $ could be an ancestor of $ Y $, without going through the $ X $ of interest.]
Judea Pearl, "On a Class of Bias-Amplifying Covariates that Endanger Effect Estimates", UAI2010, arxiv:1203.3503
Tom Pepinsky, "OMFG Exogenous Variation! Or, Can You Find Good Nails When You Find an Indonesian Politics Hammer?" [Admittedly, less formal in presentation than many of the rest of these links]
Alwyn Young, "Consistency without Inference: Instrumental Variables in Practical Application" [2017 preprint, LSE. To summarize very roughly, this is an argument that in published IV regressions, the problems due to a handful of data points having very high leverage/influence, and non-IID noise, are much more important than the bias reduction from using IV rather than OLS. PDF via Dr. Young.]

CRS, Advanced Data Analysis from an Elementary Point of View [The discussion of instrumental variables is spread out over the chapters on identification and estimation of causal effects. Right now (May 2021) there are some unfortunate errors there about the nonlinear case, which I need to fix.]

Isaiah Andrews, James H. Stock, and Liyang Sun, "Weak Instruments in Instrumental Variables Regression: Theory and Practice", Annual Review of Economics 11 (2019): 727--753
Andrii Babii, "Honest Confidence Sets in Nonparametric IV Regression and Other Ill-Posed Models", Econometric Theory 36 2020: 658--706, arxiv:1611.03015
Alexandre Belloni, Victor Chernozhukov, Christian Hansen, "Inference for High-Dimensional Sparse Econometric Models", arxiv:1201.0220
Christine Blandhol, John Bonney, Magne Mogstad & Alexander Torgovitsky, "When is TSLS Actually LATE?", NBER Working Paper 29709 (2022)
Christoph Breunig, Jan Johannes, "Adaptive estimation of functionals in nonparametric instrumental regression", arxiv:1109.0961
Xiaohong Chen, Markus Reiss, "On rate optimality for ill-posed inverse problems in econometrics", arxiv:0709.2003
Ben Deaner, "Nonparametric Instrumental Variables Estimation Under Misspecification", arxiv:1901.01241
Oliver Dukes, David B. Richardson, Zachary Shahn, James M. Robins, Eric J. Tchetgen Tchetgen, "Using negative controls to identify causal effects with invalid instrumental variables", arxiv:2204.04119
Fabian Dunker, Jean-Pierre Florens, Thorsten Hohage, Jan Johannes, Enno Mammen, "Iterative Estimation of Solutions to Noisy Nonlinear Operator Equations in Nonparametric Instrumental Regression", Journal of Econometrics 178 (2014): 444--455, arxiv:1307.6701
Alex Dytso, Martina Cardone, "A General Derivative Identity for the Conditional Expectation with Focus on the Exponential Family", arxiv:2105.05106 [Actually, I'm not sure this applies to IV, but from the abstract it might]
Markus Frölich, "A Note on Parametric and Nonparametric Regression in the Presence of Endogenous Control Variables", University of St. Gallen Economics Discussion Paper No. 2006-11
David Gold, Johannes Lederer, Jing Tao, "Inference for high-dimensional instrumental variables regression", arxiv:1708.05499
Florian Gunsilius
- "A path-sampling method to partially identify causal effects in instrumental variable models", arxiv:1910.09502
- "Non-testability of instrument validity under continuous endogenous variables", arxiv:1806.09517
Joel L. Horowitz, "Applied Nonparametric Instrumental Variables Estimation", Econometrica 79 (2011): 347--394
Guido W. Imbens, "Instrumental Variables: An Econometrician's Perspective", Statistical Science 29 (2014): 323--358, arxiv:1410.0163
Shoya Ishimaru, "Empirical Decomposition of the IV-OLS Gap with Heterogeneous and Nonlinear Effects", arxiv:2101.04346
Edward H. Kennedy, Jacqueline A. Mauro, Michael J. Daniels, Natalie Burns, Dylan S. Small, "Handling Missing Data in Instrumental Variable Methods for Causal Inference", Annual Review of Statistics and Its Application 6 (2019): 125--148
Chunxiao Li, Cynthia Rudin, and Tyler H. McCormick, "Rethinking Nonlinear Instrumental Variable Models through Prediction Validity", Journal of Machine Learning Research 23 (2022): 96
Magne Mogstad and Alexander Torgovitsky, "Identification and Extrapolation of Causal Effects with Instrumental Variables", Annual Review of Economics 10 (2018): 577--613
Whitney K. Newey, "Nonparametric Instrumental Variables Estimation", American Economic Review 103 (2013): 550--556
Whitney K. Newey, James L. Powell, Francis Vella, "Nonparametric Estimation of Triangular Simultaneous Equations Models", Econometrica 67 (1999): 565--603
Aviv Nevo, Adam M. Rosen, "Identification with Imperfect Instruments", The Review of Economics and Statistics 94 (2012): 659--671
James L. Powell, "Identification and Asymptotic Approximations: Three Examples of Progress in Econometric Theory", Journal of Economic Perspectives 31 (2017): 107--124
Allison J. Sovey and Donald P. Green, "Instrumental Variables Estimation in Political Science: A Readers' Guide", American Journal of Political Science 55 (2011): 188--200 [PDF preprint]
Vasilis Syrgkanis, Victor Lei, Miruna Oprescu, Maggie Hei, Keith Battocchi, Greg Lewis, "Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments", arxiv:1905.10176
Wing Hung Wong, "A calculus for causal inference with instrumental variables", arxiv:2104.10633