## Instrumental Variables

*28 May 2021 10:42*

\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \]

(I'll just talk graphical causal models here, because they make more sense to me than alternatives.)

This is a technique of causal inference.
The basic logic is as follows. We want to estimate (or test, etc.) the effect
of one observable variable \( X \) on another, \( Y \). That is, we want to
find \( \Expect{Y|do(X)} \). Unfortunately, we are pretty sure that this
effect is confounded; there is some third variable \( U \) which is a causal
ancestor of both \( X \) and \( Y \). The "instrument" is a fourth, observable
variable, say \( W \), which is (i) an ancestor of \( X \) and (ii) has no
(unblocked) paths to \( Y \) *except* through \( X \). The
no-unblocked-paths bit makes it easy for us to estimate both \(
\Expect{Y|do(W)} \) and \( \Expect{X|do(W)} \). The trick is then to "back
out" or "factor out" \( \Expect{Y|do(X)} \) from these two
observationally-identified functions.

If everything's linear, this is pretty straightforward in principle.
First, write out the "structural" equations showing how each variable depends
on its parents:
\[
\begin{eqnarray}
X & \leftarrow & \alpha_1 W + \alpha_2 U + \eta\\
Y & \leftarrow & \gamma_1 X + \gamma_2 U + \epsilon
\end{eqnarray}
\]
Substituting the first into the second, we get that
\[
Y = \alpha_1 \gamma_1 W + (\alpha_2 \gamma_1 + \gamma_2) U + \gamma_1 \eta + \epsilon
\]
so the true coefficient of \( Y \) on \( W \) will be \( \alpha_1 \gamma_1 \).
But the true coefficient of \( X \) on \( W \) will be \( \alpha_1 \). So just
taking the ratio is one way to back out the coefficient we *want*, which
is \( \gamma_1 \). Notice by the way that
\[
\Cov{X, Y} = \gamma_1 \Var{X} + \alpha_2 \gamma_2 \Var{U}
\]
so just regressing \( Y \) on \( X \) will yield a coefficient which we might
call
\[
\beta = \gamma_1 + \alpha_2 \gamma_2 \frac{\Var{U}}{\Var{X}} ~,
\]
which
can be arbitrarily different from \( \gamma_1 \). (Remember that the optimal
linear coefficient for predicting *any* \( Z \) from *any* \( W
\) is \( \frac{\Cov{Z,W}}{\Var{W}} \), whether or not the true regression
function is linear, the direction [if any] of causal relation, etc.)

Alternately, we can do "two-stage least squares". This is where we regress \(
Y \) not on \( X \), but on what we'd *predict* \( X \) to be based on
\( W \), namely \( \alpha_1 W \). This, again, will plainly yield the
coefficient \( \gamma_1 \).

The last two paragraphs assume everything is linear, but the basic logic
doesn't. That logic is: we know \( W \) only affects \( Y \) by first
affecting \( X \); we can identify how \( W \) affects \( X \) and how \( W \)
affects \( Y \); this has to tell us how the impulse is transmitted through \(
X \). What I am particularly interested in are nonparametric methods for
instrumental-variable inference, which do *not* assume linearity.

There is a classic derivation here, which ends up expressing what we want as the solution to an integral equation. (I believe this formulation is due to Darolles et al. but I am writing from memory so I might be off.) Let's abbreviate \( \Expect{Y|do(X=x)} \) as \( f(x) \). The trick is to show that a certain integral transformation of \( f \) can be expressed in terms of observably-identified quantities.

Say that $p(x,w)$ is the joint pdf of \( X \) and \( W \). (Similarly for
the related conditional and marginal pdfs, hopefully kept clear by their
arguments.) This is an observationally identified quantity. We can thus
define
\[ t(x,z) \equiv \int{p_{XW}(x, w) p_{XW}(z, w) dw} = \int{p(x|w) p(z|w)
p^2(w) dw}
\]
as a sort of kernel (in the machine-learning sense), expressing something like
"how similar are the events \( X=x \) and \( Z=z \), as potential consequences
of \( W \)?" We can in fact make this into the kernel of an integral operator
on functions of \( x \),
\[
(T\psi)(x) = \int{t(z,x) \psi(z) dz}
\]
Now the claim is that
\[
\Expect{\Expect{Y|W} p_{XW}(x, W)} = (Tg)(x)
\]
This helps us *if* the operator \( T \) has an inverse, \( T^{-1} \), because
then
\[
f(x) = \Expect{\Expect{Y|W} (T^{-1} p_{XW})(x, W)}
\]
(To see this, apply \( T \) to both sides of the last equation above, and
remember that \( T \) is by construction a linear operator.)

To verify the claim, start by noticing that we can write \[ Y = f(X) + U + \epsilon \] where without loss of generality \( \Expect{U} = 0 \), but \( \Expect{U|X} \neq 0 \). On the other hand, \( \Expect{U|W} = 0 \), because (in the graphical model we're assuming) \( U \) and \( W \) are both exogeneous, hence independent. So \[ \begin{eqnarray} \Expect{Y|W=w} & = & \Expect{g(X) + U+\epsilon|W=w}\\ & = & \Expect{f(X)|W=w}\\ & = & \int{p(x|w) f(x) dx}\\ & = & \frac{\int{p(x, w) f(x) dx}}{p(w)} \end{eqnarray} \] Thus \[ \begin{eqnarray} \Expect{\Expect{Y|W} p(x,W)} & = & \int{p(w) \Expect{Y|W=w} p(x,w) dw}\\ & = & \int{p(w) p(x,w) \frac{\int{p(z,w) f(z) dz}}{p(w)} dw}\\ & = & \int{\int{f(z) p(z,w) p(x,w) dw dx}}\\ & = & \int{dz g(z) \int{p(z,w) p(x,w) dw}}\\ & = & \int{dz g(z) t(x,z)} \end{eqnarray} \] as desired.

This is one of the places where I *follow* the math and can use it, but
there is something missing from my *grasp* of it, because it would never
occur to me on my own to go through *this* set of manipulations. In
fact I have to look at my notes to remember it right now. (In fact,
when I wrote the section of ADAfaEPoV about instrumental variables and integral
equations, I worked from memory / trying to derive everything from first
principles, and came up with a much simpler approach --- which was quite wrong.)
So one thing I would like to do is find some story which makes all this natural. If nothing else,
it would help me to teach it!

- Recommended, close ups about nonparametrics:
- S. Darolles, Y. Fan, J. P. Florens and E. Renault, "Nonparametric Instrumental Regression", Econometrica
**79**(2011): 1541--1565 [Preprint version, 2002. While I haven't done a line by line comparison between the preprint and the published version, remarkably little seems to have changed over those 9 years. There is a story there and I'd be curious to learn it.] - Peter Hall, Joel L. Horowitz, "Nonparametric methods for inference in the presence of instrumental variables", Annals of Statistics
**33**(2005): 2904--2929, arxiv:math/0603130 - Whitney K. Newey and James L. Powell, "Instrumental Variable Estimation of Nonparametric Models", Econometrica
**71**(2003): 1565--1578 - Rahul Singh, Maneesh Sahani, Arthur Gretton, "Kernel Instrumental Variable Regression", NeurIPS 2019, arxiv:1906.00232

- Recommended, close ups about methodology:
- Stephen G. Hall, P. A. V. B. Swamy and George S. Tavlas, "On the Interpretation of Instrumental Variables in the Presence of Specification Errors", working paper 14/19, Department of Economics, University of Leicester[PDF preprint. I actually find myself in the odd position of thinking that while this is technically correct, it's a bit unfair to instrumental variables. Some of the issues here seem like they could be sensibly resolved using Pearl's graphical definition of IVs, perhaps in combination with nonparametric regressions.]
- Jonathan Mellon, "Rain, Rain, Go Away: 176 Potential Exclusion-Restriction Violations for Studies Using Weather as an Instrumental Variable", ssrn/3715610 [This well-written paper makes the interesting point that using the same instrument \( W \) to study the effect of
*many*different causes \( X \) weakens the credibility of all the studies, because each such \( X ^{\prime} \) provides*another*pathway by which \( W \) could be an ancestor of \( Y \),*without*going through the \( X \) of interest.] - Judea Pearl, "On a Class of Bias-Amplifying Covariates that Endanger Effect Estimates", UAI2010, arxiv:1203.3503
- Tom Pepinsky, "OMFG Exogenous Variation! Or, Can You Find Good Nails When You Find an Indonesian Politics Hammer?" [Admittedly, less formal in presentation than many of the rest of these links]
- Alwyn Young, "Consistency without Inference: Instrumental Variables
in Practical Application" [2017 preprint, LSE. To summarize very roughly, this
is an argument that in
*published*IV regressions, the problems due to a handful of data points having very high leverage/influence, and non-IID noise, are much more important than the bias reduction from using IV rather than OLS. PDF via Dr. Young.]

- Modesty forbids me to recommend:
- CRS, Advanced Data Analysis from an Elementary Point of View [The discussion of instrumental variables is spread out over the chapters on identification and estimation of causal effects. Right now (May 2021) there are some unfortunate errors there about the nonlinear case, which I need to fix.]

- To read:
- Isaiah Andrews, James H. Stock, and Liyang Sun, "Weak Instruments in Instrumental Variables Regression: Theory and Practice",
Annual Review of Economics
**11**(2019): 727--753 - Andrii Babii, "Honest Confidence Sets in Nonparametric IV Regression and Other Ill-Posed Models", Econometric Theory
**36**2020: 658--706, arxiv:1611.03015 - Christoph Breunig, Jan Johannes, "Adaptive estimation of functionals in nonparametric instrumental regression", arxiv:1109.0961
- Xiaohong Chen, Markus Reiss, "On rate optimality for ill-posed inverse problems in econometrics", arxiv:0709.2003
- Ben Deaner, "Nonparametric Instrumental Variables Estimation Under Misspecification", arxiv:1901.01241
- Fabian Dunker, Jean-Pierre Florens, Thorsten Hohage, Jan Johannes, Enno Mammen, "Iterative Estimation of Solutions to Noisy Nonlinear Operator Equations in Nonparametric Instrumental Regression",
Journal of Econometrics
**178**(2014): 444--455, arxiv:1307.6701 - Alex Dytso, Martina Cardone, "A General Derivative Identity for the Conditional Expectation with Focus on the Exponential Family", arxiv:2105.05106 [Actually, I'm not sure this applies to IV, but from the abstract it might]
- Markus Frölich, "A Note on Parametric and Nonparametric Regression in the Presence of Endogenous Control Variables", University of St. Gallen Economics Discussion Paper No. 2006-11
- David Gold, Johannes Lederer, Jing Tao, "Inference for high-dimensional instrumental variables regression", arxiv:1708.05499
- Florian Gunsilius
- "A path-sampling method to partially identify causal effects in instrumental variable models", arxiv:1910.09502
- "Non-testability of instrument validity under continuous endogenous variables", arxiv:1806.09517

- Joel L. Horowitz, "Applied Nonparametric Instrumental Variables Estimation", Econometrica
**79**(2011): 347--394 - Shoya Ishimaru, "Empirical Decomposition of the IV-OLS Gap with Heterogeneous and Nonlinear Effects", arxiv:2101.04346
- Edward H. Kennedy, Jacqueline A. Mauro, Michael J. Daniels, Natalie Burns, Dylan S. Small, "Handling Missing Data in Instrumental Variable Methods for Causal Inference", Annual Review of Statistics and Its Application
**6**(2019): 125--148 - Magne Mogstad and Alexander Torgovitsky, "Identification and Extrapolation of Causal Effects with Instrumental Variables",
Annual Review of Economics
**10**(2018): 577--613 - Aviv Nevo, Adam M. Rosen, "Identification with Imperfect Instruments", The Review of Economics and Statistics
**94**(2012): 659--671 - Allison J. Sovey and Donald P. Green, "Instrumental Variables Estimation
in Political Science: A Readers' Guide", American Journal of Political Science
**55**(2011): 188--200 [PDF preprint] - Wing Hung Wong, "A calculus for causal inference with instrumental variables", arxiv:2104.10633