March 26, 2021

Regression, Thermostats, Causal Inference: Some Finger Exercises

\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \]

Attention conservation notice: An 800-word, literally academic exercise about an issue in causal inference. Its point is familiar to those in the field, and deservedly obscure to everyone else. Also, too cutesy and pleased with itself by at least half.
I wrote the first version of this for the class where we do causal inference long enough ago that I actually don't remember when --- 2011? 2013? (In retrospect I had probably read Milton Friedman's thermostat analogy but didn't consciously remember it at the time.) Posted now because I've gone over the point with two different people in the last month.

The temperature outside \( (X) \) is a direct cause of the temperature inside my house \( (Y) \). But every morning I measure the temperature, and adjust my heating/cooling system \( (C) \) to try to maintain a constant temperature \( y_0 \). For simplicity, we'll say that all the relations are linear, so \[ \begin{eqnarray} X & \sim & \mathrm{whatever}\\ C|X & \leftarrow & a+bX + \epsilon_1\\ Y|X,C & \leftarrow & X-C + \epsilon_2 \end{eqnarray} \] where \( \epsilon_1 \) and \( \epsilon_2 \) are exogenous, independent, mean-zero noise terms. We can think of \( \epsilon_1 \) as a combination of my sloppiness in measuring the temperature and in tuning the heating/cooling system; \( \epsilon_2 \) is sheer fluctuations.

Exercise: Draw the DAG.

To ensure that the expectation of \( Y \) remains at \( y_0 \), no matter the external temperature, we need \[ \begin{eqnarray} y_0 & = & \Expect{Y|X=x}\\ & = & \Expect{X - a + bX + \epsilon_1 + \epsilon_2|X=x}\\ & = & (1-b)x -a \end{eqnarray} \] Since this must hold for all \( x \), we need \( b=1, a=-y_0 \).

What follows from this?

Exercise: Build your character by doing the algebra.

So, as long as control isn't perfect, the naive statistician (or experienced econometrician...) who just does a kitchen-sink regression will actually get the relationship between \( Y \), \( X \) and \( C \) right, concluding that external temperature and the climate control have equal and opposite effects on internal temperature. Sure, there will be sampling noise, but with enough data they'll approach the truth.

Exercise: What do you get if you regress \( C \) on \( X \) and \( Y \)?

I have implicitly assumed that I know the exact linear relationship between \( X \) and \( Y \), since I used that in deriving how the control signal should respond to \( X \). If I mis-calibrate the control signal, say if \( C = -y_0 +0.999X + \epsilon_1 \), then there is not an exact cancellation and everything works as usual.

Exercise: Suppose that instead of measuring the external temperature \( X \) directly, I can only measure yesterday's temperature \( U \), again with noise. Supposing there is a linear relationship between \( U \) and \( X \), replicate this analysis. Does it matter if \( U \) is the parent of \( X \) or vice versa?

Exercise: "Feedback is a mechanism for persistently violating faithfulness"; discuss.

Exercise: "The greatest skill seems like clumsiness" (Laozi); discuss.

Engimas of Chance; Constant Conjunction Necessary Connexion

Posted at March 26, 2021 09:08 | permanent link

Three-Toed Sloth