July 11, 2011

Of the Identification of Parameters

Attention conservation notice: 1700-word Q-and-A on technical points of statistical theory, prompted by a tenuous connection to recent academic controversies.

Q: What is a statistical parameter?

A: The fundamental objects in statistical modeling are probability distributions, or random processes. A parameter is a (measurable) function of a probability distribution; if you want to be old-fashioned, a "functional" of the distribution. For instance, the magnitudes of various causal influences ("effects") are parameters of causal models.

Think of these distributions as being like geometrical figures, and the parameters as various aspects of the figures: their volume, or area in some cross-section, or a certain linear dimension.

Q: So I'm guessing that whether a parameter is "identifiable" has something to do with whether it actually makes a difference to the distribution?

A: Yes, specifically whether it makes a difference to the observable part of the distribution.

Q: How can a probability distribution have observable and unobservable parts?

A: We specify models involving the variables we think are physically (biologically, psychologically, socially...) important. We don't get to measure all of these. Fixing what we can observe, each underlying distribution induces a distribution on the variables we do measure, the observables. In the analogy, we might only get to see the shadows cast by the geometric figures, or see what volume they displace when submerged in water.

Q: And how does this relate to identifiability?

A: Every (measurable) functional of the observable distribution is identifiable, because, in principle, what we can observe gives us enough information to work it out, or identify it. Every parameter of the underlying distribution which is not also a parameter of the observable distribution is unidentifiable, or unidentified.

In the analogy, if we know all the figures are boxes (i.e., rectangular prisms), but we only get to see their displacement, then volume is identifiable, but breadth, height and width are not. It is not a matter of not having enough data (not measuring the displacement precisely enough); even knowing box's volume exactly would not, by itself, tell us the height of the box.

Q: Are all identifiable parameters equally easy to estimate?

A: Not at all. For real-value parameters, the natural quantification of identifiability is the Fisher information, i.e., the expectation value of the second derivative of the log-likelihood with respect to the parameter. (In general the first derivative is zero.) But this seems like, precisely, a second-order issue after identifiability as such. Of course, if a parameters is unidentifiable, the derivative of the log-likelihood with respect to it is zero. But at this point we are leaving the clear path of identifiability for the thickets of estimation theory, and had better get back on track.

Q: So is identifiability solely a function of what's observable?

A: No, it depends on the combination of what we can measure and what models we're willing to entertain. If we observe more, then we can identify more. Thus if we can measure the volume of a box and its area in horizontal cross-section, then we can identify its height (but not its breadth or width). But likewise, if we can rule out some possibilities a priori, then we can identify more. If we can only measure volume, but know the box is a cube, then we can find height (and all its other dimensions). Of course we could also identify height from volume and the assumption that the proportions are 1:4:9, like the monolith in 2001.

Q: I get why expanding the observables lets you identify more parameters, but restricting the set of models to get identification seems to have "all the benefits of theft over honest toil". Do people really report such results with a straight face?

A: Identifying parameters by restricting the models we entertain is just as secure as those restrictions. If we have good actual reasons for the restrictions, then it would be silly not to take advantage of that. On the other hand, restricting models simply to get identifiability seems quite contrary to goals of science, since it is as important to admit what we do not yet know as to mark out what we do. At the very least, these are the sorts of hypotheses which need to be checked — and which must be checked with other or different data, since, by non-identifiability, the data in question are silent about them. (If you are going to assume all boxes are cubes, you should check that; but looking at their volumes won't tell you whether or not they are cubes. That data is indifferent between your sensible cubical hypothesis and the idle fancies of the monolith-maniac.)

Q: Couldn't we get around non-identifiability by Bayesian methods?

A: Expressing "soft" restrictions by a prior distribution about the unidentified parameters doesn't actually make those parameters identified. Suppose, for instance, that you have a prior distribution over the dimensions of boxes, p(B,H,W). The three parameters B,H,W completely characterize boxes, and in this are equivalent to the three parameters of volume V = BHW and the two proportions or ratios h = H/B and w = W/B. Thus the prior p(B,H,W) is equivalent to an unconditional prior on volume multiplied by a conditional prior on the proportions, p(V) p(h, w|V). Since the likelihood is a function of V alone, Bayesian updating will change the posterior distribution over volumes, but leave the (volume-conditional) distribution over proportions alone. This reasoning applies more generally: the prior can be divided into one part which refers to the identifiable parameters, and another which refers to the purely-identifiable parameters, and learning only updates the former. (If a Bayesian agent's prior prejudices happen to link the identified parameters to the unidentified ones, its convictions about the latter will change, but strictly through those prior prejudices.) The prior over the identifiable parameters can and should be tested; that over the unidentified ones cannot. (Not with that data, anyway.)

Q: If a parameter is unidentified, why bother with it at all? Why not just use Occam's Razor to shave them away?

A: That seems like an excess of positivism. (And I say this as someone who is sympathetic to positivism.) After all, which parameters are identifiable depends on what we can observe. It seems excessive to regard boxes as one-dimensional when we can only measure displaced volume, but then three-dimensional when we figure out how to use a ruler.

Q: Still, shouldn't there be a presumption against the existence or importance of unidentifiable parameters?

A: Not at all. It is very common in politics to simultaneously assert that the electorate leans towards certain parties in certain years; that people born in certain years have certain inclinations; and that people's political inclinations go through a certain sequence as they age. If we admit all three kinds of processes, we have to try to separate the effects on political opinions of people's age, the year they were born (their cohortperiod). The problem is that (e.g..) everyone who will be 45 years old in 2012 was born in 1967, so there is no way to separate the effects of being 45 years old in 2012 (age+period) from being born in 1967 (cohort). Any two of the effects of age, period and cohort are identifiable if we rule out the third a priori; if we allow that all three might matter, we are not able to identify their effects.

Q: I fail to see how this isn't actually an example in favor of my position — people think these are three different effects, but they're just wrong.

A: We can break this sort of impasse by specifying more detailed mechanisms (and hoping we get more data). For instance, suppose that people tend to become more politically conservative as they age, but that this is because they accumulate more property as they grow older. Then, with data on property holdings, we could separate the effects of cohort (were you born in 1967?) and age (are you 45?) from period (are you voting in 2012?), because aging influences political opinions not through a mysterious black box but through an observable mechanism. Or again, there are presumably mechanisms which lead to period effects, as in Hibbs's "Bread and Peace" election model. (Even if that model is wrong, it illustrates the kind of way a more elaborate theory can bring evidence to bear on otherwise-unidentifiable questions.) Of course these more elaborated, mechanistic theories need to be checked themselves, but that's science.

Q: So, what does all this have to do with the social-contagion debate?

A: What Andrew Thomas and I showed is that the distinction between the effects of homophily and those of social influence or contagion is unidentifiable in observational (as opposed to experimental) data. This, to my way of thinking, is a much more consequential problem for claims that such-and-such a trait is socially contagious than doubts about whether this-or-the-other significance test was really appropriate; it says that the observational data was all irrelevant to begin with. Instead, trying to attribute shares of the similarity between social-network neighbors to influence vs. pre-existing similarity is just like trying to say how much of the volume of a box is due to its height as opposed to its width — it's not really a question data could answer. It could be that we could use other evidence to show that most boxes are cubes, but that's a separate question. No amount of empirical evidence about the degree of similarity between network neighbors can tell us anything about whether the similarity comes from homophily or influence, just as no amount of measuring the volume of boxes can tell us about their proportions.

Q: Mightn't there be assumptions about how social influence works, or how social networks form, which let us estimate the relative strengths of social contagion and homophily?

A: There might be indeed; we hope to find them; and to find external checks on such assumptions. Discovering such cross-checks would be like finding ways of measuring the volume of a geometrical body and and its horizontal cross-section. Andrew and I talk about some possibilities towards the end of our paper, and we're working on them. So I'm sure are others.

Q: I find your ideas intriguing; how may I subscribe to your newsletter?

A: For more, see Partial Identification of Parametric Statistical Models; my review of Manski on identification for prediction and decision; and Manski's book itself.

Update, 23 December: It might seem odd that I talk about elaborating mechanisms to identify age, period and cohort effects, without mentioning Winship and Harding's "A Mechanism-Based Approach to the Identification of Age-Period-Cohort Models" (Sociological Methods and Research 36 (2008): 362--401, free PDF reprint). It is odd; but I didn't know about that paper until today.

Engimas of Chance; Networks; Dialogues

Posted at July 11, 2011 13:27 | permanent link

Three-Toed Sloth