Attention conservation notice: 1700-word Q-and-A on technical points of statistical theory, prompted by a tenuous connection to recent academic controversies.

**Q**: What is a statistical parameter?

**A**: The fundamental objects in statistical modeling are
probability distributions, or random processes. A

Think of these distributions as being like geometrical figures, and the parameters as various aspects of the figures: their volume, or area in some cross-section, or a certain linear dimension.

**Q**: So I'm guessing that whether a parameter is
"identifiable" has something to do with whether it actually makes a difference
to the distribution?

**A**: Yes, specifically whether it makes a difference
to the *observable* part of the distribution.

**Q**: How can a probability distribution have
observable and unobservable parts?

**A**: We specify models involving the variables we think are
physically (biologically, psychologically, socially...) important. We don't
get to measure all of these. Fixing what we can observe, each underlying
distribution induces a distribution on the variables we do measure, the
observables. In the analogy, we might only get to see the shadows cast by the
geometric figures, or see what volume they displace when submerged in water.

**Q**: And how does this relate to identifiability?

**A**: Every (measurable) functional of the observable
distribution is *identify* it.
Every parameter of the underlying distribution which is not also a parameter of
the observable distribution is

In the analogy, if we know all the figures are boxes (i.e., rectangular prisms), but we only get to see their displacement, then volume is identifiable, but breadth, height and width are not. It is not a matter of not having enough data (not measuring the displacement precisely enough); even knowing box's volume exactly would not, by itself, tell us the height of the box.

**Q**: Are all identifiable parameters equally easy to
estimate?

**A**: Not at all. For real-value parameters, the natural
quantification of identifiability is
the Fisher
information, i.e., the expectation value of the second derivative of the
log-likelihood with respect to the parameter. (In general the first derivative
is zero.) But this seems like, precisely, a second-order issue after
identifiability as such. Of course, if a parameters is unidentifiable, the
derivative of the log-likelihood with respect to it is zero. But at this point
we are leaving the clear path of identifiability for the thickets
of estimation theory,
and had better get back on track.

**Q**: So is identifiability solely a function of what's
observable?

**A**: No, it depends on the combination of what we can measure
and what models we're willing to entertain. If we observe more, then we can
identify more. Thus if we can measure the volume of a box and its area in
horizontal cross-section, then we can identify its height (but not its breadth
or width). But likewise, if we can rule out some possibilities *a
priori*, then we can identify more. If we can only measure volume, but
know the box is a cube, then we can find height (and all its other dimensions).
Of course we could also identify height from volume and the assumption that the
proportions are 1:4:9, like the monolith in 2001.

**Q**: I get why expanding the observables lets you identify
more parameters, but restricting the set of models to get identification seems
to have "all the benefits of theft over honest toil". Do people really report
such results with a straight face?

**A**: Identifying parameters by restricting the models we
entertain is just as secure as those restrictions. If we have good actual
reasons for the restrictions, then it would be silly not to take advantage of
that. On the other hand, restricting models *simply* to get
identifiability seems quite contrary to goals of science, since it is as
important to admit what we do not *yet* know as to mark out what we do.
At the very least, these are the sorts of hypotheses which need to be checked
— and which must be checked with other or different data, since, by
non-identifiability, the data in question are silent about them. (If you are
going to assume all boxes are cubes, you should check that; but looking at
their volumes won't tell you whether or not they are cubes. That data is
indifferent between your sensible cubical hypothesis and the idle fancies of
the monolith-maniac.)

**Q**: Couldn't we get around non-identifiability by Bayesian methods?

**A**: Expressing "soft" restrictions by a prior distribution
about the unidentified parameters doesn't actually make those parameters
identified. Suppose, for instance, that you have a prior distribution over the
dimensions of boxes, *p*(*B*,*H*,*W*). The three
parameters *B*,*H*,*W* completely characterize boxes, and in
this are equivalent to the three parameters of volume *V* = *BHW* and
the two proportions or ratios *h* = *H*/*B* and *w*
= *W*/*B*. Thus the prior *p*(*B*,*H*,*W*) is
equivalent to an unconditional prior on volume multiplied by a conditional
prior on the
proportions, *p*(*V*) *p*(*h*, *w*|*V*). Since
the likelihood is a function of *V* alone, Bayesian updating will change
the posterior distribution over volumes, but leave the (volume-conditional)
distribution over proportions alone. This reasoning applies more generally:
the prior can be divided into one part which refers to the identifiable
parameters, and another which refers to the purely-identifiable parameters, and
learning only updates the former. (If a Bayesian agent's prior prejudices
happen to link the identified parameters to the unidentified ones, its
convictions about the latter will change, but strictly through those prior
prejudices.) The prior over the identifiable parameters can and should
be tested; that over the unidentified ones cannot. (Not
with that data, anyway.)

**Q**: If a parameter is unidentified, why bother with it at
all? Why not just use Occam's Razor to shave them away?

**A**: That seems like an excess
of positivism. (And I say this as
someone who is sympathetic to positivism.) After all, which parameters are
identifiable depends on what we can observe. It seems excessive to regard
boxes as one-dimensional when we can only measure displaced volume, but then
three-dimensional when we figure out how to use a ruler.

**Q**: Still, shouldn't there be a presumption against the
existence or importance of unidentifiable parameters?

**A**: Not at all. It is very common in politics to
simultaneously assert that the electorate leans towards certain parties in
certain years; that people *born* in certain years have certain
inclinations; and that people's political inclinations go through a certain
sequence as they age. If we admit all three kinds of processes, we have to try
to separate the effects on political opinions of people's age, the year they
were born (their *two* of the
effects of age, period and cohort are identifiable if we rule out the
third *a priori*; if we allow that all three might matter, we are not
able to identify their effects.

**Q**: I fail to see how this isn't actually an example in
favor of my position — people think these are three different effects, but
they're just wrong.

**A**: We can break this sort of impasse by specifying more
detailed mechanisms (and hoping we get more data). For instance, suppose that
people tend to become more politically conservative as they age, but that this
is because they accumulate more property as they grow older. Then, with data
on property holdings, we could separate the effects of cohort (were you born in
1967?) and age (are you 45?) from period (are you voting in 2012?), because
aging influences political opinions not through a mysterious black box but
through an observable mechanism. Or again, there are
presumably *mechanisms* which lead to period effects, as
in Hibbs's "Bread
and Peace" election model. (Even if that model is wrong, it illustrates
the *kind* of way a more elaborate theory can bring evidence to bear on
otherwise-unidentifiable questions.) Of course these more elaborated,
mechanistic theories need to be checked themselves, but that's science.

**Q**: So, what does all this have to do with the
social-contagion debate?

**A**: What Andrew Thomas and I showed
is that the distinction between the effects of homophily and those of social
influence or contagion is unidentifiable in observational (as opposed to
experimental) data. This, to my way of thinking, is a
much more consequential problem for claims that such-and-such a trait is
socially contagious than doubts about whether this-or-the-other significance
test was really appropriate; it says that *the observational data was all
irrelevant* to begin with. Instead, trying to attribute shares of the
similarity between social-network neighbors to influence vs. pre-existing
similarity is just like trying to say how much of the volume of a box is due to
its height as opposed to its width — it's not really a question
data *could* answer. It *could* be that we could use other
evidence to show that most boxes are cubes, but that's a separate question. No
amount of empirical evidence about the degree of similarity between network
neighbors can tell us anything about whether the similarity comes from
homophily or influence, just as no amount of measuring the volume of boxes can
tell us about their proportions.

**Q**: Mightn't there be assumptions about how social influence works, or how social networks form, which let us estimate the relative strengths of social contagion and homophily?

**A**: There might be indeed; we hope to find them; and to find
external checks on such assumptions. Discovering such cross-checks would be
like finding ways of measuring the volume of a geometrical body *and*
and its horizontal cross-section. Andrew and I talk about some possibilities
towards the end of our paper, and we're working on them. So I'm sure are
others.

**Q**: I find your ideas intriguing; how may I subscribe
to your newsletter?

**A**: For more,
see Partial Identification
of Parametric Statistical Models;
my review of Manski on
identification for prediction and decision; and Manski's book itself.

**Update**, 23 December: It might seem odd that I talk about
elaborating mechanisms to identify age, period and cohort effects, without
mentioning Winship and Harding's "A Mechanism-Based Approach to the
Identification of Age-Period-Cohort Models" (Sociological Methods and Research **36** (2008): 362--401, free PDF reprint). It *is* odd; but I didn't know
about that paper until today.

Posted at July 11, 2011 13:27 | permanent link