This short book reprints three papers, all originally published in 1970 or
just before, by the three contributors, with an introduction and a conclusion
by Salmon. I turned to this because it's *almost* the first source for
Salmon's notion of a "statistical relevance basis". Briefly, and not quite
following the notation here, the notion is this. Suppose we are interested in
some outcome variable $Y$, and consider a set of (possibly) predictive
variables $X=(X_1, X_2, \ldots X_d)$. Let us say that two points, $x = (x_1,
x_2, \ldots x_d)$ and $x^{\prime}=(x^{\prime}_1, x^{\prime}_2, \ldots
x^{\prime}_d)$ are equivalent when $P(Y|X=x) = P(Y|X=x^{\prime})$. [*] This
defines an equivalence relation, since it's plainly reflexive, symmetric, and
transitive. Every equivalence relation defines a partition, so this one does
as well. The cells of this partition are configurations of the predictive
variables which are "homogeneous" (as Salmon puts it) with respect to $Y$.
Those cells are the elements of the statistical relevance basis. A difference
between two configurations $x$ and $x^{\prime}$ is relevant to $Y$ if, but only
if, $x$ and $x^{\prime}$ are not equivalent. In particular: if, given the
value of *some* of the $X$ variables, we can adjust the values of others
without moving from one cell of the partition to another, then those latter
variables are irrelevant to $Y$ (either absolutely, or in certain
configurations of the others).
Thus drumming is irrelevant to the
presence of tigers in North America, taking birth control pills is
irrelevant to men failing to get pregnant, etc.

Salmon's notion here is that a statistical explanation of the event $Y=y$
consists in laying out the statistical relevance basis, and stating the
conditional distribution for each cell. It is *not* necessary, in his
view, that the explanation give the event high probability, or even that it
increase the probability.

Reinforcing this is the paper by Jeffrey, which argues forcefully that often a statistical explanation of an event just consists in laying out the stochastic process which generates it, and not adding "and furthermore that process gives the event $Y=y$ high probability". Thus, for example, Jeffrey argues that "Why did this sequence of coin tosses come out HTHTHTHTTH?" is perfectly adequately explained by saying "The coin tosses followed a Bernoulli(0.5) process". (Those aren't direct quotes, but they are pretty literal paraphrases.)

The paper by Greeno complements Salmon's view of what constitutes an
explanation, essentially by arguing that the strength of the explanation is
given, information-theoretically, by $I[X;Y] = H[Y] - H[Y|X]$, the reduction in
entropy of $Y$ from conditioning on $X$. (As the authors do *not* note,
that is going to be equal to $I[S;Y]$ where $S=s(X)$ is the random variable say
which cell in the statistical relevance basis $X$ is located in, because $S$ is
a sufficient statistic.);
Greeno supplements his plausibility arguments by proving a simple
form of Fano's inequality,
relating the probability of mis-classifying a binary $Y$ to $H[Y|X]$. (Greeno
does not appear to have heard of Fano's inequality.) --- Incidentally, I think
Greeno would have to accept that in Jeffrey's example, of explaining a sequence
of coin tosses by pointing to the generating process, the strength of the
explanation is actually 0, but also that that's the strongest explanation
possible.

I was lead to this book a long time ago. Back in May, 1998, when I was launched on my [thesis research] about [complexity, information theory, sufficient statistics and partitions of predictors], I happened upon a copy of Salmon's Scientific Explanation and the Causal Structure of World [Princeton University Press, 1984] in a Madison used bookstore. Browsing through it, I had the unpleasant realization that Salmon's construction of the statistical relevance basis was, in different words, the same as the construction of "causal states" (per Crutchfield and Young, 1989) that I was investigating. Needless to say, I read the book, learned a lot from it, and even eventually got a paper out of the connection. But it was a nasty shock, it made me paranoid about trying to read everything, and it alerted me to a whole past and continuing history of rediscovering this circle of ideas.

The encounter also made me curious about Salmon's prior work on the
subject, which he referred to in the 1984 book, but was, then, hard for me to
track down. By the time, decades later and after moving to Pittsburgh, I found
a copy of this book, I was no longer working on those subjects, so it wasn't
until this month that I actually read it. This made it clear that while this
book was published in 1971, the central paper by Salmon had appeared in an
edited volume in 1970, and earlier versions had circulated in manuscript for
some time in the 1960s, since Greeno cites it in that form. This isn't quite
definite for me to say *exactly* when Salmon introduced the statistical
relevance basis, but "no later than 1970" for sure.

Whatever interest this book might have is now, I think, entirely
historical. Salmon's *ideas* remain valuable, but there's nothing
important here which isn't also in his 1984 book, better expressed and more
fully worked-out. So while I'm glad I read this, I'm not sure I
can *recommend* it, unless you happen to be doing research into the
history of these topics. Scientific Explanation and the Causal Structure
of the World, however, I can and do recommend.

--- One point I cannot resist making before closing. Salmon does not
distinguish, in his formalism, between $P(Y|X=x)$ and what those of us who've
read Pearl would write $P(Y|do(X=x))$. (Of course,
Spirtes, Glymour and
Scheines [ch. 3] offered an alternative and equivalent notation --- and
Glymour was Salmon's student.) He is, however, perfectly well aware of the
difference between these, and appeals to the possibility of experimental
manipulations to achieve what (following Reichenbach) he calls "screening off",
i.e., remote causes being irrelevant given proximate causes, and effects being
irrelevant given causes. But his *formalism* doesn't allow for this,
either here or in the 1984 book. The natural thing to want, then is to say
that two configurations of variables $x$ and $x^{\prime}$ are equivalent, with
respect to $Y$, when $P(Y|do(X=x)) = P(Y|do(X=x^{\prime}))$. The people who
have actually worked this out are Chalupka, Perona
and Eberhardt
(2015, 2016).
Gratifyingly, the causal version of the theory goes through almost exactly the
same way as the one using ordinary conditioning, and they even work out some
nice results on the relationship between the two partitions (e.g., the causal
partition is usually a coarsening of the merely-probabilistic one).

*: There are issues here about distinct "versions" of conditional probabilities, which disagree only on subsets of $x$ of measure 0. Salmon consigns them to a dismissive footnote, and I follow his wise example.

Philosophy of Science / Probability and Statistics

Drafted 25 February 2022, posted 6 March 2022