March 31, 2021

Books to Read While the Algae Grow in Your Fur, March 2021

(I didn't finish a lot of books this month, since I'm not counting re-reading bits and pieces of arcane tomes on golem-making as needed for my own shambling creation.)

K. C. Constantine, Sunshine Enemies
Mind candy from 1990: the nth in a series of mystery novels set in the fictional western Pennsylvania town of Rockford, PA, somewhere in the environs of Pittsburgh — what I've heard called the yinzerlands. It's a good mystery novel, but what really sets it apart is the dialogue. Constantine has an incredible ear for the way locals of that generation spoke, and turns it into riveting dialogue. The depiction of the life-ways of these communities also feels authentic, but that's harder for me to judge. Strongly recommended if you like well-written detective novels, or are interested in fiction set around here.
Gabriel Rossman, Climbing the Charts: What Radio Airplay Tells Us about the Diffusion of Innovation
This is a short sociological treatise about, primarily, how songs become hits on commercial American radio, or fail to do so. It's well written (not just "well written for sociology"), and has a number of very interesting points to make about topics like the diffusion of innovation, corruption, the role of genres in popular culture, and more besides. The points which most interest me are the diffusion ones.
Rossman's starting point is to look at curves of cumulative adoption over time --- how many radio stations have, by a given date, ever played such-and-such a song? His main methodological tool is to distinguish between two types of adoption curves. One is the classic elongated-S curve, looking roughly like $\frac{e^{t\lambda}}{1+e^{t\lambda}}$, which one would expect to be produced by contagion, whether mediated by a network or by some more mean-field-ish process (like a best-seller list). The other ideal type of curve is "concave", indicating a constant probability of adoption per unit time, so looking like $1-e^{-t\lambda}$. The latter he interprets as indicating some shared external forcing. Most songs which become hits follow the latter pattern (though he has illuminating things to say about the exceptional endogenous hits). The obvious question is the identity of the external force. Rossman makes a compelling case that this is, in fact, the record companies, and not (e.g.) radio station chains; on this basis he goes in to an examination of the history and theory of payola. (Basically: radio "moves product" for the record companies, so you don't want to be the only record company which is not bribing radio stations to play your music.) He also has a less compelling but still fairly persuasive analysis showing that radio stations don't really decide what to play by imitating other radio stations (at least for one "format" of radio station, during one time period). I could go on --- Rossman packs a lot into only ~200 pages --- but forbear.
The central distinction here, between curves due to external forcing and curves due to endogenous contagion, is one that's persuasive in context, but isn't necessarily either airtight or generalizable. That promotional efforts by a record company would translate into a constant hazard for adoption seems plausible enough, but one could imagine a record company whose promotional efforts start small, ramp up rapidly when one song or another takes off, and which tapers when it becomes clear that the pool of new adoptees is almost exhausted, imitating a logistic, "endogenous" diffusion curve. (It doesn't seem like good business strategy, and I take Rossman's word for it that that's not, in fact, how record promotion works.) My efforts to come up with a "just so" story in which contagion produces a constant hazard are less convincing even to me, but I only gave five minutes to the effort. Returning to my perpetual hobbyhorse of the difficulty of establishing social contagion, I would say that this is an example of using subject-matter knowledge (i.e., actual science) to rule out alternatives, which couldn't be done on purely statistical grounds.
Recommended if you have any interest in the diffusion of innovations, or in social contagion. (Probably good if you're interested in the sociology of music, too.) Finally finished, 8 years (!) after I started it, because of reading a more recent paper by the author.
Chernobyl
Fukushima 50
Pandora's Promise
Hotel Rwanda
Watchers of the Sky
Human Flow
The Rest
This is what happens when you live with a historian writing a chapter about 1980--2020... Chernobyl is very well done; some scenes which I thought were imitations of Soviet science fiction movies were in fact imitations of archival footage. Fukushima is a much lower level of art, but still decent. (There is a whole essay to be written about the role of America in that movie, which I am utterly incompetent to do.) Quo Vadis, Aida? is almost unbearably sad. Hotel Rwanda is somehow more purely horrifying than sad. Watchers of the Sky was comparatively optimistic, but having a sincere and committed campaigner against genocide as our UN ambassador did less to improve things than one might wish. Human Flow is the most beautiful movie of the lot. The Rest is fine on its own terms, but diminished by the comparison to the previous movie (not as visually striking, not as thematically wide-ranging, and with too little of Ai Weiwei in the role of the planet's eccentric cat-guy uncle).
Pandora's Power calls for special comment. I am, by temperament and training, receptive to nuclear power having more of a role than many on the left want it to. But this movie, if anything, pushed me away from that position, purely by reaction. The people it chose to showcase as advocates were, for the most part, completely unqualified, both in their earlier opposition and in their later advocacy. Shellenberger in fact seems like someone whose only real principle is attracting attention by outraging liberal piety, a well-trodden path. (Perhaps he's a lovely person and the movie showed him in an bad light.)
Turning from personalities to substance, the arguments here are just tissue thin. If the problem with solar and wind power is intermittency, the obvious solutions are (1) storage, (2) non-intermittent renewable power sources (like hydro power), and (3) a limited role for natural gas or other fossil fuels. (Humanity's carbon budget is not zero.) To listen to the movie, you'd think all of this was impossible, rather than well-studied. (Yes, there are technical challenges, but that'd lead to a serious comparison of alternatives, which the movie avoids at all costs.) Claims that Chernobyl was responsible for millions of deaths are absurd, and anti-nuclear campaigners who repeat them discredit themselves. But it's also absurd to claim that Chernobyl killed basically nobody. (Why oh why might Soviet successor states want to minimize the consequences, it is a mystery, and why might the UN and WHO fail to challenge even obviously falsified official figures, who can say? A village priest squatting in the exclusion zone insists none of his flock gets sick, obviously he's telling the truth.) Concerns about the safe disposal of waste for hundreds to tens of thousands of years, and about nuclear proliferation (particularly with the breeder reactors favored by the move-makers) are dismissed remarkably glibly. (ObRecOfAnInfinitelyBetterMovie: Containment.) That there's a correlation between a country's energy usage and its average lifespan is perfectly true, but that's because countries which use a lot of energy are also ones with sanitation, adequate food, etc., etc. (Obviously it takes energy to provide these goods.) In any case the argument isn't about whether to use lots of energy (*), but how to supply it. I can't tell whether the poverty-porn shots of children in third world slums arise from a clumsy-but-sincere concern for the kids' well-being, from a calculation that "why do you hate brown kids?" is an easy way to morally blackmail the intended audience, or from a feeling that this'd be an amusing way to own the libs.
The only thing which gives me any pause about saying the movie is unmitigated dreck is that Stewart Brand and Richard Rhodes, who I otherwise find to be thoughtful and serious authors from whom I've learned much, agreed to participate. But by the end this had the effect of lowering them a bit in my estimation, which is sad.
After watching, I found this review, which seems very fair, because the movie is, in fact, very bad.
*: Of course there are people who wish humanity would plunge back to pre-industrial levels of energy usage, motivated by some combination of nostalgia for the idiocy of rural life and mis-guided Malthusianism. They are few in number and, thankfully, completely without influence, which will continue to be the case. (Any country where they might, incredibly, manage to impose their views would quickly be stomped by rivals whose madmen in authority were not quite that crazy, assuming their own people didn't do it first.)

March 26, 2021

Sub-Re-Intermediation

Because he hates me and wants to make sure that I never get back to any (other) friend or collaborator, Simon made me read Jack Dorsey endorsing an idea of Stephen Wolfram's. Much as it pains me to say, Wolfram has the germ of an interesting idea here, which is to start separating out different aspects of the business of running a social network, as that's currently understood. I am going to ignore the stuff about computational contracts (nonsense on stilts, IMHO), and focus just on the idea that users could have a choice about the ranking / content recommendation algorithms which determine what they see in their feeds. (For short I'll call them "recommendation engines" or "recommenders".) There are still difficulties, though.

"Editors. You've re-invented editors."

Or, more exactly, a choice of editorial lines, as we might have with different, competing newspapers and magazines. Well, fine; doing it automatically and at the volume and rate of the Web is something which you can't achieve just by hiring people to edit.

— Back in the dreamtime, before the present was widely distributed, Vannevar Bush imagined the emergence of people who'd make their livings by pointing out what, in the vast store of the Memex, would be worth others' time: "there is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record." Or, again, there's Paul Ginsparg's vision of new journals erecting themselves as front ends to arxiv. Appealing those such visions are, it's just not happened in any sustained, substantial way. (All respect to Maria Popova for Brain Pickings, but how many like her are there, who can do it as a job and keep doing it?) Maybe the obstacles here are ones of scale, and making content-recommendation a separate, algorithmic business could help fulfill the vision. Maybe.

Monsters Respond to Incentives

"Presumably", Wolfram says, "the content platform would give a commission to the final ranking provider". So the recommender is still in the selling-ads business, just as Facebook, Twitter, etc. are now. I don't see how this improves the incentives at all. Indeed, it'd presumably mean the recommender is a "publisher" in the digital-advertizing sense, and Facebook's and Twitter's core business situation is preserved. (Perhaps this is why Dorsey endorses it?) But the concerns about the bad and/or perverse effects of those incentives (e.g.) are not in the least alleviated by having many smaller entities channeled in the same direction.

On the other hand, I imagine it's possible that people would pay for recommendations, which would at least give the recommenders a direct financial incentive to please the users. This might still not be good for the users, but at least it would align them more with users' desires, and diversity of those desires could push towards a diversity of recommendations. Of course, there would be the usual difficulty of fee-based services competing against free-to-user-ad-supported services.

Imprimatur

To the extent there are concerns about certain content being banned by private companies, those are still there: the network operator, Facebook or Twitter or whatever, retains a veto over content. The recommenders are able to impose further vetoes, but not over-ride the operator.

Further: as Wolfram proposes it, the features used to represent content are already calculated by the operator. This can of course impose all sorts of biases and "editorial" decisions centrally, ones which the recommenders would have difficulty over-riding, if they could do so at all.

Increasing returns rule everything around me

Wolfram invokes "competition", but doesn't think about whether it will be effective. There are (at least) two grounds for thinking it wouldn't be, both based on increasing returns to scale.
1. Costs of providing the service: If I am going to provide a recommendation engine to a significant fraction of Facebook's audience, in a timely manner, I require a truly massive computational infrastructure, which will have huge fixed costs, though the marginal costs of each additional recommendation will be trivial. It's literally Econ 101 that this is a situation where competition doesn't work very well, and the market tends to either segment in to monopolistic competition or in to oligopoly (if not outright monopoly). As a counter-argument, I guess I could imagine someone saying "Cloud computing will take care of that", i.e., as long as we tolerate oligopoly among hardware operators, software companies will face constant scale costs for computing. (How could that possibly go wrong, technically or socially?)
2. Quality of the service: Machine learning methods work better with more data. This will mean more data about each user, and more data about more users. (In the very first paper on recommendation engines, back in 1995, Shardanand and Maes observed that the more users' data went in to each prediction, the smaller the error.) Result: the same algorithm used by company A, with $n$ users, will be less effective than if used by company B, with data on $2n$ users. Even when the recommendation engine doesn't explicit use the social network, this will create a network externality for recommendation providers (*). And thus again we get increasing returns and throttled competition (cf.).

Normally I'd say there'd also be switching costs to lock users in to the first recommender they seriously use, but I could imagine the network operators imposing data formats and input-output requirements to make it easy to switch from one recommender to another without losing history.

— Not quite so long ago as "As We May Think", but still well before the present was widely distributed, Carl Shaprio and Hal Varian wrote a quietly brilliant book on the strategies firms in information businesses should follow to actually make money. The four keys were economies of scale, network externalities, lock-in of users, and control of standards. The point of all of these is to reduce competition. These principles work — it is no accident that Varian is now the chief economist of Google — and they will apply here.

Prior art

Someone else must have proposed this already. This conclusion is an example of induction by simple enumeration, which is always hazardous, but compelling with this subject. I would be interested to read about those earlier proposal, since I suspect they'll have thought about how it actually could work.

*: Back of the envelope, say the prediction error is $O(n^{-1/2})$, as it often is. The question is then how utility to the user scales with error. If it was simply inversely proportional, we'd get utility scaling like $O(n^{1/2})$, which is a lot less than the $O(n)$ claimed for classic network externalities by Metcalfe's law rule-of-thumb. On the other hand it feels more sensible to say that going from an error of $\pm 1$ on a 5 point scale to $\pm 0.1$ is a lot more valuable to users than going from $\pm 0.1$ to $\pm 0.01$, not much less valuable. Indeed we might expect that even perfect prediction would have only finite utility to users, so the utility would be something like $c-O(n^{-1/2})$. This suggests that we could have multiple very large services, especially if there is a cost to switch between recommenders. But it also suggests that there'd be a minimum viable size for a service, since if it's too small a customer would be paying the switching cost to get worse recommendations. ^

(I can't remember if Henry Farrell came up with this phrase, or I did, as the title for a possible joint project.)

An Appeal to the Hive Mind (Ironically Enough)

I have a vivid memory of reading, in the 1990s, an online discussion (maybe just two people, maybe as many as four) about what online fora, search engines, the Web, "agents", etc., were doing to the way people acquire and use knowledge, and indeed to what we mean by "knowledge". My very strong impression is that one of the participants was linked somehow with the MIT Media Lab, and taking a very strong social-constructionist line (unsurprisingly, given that affiliation). At some point the discussion turned to her experiences with an online forum related to a hobby of hers (tropical fish? terraria?). The person I'm thinking of said something like, the consensus of that forum just were knowledge about \$HOBBY. One of her interlocutors made an objection on the order of, why do you trust those random people on the Internet to have any idea what they're talking about? To which the reply was, basically, come on, who'd just make stuff up about \$HOBBY?

I have (genuinely!) thought of this exchange often in the 20-plus years since I read it. But when I recently tried to find it again, to check my memory and to cite it in a work-in-glacial-progress, I've been unable to locate it. (The fact that I don't recall any names of the participants, or the venue, doesn't help.) I am prepared to learn that, because this is something I've thought of often, my mind has re-shaped it into a memorable anecdote, but I'd still like to see what this started from. Any leads readers could provide would be appreciated.

Update, the next day

The hive mind Lucy Keer (with an assist from Mike Traven) delivers:

Specifically, the seed around which this story nucleated in my memory may have been a January 1996 piece by Prof. Bruckman in Technology Review — it has the right content (sci.aquaria!), the right date, my father subscribed to TR and I'd even have been visiting my parents when that issue was current. Only it's not a conversation between multiple people but a solo-author essay, it's not primarily about the social aspects of knowledge but about how to find congenial on-line communities and make (or re-make) ones that don't suck (the lost wisdom of the Internet's early Bronze Age), and contains nothing like "who'd just make stuff up about \\$HOBBY?" (In short: Bartlett (1932) meets Radio Yerevan.)

More positively, I very much look forward to reading Bruckman's book (there's an excerpt/precis available on her website).

Posted at March 26, 2021 12:32 | permanent link

Regression, Thermostats, Causal Inference: Some Finger Exercises


I wrote the first version of this for the class where we do causal inference long enough ago that I actually don't remember when --- 2011? 2013? (In retrospect I had probably read Milton Friedman's thermostat analogy but didn't consciously remember it at the time.) Posted now because I've gone over the point with two different people in the last month.

The temperature outside $(X)$ is a direct cause of the temperature inside my house $(Y)$. But every morning I measure the temperature, and adjust my heating/cooling system $(C)$ to try to maintain a constant temperature $y_0$. For simplicity, we'll say that all the relations are linear, so $\begin{eqnarray} X & \sim & \mathrm{whatever}\\ C|X & \leftarrow & a+bX + \epsilon_1\\ Y|X,C & \leftarrow & X-C + \epsilon_2 \end{eqnarray}$ where $\epsilon_1$ and $\epsilon_2$ are exogenous, independent, mean-zero noise terms. We can think of $\epsilon_1$ as a combination of my sloppiness in measuring the temperature and in tuning the heating/cooling system; $\epsilon_2$ is sheer fluctuations.

Exercise: Draw the DAG.

To ensure that the expectation of $Y$ remains at $y_0$, no matter the external temperature, we need $\begin{eqnarray} y_0 & = & \Expect{Y|X=x}\\ & = & \Expect{X - a + bX + \epsilon_1 + \epsilon_2|X=x}\\ & = & (1-b)x -a \end{eqnarray}$ Since this must hold for all $x$, we need $b=1, a=-y_0$.

What follows from this?

• Internal temperature $Y$ is uncorrelated with external temperature $X$: $\begin{eqnarray} \Cov{X,Y} & = & \Expect{XY} - \Expect{X}\Expect{Y}\\ & = & \Expect{X\Expect{Y|X}} - \Expect{X}\Expect{Y}\\ & = & \Expect{X}y_0 - \Expect{X}y_0 = 0 \end{eqnarray}$ The internal temperature will fluctuate around the set-point $y_0$, but those fluctuations will not correlate with the external temperature.
• Internal temperature $Y$ is correlated with the control signal $C$ only through my sloppiness: $\begin{eqnarray} \Cov{C,Y} & = & \Expect{CY} - \Expect{C}\Expect{Y}\\ & = & \Expect{(-y_0 + X + \epsilon_1)(X+y_0-X-\epsilon_1+\epsilon_2)} - (\Expect{X}-y_0)y_0\\ & = & -y_0^2 - \Expect{\epsilon^2} + \Expect{X}y_0 -\Expect{X \epsilon_1} + \Expect{X\epsilon_2} + \Expect{\epsilon_1 \epsilon_2} - \Expect{X}y_0 + y_0^2\\ & = & -\Var{\epsilon_1} \end{eqnarray}$ since all the cross-expectations are zero, and $\Expect{\epsilon_1}=0$.
• The control signal $C$ is correlated with the external temperature: $\begin{eqnarray} \Cov{C,X} & = & \Expect{CX} - \Expect{C}\Expect{X}\\ & = & \Expect{(-y_0 + X+\epsilon_1)X} + (-y_0 +\Expect{X})\Expect{X}\\ & = & \Expect{X^2} - \left(\Expect{X}\right)^2\\ & = & \Var{X} \end{eqnarray}$
• A linear regression of $Y$ on $X$ and $C$ will consistently recover the correct coefficients, namely $+1$ and $-1$. To see this, recall (e.g., from [[here]]) that the OLS estimates will tend towards the coefficients of the optimal linear predictor. Those coefficients, in turn, are the solution to $\beta = {\left[ \begin{array}{cc} \Var{X} & \Cov{C,X}\\ \Cov{X,C} & \Var{C} \end{array}\right]}^{-1} \left[ \begin{array}{c} \Cov{Y,X}\\ \Cov{Y,C} \end{array}\right]$ Plugging in our previous results, $\beta = {\left[ \begin{array}{cc} \Var{X} & \Var{X}\\ \Var{X} & \Var{X}+\Var{\epsilon_1} \end{array}\right]}^{-1} \left[ \begin{array}{c} 0\\ -\Var{\epsilon} \end{array}\right]$ After some character-building algebra, you can confirm that the covariance matrix is invertible as long as $\Var{\epsilon_1} > 0$, and then, as promised $\beta = (1,-1)$.

Exercise: Build your character by doing the algebra.

So, as long as control isn't perfect, the naive statistician (or experienced econometrician...) who just does a kitchen-sink regression will actually get the relationship between $Y$, $X$ and $C$ right, concluding that external temperature and the climate control have equal and opposite effects on internal temperature. Sure, there will be sampling noise, but with enough data they'll approach the truth.

Exercise: What do you get if you regress $C$ on $X$ and $Y$?

I have implicitly assumed that I know the exact linear relationship between $X$ and $Y$, since I used that in deriving how the control signal should respond to $X$. If I mis-calibrate the control signal, say if $C = -y_0 +0.999X + \epsilon_1$, then there is not an exact cancellation and everything works as usual.

Exercise: Suppose that instead of measuring the external temperature $X$ directly, I can only measure yesterday's temperature $U$, again with noise. Supposing there is a linear relationship between $U$ and $X$, replicate this analysis. Does it matter if $U$ is the parent of $X$ or vice versa?

Exercise: "Feedback is a mechanism for persistently violating faithfulness"; discuss.

Exercise: "The greatest skill seems like clumsiness" (Laozi); discuss.

