Notebooks

## Gygax Texts, Gygax Confidence Sets

21 Sep 2022 16:47

A Gygax test of a statistical hypothesis is one which rejects the null hypothesis with the specified false-positive rate $\alpha$, but rejects in a completely random manner, independent of the truth or falsity of the parameter value. If $\alpha = 0.05$, the conventional level in many fields, this is like saying we reject the null when we roll a 1 on a twenty-sided die, hence the name.

Given a Gygax test, it's easy to construct a Gygax confidence set. If the parameter space is countable, perform a separate Gygax test for each possible parameter value. (Roll a d20 for every parameter value.) Continuous parameter spaces are slightly more complicated, but we can nonetheless construct Gygax confidence sets (if not necessarily confidence intervals), as I shall explain in parentheses below.

(If the parameter space is the real line, we need a continuous-time Markov chain where the two states are "reject" and "accept", and where the invariant distribution puts probability $\alpha$ on "reject". Pick one point on the line, arbitrarily, as the origin, and draw from the invariant distribution for that point. Then, conditional on that starting value, move to the right and mark out regions of alternating acceptance and rejection, following the chain, conditional on that initial value. Similarly, go to the left independently of what we do to the right of the origin. We have thus ensured that every parameter value on the line, including the true value, is rejected with probability $\alpha$. Extending the construction to higher-dimensional parameter spaces is left as an exercise in random fields.)

Notice that a Gygax test has exactly the promised "size", or probability of falsely rejecting the null, viz., $\alpha$, but the test is also completely uninformative. (The "power", or probability of correctly detecting the alternative, is also $\alpha$.) Similarly, a Gygax confidence set will contain the true value of the parameter with probability (exactly) $1-\alpha$, i.e., it has correct coverage. Notice also, however, that this confidence set will not shrink as we get more data --- it's not consistent. This, I think, tells us something interesting about the relative importance of a statistical procedure's getting the error probabilities right versus its converging to the truth. (Cf.) It also tells us how little we've accomplished when we've merely shown that our test has the right size, or that our confidence set has the right coverage.

A word on the name, which is really why I wrote this up. I have been using the expression "Gygax test" in my teaching for many years, but was sure I'd borrowed it from someone, probably some teacher or the other, and forgotten who in my usual way. But I cannot find any appearance for it before my 2012 comment on VanderWeele et al. This raises the uncomfortable possibility that I just made up the name. If anyone can point me to an earlier source, so that I can give credit, I would very much appreciate it. If not, I am prepared to take responsibility.