Sequential Decision-Making Under Stochastic Uncertainty

14 Aug 2023 13:44

Yet Another Inadequate Placeholder.

That said... I'm interested in the theory of optimal decision-making, when you need to make multiple decisions over time, and there is non-trivial stochastic uncertainty, either because the effects of your actions are somewhat random, or because you can only coarsely and noisily measure the state of the system you're acting on. I am particularly interested in the extent to which optimal strategies can be learned, in the usual "probably approximately correct" sense of computational learning theory. Here it seems that there is a potentially very important difference between trying to learn an optimal strategy on the basis of merely historical, haphazard data, versus actually performing experiments. In fact, in some sense the best way to learn about the optimal policy may be to experiment with a totally random policy, because the data you gather from such an experiment is totally free of outside, confounding factors. (Similarly, one way to learn about the properties of a nonlinear system is to measure its response to white noise; this is the Wiener method for transducers.)

Finding the optimal strategy turns out to be a very hard problem, both computationally and statistically, and it seems staggeringly unlikely that most human beings, when faced with such situations, respond in anything like the optimal manner. (This is part of the reason things like DSGE models in macroeconomics are crazy.) Or, rather, if we do act optimally, it's with respect to a non-obvious criterion.

Related or subsidiary topics which will also show up here: Partially-observable Markov decision processes, reinforcement learning, etc., etc.

People sometimes distinguish between "risk", which can be represented stochastically, i.e., as a probability distribution, and "uncertainty", where there is simply no basis for assessing frequencies or the like. Decision-making under such strong or genuine uncertainty is a very different problem, better addressed using the tools of low-regret learning.