Notebooks

Inference for Stochastic Differential Equations

31 Oct 2023 20:47

Spun off from stochastic differential equations, and/or inference for Markov models.

To be clear, I'm considering situations in which we observe a trajectory \( x(t) \) of a stochastic process \( X \) that obeys an SDE \[ dx = a(x;\theta) dt + b(x;\theta) dW \] and want to do inference on the parameter \( \theta \). (The "parameter" here might be a whole function.)

The "easy" case is discrete-time, equally-spaced data, without loss of generality \( x(0), x(h), x(2h), \ldots x(nh) \). Because \( X \) is (by hypothesis) a Markov process, there is a conditional probability kernel \( P_h(y|x;\theta) \), which one could find by integrating the generator of the SDE, and the log-likelihood is just \[ L(\theta) = \sum_{k=0}^{n-1}{\log{P_h(x((k+1)h)|x(kh); \theta)}} \] (As usual with Markov processes, this is really a conditional likelihood, conditioning on the first observation \( x(0) \).) Of course, "just" integrating the generator is not necessarily an easy issue...

Let me give a somewhat heuristic example, though. Say that \( X(t) \) is one-dimensional, that the driving noise process \( W \) is a standard Wiener process, and the time interval \( h \) is very small. (Vector-valued processes just mean more notation here.) Then we can say that \[ X(t+h)|X(t)=x(t) \sim \mathcal{N}(x(t) + ha(x(t)), h b^2(x(t))) \] and so write out the log-likelihood explicitly: \[ L(\theta) = -\frac{n}{2}\log{2\pi h} + \sum_{k=0}^{n-1}{-\log{b(x(kh); \theta)} - \frac{1}{2}\frac{(x((k+1)h) - x(kh) - ha(x(kh); \theta))^2}{hb^2(x(kh); \theta)}} \] Of course this relies on \( h \) being small, and funny things are clearly going to happen as \( h \rightarrow 0 \) --- the parameter-and-data independent term out front is going to blow up, but the number of terms in the sum will become infinite...

In fact, in many ways, the most natural sort of data to want to use here would be a whole function, a sample path or trajectory over an interval of time, say \( x[0,T] \). One would naturally hope that the log-likelihood sum above, for discrete times, would pass over to an integral as \( h \rightarrow 0 \). But this raises some technical difficulties. When we talk about the likelihood of parameter value \( \theta \) on data \( x \), say \( \mathcal{L}(\theta, x) \), what we really mean is \( \mathcal{L}(\theta; x) = \frac{dP_{\theta}}{dM}(x) \), where \( P_{\theta} \) is the probability measure induced by the model when the parameter is \( \theta \), and \( \frac{dP_{\theta}}{dM} \) is the Radon-Nikodym derivative* of this measure w.r.t. some reference measure \( M \) which dominates \( P_{\theta} \) for all \( \theta \). In the usual baby-stats problems, we silently take \( M = \) counting measure for discrete sample spaces, or \( M = \) Lebesgue measure for when the data live in \( \mathbb{R}^n \). Coming up with a good reference measure for infinite-dimensional data is non-trivial: it has to dominate all the different measures on trajectories the SDE might generate with different \( \theta \)'s, and ideally we should actually be able to calculate the likelihood! You might hope we could get away with, so to speak, hanging a copy of the Lebesgue measure from every point on the time interval \( [0,T] \) and taking the uncountably-infinite product, but, very annoyingly, that turns out not to work. What can work, however, is to use the measure of the driving noise process that I wrote as \( W \) above; this turns out to be especially nice when \( W \) is, as the label suggests, a standard Wiener process. (There are some rather nice formulas

*: As you know, Babur, the Radon-Nikodym derivative of one measure, \( Q \), with respect to another measure, \( M \), written \( \frac{dQ}{dM} \), is a function \( f \) such that \( Q(A) = \int_{A}{f(x) dM(x) } \). Morally speaking, \( f(x) \) says what the that the ratio of the measure density is at the point \( x \). This only works if \( M(A) = 0 \) implies \( Q(A) = 0 \), in which case we say \( M \) dominates \( Q \), or that \( Q \) is absolutely continuous w.r.t. \( M \). Neither measure has to be a probability measure, though they can be. Also, a function \( f \) like this is only a "version" of the R-N derivative; if \( g(x) = f(x) \) except on a set \( B \) where \( M(B)=0 \), then \( g \) is also a perfectly acceptable version of the R-N derivative.


Notebooks: