Inference for Stochastic Differential Equations
Last update: 08 Dec 2024 00:30First version: 28 June 2021
Spun off from stochastic differential equations, and/or inference for Markov models.
To be clear, I'm considering situations in which we observe a trajectory \( x(t) \) of a stochastic process \( X \) that obeys an SDE \[ dx = a(x;\theta) dt + b(x;\theta) dW \] and want to do inference on the parameter \( \theta \). (The "parameter" here might be a whole function.)
The "easy" case is discrete-time, equally-spaced data, without loss of generality \( x(0), x(h), x(2h), \ldots x(nh) \). Because \( X \) is (by hypothesis) a Markov process, there is a conditional probability kernel \( P_h(y|x;\theta) \), which one could find by integrating the generator of the SDE, and the log-likelihood is just \[ L(\theta) = \sum_{k=0}^{n-1}{\log{P_h(x((k+1)h)|x(kh); \theta)}} \] (As usual with Markov processes, this is really a conditional likelihood, conditioning on the first observation \( x(0) \).) Of course, "just" integrating the generator is not necessarily an easy issue...
Let me give a somewhat heuristic example, though. Say that \( X(t) \) is one-dimensional, that the driving noise process \( W \) is a standard Wiener process, and the time interval \( h \) is very small. (Vector-valued processes just mean more notation here.) Then we can say that \[ X(t+h)|X(t)=x(t) \sim \mathcal{N}(x(t) + ha(x(t)), h b^2(x(t))) \] and so write out the log-likelihood explicitly: \[ L(\theta) = -\frac{n}{2}\log{2\pi h} + \sum_{k=0}^{n-1}{-\log{b(x(kh); \theta)} - \frac{1}{2}\frac{(x((k+1)h) - x(kh) - ha(x(kh); \theta))^2}{hb^2(x(kh); \theta)}} \] Of course this relies on \( h \) being small, and funny things are clearly going to happen as \( h \rightarrow 0 \) --- the parameter-and-data independent term out front is going to blow up, but the number of terms in the sum will become infinite...
In fact, in many ways, the most natural sort of data to want to use here would be a whole function, a sample path or trajectory over an interval of time, say \( x[0,T] \). One would naturally hope that the log-likelihood sum above, for discrete times, would pass over to an integral as \( h \rightarrow 0 \). But this raises some technical difficulties. When we talk about the likelihood of parameter value \( \theta \) on data \( x \), say \( \mathcal{L}(\theta, x) \), what we really mean is \( \mathcal{L}(\theta; x) = \frac{dP_{\theta}}{dM}(x) \), where \( P_{\theta} \) is the probability measure induced by the model when the parameter is \( \theta \), and \( \frac{dP_{\theta}}{dM} \) is the Radon-Nikodym derivative* of this measure w.r.t. some reference measure \( M \) which dominates \( P_{\theta} \) for all \( \theta \). In the usual baby-stats problems, we silently take \( M = \) counting measure for discrete sample spaces, or \( M = \) Lebesgue measure for when the data live in \( \mathbb{R}^n \). Coming up with a good reference measure for infinite-dimensional data is non-trivial: it has to dominate all the different measures on trajectories the SDE might generate with different \( \theta \)'s, and ideally we should actually be able to calculate the likelihood! You might hope we could get away with, so to speak, hanging a copy of the Lebesgue measure from every point on the time interval \( [0,T] \) and taking the uncountably-infinite product, but, very annoyingly, that turns out not to work. What can work, however, is to use the measure of the driving noise process that I wrote as \( W \) above; this turns out to be especially nice when \( W \) is, as the label suggests, a standard Wiener process. (There are some rather nice formulas
*: As you know, Babur, the Radon-Nikodym derivative of one measure, \( Q \), with respect to another measure, \( M \), written \( \frac{dQ}{dM} \), is a function \( f \) such that \( Q(A) = \int_{A}{f(x) dM(x) } \). Morally speaking, \( f(x) \) says what the that the ratio of the measure density is at the point \( x \). This only works if \( M(A) = 0 \) implies \( Q(A) = 0 \), in which case we say \( M \) dominates \( Q \), or that \( Q \) is absolutely continuous w.r.t. \( M \). Neither measure has to be a probability measure, though they can be. Also, a function \( f \) like this is only a "version" of the R-N derivative; if \( g(x) = f(x) \) except on a set \( B \) where \( M(B)=0 \), then \( g \) is also a perfectly acceptable version of the R-N derivative.
- See also:
- Equations of Motion from a Time series [for hints towards nonparametric approaches]
- Recommended, big picture:
- Stefano M. Iacus, Simulation and Inference for Stochastic Differential Equations: With R Examples
- Recommended, close-ups:
- David R. Brillinger, Brent S. Stewart, Charles L. Littnan, "Three months journeying of a Hawaiian monk seal", pp. 246--264 of Deborah Nolan and Terry Speed (eds.), Probability and Statistics: Essays in Honor of David A. Freedman (2008), arxiv:0805.3019 [A pretty application, dealt with mostly non-parametrically, rather than by a parametric likelihood approach]
- Luca Capriotti
- "A Closed-Form Approximation of Likelihood Functions for Discretely Sampled Diffusions: the Exponent Expansion", physics/0703180
- "The Exponent Expansion: An Effective Approximation of Transition Probabilities of Diffusion Processes and Pricing Kernels of Financial Derivatives", International Journal of Theoretical and Applied Finance 9 (2006): 1179--1199, physics/0602107
- Christopher C. Heyde, Quasi-Likelihood and Its Applications: A General Approach to Optimal Parameter Estimation
- Robert S. Liptser and Albert N. Shiryaev, Statistics of Random Processes [Ch. 17, in vol. II, is devoted to a concise by rigorous treatment of this issue. Very roughly speaking, the likelihood approach here is to take the sum I wrote above and let it become an integral as \( h \rightarrow 0 \), and the data becomes an entire continuously-observed trajectory. A careful treatment of this requires at the least something like Liptser and Shiryaev's ch. 7 in vol. I ("Absolute Continuity"), and that needs a bunch of earlier material as well...]
- To read:
- Jose Bento, Morteza Ibrahimi and Andrea Montanari
- "Learning Networks of Stochastic Differential Equations", NIPS 23 (2010), arxiv:1011.0415
- "Information Theoretic Limits on Learning Stochastic Differential Equations", arxiv:1103.1689
- Xi Chen, Ilya Timofeyev, "Non-parametric estimation of Stochastic Differential Equations from stationary time-series", arxiv:2007.08054
- Daan Crommelin, "Estimation of Space-Dependent Diffusions and Potential Landscapes from Non-equilibrium Data", Journal of Statistical Physics 149 (2012): 220--233
- Serguei Dachian, Yury A. Kutoyants, "On the Goodness-of-Fit Tests for Some Continuous Time Processes", arxiv:0903.4642 ["We present a review of several results concerning the construction of the Cramer-von Mises and Kolmogorov-Smirnov type goodness-of-fit tests for continuous time processes. As the models we take a stochastic differential equation with small noise, ergodic diffusion process, Poisson process and self-exciting point processes"]
- Arnak Dalalyan and Markus Reiss, "Asymptotic statistical equivalence for ergodic diffusions: the multidimensional case", math.ST/0505053
- A. De Gregorio and S. M. Iacus, "Adaptive Lasso-type estimation for ergodic diffusion processes", arxiv:1002.1312
- D. Dehay and Yu. A. Kutoyants, "On confidence intervals for distribution function and density of ergodic diffusion process", Journal of Statistical Planning and Inference 124 (2004): 63--73
- D. Florens and H. Pham, "Large Deviations in Estimation of an Ornstein-Uhlenbeck Model," Journal of Applied Probability 36 (1999): 60--77
- Christiane Fuchs, Inference for Diffusion Processes: With Applications in Life Sciences
- Shota Gugushvili, Peter Spreij, "Parametric inference for stochastic differential equations: a smooth and match approach", arxiv:1111.1120
- Stefano M. Iacus
- "Statistical analysis of stochastic resonance with ergodic diffusion noise," math.PR/0111153
- "On Lasso-type estimation for dynamical systems with small noise", arxiv:0912.5078
- D. Kleinhans, R. Friedrich, A. Nawroth and J. Peinke, "An iterative procedure for the estimation of drift and diffusion coefficients of Langevin processes", Physics Letters A 346 (2005): 42--46, physics/0502152
- Yury A. Kutoyants
- Statistical Inference for Ergodic Diffusion Processes
- "On the Goodness-of-Fit Testing for Ergodic Diffusion Processes", arxiv:0903.4550
- "Goodness-of-Fit Tests for Perturbed Dynamical Systems", arxiv:0903.4612
- Chenxu Li, "Maximum-likelihood estimation for diffusion processes via closed-form density expansions", Annals of Statistics 41 (2013): 1350--1380
- Martin Lysy, Natesh S. Pillai, "Statistical Inference for Stochastic Differential Equations with Memory", arxiv:1307.1164
- Javier R. Movellan, Paul Mineiro, and R. J. Williams, "A Monte Carlo EM Approach for Partially Observable Diffusion Processes: Theory and Applications to Neural Networks," Neural Computation 14 (20020: 1507--1544
- Ilia Negri, "Efficiency of a class of unbiased estimators for the invariant distribution function of a diffusion process", math.ST/0609590
- Ilia Negri and Yoichi Nishiyama, "Goodness of fit test for ergodic diffusions by tick time sample scheme", Statistical Inference for stochastic Processes 13 (2010): 81--95
- Jun Ohkubo, "Nonparametric model reconstruction for stochastic differential equations from discretely observed time-series data", Physical Review E 84 (2011): 066702
- B. L. S. Prakasa Rao, Statistical Inference for Diffusion-Type Proccesses
- E. Racca and A. Porporato, "Langevin equations from time series", Physical Review E 71 (2005): 027101
- Aad van der Vaart and Harry van Zanten, "Donsker theorems for diffusions: Necessary and sufficient conditions", Annals of Probability 33 (2005): 1422--1451, math.PR/0507412
- Harry van Zanten, "On Uniform Laws of Large Numbers for Ergodic Diffusions and Consistency of Estimators", Statistical Inference for Stochastic Processes 6 (2003): 199--213 ["In contrast with uniform laws of large numbers for i.i.d. random variables, we do not need conditions on the 'size' of the class [of functions] in terms of bracketing or covering numbers. The result is a consequence of a number of asymptotic properties of diffusion local time that we derive."]
- J. H. van Zanten, "On the Uniform Convergence of the Empirical Density of an Ergodic Diffusion", Statistical Inference for Stochastic Processes 3 (2000): 251--262