Notebooks

Stochastic Differential Equations

Last update: 08 Dec 2024 12:22
First version: 27 September 2007

Non-stochastic differential equations are models of dynamical systems where the state evolves continuously in time. If they are autonomous, then the state's future values depend only on the present state; if they are non-autonomous, it is allowed to depend on an exogeneous "driving" term as well. (This may not be the standard way of putting it, but I think it's both correct and more illuminating than the more analytical viewpoints, and anyway is the line taken by V. I. Arnol'd in his excellent book on differential equations.) Stochastic differential equations (SDEs) are, conceptually, ones where the the exogeneous driving term is a stochatic process. --- While "differential equation", unmodified, covers both ordinary differential equations, containing only time derivatives, and partial differential equations, containing both time and space derivatives, "stochastic differential equation", unmodified, refers only to the ordinary case. Stochastic partial differential equations are just what you'd think.

The solution of an SDE is, itself, a stochastic process. To understand how this comes about, it helps to start by thinking through what an ordinary differential equation is, what constitutes a solution, and how to find a solution. The canonical sort of autonomous ordinary differential equation looks like \[ \frac{dx}{dt} = f(x) \] or \[ \frac{dx}{dt} = f(x,t) \] if it's non-autonomous. (I won't keep noting all the modifications for non-autonomous equations when they're clear.) Here \( f \) is the vector field which gives the rate of change of the system's variables, either as a function of their current value (autonomous), or their current value and the time (non-autonomous). A solution to the equation is a function \( x(t) \) whose time derivative matches the field: \[ \frac{dx}{dt}(t) = f(x(t)) \] Now, you'll recall from baby calculus that \( x(t) \) and \( x(t)+c \) have the same time derivative, for any constant \( c \). This means that the solution to an ODE isn't unique. We typically add on an extra requirement, such as an initial condition, \[ x(0) = x_0 \] to make the solution unique (and to represent the initial state of the system).

Finding a solution to the ODE means finding such a function, which in principle we can do by integrating. This is easiest to see if the equation isn't just non-autonomous, but doesn't depend on \( x \) at all, \[ \frac{dx}{dt} = f(t) \] Then we integrate both sides over time, from 0 to our time of interest \( t \); I'll write the variable of integration as \( s \) to keep it distinct: \[ \begin{eqnarray*} \int_{0}^{t}{\frac{dx}{dt} ds} & = & \int_{0}^{t}{f(s) ds}\\ x(t) - x(0) & = & \int_{0}^{t}{f(s) ds}\\ x(t) & = & x_0 + \int_{0}^{t}{f(s) ds} \end{eqnarray*} \] using the fundamental theorem of calculus, and the initial condition. Even if the equation isn't completely externally driven, we can still in principle do the same thing: \[ x(s) = x_0 + \int_{0}^{t}{f(x(s), s) ds} \]

We can actually calculate such a thing by using many different numerical schemes. One which is particularly helpful to going forward is Euler's method. Pick a small increment of time \( h \). We start with \( x(0) = x_0 \). Then we say \[ x(t+h) = x(t) + h f(x(t)) \] for the points \( t=0 \), \( t=h \), \( t=2h \), etc. In between those points we interpolate linearly. This gives us a function which doesn't quite obey the differential equation, but one can show that it comes closer and closer to doing so as \( h \rightarrow 0 \). (As I recall, Arnol'd's textbook, mentioned above, contains a pretty proof.)

Now let's thing about making all this stochastic. The easiest thing to do is to add some sort of stochastic noise on the right-hand side of the differential equation: \[ \frac{dX}{dt}(t) = f(X(t)) + Z(t) \] A solution will, once again, be a function \( X \) which satisfies this equation. Since \( Z \) is a random function of time, \( X \) will also be random. But if we could somehow fix \( Z \), we'd just be back to the kind of equation we know how to solve: \[ X(t) = x_0 + \int_{0}^{t}{f(X(s)) ds} + \int_{0}^{t}{Z(t) dt} \]

To make sense of such an expression in general, we need to know how to integrate stochastic processes. In particular, we need to understand when one process, say \( \zeta \), is the time-integral of another, say \( Z \), and vice versa. If we know that, then we can actually use Euler's method to find solutions. We would, once again, start with \( X(t) = x_0 \), set \[ X(t+h) = X(t) + h f(X(t)) + \zeta(t+h) - \zeta(t) \] and linearly interpolate between the points \( t=0 \), \( t=h \), \( t=2h \), etc. We then let \( h\rightarrow 0 \) to recover an exact solution.

In fact, so long as expressions like this last one make sense, we could use them to define what it means for one stochastic process, say \( Z \), to be the time-derivative of another, \( \zeta \). This may seem crazy or, more politely, of merely mathematical interest, but there are lots of situations where \( \zeta \) is a much better behaved process than is \( Z \). The premier example is when \( \zeta \) is the Gaussian process called the Wiener process or (slightly inaccurately) Brownian motion [1], defined by \( W(0) = 0 \), \( W(t) \sim \mathcal{N}(0, t) \) (i.e., a Gaussian or "Normal" distribution with mean 0 and variance \( t \)), and the increment \( W(t) - W(s) \) being statistically independent of the increment over any other (non-overlapping) interval of time. If we try to compute the time-derivative of \( W \), we find that it is almost-surely ill-defined [2]. You might say "Well, then maybe we shouldn't use the Wiener process as something giving us noise in a differential equation", but look at what it means in the Euler scheme: over the time interval from \( t \) to \( t+h \), the trajectory moves from \( X(t) \) to \( X(t) + hf(X(t)) + W(t+h) - W(t) \). That is to say, the the process follows the deterministic differential equation, plus a little Gaussian random kick --- that seems very natural! It's even natural that the variance of the Gaussian perturbation should scale with \( h \), since that's what we'd see in, say, a random walk...

So there should be some way of making sense of seeing a Wiener process as an integral. There is, in fact, a whole theory of stochastic integrals, developed in the 1940s, by M. Loeve, K. Ito, and R. Stratonovich (all building on earlier work by, among others, N. Wiener). The theory of SDEs more strictly is largely owed to Ito and Stratonovich (in two slightly different forms, corresponding to subtle differences between Euler methods).

Most of what one encounters, in applications, as the theory of SDEs assumes that the driving noise is in fact white noise, i.e., Gaussian and uncorrelated over time. On the one hand, this is less of a restriction than it might seem, because many other natural sorts of noise process can be represented as stochastic integrals of white noise. On the other hand, the same mathematical structure can be used directly to define stochastic integrals and stochastic DEs driven by a far broader class of stochastic processes; on this topic Kallenberg is a very good introduction.

[1]: "Slightly inaccurately", because the physical process known as "Brownian motion" is actually rather more accurately described by an Ornstein-Ulhenbeck process, which is the solution of the Langevin SDE for the momentum \( \vec{P}(t) \), \( d\vec{P} = -\gamma \vec{P} dt + \delta \mathbf{I} dW \), the constants \( \gamma \) and \( \delta \) telling us about how effective friction and diffusion are (respectively). One can actually recover the Wiener process as the appropriate limit from this, so the Wiener process approximates Brownian motion, but just identifying the two is a lie told to children, or rather to probability theorists. See Selmeczi et al. 2006, arxiv:physics/0603142, and sec. 18.1 of Almost None of the Theory of Stochastic Processes. ^

[2]: The argument is so pretty that I can't resist repeating it. The time-derivative is of course \( \lim_{h\rightarrow 0}{\frac{W(t+h) - W(t)}{h}} \). By the defining properties of the Wiener process, the numerator inside the limit is a Gaussian random variable with expectation 0 and variance \( h \). The ratio in the limit is therefore a Gaussian random variable with expectation 0 and variance \( 1/h \), so the variance will blow up to infinity as \( h \) shrinks. Moreover, for any other time \( s \), no matter how close \( s \) might be to \( t \), eventually the intervals \( (s, s+h) \) and \( (t, t+h) \) will not over-lap, so the increments will be statistically independent. ^


Notebooks: