Half-Truths (at Best) about Calculus of Variations and Optimal Control

27 Jun 2023 21:11

\[ \DeclareMathOperator*{\argmin}{argmin} \]

What follows is an attempt at sketching out deliberately bad (sloppy, purely heuristic) derivations of Euler-Lagrange, the Hamiltonian, and a step towards the maximum principle. At some point I want to expand it towards including more of the maximum principle and the HJB equations. The goal here is to show no concern for rigor, but present something which a stats. undergrad familiar with calculus could grasp, and use as orientation when learning a more rigorous version. (The aspiration, in other words, is to be what Terry Pratchett called a "lie told to children".)

See the main notebook for references which do all this correctly.

From optimizing time integrals to the Euler-Lagrange equations

We'd like to optimize a function \( x \) of time \( t \) over some interval, \( x: [0,T] \mapsto \mathbb{R}^d \). More exactly, we define a function of this function by integrating: \[ J[x] \equiv \int_{0}^{T}{g(t, x(t), \dot{x}(t)) dt} \] where \( g: \mathbb{R} \times \mathbb{R}^d \times \mathbb{R}^d \mapsto \mathbb{R} \) is some sort of instantaneous cost (or benefit) per unit time. When I want to talk about the arguments to this somewhat generically, I'll write \( g(t, q, v) \), with the understanding that (usually) we're only interested in \( q = x(t) \) and \( v = \dot{x}(t) \). To start with, we want to pick the function \( x \) so as to make \( J[x] \) as small, or as large, as possible.

(Notice at this point that if the cost of a trajectory doesn't depend on the time derivative, so \( g(t, q, v) = g(t,q) \), this is easy: set \( q^*(t) = \argmin_{q\in\mathbb{R}^d}{g(t,q)} \) and use that trajectory. This will even be a smooth function of \( t \), if the extrema of \( g(t, \cdot) \) change smoothly with \( t \). So this sort of problem will generally be interesting only when the cost depends, at least partially, on the rate of change of the trajectory.)

The classic step here is to imitate maximizing a function in \( \mathbb{R}^d \): take the derivative and set it to zero. Let's back up to remember why.

If we're minimizing \( \phi(q) \), then we're interested in points \( q^* \) where \( \phi(q^* + \epsilon r) \geq \phi(q^*) \) for all sufficiently small \( \epsilon \) and every direction \( r \), because these points will be the local minima. (More exactly, the local interior minima. Pretend there are no boundary minima for now.) So, considering small positive \( \epsilon > 0 \), \begin{eqnarray} \phi(q^* + \epsilon r) & \geq & \phi(q^*)\\ \phi(q^* + \epsilon r) - \phi(q^*) & \geq & 0\\ \frac{ \phi(q^* + \epsilon r) - \phi(q^*)}{\epsilon} & \geq & 0\\ \lim_{\epsilon \rightarrow 0}{\frac{ \phi(q^* + \epsilon r) - \phi(q^*)}{\epsilon}} & \geq & 0\\ \end{eqnarray} Now \( q^* - \epsilon r = q^* + \epsilon(-r) \) so we equally have \[ \lim_{\epsilon \rightarrow 0}{\frac{ \phi(q^* + \epsilon r) - \phi(q^*)}{\epsilon}} \leq 0 \] hence \[ \lim_{\epsilon \rightarrow 0}{\frac{ \phi(q^* + \epsilon r) - \phi(q^*)}{\epsilon}} = 0 \] The quantity on the left-hand side is the directional derivative of \( \phi \) in the direction \( r \). What we've concluded is that at an (interior) minimum (or maximum), the directional derivative needs to be zero in every direction.

For functions-of-trajectories, the equivalent of a small perturbation in the direction \( r \) is a small perturbation by another trajectory, so we go from the optimal trajectory \( x \) to \( x + \epsilon r \). The equivalent of the directional derivative is the Frechet derivative, \[ D_{r}J[x] \equiv \lim_{\epsilon\rightarrow 0}{\frac{J[x+\epsilon r] - J[x]}{\epsilon}} \]

Imitating what we just did for functions of vectors, we should have that, if \( x^* \) is a local minimum, then \[ D_{r}J[x^*] = 0 \] for all perturbing functions \( r \).

Let's look at the change in \( J \) when we move from \( x^* \) to \( x^* + \epsilon r \) for small \( \epsilon \): \begin{eqnarray} J[x^*+\epsilon r] - J[x^*] & = & \int_{0}^{T}{\left( g(t, x^*(t) + \epsilon r(t), \dot{x}(t) + \epsilon \dot{r}(t)) - g(t, x^*(t), \dot{x}^*(t)) \right) dt}\\ & \approx & \int_{0}^{T}{\left( \epsilon r(t) \cdot \nabla_q g(t, x^*(t), \dot{x}^*(t)) + \epsilon \dot{r}(t) \cdot \nabla_v g(t, x^*(t), \dot{x}^*(t)) \right) dt}\\ \end{eqnarray} where the approximation comes from taking a first-order Taylor series and ignoring higher-order terms, which are all (presumably) \( O(\epsilon^2) \) at most. (Remember we're interested in the limit \( \epsilon \rightarrow 0 \).

Notation notes at this point:

So, with this abbreviated notation, \[ J[x^*+\epsilon r] - J[x^*] \approx \int_{0}^{T}{\left( \epsilon r(t) \cdot \nabla_q g^*(t) + \epsilon \dot{r}(t) \cdot \nabla_v g^*(t) \right) dt} \] which is much less of a mouthful.

Dividing through by \( \epsilon \) we get \[ D_{r}J[x^*] = \int_{0}^{T}{\left( r(t) \cdot \nabla_q g^*(t) + \dot{r}(t) \cdot \nabla_v g^*(t) \right) dt} \] This is what should equal zero, for all perturbing functions \( r \). It's a bit annoying to have the two derivative-of-\( f \) terms multiplied by different factors. The classic trick for making this go away is to use "integration by parts". \begin{eqnarray} \frac{d (ab)}{dt} & = & a\frac{db}{dt} + b\frac{da}{dt}\\ \int_0^T{\frac{d (ab)}{dt}} & = & \int_{0}^{T}{a \frac{db}{dt} dt} + \int_{0}^{T}{b\frac{da}{dt}}\\ a(T)b(T) - a(0)b(0) & = & \int_{0}^{T}{a \frac{db}{dt} dt} + \int_{0}^{T}{b \frac{da}{dt} dt}\\ \int_{0}^{T}{b \frac{da}{dt} dt} & = & a(T)b(T) - a(0)b(0) - \int_{0}^{T}{a \frac{db}{dt} dt} \end{eqnarray} Now let's use this, with \( r \) taking the role of \( a \), and \( \nabla_v g \) taking the role of \( b \): \begin{eqnarray} D_r J[x^*] & = & \int_{0}^{T}{r(t) \cdot \left( \nabla_q g^*(t) - \frac{d}{dt}\nabla_v g^*(t) \right) dt}\\ & & + r(T) \cdot \nabla_v g^*(T) - r(0) \cdot \nabla_v g^*(0) \end{eqnarray} We've gotten rid of the \( \dot{r} \) in the integral, at the cost of some terms about what happens at \( t=0 \) and \( t=T \). Now we reason as follows: we want \( D_r J[x^*] = 0 \) for all reasonable \( r \), which should definitely include perturbations which go to zero at the initial and final time, \( r(0) = r(T) = 0 \). For those perturbations, we're left with the integral: \[ D_r J[x^*] = \int_{0}^{T}{r(t) \cdot \left( \nabla_q g^*(t) - \frac{d}{dt}\nabla_v g^*(t) \right) dt} \] Guaranteeing that this is zero for all such perturbations requires that the integrand be zero: \[ \nabla_q g(t, x^*(t), \dot{x}^*(t)) - \frac{d}{dt}\nabla_v g(t, x^*(t), \dot{x}^*(t)) = 0 \] where I'm being explicit again about the arguments of \( g \).

(Suppose this last expression was instead equal to some vector \( c \neq 0 \) at a particular time \( t_{bad} \). By continuity it will be close to \( c \) over at least a small interval around \( t_{bad} \). Now consider a perturbation where \( r(t) = c \) in that interval, but \( r(t) = 0 \) outside it. [If we only want to deal with smooth perturbations, we can approximate this by having \( r \) change steeply, but smoothly, at either end of the time interval.] Clearly the integral for \( D_r J[x^*] \) with this \( r \) will be \( > 0 \), which is not allowed.)

(If you think the previous parenthesized paragraph leaves a loophole, and all we've shown is that the integrand must be zero for [Lebesgue] almost all times, you are correct. But I am trying not to expose the mysteries of measure theory before non-initiates.)

Written out coordinate by coordinate, \( i \in 1:d \), \[ \frac{\partial g}{\partial q_i}(t, x^*(t), \dot{x}^*(t)) - \frac{d}{dt}\frac{\partial g}{\partial v_i}(t, x^*(t), \dot{x}^*(t)) = 0 \] which are the Euler-Lagrange equations.

Notice that these are conditions on the optimal trajectory, so the fact that we got them by considering the limited set of perturbations where \( r(0) = r(T) = 0 \) is fine. We know those are allowed perturbations, so the optimal trajectory must obey these equations. Considering a larger class of perturbations might lead us to additional restrictions on the optimum (see below), but it can't erase this one.

What about the boundary terms?
We've seen that the optimal \( x^* \) has to obey the Euler-Lagrange equations, so the integral term in the expression for \( D_r J[x^*] \) must be zero. But if we pick perturbations where \( r(0) \neq 0 \) or \( r(T) \neq 0 \), the boundary terms might not be zero. Now in some cases this doesn't matter. There are plenty of problems where we are only interested in trajectories which themselves meet some boundary conditions, like \( x(0) = x_{initial} \) and \( x(T) = x_{final} \). Then the only admissible perturbations will indeed have \( r(0) = r(T) = 0 \). But if one or the other of those boundary conditions are "free", unrestricted, we can reason our way to saying that \begin{eqnarray} \nabla_v g(0, x^*(0), \dot{x}^*(0)) & = & 0 ~ \text{(free initial conditions)}\\ \nabla_v g(T, x^*(T), \dot{x}^*(T)) & = & 0 ~ \text{(free final conditions)} \end{eqnarray}

From Euler-Lagrange to the Hamiltonian

I will now introduce a \( k \)-dimensional control input \( u \), which works by influencing the time-evolution of the state \( x \), but which also shows up in the cost or gain: \[ \dot{x}(t) = f(x(t), u(t)) \] and \[ J[x, u] \equiv \int_{0}^{T}{g(t, x(t), u(t)) dt} \] with \( x(0) = x_{initial} \) fixed. (You might ask about time-dependent dynamics, \( \dot{x}(t) = f(t, x(t), u(t)) \), but that can be accommodated by adding an extra coordinate to the state, say \( x_{d+1} \), where \( x_{d+1}(0) = 0 \) and \( \dot{x}_{d+1} = 1 \) always.)

Trusting in the magic of Lagrange multipliers, let's try to optimize \[ L[x, u, p] = \int_{0}^{T}{\left( g(t, x(t), u(t)) - p(t) \cdot (f(t, x(t), u(t)) - \dot{x}(t)) \right) dt} \] introducing a vector of \( d \) time-varying Lagrange multipliers \( p(t) \).

Now what happens if we move from the optimal control signal \( u^* \) to \( u^* + \epsilon r \)? Well, we'll move from \( x^* \) to, say, \( x^* + \delta \). We can treat \( \delta \) as independent of \( r \), not because it really is, but because the Lagrange multiplier is enforcing the constraint that links them. We can limit ourselves, as above, to situations where \( r(0) = r(T) = 0 \), and \( \delta(0) = \delta(T) = 0 \).

Taylor-expanding to first order, and writing \( g^*(t) \) and \( f^*(t) \) as above, \begin{eqnarray} L[x^*+\delta, u^*+\epsilon r, p^*] - L[x^*, u^*, p^*] & \approx & \int_{0}^{T}{\left( \delta(t) \cdot \nabla_q g^*(t) + \epsilon r (t) \cdot \nabla_v g^*(t) - p^*(t) \cdot \left( \nabla_q f^*(t) \delta(t) + \epsilon \nabla_v f^*(t) r(t)- \dot{\delta}(t) \right) \right) dt} \end{eqnarray}

(I am abusing notation even more by writing things like \( \nabla_q f \): since \( f \) is a \( \mathbb{R}^d \)-valued vector, this should be understood as a \( d \times d \) matrix. Similarly, \( \nabla_v f \) is a \( d \times k \) matrix.)

Now let's use integration by parts again: \[ \int_{0}^{T}{p(t) \cdot \dot{\delta}(t) dt} = p(T)\delta(T) - p(0)\delta(0) - \int_{0}^{T}{\delta(t) \cdot \dot{p}(t) dt} \] But remember we're (temporarily) limiting ourselves to perturbations where \( \delta(0) = \delta(T) = 0 \). So we can get rid of the annoying \( \dot{\delta}(t) \) term, at the cost of a \( \dot{p}(t) \) term: \begin{eqnarray} L[x^*+\delta, u^*+\epsilon r, p^*] - L[x^*, u^*, p^*] & \approx & \int_{0}^{T}{\left( \delta(t) \cdot \nabla_q g^*(t) + \epsilon r (t) \cdot \nabla_v g^*(t) - p^*(t) \cdot \left( \nabla_q f^*(t) \delta(t) + \epsilon \nabla_v f^*(t) r(t)- \dot{p}^*(t) \cdot \delta(t) \right) \right) dt}\\ & = & \int_{0}^{T}{\left( \delta(t) \cdot \left[ \nabla_q g^*(t) - (\nabla_q f^*(t))^T p^*(t) - \dot{p}^*(t) \right] + \epsilon r(t) \cdot \left[ \nabla_v g^*(t) - (\nabla_v f^*(t))^T p^*(t)\right] \right) dt} \end{eqnarray}

For this integral to be zero, we should have both the terms in square brackets be zero: \begin{eqnarray} \nabla_q g^*(t) - (\nabla_q f^*(t))^T p^*(t) & = & \dot{p}^*(t)\\ \nabla_v g^*(t) - (\nabla_v f^*(t))^T p^*(t) & = & 0 \end{eqnarray}

Matters actually become more illuminating if we introduce the at-first-mysterious-seeming Hamiltonian \[ H(x, u, p) \equiv -g(t,x, u) + p \cdot f(x, u) \]

This gives us \[ \int_{0}^{T}{\left(\delta(t) \cdot \left(\nabla_q H^*(t) + \dot{p}^*(t)\right) + \epsilon r(t) \cdot \nabla_v H^*(t) \right) dt} = 0 \] If we pick the Lagrange multipliers to satisfy \[ \dot{p} = - \nabla_q H \] then the first part of the integral is zero for every perturbation \( \delta(t) \), and the over-all integral simplifies to \[ \int_{0}^{T}{\epsilon r(t) \cdot \nabla_v H^*(t) dt} = 0 \] Making this zero for all perturbations \( r(t) \) requires \[ \nabla_v H(x^*(t), u^*(t), p^*(t)) = 0 \] at all times. That is, we need to pick the control \( u^*(t) \) so as to maximize the Hamiltonian at time \( t \) (or at least render it stationary).

Let me also remark that \[ \dot{x} = \nabla_p H = f(x, u) \] from the definition.

So, to sum up, \begin{eqnarray} H(x, u, p) & \equiv & -g(t,x, u) + p \cdot f(x, u)\\ 0 & = & \nabla_u H(x^*(t), u^*(t), p^*(t))\\ \dot{x}^*(t) & = & \nabla_p H(x^*(t), u^*(t), p^*(t))\\ \dot{p}^*(t) & = & - \nabla_q H(x^*(t), u^*(t), p^*(t))\\ \end{eqnarray}

Given the current state \( x^*(t) \) and the current Lagrange multipliers \( p^*(t) \), we pick the control \( u^*(t) \) to maximize the Hamiltonian. Once we know the control, the gradients of the Hamiltonian tell us the rates of change in both the state variables and the Lagrange multipliers. Remarkably enough, the Hamiltonian which tells us how to move along the optimal trajectory is a function of the current variables alone.

Rate of change of the Hamiltonian

How does the Hamiltonian change along the optimal trajectory? \begin{eqnarray} \frac{dH^*}{dt} & = & \frac{\partial H^*}{\partial t} + \nabla_q H^* \dot{x}^* + \nabla_u H^* \dot{u}^* + \nabla_p H^* \dot{p}\\ & = & - \frac{\partial g(t, x^*(t), u^*(t), p^*(t))}{\partial t} + (\nabla_q H^*) \cdot \nabla_p H^* + 0 - (\nabla_p H^*)\cdot \nabla_q H^*\\ & = & - \frac{\partial g(t, x^*(t), u^*(t), p^*(t))}{\partial t} \end{eqnarray} The rate of change for the Hamiltonian just comes from the explicit time-dependence of the instantaneous cost/benefit function \( g \). If that's time-independent, then the Hamiltonian is constant along the optimal trajectory. If \( g(t, x, u) = e^{-\rho t} \gamma(x, u) \), so we're looking at some sort of net present value with a discount rate of \( \rho \), the relevant partial derivative becomes \begin{eqnarray} \frac{dH^*}{dt} & = & (-\rho)( - g^*(t))\\ & = & (-\rho) (H^*(t) - p^*(t) \cdot f(x^*(t), u^*(t)))\\ H^*(t) &= & H^*(0) - \rho\int_{0}^{t}{H^*(s) ds} + \rho \int_{0}^{t}{p^*(s) \cdot f(x^*(s), u^*(s)) ds} \end{eqnarray} If the second term wasn't there, this'd just be exponential decay (or growth), so it's exponential decay with an input term...