## Half-Truths (at Best) about Calculus of Variations and Optimal Control

*27 Jun 2023 21:11*

\[ \DeclareMathOperator*{\argmin}{argmin} \]

What follows is an attempt at sketching
out *deliberately bad* (sloppy, purely heuristic) derivations of
Euler-Lagrange, the Hamiltonian, and a step towards the maximum principle. At
some point I want to expand it towards including more of the maximum
principle and the HJB equations. The goal here is to show no concern for
rigor, but present something which a stats. undergrad familiar with
calculus could grasp, and use as orientation when learning a more
rigorous version. (The aspiration, in other words, is to be what Terry
Pratchett called a "lie told to children".)

See the main notebook for references which do all this *correctly*.

- From optimizing time integrals to the Euler-Lagrange equations
- From Euler-Lagrange to the Hamiltonian
- Rate of change of the Hamiltonian

#### From optimizing time integrals to the Euler-Lagrange equations

We'd like to optimize a function \( x \) of time \( t \) over some interval, \( x: [0,T] \mapsto \mathbb{R}^d \). More exactly, we define a function of this function by integrating: \[ J[x] \equiv \int_{0}^{T}{g(t, x(t), \dot{x}(t)) dt} \] where \( g: \mathbb{R} \times \mathbb{R}^d \times \mathbb{R}^d \mapsto \mathbb{R} \) is some sort of instantaneous cost (or benefit) per unit time. When I want to talk about the arguments to this somewhat generically, I'll write \( g(t, q, v) \), with the understanding that (usually) we're only interested in \( q = x(t) \) and \( v = \dot{x}(t) \). To start with, we want to pick the function \( x \) so as to make \( J[x] \) as small, or as large, as possible.(Notice at this point that if the cost of a trajectory doesn't depend on the time derivative, so \( g(t, q, v) = g(t,q) \), this is easy: set \( q^*(t) = \argmin_{q\in\mathbb{R}^d}{g(t,q)} \) and use that trajectory. This will even be a smooth function of \( t \), if the extrema of \( g(t, \cdot) \) change smoothly with \( t \). So this sort of problem will generally be interesting only when the cost depends, at least partially, on the rate of change of the trajectory.)

The classic step here is to imitate maximizing a function in \( \mathbb{R}^d \): take the derivative and set it to zero. Let's back up to remember why.

If we're minimizing \( \phi(q) \), then we're interested in points \( q^* \)
where \( \phi(q^* + \epsilon r) \geq \phi(q^*) \) for all sufficiently small \(
\epsilon \) and every direction \( r \), because these points will be the local
minima. (More exactly, the local *interior* minima. Pretend there are
no boundary minima for now.) So, considering small positive \( \epsilon > 0 \),
\begin{eqnarray}
\phi(q^* + \epsilon r) & \geq & \phi(q^*)\\
\phi(q^* + \epsilon r) - \phi(q^*) & \geq & 0\\
\frac{ \phi(q^* + \epsilon r) - \phi(q^*)}{\epsilon} & \geq & 0\\
\lim_{\epsilon \rightarrow 0}{\frac{ \phi(q^* + \epsilon r) - \phi(q^*)}{\epsilon}} & \geq & 0\\
\end{eqnarray}
Now \( q^* - \epsilon r = q^* + \epsilon(-r) \) so we equally have
\[
\lim_{\epsilon \rightarrow 0}{\frac{ \phi(q^* + \epsilon r) - \phi(q^*)}{\epsilon}} \leq 0
\]
hence
\[
\lim_{\epsilon \rightarrow 0}{\frac{ \phi(q^* + \epsilon r) - \phi(q^*)}{\epsilon}} = 0
\]
The quantity on the left-hand side is the directional derivative of \( \phi \) in the direction \( r \). What we've concluded is that at an (interior) minimum (or maximum), the directional derivative needs to be zero in every direction.

For functions-of-trajectories, the equivalent of a small perturbation in the direction \( r \) is a small perturbation by another trajectory, so we go from the optimal trajectory \( x \) to \( x + \epsilon r \). The equivalent of the directional derivative is the Frechet derivative, \[ D_{r}J[x] \equiv \lim_{\epsilon\rightarrow 0}{\frac{J[x+\epsilon r] - J[x]}{\epsilon}} \]

Imitating what we just did for functions of vectors, we should have that, if \( x^* \) is a local minimum, then \[ D_{r}J[x^*] = 0 \] for all perturbing functions \( r \).

Let's look at the change in \( J \) when we move from \( x^* \) to \( x^* + \epsilon r \) for small \( \epsilon \): \begin{eqnarray} J[x^*+\epsilon r] - J[x^*] & = & \int_{0}^{T}{\left( g(t, x^*(t) + \epsilon r(t), \dot{x}(t) + \epsilon \dot{r}(t)) - g(t, x^*(t), \dot{x}^*(t)) \right) dt}\\ & \approx & \int_{0}^{T}{\left( \epsilon r(t) \cdot \nabla_q g(t, x^*(t), \dot{x}^*(t)) + \epsilon \dot{r}(t) \cdot \nabla_v g(t, x^*(t), \dot{x}^*(t)) \right) dt}\\ \end{eqnarray} where the approximation comes from taking a first-order Taylor series and ignoring higher-order terms, which are all (presumably) \( O(\epsilon^2) \) at most. (Remember we're interested in the limit \( \epsilon \rightarrow 0 \).

Notation notes at this point:

- I'm using \( \nabla_q g(t, q, v) \) to mean the vector of \( d \) partial
derivatives of \( f \) with respect to the \( d \) coordinates of \( q \)
- Similarly for \( \nabla_v g(t, q, v) \).)
- From here on when I write \( g^*(t) \) it'll be short-hand for \( g^*(t, x^*(t), \dot{x}^*(t)) \), and similarly for the gradients.

So, with this abbreviated notation, \[ J[x^*+\epsilon r] - J[x^*] \approx \int_{0}^{T}{\left( \epsilon r(t) \cdot \nabla_q g^*(t) + \epsilon \dot{r}(t) \cdot \nabla_v g^*(t) \right) dt} \] which is much less of a mouthful.

Dividing through by \( \epsilon \) we get \[ D_{r}J[x^*] = \int_{0}^{T}{\left( r(t) \cdot \nabla_q g^*(t) + \dot{r}(t) \cdot \nabla_v g^*(t) \right) dt} \] This is what should equal zero, for all perturbing functions \( r \). It's a bit annoying to have the two derivative-of-\( f \) terms multiplied by different factors. The classic trick for making this go away is to use "integration by parts". \begin{eqnarray} \frac{d (ab)}{dt} & = & a\frac{db}{dt} + b\frac{da}{dt}\\ \int_0^T{\frac{d (ab)}{dt}} & = & \int_{0}^{T}{a \frac{db}{dt} dt} + \int_{0}^{T}{b\frac{da}{dt}}\\ a(T)b(T) - a(0)b(0) & = & \int_{0}^{T}{a \frac{db}{dt} dt} + \int_{0}^{T}{b \frac{da}{dt} dt}\\ \int_{0}^{T}{b \frac{da}{dt} dt} & = & a(T)b(T) - a(0)b(0) - \int_{0}^{T}{a \frac{db}{dt} dt} \end{eqnarray} Now let's use this, with \( r \) taking the role of \( a \), and \( \nabla_v g \) taking the role of \( b \): \begin{eqnarray} D_r J[x^*] & = & \int_{0}^{T}{r(t) \cdot \left( \nabla_q g^*(t) - \frac{d}{dt}\nabla_v g^*(t) \right) dt}\\ & & + r(T) \cdot \nabla_v g^*(T) - r(0) \cdot \nabla_v g^*(0) \end{eqnarray} We've gotten rid of the \( \dot{r} \) in the integral, at the cost of some terms about what happens at \( t=0 \) and \( t=T \). Now we reason as follows: we want \( D_r J[x^*] = 0 \) for all reasonable \( r \), which should definitely include perturbations which go to zero at the initial and final time, \( r(0) = r(T) = 0 \). For those perturbations, we're left with the integral: \[ D_r J[x^*] = \int_{0}^{T}{r(t) \cdot \left( \nabla_q g^*(t) - \frac{d}{dt}\nabla_v g^*(t) \right) dt} \] Guaranteeing that this is zero for all such perturbations requires that the integrand be zero: \[ \nabla_q g(t, x^*(t), \dot{x}^*(t)) - \frac{d}{dt}\nabla_v g(t, x^*(t), \dot{x}^*(t)) = 0 \] where I'm being explicit again about the arguments of \( g \).

(Suppose this last expression was instead equal to some vector \( c \neq 0 \) at
a particular time \( t_{bad} \). By continuity it will be *close* to \(
c \) over at least a small interval around \( t_{bad} \). Now consider a
perturbation where \( r(t) = c \) in that interval, but \( r(t) = 0 \) outside
it. [If we only want to deal with smooth perturbations, we can approximate
this by having \( r \) change steeply, but smoothly, at either end of the time
interval.] Clearly the integral for \( D_r J[x^*] \) with this \( r \) will be
\( > 0 \), which is not allowed.)

(If you think the previous parenthesized paragraph leaves a loophole, and
all we've shown is that the integrand must be zero for [Lebesgue] *almost
all* times, you are correct. But I am trying not to expose the mysteries
of measure theory before non-initiates.)

Written out coordinate by coordinate, \( i \in 1:d \),
\[
\frac{\partial g}{\partial q_i}(t, x^*(t), \dot{x}^*(t)) - \frac{d}{dt}\frac{\partial g}{\partial v_i}(t, x^*(t), \dot{x}^*(t)) = 0
\]
which are the **Euler-Lagrange equations**.

Notice that these are conditions on the optimal trajectory, so the fact that
we got them by considering the limited set of perturbations where \( r(0) =
r(T) = 0 \) is fine. We know those are allowed perturbations, so the optimal
trajectory *must* obey these equations. Considering a *larger*
class of perturbations might lead us to additional restrictions on the optimum
(see below), but it can't *erase* this one.

##### What about the boundary terms?

We've seen that the optimal \( x^* \) has to obey the Euler-Lagrange equations, so the integral term in the expression for \( D_r J[x^*] \) must be zero. But if we pick perturbations where \( r(0) \neq 0 \) or \( r(T) \neq 0 \), the boundary terms might not be zero. Now in some cases this doesn't matter. There are plenty of problems where we are only interested in trajectories which themselves meet some boundary conditions, like \( x(0) = x_{initial} \) and \( x(T) = x_{final} \). Then the only admissible perturbations will indeed have \( r(0) = r(T) = 0 \). But if one or the other of those boundary conditions are "free", unrestricted, we can reason our way to saying that \begin{eqnarray} \nabla_v g(0, x^*(0), \dot{x}^*(0)) & = & 0 ~ \text{(free initial conditions)}\\ \nabla_v g(T, x^*(T), \dot{x}^*(T)) & = & 0 ~ \text{(free final conditions)} \end{eqnarray}#### From Euler-Lagrange to the Hamiltonian

I will now introduce a \( k \)-dimensional control input \( u \), which works by influencing the time-evolution of the state \( x \), but which also shows up in the cost or gain: \[ \dot{x}(t) = f(x(t), u(t)) \] and \[ J[x, u] \equiv \int_{0}^{T}{g(t, x(t), u(t)) dt} \] with \( x(0) = x_{initial} \) fixed. (You might ask about time-dependent dynamics, \( \dot{x}(t) = f(t, x(t), u(t)) \), but that can be accommodated by adding an extra coordinate to the state, say \( x_{d+1} \), where \( x_{d+1}(0) = 0 \) and \( \dot{x}_{d+1} = 1 \) always.)Trusting in the magic of Lagrange multipliers, let's try to optimize \[ L[x, u, p] = \int_{0}^{T}{\left( g(t, x(t), u(t)) - p(t) \cdot (f(t, x(t), u(t)) - \dot{x}(t)) \right) dt} \] introducing a vector of \( d \) time-varying Lagrange multipliers \( p(t) \).

Now what happens if we move from the optimal control signal \( u^* \) to \( u^* + \epsilon r \)? Well, we'll move from \( x^* \) to, say, \( x^* + \delta \). We can treat \( \delta \) as independent of \( r \), not because it really is, but because the Lagrange multiplier is enforcing the constraint that links them. We can limit ourselves, as above, to situations where \( r(0) = r(T) = 0 \), and \( \delta(0) = \delta(T) = 0 \).

Taylor-expanding to first order, and writing \( g^*(t) \) and \( f^*(t) \) as above, \begin{eqnarray} L[x^*+\delta, u^*+\epsilon r, p^*] - L[x^*, u^*, p^*] & \approx & \int_{0}^{T}{\left( \delta(t) \cdot \nabla_q g^*(t) + \epsilon r (t) \cdot \nabla_v g^*(t) - p^*(t) \cdot \left( \nabla_q f^*(t) \delta(t) + \epsilon \nabla_v f^*(t) r(t)- \dot{\delta}(t) \right) \right) dt} \end{eqnarray}

(I am abusing notation *even more* by writing things like \( \nabla_q f \): since \( f \) is a \( \mathbb{R}^d \)-valued vector, this should be understood as a \( d \times d \) matrix. Similarly, \( \nabla_v f \) is a \( d \times k \) matrix.)

Now let's use integration by parts again: \[ \int_{0}^{T}{p(t) \cdot \dot{\delta}(t) dt} = p(T)\delta(T) - p(0)\delta(0) - \int_{0}^{T}{\delta(t) \cdot \dot{p}(t) dt} \] But remember we're (temporarily) limiting ourselves to perturbations where \( \delta(0) = \delta(T) = 0 \). So we can get rid of the annoying \( \dot{\delta}(t) \) term, at the cost of a \( \dot{p}(t) \) term: \begin{eqnarray} L[x^*+\delta, u^*+\epsilon r, p^*] - L[x^*, u^*, p^*] & \approx & \int_{0}^{T}{\left( \delta(t) \cdot \nabla_q g^*(t) + \epsilon r (t) \cdot \nabla_v g^*(t) - p^*(t) \cdot \left( \nabla_q f^*(t) \delta(t) + \epsilon \nabla_v f^*(t) r(t)- \dot{p}^*(t) \cdot \delta(t) \right) \right) dt}\\ & = & \int_{0}^{T}{\left( \delta(t) \cdot \left[ \nabla_q g^*(t) - (\nabla_q f^*(t))^T p^*(t) - \dot{p}^*(t) \right] + \epsilon r(t) \cdot \left[ \nabla_v g^*(t) - (\nabla_v f^*(t))^T p^*(t)\right] \right) dt} \end{eqnarray}

For this integral to be zero, we should have both the terms in square brackets be zero: \begin{eqnarray} \nabla_q g^*(t) - (\nabla_q f^*(t))^T p^*(t) & = & \dot{p}^*(t)\\ \nabla_v g^*(t) - (\nabla_v f^*(t))^T p^*(t) & = & 0 \end{eqnarray}

Matters actually become more illuminating if we introduce the at-first-mysterious-seeming **Hamiltonian**
\[
H(x, u, p) \equiv -g(t,x, u) + p \cdot f(x, u)
\]

This gives us \[ \int_{0}^{T}{\left(\delta(t) \cdot \left(\nabla_q H^*(t) + \dot{p}^*(t)\right) + \epsilon r(t) \cdot \nabla_v H^*(t) \right) dt} = 0 \] If we pick the Lagrange multipliers to satisfy \[ \dot{p} = - \nabla_q H \] then the first part of the integral is zero for every perturbation \( \delta(t) \), and the over-all integral simplifies to \[ \int_{0}^{T}{\epsilon r(t) \cdot \nabla_v H^*(t) dt} = 0 \] Making this zero for all perturbations \( r(t) \) requires \[ \nabla_v H(x^*(t), u^*(t), p^*(t)) = 0 \] at all times. That is, we need to pick the control \( u^*(t) \) so as to maximize the Hamiltonian at time \( t \) (or at least render it stationary).

Let me also remark that \[ \dot{x} = \nabla_p H = f(x, u) \] from the definition.

So, to sum up, \begin{eqnarray} H(x, u, p) & \equiv & -g(t,x, u) + p \cdot f(x, u)\\ 0 & = & \nabla_u H(x^*(t), u^*(t), p^*(t))\\ \dot{x}^*(t) & = & \nabla_p H(x^*(t), u^*(t), p^*(t))\\ \dot{p}^*(t) & = & - \nabla_q H(x^*(t), u^*(t), p^*(t))\\ \end{eqnarray}

Given the current state \( x^*(t) \) and the current Lagrange multipliers \(
p^*(t) \), we pick the control \( u^*(t) \) to maximize the Hamiltonian. Once
we know the control, the gradients of the Hamiltonian tell us the rates of
change in both the state variables and the Lagrange multipliers. Remarkably
enough, the Hamiltonian which tells us how to move along the optimal trajectory
is a function of the *current* variables alone.

Rate of change of the Hamiltonian

How does the Hamiltonian change along the optimal trajectory?
\begin{eqnarray}
\frac{dH^*}{dt} & = & \frac{\partial H^*}{\partial t} + \nabla_q H^* \dot{x}^* + \nabla_u H^* \dot{u}^* + \nabla_p H^* \dot{p}\\
& = & - \frac{\partial g(t, x^*(t), u^*(t), p^*(t))}{\partial t} + (\nabla_q H^*) \cdot \nabla_p H^* + 0 - (\nabla_p H^*)\cdot \nabla_q H^*\\
& = & - \frac{\partial g(t, x^*(t), u^*(t), p^*(t))}{\partial t}
\end{eqnarray}
The rate of change for the Hamiltonian just comes from the *explicit*time-dependence of the instantaneous cost/benefit function \( g \). If that's time-independent, then the Hamiltonian is constant along the optimal trajectory. If \( g(t, x, u) = e^{-\rho t} \gamma(x, u) \), so we're looking at some sort of net present value with a discount rate of \( \rho \), the relevant partial derivative becomes \begin{eqnarray} \frac{dH^*}{dt} & = & (-\rho)( - g^*(t))\\ & = & (-\rho) (H^*(t) - p^*(t) \cdot f(x^*(t), u^*(t)))\\ H^*(t) &= & H^*(0) - \rho\int_{0}^{t}{H^*(s) ds} + \rho \int_{0}^{t}{p^*(s) \cdot f(x^*(s), u^*(s)) ds} \end{eqnarray} If the second term wasn't there, this'd just be exponential decay (or growth), so it's exponential decay with an input term...