The Role of Hamilton-jacobi-bellman Equations in Optimal Control Problems

Understanding Optimal Control Problems

Optimal control theory is a branch of mathematics and engineering that deals with finding control policies for dynamic systems over time. The goal is to steer a system from an initial state to a desired target while optimizing a performance criterion—typically minimizing cost or maximizing reward. These problems arise across domains: in aerospace, to design fuel-efficient trajectories; in finance, to manage investment portfolios; in robotics, to plan dexterous movements; and in economics, to allocate resources efficiently.

Every optimal control problem consists of three core elements: a state variable describing the system at any instant, a control input chosen by the decision maker, and a cost function that quantifies performance over time. The system evolves according to a differential equation dx/dt = f(x(t), u(t), t), where x is the state and u is the control. The objective is to find a control law u*(t) that minimizes a total cost, often written as an integral of running costs plus a terminal cost.

The Hamilton–Jacobi–Bellman Equation

The Hamilton–Jacobi–Bellman (HJB) equation is a partial differential equation (PDE) that provides a necessary and sufficient condition for optimality in continuous-time control problems. It gives a powerful way to characterize the value function V(x, t), which represents the optimal cost-to-go starting from state x at time t.

The HJB equation emerges from Bellman’s principle of optimality: “An optimal policy has the property that whatever the initial state and decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.” In continuous time, this principle leads to a backward-in-time PDE that the value function must satisfy.

The general form of the HJB equation for a deterministic system is:

0 = min_u { L(x, u, t) + ∂V/∂t + ∇V · f(x, u, t) }

where:

V(x, t) is the value function (optimal cost from state x at time t)
L(x, u, t) is the running cost (e.g., energy, fuel, or deviation)
f(x, u, t) describes the system dynamics
∇V is the spatial gradient of the value function
∂V/∂t accounts for time dependence

The minimization is taken over all admissible controls u. Once the value function is known, the optimal control can be recovered as the argument that achieves the minimum: u*(x, t) = argmin_u { L + ∇V · f }.

Interpretation and Relation to the Hamiltonian

The HJB equation closely mirrors the Hamiltonian formulation of classical mechanics. In fact, the Hamiltonian of the control problem is defined as H(x, p, u) = L(x, u) + p · f(x, u), where p is the costate (adjoint) variable. The HJB equation then becomes ∂V/∂t + min_u H(x, ∇V, u) = 0. This shows that the value function gradient plays the role of the costate, linking dynamic programming to Pontryagin’s maximum principle.

Applications of the HJB Equation

The HJB equation is not merely a theoretical construct—it is used to solve real-world problems across many fields. Below are several key application areas.

Finance and Portfolio Optimization

In mathematical finance, optimal portfolio selection is a classic stochastic control problem. The HJB equation (with an added diffusion term for stochasticity) yields the optimal allocation between risky and risk-free assets. For example, in the Merton problem, an investor maximizes expected utility of consumption over a lifetime. The HJB equation reduces to a nonlinear PDE whose solution gives the optimal consumption rate and portfolio weight. This framework also underpins option pricing by relating the HJB equation to the Black–Scholes PDE under certain assumptions.

Autonomous Vehicles and Robotics

Path planning for self-driving cars and drones often involves minimizing time, fuel, or risk while respecting obstacles and dynamics. The HJB equation can be used to compute the optimal value function over the entire state space, yielding a feedback policy that is robust to disturbances. For instance, in motion planning, a vehicle’s dynamics are encoded in f, and obstacles are incorporated as state constraints or high cost regions. Solving the HJB equation provides a global optimal solution, though it is computationally intensive for high-dimensional state spaces.

Economics and Resource Allocation

In economics, optimal control models describe growth, resource extraction, and capital accumulation. The Ramsey–Cass–Koopmans model uses a simplified HJB-like equation to determine the optimal path of savings and consumption. More complex models with multiple sectors or stochastic shocks require solving the HJB equation numerically. Central banks also use stochastic control to set interest rates based on inflation and output, though simplifications are often employed for tractability.

Robotics and Automation

Beyond path planning, the HJB equation appears in optimal feedback control for robotic manipulators. When the dynamics are nonlinear and the cost is quadratic, the HJB equation reduces to the algebraic Riccati equation (for linear systems) or must be solved numerically for nonlinear cases. Recent work uses HJB-based control for walking robots, quadcopters, and even soft robotics.

Challenges and Numerical Methods

Despite its elegance, the HJB equation is notoriously difficult to solve, especially for high-dimensional systems. The main obstacles are:

Curse of dimensionality: The value function is defined over the state space, which grows exponentially with dimension. For a 10-dimensional system, a grid with 100 points per dimension would have 100¹⁰ points—far beyond computational reach.
Nonlinearity: The HJB equation is a first-order nonlinear PDE (or second-order for stochastic problems). Standard linear PDE solvers do not apply.
Discontinuities: The value function can become nonsmooth, especially in problems with state constraints or discontinuous costs, requiring the notion of viscosity solutions.

Viscosity Solutions

Classical solutions (twice differentiable) of the HJB equation may not exist, even for well-posed control problems. The theory of viscosity solutions, developed by Crandall and Lions in the 1980s, provides a weak solution framework that is both well-posed and amenable to numerical approximation. Viscosity solutions allow the value function to have kinks while ensuring uniqueness and stability.

For further reading, see the foundational paper: “Viscosity solutions of Hamilton-Jacobi equations” by Crandall and Lions (1983).

Grid-Based Methods

Classical approaches discretize the state space onto a grid and solve a discrete dynamic programming problem. Methods include finite difference schemes (e.g., upwind differences), semi-Lagrangian schemes, and level-set methods. These work well for up to 3–4 dimensions but quickly become infeasible beyond that.

Machine Learning and Neural Networks

Recent advances use neural networks to approximate the value function, bypassing the grid. Key approaches include:

Deep Galerkin Method (DGM): A meshfree method that trains a neural network to satisfy the PDE at random collocation points. It uses automatic differentiation to compute gradients and enforces the HJB equation as a loss term.
Physically Informed Neural Networks (PINNs): Similar to DGM, PINNs embed the PDE and boundary conditions into the loss function. They can handle irregular domains and high dimensions.
Reinforcement Learning: In the spirit of dynamic programming, actor-critic algorithms approximate the value function (critic) and policy (actor) iteratively, effectively solving the HJB equation without explicitly constructing the PDE.

For a survey, see “Solving high-dimensional Hamilton-Jacobi-Bellman equations with neural networks” by Darbon and Osher (2020).

Approximate Dynamic Programming

When the exact HJB solution is intractable, approximate dynamic programming (ADP) methods are used. These include:

Linear programming approaches that relax the HJB inequality.
Policy iteration and value iteration with function approximation (e.g., radial basis functions, neural networks).
Model predictive control (MPC) which solves a finite-horizon open-loop problem at each step, implicitly approximating the solution of the HJB equation over a short horizon.

Recent Advances and Open Problems

Research on the HJB equation remains active, especially in high-dimensional and stochastic settings. Key directions include:

Tensor train decompositions to represent the value function in a low-rank format, making high-dimensional problems tractable.
Dual dynamic programming for stochastic control in energy systems and inventory management.
Hamilton-Jacobi reachability analysis for safety-critical systems (e.g., collision avoidance) where the HJB equation characterizes the set of states from which a system can avoid failure.
Stochastic HJB equations with jumps driven by Lévy processes, relevant to finance and insurance.

Despite progress, many open problems remain. For example, rigorously solving HJB equations for 100-dimensional systems with non-smooth dynamics is still an active frontier. The combination of reinforcement learning and PDE theory offers promising avenues.

Conclusion

The Hamilton–Jacobi–Bellman equation provides a unifying mathematical framework for optimal control, linking dynamic programming, PDE theory, and practical algorithms. From finance to robotics, it enables the design of optimal strategies under complex dynamics and constraints. Although solving the HJB equation exactly is often challenging, modern numerical methods—especially those leveraging machine learning—are pushing the boundaries of what is computationally feasible. As research continues, the HJB equation will remain a cornerstone of optimal control, guiding both theory and application.

For a deeper dive into the mathematical theory, refer to the classic textbook: “Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations” by Bardi and Capuzzo-Dolcetta (1997).