We obtain the optimal feedback and the associated present value by solving the Bellman equation of stochastic dynamic programming (Mangel and Clark 1988, Clark and Mangel 2000). If we replace the continuous variables P and M by sets of values defined on meshes, then the continuous dynamic model becomes a discrete one that is amenable to computation. In contrast to Mangel and Clark, we seek solutions that are independent of time. These solutions may be obtained as limits of the usual time-dependent solutions as the time horizon recedes to infinity.
We solve the Bellman equation by a combination of so-called value iterations and policy iterations. We begin with an initial guess for the policy and present value that is analogous to the results in Carpenter et al. (1999). Value iteration uses the initial guess for present value as the final payoff after T years, and the value at earlier years is obtained by the usual backwards iteration of dynamic programming, using a fixed policy. As T approaches infinity, the value at the initial time approaches a steady state, which is the result of the value iteration. This steady state is the basis of a policy iteration: a new policy is determined to maximize the value obtained by the preceding value iteration. The value of this new policy is then computed by value iteration, and then the policy is updated by successive applications of this process until there are no significant changes in policy or value.
Although such a procedure might seem cumbersome when compared with simulation results, it should be kept in mind that comparable results using simulations would require many simulations (perhaps hundreds) corresponding to each combination of P and M. We have used simulations to check the present method.
We approximate the function vi+1 in Eq. B.6 by a linear interpolation of its values at mesh points. Since z in Eq. B.2 is unbounded, values of vi+1 beyond the last mesh point also enter in Eq. B.6. We extrapolate vi+1 to decay quadratically beyond that point, proportional to z. This ensures that P concentrations higher than the largest mesh point are penalized. The time variable appears as a superscript in the all of the following equations. This change is made to avoid confusion with the mesh index, which is denoted by a subscript. Let
![]() |
(B.1) |
![]() |
(B.2) |
![]() |
(B.3) |
| (B.4) | ||
| (B.5) |
If Pt+1 is between Pk and Pk+1, then the linear interpolation of vt+1 (denoted by v henceforth) is
![]() |
(B.6) |
| (B.7) |
| (B.8) |
provided that
. The definition of Ck is for later convenience.
We extrapolate v for Pt+1 > PN as follows :
![]() |
(B.12) |
| (B.13) |
E![]() |
(B.17) |
![]() |
(B.18) |
![]() |
(B.19) |
These equations determine vt for a specified Li,j
and vt+1.
At this point we must face the
fact that
actually depends upon
according to Eq. B.1. In general
lies between two mesh points;
is not an integer, but
can be obtained by linear interpolation between mesh points.
If the value VT
is specified at a final time T, then Eq. B.6 may be applied successively
to obtain
. If T approaches infinity with the end value vT
fixed, then discounting ensures that the sequence of the corresponding v0
converges. This limit is the solution of the time-independent Bellman equation
obtained from Eq. B.6 by deleting the superscripts.
In the case where parameter values are given by a posterior distribution, the preceding equations must be modified. We perform value iterations for each point of the posterior: the value function is then given by
![]() |
(B.20) |
where vp denotes the limit of the preceding value iterations
f or a single point p in the posterior and wp is the
weight associated with the p-th point of the posterior.
So far the feedback policy Li,j
was arbitrarily prescribed. We can use the Bellman equation to improve a given
policy. Given a policy
, and the corresponding value
, we can apply Eq. B.6 to each term to obtain
![]() |
(B.21) |
| (B.22) |
This maximization is carried out separately for each point Pi,
Mj. Note that the complete sum over the posterior is required
for each maximization. We used Brent's method, first evaluating the function
on a mesh of 50 points in order to try to find all local maxima. The simpler
method of looking for a single local maximum fails because the right-hand side
of Eq. B.22 may have several local maxima.
Literature cited
Carpenter, S. R., D. Ludwig, and W. A. Brock. 1999. Management of eutrophication for lakes subject to potentially irreversible change. Ecological Applications 9:751771.
Clark, C. W., and M. Mangel. 2000. Dynamic state variable models in ecology. Oxford University Press, Oxford, UK.
Mangel, M., and C. W. Clark. 1988. Dynamic modeling in behavioral ecology. Princeton University Press, Princeton, New Jersey, USA.