Ecological Archives A013-019-A2

Donald Ludwig, Steve Carpenter, and W. Brock. 2003. Optimal phosphorus loading for a potentially eutrophic lake. Ecological Applications 13:1135–1152.

Appendix B. An algorithm for the calculation of the expected discounted present value (PV).

We obtain the optimal feedback and the associated present value by solving the Bellman equation of stochastic dynamic programming (Mangel and Clark 1988, Clark and Mangel 2000). If we replace the continuous variables P and M by sets of values defined on meshes, then the continuous dynamic model becomes a discrete one that is amenable to computation. In contrast to Mangel and Clark, we seek solutions that are independent of time. These solutions may be obtained as limits of the usual time-dependent solutions as the time horizon recedes to infinity.

We solve the Bellman equation by a combination of so-called value iterations and policy iterations. We begin with an initial guess for the policy and present value that is analogous to the results in Carpenter et al. (1999). Value iteration uses the initial guess for present value as the final payoff after T years, and the value at earlier years is obtained by the usual backwards iteration of dynamic programming, using a fixed policy. As T approaches infinity, the value at the initial time approaches a steady state, which is the result of the value iteration. This steady state is the basis of a policy iteration: a new policy is determined to maximize the value obtained by the preceding value iteration. The value of this new policy is then computed by value iteration, and then the policy is updated by successive applications of this process until there are no significant changes in policy or value.

Although such a procedure might seem cumbersome when compared with simulation results, it should be kept in mind that comparable results using simulations would require many simulations (perhaps hundreds) corresponding to each combination of P and M. We have used simulations to check the present method.

 

Approximation of the expectation of future values

We approximate the function vi+1 in Eq. B.6 by a linear interpolation of its values at mesh points. Since z in Eq. B.2 is unbounded, values of vi+1 beyond the last mesh point also enter in Eq. B.6. We extrapolate vi+1 to decay quadratically beyond that point, proportional to z. This ensures that P concentrations higher than the largest mesh point are penalized. The time variable appears as a superscript in the all of the following equations. This change is made to avoid confusion with the mesh index, which is denoted by a subscript. Let

(B.1)

This quantity may be interpreted as the P level that would be expected in the absence of loading. Since loading is non-negative, all accessible mesh points must lie at or above P*t. Let zk correspond to the random factor required to bring Pt+1 to the k-th mesh point. If Pk > P*t, then Eq. A.6 implies that
$\displaystyle z_k = \frac{1}{{\sigma}} \log \left( \frac{P_k -P^{*t})}{L^
*_{i,j}} \right),$ (B.2)

where
$\displaystyle L^*_{i,j} = L_{i,j}\frac{1 - e^{-s-h}}{s+h}.$ (B.3)

It is convenient to extend the definition of zk to cover the first interval that contains P*t and the interval beyond the last mesh point: let
  $\displaystyle z_{k^*} = -\infty$   if $ P_{k^*} \le P^{*t} < P_{k^* +1}$$\displaystyle ,$ (B.4)
  $\displaystyle z_{N+1} = \infty.$ (B.5)

 


Linear interpolation

If Pt+1 is between Pk and Pk+1, then the linear interpolation of vt+1 (denoted by v henceforth) is

$\displaystyle v(P^{t+1}) \approx \frac{v_k P_{k+1} - v_{k+1} P_k + P^{t+1
}(v_{k+1} - v_k)}{P_{k+1} - P_k}.$ (B.6)

From Eq. A.6 we have
$\displaystyle P^{t+1} = P^{*t} + L^*_{i,j} e^z.$ (B.7)

Hence
$\displaystyle v(P^{t+1}) \approx A_k + B_k e^z + C_k e^{2z},$ for $\displaystyle \quad k^* \le k \le N,$ (B.8)

where
$\displaystyle A_k$ $\displaystyle = \frac{v_k P_{k+1} - v_{k+1} P_k + P^{*t}(v_{k+1} - v_k) }
{P_{k+1} - P_k},$ (B.9)
$\displaystyle B_k$ $\displaystyle = L^*_{i,j}\frac{v_{k+1} - v_k}{P_{k+1} - P_k},$ (B.10)
$\displaystyle C_k$ $\displaystyle = 0,$ (B.11)


provided that $ k^* \le k <
N$. The definition of Ck is for later convenience.

 

Extrapolation

We extrapolate v for Pt+1 > PN as follows :

$\displaystyle v(P^{t+1})= v_N - \frac{{\beta}_2}{{\delta}}\left[ {(P^{t+1
})}^2 - P_N^2 \right].$ (B.12)

This leads to
$\displaystyle v(P^{t+1})= A_N + B_N e^z + C_N e^{2z},$ (B.13)

where
$\displaystyle A_N$ $\displaystyle = v_n - \frac{{\beta}_2}{{\delta}}\left[ {(P^{*t})}^2 - P_N
^2 \right],$ (B.14)
$\displaystyle B_N$ $\displaystyle = - \frac{2{\beta}_2}{ {\delta}}L^*_{i,j} P^{*t},$ (B.15)
$\displaystyle C_N$ $\displaystyle = -\frac{{\beta}_2}{{\delta}}{L^*_{i,j}}^2.$ (B.16)

With these definitions and approximations, it follows that
E$\displaystyle [ v(P^{t+1})] \approx \sum_{k=k^*}^{N} \frac {1}{\sqrt {2\p
i}} \i...
...z_k}^{z_{k+1}} \exp(-\frac{1}{2}z^2)\left( A_k + B_ke^z + C_k e^{2z}\right) d
z.$ (B.17)

The integrals may be evaluated in terms of the cumulative distribution function for the normal distribution:
$\displaystyle \Phi(u) =\frac {1}{\sqrt {2\pi}} \int_{-\infty}^u \exp(-\fr
ac{1}{2}z^2) dz.$ (B.18)

This yields
\begin{displaymath}\begin{split}\text{E}[ v(P^{t+1})] & \approx \sum_{
k=k^*}^{N}...
...i(z_{k+1}-2{\sigma}) - \Phi(z_k-2{\sigma}) \right]. \end{split}\end{displayma
th} (B.19)


These equations determine vt for a specified Li,j and vt+1.

 

Interpolation in M

At this point we must face the fact that $ M_\ell$ actually depends upon $ (P_i, \, M_j)$ according to Eq. B.1. In general $ M_\ell$ lies between two mesh points; $ \ell$ is not an integer, but $ can be obtained by linear interpolation between mesh points.

 

Value (backwards) iteration

If the value VT is specified at a final time T, then Eq. B.6 may be applied successively to obtain $ v^{T-1}, \ldots,
v^0$. If T approaches infinity with the end value vT fixed, then discounting ensures that the sequence of the corresponding v0 converges. This limit is the solution of the time-independent Bellman equation obtained from Eq. B.6 by deleting the superscripts.

 

Consideration of the posterior distribution

In the case where parameter values are given by a posterior distribution, the preceding equations must be modified. We perform value iterations for each point of the posterior: the value function is then given by

$\displaystyle v(P_i,M_j) = \sum_p w_p v^p(P_i,M_j),$ (B.20)


where vp denotes the limit of the preceding value iterations f or a single point p in the posterior and wp is the weight associated with the p-th point of the posterior.

 

Policy Iteration

So far the feedback policy Li,j was arbitrarily prescribed. We can use the Bellman equation to improve a given policy. Given a policy $ L^\tau$, and the corresponding value $ v^{t,p,\tau}(P_k,M_l)$, we can apply Eq. B.6 to each term to obtain

$\displaystyle v^{t+1}(P_i,M_j,L) = \sum_p w_p \left( n^p(P_i,L) + (1-{\de
lta}) \text{E}\left[ v^{t,p,\tau}(P_k(L),M_l)\right] \right).$ (B.21)

Here L stands for a matrix of target loadings. Policy iterations consist in starting with a guess L0 for the loading and calculating the associated value $ v^{t+1}(P_i,M_j,L)$. We define the next approximate loading by maximizing the right-hand side of Eq. B.21. Thus the next loading $ L^{\tau+1}$ satisfies
$\displaystyle L^{\tau+1}_{i,j} = \hbox{arg max}_L \sum_p w_p \left( n^p(P
_i,L) + (1-{\delta}) \text{E}\left[ v^{t,p,\tau}(P_k(L),M_l)\right] \right).$ (B.22)


This maximization is carried out separately for each point Pi, Mj. Note that the complete sum over the posterior is required for each maximization. We used Brent's method, first evaluating the function on a mesh of 50 points in order to try to find all local maxima. The simpler method of looking for a single local maximum fails because the right-hand side of Eq. B.22 may have several local maxima.

Literature cited

Carpenter, S. R., D. Ludwig, and W. A. Brock. 1999. Management of eutrophication for lakes subject to potentially irreversible change. Ecological Applications 9:751–771.

Clark, C. W., and M. Mangel. 2000. Dynamic state variable models in ecology. Oxford University Press, Oxford, UK.

Mangel, M., and C. W. Clark. 1988. Dynamic modeling in behavioral ecology. Princeton University Press, Princeton, New Jersey, USA.



[Back to A013-019]