Ecological Archives A015-009-A1

Toshinori Okuyama and Benjamin M. Bolker. 2005. Combining genetic and ecological data to estimate esa turtle origins. Ecological Applications 15:315–325.

Appendix A. The model specification and Markov Chain Monte Carlo.

The DAG description of the model is shown in Fig. A1. The covariate model is extended from the previous non-covariate model (Pella and Masuda 2001). The extended components of the model are described here (i.e., the gray nodes in Fig. A1).

Let $ G$ (indexed by $ k$) denote the number of gyres. There are $ R_{k}$ rookeries in the $ k$th gyre. The size of rookeries in the $ k$th gyre is described by vector $ \textbf{n}_{k}=(n_{1k},\ldots,n_{R_{k}k})$. $ \textbf{p}=(p_{1},\ldots,p_{G})$ is the contribution vector for gyres (i.e., the rookery contributions of $ k$th gyre sum to $ p_{k}$). Let $ \bm{\theta}$ be the mixed stock contributions such that $ \bm{\theta_{k}}=(\theta_{1k},\ldots,\theta_{R_{k}k})$ is the contribution vector for $ k$th gyre;

$\displaystyle \bm{\theta}=(p_{1}\bm{\theta}_{1},p_{2}~\bm{\theta}_{2},\cdots,p_{G}\bm{\theta}_{G})\\ $    

where $ \bm{\theta}_{i}=\textrm{alr}^{-1}(\bm{\xi}_{i})$. Which follows

$\displaystyle \textrm{alr}(\bm{\theta}_{k}) \sim \textrm{Normal}(a_{0k}+a_{1k}\textrm{alr}(\textbf{n}_{k}),\textbf{M}_{k})$    

where $ a_{0k}$ and $ a_{1k}$ are the intercept and slope parameters for the $ k$th gyre. $ \textbf{M}_{k}$ is a variance covariance matrix for the $ k$th gyre. When performing the additive log ratio (alr) transform, the smallest rookery was taken as the base: i.e., the rookeries were sorted in decreasing order of size, so that the denominator in equation 2 was the smallest rookery size. The prior distribution of the slope parameter was restricted to be nonnegative; $ a_{1k}\sim \textrm{Normal}(0,1)I_{[\geq 0]}$. This incorporates our opinion that rookery size and true rookery contribution might be uncorrelated, but would not be negatively correlated. The other priors were set as follows: $ a_{0k}\sim
\textrm{Normal}(0,1)$, $ \sigma_{k}^{2} \sim \textrm{Gamma}(1,1)$ where $ \sigma_{k}^{2}$ are the diagonal elements of the diagonal covariance matrix $ \textbf{M}_{k}$. These priors have expected values equivalent to the non-covariate model, so that in cases where the data are uninformative about the relationship between rookery size and rookery contribution the model will behave like the non-covariate model. For example, in the simulations presented in this paper, the parameter $ a_{1k}$ (slope of the relationship between rookery size and rookery contribution in gyre $ k$) had a posterior distribution concentrated around the true slope for the additive-log-ratio transformed sizes and contributions. As $ \rho$, the correlation between the non-transformed sizes and contributions, decreased toward zero, $ a_{1k}$ decreased toward zero (equivalent to the non-covariate model) as well. The hyperparameters $ \bm{\beta}$ (parameters for the distribution of rookery haplotypes) were determined by using the pseudo-Bayes method (Pella and Masuda 2001). We used a Dirichlet prior for the gyre contribution, $ \textbf{p}\sim\textrm{Dirichlet}(\mathbf{\alpha})$ where $ \bm{\alpha}=(\alpha_{1},\ldots,\alpha_{G})$. We set $ \alpha_{k}=R_{k}/R$ which corresponds to distributing equal weight to each rookery, and assuming an amount of prior information equivalent to sampling a single individual (a weak prior).

It is not necessary to incorporate ecological covariates for every gyre. For example, when there are two gyres, a covariate structure can be specified for one gyre and a non-covariate structure can be specified for the other gyre. Such a specification may be useful if the rookery sizes provide useful information in only one gyre. For example, in our analysis with green turtle data, we used the covariate model for both gyres, but for loggerhead turtles, we omitted the covariate model for the second gyre (since it contains only two rookeries). In general we would only specify a covariate model if there were at least four rookeries in a gyre.





   FIG. A1. Directed acyclic graph for the stock mixture model. Notation: $ \textbf{x}_{i}=(x_{i1},\ldots,x_{iH})$ is the haplotype profile of $ i$th individual in the mixture population such that if the individual's haplotype is $ m$, $ x_{ih}=1$ for $ h=m$ and $ x_{ih}=0$ for $ h\neq m$. $ H$ denotes the number of observed haplotypes. $ R$ denotes the number of rookeries. $ R_{k}$ denotes the number of rookeries in the $ k$th gyre (i.e., $ R=R_{1}+\cdots+R_{G}$). $ G$ denotes the number of gyres. $ \textbf{q}_{j}=(q_{j1},\ldots,q_{jH})$ is the set of haplotype frequencies for the $ j$th rookery. $ \textbf{y}_{j}=(y_{j1},\ldots,y_{jH})$ is the number of turtles from $ j$th rookery such that $ y_{jh}$ denotes the number of turtles with haplotype $ h$ in the $ j$th rookery. $ z_{i}$ indicates the stock identity of $ i$th individual in the mixture population: $ z_{i}=j$ indicates the $ i$th individual is imputed to the $ j$th rookery. $ \textbf{p}=(p_{1},\cdots,p_{G})$ is the contributions from each of the $ G$ ocean gyres. $ \bm{\xi}_{k}=(\xi_{1k},\ldots,\xi_{(R_{k}-1)k})$ is the additive log ratio transformed contribution within the $ k$th ocean gyre. $ R_{k}$ is the number of rookeries in the $ k$th gyre. $ \textbf{n}_{k}=(n_{1k},\ldots,n_{(R_{k})k})$ is the vector of rookery sizes in the $ k$th gyre.





To implement Gibbs sampling, we need to specify the full conditional distribution for each unobserved node. A DAG model provides a way to describe the parameters using the following general relationship:

$\displaystyle P(V)=\prod_{v \in V} P(v\vert~\textrm{parents of}~v)$    

where $ V$ is a set of all the node $ v$, and $ P(\cdot)$ is a probability distribution. A full conditional distribution for a node is the distribution of that node given current or known values for all the other nodes in the graph (Spiegelhalter et al. 1996). Let $ V_{-v}$ be the set of all the nodes except for $ v$. Then the full conditional distribution is

$\displaystyle P(v\vert V_{-v}) \propto$ $\displaystyle ~P(v,V_{-v})$    
$\displaystyle \propto$ $\displaystyle ~\textrm{terms in}~P(V)~\textrm{containing}~v$    
$\displaystyle =$ $\displaystyle ~P(v\vert\textrm{parents of}~v) \times$    
  $\displaystyle ~\prod_{w\in\textrm{descendants of}~v} P(w\vert~\textrm{parents of}~w)$    

where $ \propto$ means "proportional to". For example, the full conditional distribution for $ \textbf{q}_{j}$ is

$\displaystyle P(\textbf{q}_{j}\vert V_{-\textbf{q}_{j}}) \propto$ $\displaystyle ~ P(\textbf{x}\vert\textbf{q}_{j},\textbf{Z})P(\textbf{y}\vert\textbf{q}_{j})P(\textbf{q}_{j})$    
$\displaystyle \propto$ $\displaystyle ~ \prod_{i=1}^{N}\textrm{Multinomial}(\textbf{x}_{i},\textbf{q}_{...
...ultinomial}(\textbf{y}_{j},\textbf{q}_{j}) \times\textrm{Dirichlet}(\bm{\beta})$    
$\displaystyle \propto$ $\displaystyle ~ \prod_{h=1}^{H}q_{jh}^{\sum_{i=1}^{N}x_{ih} I_{[z_{i}=j]}}\times\prod_{h=1}^{H}q_{jh}^{y_{jh}}\times \prod_{h=1}^{H}q_{jh}^{\beta_{h}}$    
$\displaystyle \propto$ $\displaystyle ~\prod_{h=1}^{H}q_{jh}^{\sum_{i=1}^{N}x_{ih} I_{[z_{i}=j]}+y_{jh}+\beta_{h}}$    
$\displaystyle =$ $\displaystyle ~\textrm{Dirichlet}(\sum_{i=1}^{N}\textbf{x}_{i} I_{[z_{i}=j]}+\textbf{y}_{j}+\bm{\beta})$    

where Z $ =(z_{1},\ldots,z_{N})$ and $ I$ is the indicator function.

Once all the full conditional distributions are specified, we can implement a Markov Chain Monte Carlo (MCMC) that will eventually converge on the posterior distributions of all of the parameters and hyperparameters. For a large variety of models (including those described here), this process can be completely automated using a freely available program called BUGS (Bayesian inference Using Gibbs Sampling) (http://www.mrc-bsu.cam.ac.uk/bugs/). MCMC only converges to the appropriate distribution of parameter values in the long run, after the starting values of the parameters have been ``forgotten". We used standard methods, as implemented in the publicly available CODA package (linked from the BUGS website), to evaluate how long this so-called ``burn-in" period should be, and how long the chain must be run to get reasonable estimates. For our simulations, we used Gelman-Rubin convergence diagnostics with five overdispersed initial values (Gelman et al. 1996). For all MCMC runs, we took 5000 burn-in and 20000 subsequent runs for the analyses. In the simulations, when the convergence criteria was not met, we discarded the results instead of running for longer for the computational simplicity, but such occasions were very rare and should not bias our results.



LITERATURE CITED

Gelman, A., J. Carlin, H. S. Stern, and D. B. Rubin. 1996. Bayesian data analysis. Chapman and Hall, New York, New York, USA.

Pella, J., and M. Masuda. 2001. Bayesian methods for analysis of stock mixtures from genetic characters. Fisheries Bulletin 99:151–167.

Spiegelhalter, D. J., N. G. Best, W. R. Gilks, and H. Inskip. 1996. Hepatitis B: a case study in MCMC methods. Pages 21–74 in W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, editors. Markov Chain Monte Carlo in practice, Chapman and Hall/CRC, Boca Raton, Florida, USA.



[Back to A015-009]