Appendix C. The reduced-rank multinomial logit model and CQO.
Although the article has proposed a new fast algorithm, there is yet another method for obtaining an (approximate) estimate of the constrained coefficients C for equal-tolerances Poisson data. It is shown in this Appendix that, for large Poisson data sets, fitting a reduced-rank multinomial logit model (RR-MLM) is approximately the same as fitting an equal-tolerances QRR-VGLM. For further details about the RR-MLM see Yee and Hastie (2003).
The advantage of fitting a RR-MLM is its speed (Tables A1 and A2 in Appendix A) when applied to large data sets. Another advantage is that it appears to be less prone to convergence to suboptimal solutions.
We now illustrate the efficacy of a RR-MLM as an approximation to the equal-tolerances Poisson by comparing both on the hunting spiders data (see ter Braak (1986), Yee (2004)). We show it gives a good result despite the fact that 45.8% of Y are zero counts which convey little information on C. Furthermore, many of the species have low abundances.
The hunting spiders data set was collected in a Dutch dune area, and consists of abundances (numbers trapped over a 60 week period) of 12 species of hunting spiders and six environmental variables (water, bare sand, twigs, cover moss, cover herbs, and light reflection). There were
sites and the environmental variables were standardized to zero mean and unit variance in this article.
The following code fits a rank-1 RR-MLM:
hsrrmlm1 = rrvglm(cbind(Alopacce, Alopcune, Alopfabr, Arctlute, Arctperi,
Auloalbi, Pardlugu, Pardmont, Pardnigr, Pardpull,
Trocterr, Zoraspin) ~
WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
family = multinomial, data = hspider, Rank = 1)
The equal-tolerances Poisson model is stored in an object called etp1, and can be obtained as p1 in Yee (2004) but with the option EqualTolerance = TRUE. The two models have a different normalization for C, but we can make them comparable by applying a centering and scaling using scale():
> round(t(scale(ccoef(hsrrmlm1))), digits=3)
WaterCon BareSand FallTwig CoveMoss CoveHerb ReflLux
0.017 0.894 -1.668 0.555 -0.638 0.839
> round(t(scale(ccoef(etp1))), digits=3)
WaterCon BareSand FallTwig CoveMoss CoveHerb ReflLux
-0.57 0.886 -1.464 0.51 -0.483 1.12
Apart from the coefficient for water content, all the others agree quite well. Thus the RR-MLM gives effectively the same result as the CQO model. The constrained coefficients of etp1 agree with CCA in their signs, and can be interpreted as a moisture gradient.
In this section it is shown that fitting a reduced-rank multinomial logit model (RR-MLM) to a matrix of counts Y, assumed to be generated from
Poisson
, gives the same result asymptotically as fitting an equal-tolerances Poisson CQO. The matrix of explanatory variables is X.
Heuristic Proof Let
be the number of counts of
species at site
(we omit the subscript
for simplicity). It is well known that
is the total number of species at site
. A Poisson CQO model implies
A MLM can be written
The above proof shows another way of estimating the constrained coefficients C of an equal-tolerances Poisson CQO. A list of all of them to date are:
The above result can be made more rigorous by making use of the well known Poisson-multinomial logit relationship (see Baker (1994)). See also related work: Section 3.9 of ter Braak and Smilauer (1998) and Chapter 1 of ter Braak (1996).
Simulations show the RR-MLM works well if the species abundances are high, there is a lot of Y data (i.e., large
and/or
), and all species have an equal-tolerance. One characteristic about the RR-MLM is that sites with no species at all must be deleted (
not allowed).
LITERATURE CITED