Online Trick: Vegan Package

Showing posts with label Vegan Package. Show all posts

Sunday, September 20, 2015

Permutation Test with Stratified Data and Repeated Measurements

This is an example for a permutation test on stratified samples with repeated measurements. Samples are interdependent firstly because they come from several sites and secondly because the sampling was repeated a second time. That is samples of the same sites are dependent and sample t1 and sample t2, taken from the very same places are dependent.

What I want to test is whether there is a difference between timepoint one (t1) and two (t2) or not. A hypothesis could be: the average difference t1-t2 is sign. larger than zero (a one-sided test). Another hypothesis could be: the average difference is sign. different from zero, either larger or smaller (a two-sided test).

If you deal with a distribution of your data that ordinary Linear Mixed Models (LMMs) or Generalized LMMs (GLMMs) can handle you should vote for this option - but sometimes you deal with awkard data and permutation tests may the only thing to bail you out...

library(vegan)

### the data:

sites <- sort(rep(letters[1:8],6))
tm <- rep(1:2,24)
pairs <- gl(24, 2)
n <- length(sites)
id <- 1:n

print(data <- data.frame(id, sites, tm, pairs))

### set up a Control object:

ctrl <- permControl(strata = sites, type = "series")

### check permutation scheme -
### by permuting the ids of the pair vector with the above
### control we assure that time points are either changed
### or unchanged within each stratum - this allows for the
### dependency within strata - the nr. of possible permutations
### is 2^n(strata) which is 2^8 = 256, meaning that the smallest
### p-value that we will be able to obtain is 1/256 = 0.0039

### in this frame you can retry and see that what i described above
### really happens:

data.frame(id, sites, pairs, tm, tm_p = tm[permuted.index2(n, control = ctrl)])

### now what i want to test is pair differences - i will
### yield differences by aggregating pairs, i will use the
### aggregate function with FUN = sum, to yield differences
### i set up a "sign" variable, with the negative sign aligned
### to the second time point, that is a positive difference
### means a decrease with time and vice versa:

sign <- rep(c(1,-1), 24)

### this will be a simulated dependent variable with a time-effect
### at the 2nd time point var is decreased

var <- rnorm(48, 10, 2)
var <- ifelse(tm == 2, var * 0.7, var * 1)

### in a frame:
data.frame(id, sites, pairs, tm, var)

### x will be the differences:
### you see, the added effect simulates a decrease

print(true <- aggregate(x = var*sign, by = list(sites = sites, pairs = pairs), FUN = sum))

### and now the actual permutation comes in -
### we permute the sign and than aggregate which
### gives the differences of permuted pairs

aggregate(x = var*sign[permuted.index2(n, control = ctrl)],
by = list(pairs = pairs), FUN = sum)

### and now we repeat the above step and record the mean differences
### which will give us a null-population - this we will compare to
### the mean of the observed differences

m.true <- mean(true$x)

### set number of perms

B <- 1999

### setting up frame for the null-population of differences:

pop <- rep(NA, B + 1)
pop[1] <- m.true

for(i in 2:(B+1)){
perm <- aggregate(x = var*sign[permuted.index2(n, control = ctrl)],
by = list(pairs = pairs), FUN = sum)
pop[i] <- mean(perm$x)
}

### show the null-population and the observed difference:

hist(pop, xlab = "Differences")
abline(v = pop[1], col = "red", lty = 3)
text(x = pop[1]*0.9, y = 150, srt = 90, "the true Difference")

### depending on your hypothesis you will take the one-sided p-value:

print(pval_1 <- sum(pop >= pop[1]) / (B + 1))

### or the two-sided, which simply is:

print(pval_2 <- sum(abs(pop) >= abs(pop[1])) / (B + 1))

### the one-sided test would be appropiate if your hypothesis
### was taht the observed difference is sign. larger than the null.
### the two-sided is appropiate if your hypothesis was that
### the obs. diff. is different from the null - that is, it could be larger
### or smaller than the null.

### You can check the result without time effect by resetting
### the var like beneath, then re-run the test:
var <- rnorm(48, 10, 2)

To cite package ‘vegan’ in publications use:

Jari Oksanen, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre, R.

B. O'Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens and

Helene Wagner (2011). vegan: Community Ecology Package. R package

version 1.17-9. http://CRAN.R-project.org/package=vegan

...say you have a multivariate dataset and a two-way factorial design - you do a PERMANOVA and the aov-table (adonis is using ANOVA or "sum"-contrasts) tells you there is an interaction - how to proceed when you want to go deeper into the analysis?
You could, however somewhat tedious, customize contrasts for the PERMANOVA and check for differences between certain level combinations.

Here's an example with a 2-way factorial design with 4 dependent variables, which represent species abundances. I tested for significant differences between "treatment"-level 1 vs 2 and "treatment"-level 2 vs 3 as well as for simple "impact"-effects at each level of "treatment".
When comparing the default aov-table with the table for the customized contrasts one can see that the residual and the total sum of squares are the same for both models and that the main effects are decomposed in its elements as specified by the contrasts.

library(vegan)
 
# species matrix
spdf <- matrix(NA, 60, 4, dimnames =
               list(1:60, c("sp1", "sp2", "sp3", "sp4")))
spdf <- as.data.frame(spdf)
 
# 1st factor = treatment:
treat <- gl(3, 20, labels = paste("t", 1:3, sep=""))
 
# 2nd factor = impact:
imp <- rep(gl(2, 10, labels = c("yes", "no")), 3)
 
# simulating effect -
# simulation will add similar effects
# and no interactions (see plot):
eff <- sort(rep(1:6, 10))
 
# add noise:
spdf$sp1 = eff + rnorm(60, 0, 0.25)
spdf$sp2 = eff + rnorm(60, 0, 0.25)
spdf$sp3 = eff + rnorm(60, 0, 0.25)
spdf$sp4 = eff + rnorm(60, 0, 0.25)
 
# interaction plot for all species:
par(mfrow=c(2, 2), mar = c(2.5, 2.5, 0.5, 0.5))
for (i in 1:4) {interaction.plot(treat, imp, spdf[, i], lty = c(1, 2), legend = F);
                legend("topleft", bty = "n", cex = 1.2,
                       paste("Species", i, sep = " "));
                legend("bottomright", c("no", "yes"),
                       bty = "n", lty = c(2, 1))}
 
## create a design matrix of the contrasts for "imp"
contrasts(imp) <- c(-1, 1)
Imp <- model.matrix(~ imp)[, -1]
 
## create a design matrix of the contrasts for "treat"
contrasts(treat) <- cbind(c(0,1,0),c(0,0,1))
Treat <- model.matrix(~ treat)[, -1]
 
imp.in.t1 <- Imp * ifelse(treat == "t1", 1, 0)
imp.in.t2 <- Imp * Treat[, 1]
imp.in.t3 <- Imp * Treat[, 2]
 
## specify the orthogonal contrasts for "treat"
contrasts(treat) <- cbind(c(1, -1, 0), c(1, 0, -1))
 
## specify the design matrix of the orthogonal
## contrasts for "treat"
Treat.ortho <- model.matrix(~ treat)[, -1]
 
## create a factor for each of the orthogonal "treat" contrasts
treat1vs2 <- Treat.ortho[, 1]
treat1vs3 <- Treat.ortho[, 2]
 
## do the pm-manova with the full model
fm1 <- adonis(spdf ~ treat * imp, method = "euclidean", perm = 999)
 
## do the pm-manova with the orthogonal contrasts for imp and treat'
## and the interaction contrasts of interest
fm2 <- adonis(spdf ~ treat1vs2 + treat1vs3 +
                     imp.in.t1 + imp.in.t2 + imp.in.t3,
              method = "euclidean", perm = 999)
fm1; fm2
 
## check model with changed effects
eff <- sort(rep(1:6, 10))
eff[treat == "t1" | treat == "t2"] <- 3
 
# add noise:
spdf$sp1 = eff + rnorm(60, 0, 0.25)
spdf$sp2 = eff + rnorm(60, 0, 0.25)
spdf$sp3 = eff + rnorm(60, 0, 0.25)
spdf$sp4 = eff + rnorm(60, 0, 0.25)
 
# interaction plot for all species:
par(mfrow=c(2, 2), mar = c(2.5, 2.5, 0.5, 0.5))
for (i in 1:4) {interaction.plot(treat, imp, spdf[, i], lty = c(1, 2), legend = F);
                legend("topleft", bty = "n", cex = 1.2,
                       paste("Species", i, sep = " "));
                legend("bottomright", c("no", "yes"),
                       bty = "n", lty = c(2, 1))}
 
fm1 <- adonis(spdf ~ treat * imp, method = "euclidean", perm = 999)
fm2 <- adonis(spdf ~ treat1vs2 + treat1vs3 +
                     imp.in.t1 + imp.in.t2 + imp.in.t3,
              method = "euclidean", perm = 999)
fm1; fm2

To cite package ‘vegan’ in publications use:
Jari Oksanen, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre,
R. B. O'Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens
and Helene Wagner (2011). vegan: Community Ecology Package. R package
version 1.17-9. http://CRAN.R-project.org/package=vegan

Before you use PERMANOVA (R-vegan function adonis) you should read the user notes for the original program by the author (Marti J. Anderson) who first came up with this method. An important assumtption for PERMANOVA is same "multivariate spread" among groups, which is similar to variance homogeneity in univariate ANOVA.

I'll show why you may draw the wrong conclusions if this assumption is not met:


library(vegan)

# two similar populations:
dat1a<-matrix(sample(c(0,1,1,1),200,replace=T),10,20)
dat1b<-matrix(sample(c(0,1,1,1),200,replace=T),10,20)

# generating a third sample from the same population, but with reduced
# number of species occurrences. this set will have higher
# beta-diversity (or "multivariate spread"):
dat2<-matrix(sample(c(0,0,0,1),200,replace=T),10,20)
 


# distance matrices:
fac<-gl(2,10)
dist11<-vegdist(rbind(dat1a,dat1b))
dist12<-vegdist(rbind(dat1a,dat2)): 

# when computing sets with same beta-dispersion we get a
# correct adonis result with no sign. group differences
# in species composition:
anova(betadisper(dist11,fac))
adonis(rbind(dat1a,dat1b)~fac)

# when using sets with different beta-diversity you may
# get false significant adonis results - the location/composition
# is actually the same (!) this result is due to different
# multivariate spread in dat1 and dat2:
anova(betadisper(dist12,fac))
adonis(rbind(dat1a,dat2)~fac)

# see ordination diagram where location (centroids) between dat1 and dat2
# is not shifted more than for dat1a dat1b, still you yield a (false)
# sign. adonis result
# plot:

windows(10,5)

opar<-par()
par(mfrow=c(1,2))
plot(meta11<-metaMDS(dist11,zerodist=ignore),type="n",
main="same beta-disp\nsame location")
points(meta11,select=which(fac==1),col="red")
points(meta11,select=which(fac==2),col="blue")
ordispider(meta11,group=fac)

plot(meta12<-metaMDS(dist12,zerodist=ignore),type="n",
main="diff beta-disp\nsame location")
points(meta12,select=which(fac==1),col="red")
points(meta12,select=which(fac==2),col="blue")
ordispider(meta12,group=fac)

par(opar)

Custom Labels for Ordination Diagram

Here is how you do custom labels, hull, spider in a vegan ordination diagram:

library(vegan)

### data on 35 species of Oribatid mites
### data description at: http://cc.oulu.fi/~jarioksa/softhelp/vegan/html/mite.html
data(mite)

### ...and environmental variables
data(mite.env)

### the factor which you wish to use for labeling the displayed sites
### in the ordination diagramm:
fac<-mite.env$Topo
sol<-metaMDS(mite)

windows(9,5)
par(mfrow=c(1,2))

### graph with "spider":
plot(sol,type="n")
points(sol, display = "sites", cex=0.65, select=which(fac=="Hummock"),
       pch = 21,col="black", bg="black")
points(sol, display = "sites", cex=0.65, select=which(fac=="Blanket"),
       pch = 21,col="black", bg="white")
ordihull(sol,group=fac,show.groups="Hummock")
ordihull(sol,group=fac,show.groups="Blanket",lty=3)
orditorp(sol, dis = "sp", pcex=0,air=0.85,col="grey35",cex=0.8 ,font=3)

### graph with "hull":
plot(sol,type="n")
points(sol, display = "sites", cex=0.65, select=which(fac=="Hummock"),
       pch = 21,col="black", bg="black")
points(sol, display = "sites", cex=0.65, select=which(fac=="Blanket"),
       pch = 21,col="black", bg="white")
ordispider(sol,group=fac,show.groups="Hummock")
ordispider(sol,group=fac,show.groups="Blanket",lty=3)
orditorp(sol, dis = "sp", pcex=0,air=0.85,col="grey35",cex=0.8 ,font=3)

R package ‘vegan’ citation: Jari Oksanen, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre,

R. B. O'Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens

and Helene Wagner (2011). vegan: Community Ecology Package. R package

version 1.17-9. http://CRAN.R-project.org/package=vegan

Multivariate Repeated Measurements with adonis():

Please check the updated code in the comment by Wallace Beiroz, 26 January 2015 at 13:37!

Lately I had to figure out how to do a repeated measures (or mixed effects) analysis on multivariate (species) data. Here I share code for a computation in R with the adonis function of the vegan package. Credit goes to Gavin Simpson providing most of the important pieces of the below code in R-Help.

The design:

We have multivariate species data, sampled at different sites (n = 6) at 3 points in time (N = 24).

The hypothesis:

H0: Species composition is the same across time. The permutation will produce the H0-population by a restricted randomization of time points.

H1: Species composition differs between time points

The analysis:

With repeated measurements one has to allow for the temporal structure and for the dependency within repeated measures. In a pedantic manner one would account for temporal structure by permuting all time-points in the same way also keeping the ordering of time-points. With 3 time-points this would give you only 3 permutations!

But we may relax this very restrictive assumption and say it doesn't matter that all samples at t1 were taken at one time and the ones of t2, etc. at another. Still there would be the repeated measures and the fact that the samples were taken in a temporal sequence. So how to deal with this?

(1) Having in mind that repeated measures on one sample are not independent, we will not permute across these samples.

(2) Now to the temporal correlation within repeated samples: The ordering of time-points is another type of dependency: the ordering is 1, 2, 3 - we will change this ordering only to 3, 1, 2 or 2, 3, 1, This permutation is referred to as toroidal shift which is keeping the "neighborhood" of values intact.

Eventually, we have 3 permutations per site, with 6 sites - this yields 3^6 = 729 permutations overall. Thus, a minimal p-value of 1/729 = 0.0014 can be obtained. I'd say that's good enough!

### species matrix with 20 species abundances (mean = 50, sd = 10)
### one time variable, with 3 timepoints, which should be tested
### and a factor denoting sites that were repeatedly sampled (site)
 
## Load packages
require(vegan)
 
### Data:
sp <- matrix(rnorm(3 * 6 * 20, 50, 10), nrow = 3 * 6, ncol = 20,
         dimnames = list(1:18, paste("Sp", 1:20, sep = "")))
 
time <- as.ordered(rep(1:3, 6))
site <- gl(6, 3)
cbind(site, time, sp)
 
### add time effect at timepoint 3,
### this will effect will be tested by adonis():
sp_1 <- sp
sp_1[time==3,] <-  sp[time==3,] + rnorm(20, 10, 1)
cbind(site, time, sp_1)
 
### choose which species set to test:
test_sp <- sp_1
 
### computing the true R2-value
 
### (btw, using dist() defaults to euclidean distance):
print(fit <- adonis(dist(test_sp) ~ time, permutations=1))
 
### number of perms
B <- 1999
 
### setting up frame which will be populated by
### random r2 values:
pop <- rep(NA, B + 1)
 
### the first entry will be the true r2:
pop[1] <- fit$aov.tab[1, 5]
 
### set up a "permControl" object:
### we turn off mirroring as time should only flow in one direction
ctrl <- permControl(strata = site, within = Within(type = "series", mirror = FALSE))
 
### Number of observations:
nobs <- nrow(test_sp)
 
### check permutation (...rows represent the sample id):
### ..they are ok!
### within in each repeated sample (= sites) timepoints are shuffled,
### with keeping the sequence intact (e.g., for  site 1: 1,2,3 - 2,3,1 - 3,2,1)
shuffle(nobs, control = ctrl)
 
### loop:
### in adonis(...) you need to put permutations = 1, otherwise 
### adonis will not run
set.seed(123)
for(i in 2:(B+1)){
     idx <- shuffle(nobs, control = ctrl)
     fit.rand <- adonis(dist(test_sp) ~ time[idx], permutations = 1)
     pop[i] <- fit.rand$aov.tab[1, 5]
}
 
### get the p-value:
print(pval <- sum(pop >= pop[1]) / (B + 1))
### [1] 0.0035
 
### the sign. p-value supports the H1 (->there is a time effect).
### ..and the fact that samples are not iid is allowed by
### the customized perms - so this p-value is trustworthy as opposed
### to tests not acknowledging dependency of data points..
 
### test sp set without an effect:
### replace test_sp with sp set without effect:
test_sp <- sp
 
### now re-run the script and see the result:
### it is insign. - as expected:
 
### setting up frame which will be populated by
### random r2 values:
pop <- rep(NA, B + 1)
 
### computing the true R2-value:
print(fit <- adonis(dist(test_sp) ~ time, permutations = 1))
 
### the first entry will be the true r2:
pop[1] <- fit$aov.tab[1, 5]
 
### run the loop:
set.seed(123)
for(i in 2:(B+1)){
     idx <- shuffle(nobs, control = ctrl)
     fit.rand <- adonis(dist(test_sp) ~ time[idx], permutations = 1)
     pop[i] <- fit.rand$aov.tab[1, 5]
}
print(pval <- sum(pop >= pop[1]) / (B + 1))
### [1] 0.701
 
## make a histogram to see random R2-values and the true one:
hist(pop, xlab = "Population R2")
abline(v = pop[1], col = 2, lty = 3)
text(0.08, 300, paste("true R2,\np = ", pval, sep = ""))

To cite package ‘vegan’ in publications use:

Jari Oksanen, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre,
Peter R. Minchin, R. B. O'Hara, Gavin L. Simpson, Peter Solymos, M.
Henry H. Stevens and Helene Wagner (2011). vegan: Community Ecology
Package. R package version 2.0-2.
http://CRAN.R-project.org/package=vegan

http://cran.r-project.org/web/packages/vegan/index.html

http://cc.oulu.fi/~jarioksa/softhelp/vegan.html

Online Trick

Sunday, September 20, 2015

Permutation Test with Stratified Data and Repeated Measurements

Two-Way PERMANOVA (adonis, vegan-Package) with Customized Contrasts

Adonis (PERMANOVA) - Assumptions

Custom Labels for Ordination Diagram

Friday, September 18, 2015

Multivariate Repeated Measurements with adonis():