Sunday, September 20, 2015
Permutation Test with Stratified Data and Repeated Measurements
This is an example for a permutation test on stratified samples with repeated measurements. Samples are interdependent firstly because they come from several sites and secondly because the sampling was repeated a second time. That is samples of the same sites are dependent and sample t1 and sample t2, taken from the very same places are dependent.
What I want to test is whether there is a difference between timepoint one (t1) and two (t2) or not. A hypothesis could be: the average difference t1-t2 is sign. larger than zero (a one-sided test). Another hypothesis could be: the average difference is sign. different from zero, either larger or smaller (a two-sided test).
If you deal with a distribution of your data that ordinary Linear Mixed Models (LMMs) or Generalized LMMs (GLMMs) can handle you should vote for this option - but sometimes you deal with awkard data and permutation tests may the only thing to bail you out...
What I want to test is whether there is a difference between timepoint one (t1) and two (t2) or not. A hypothesis could be: the average difference t1-t2 is sign. larger than zero (a one-sided test). Another hypothesis could be: the average difference is sign. different from zero, either larger or smaller (a two-sided test).
If you deal with a distribution of your data that ordinary Linear Mixed Models (LMMs) or Generalized LMMs (GLMMs) can handle you should vote for this option - but sometimes you deal with awkard data and permutation tests may the only thing to bail you out...
library(vegan)
### the data:
sites <- sort(rep(letters[1:8],6))
tm <- rep(1:2,24)
pairs <- gl(24, 2)
n <- length(sites)
id <- 1:n
print(data <- data.frame(id, sites, tm, pairs))
### set up a Control object:
ctrl <- permControl(strata = sites, type = "series")
### check permutation scheme -
### by permuting the ids of the pair vector with the above
### control we assure that time points are either changed
### or unchanged within each stratum - this allows for the
### dependency within strata - the nr. of possible permutations
### is 2^n(strata) which is 2^8 = 256, meaning that the smallest
### p-value that we will be able to obtain is 1/256 = 0.0039
### in this frame you can retry and see that what i described above
### really happens:
data.frame(id, sites, pairs, tm, tm_p = tm[permuted.index2(n, control = ctrl)])
### now what i want to test is pair differences - i will
### yield differences by aggregating pairs, i will use the
### aggregate function with FUN = sum, to yield differences
### i set up a "sign" variable, with the negative sign aligned
### to the second time point, that is a positive difference
### means a decrease with time and vice versa:
sign <- rep(c(1,-1), 24)
### this will be a simulated dependent variable with a time-effect
### at the 2nd time point var is decreased
var <- rnorm(48, 10, 2)
var <- ifelse(tm == 2, var * 0.7, var * 1)
### in a frame:
data.frame(id, sites, pairs, tm, var)
### x will be the differences:
### you see, the added effect simulates a decrease
print(true <- aggregate(x = var*sign, by = list(sites = sites, pairs = pairs), FUN = sum))
### and now the actual permutation comes in -
### we permute the sign and than aggregate which
### gives the differences of permuted pairs
aggregate(x = var*sign[permuted.index2(n, control = ctrl)],
by = list(pairs = pairs), FUN = sum)
### and now we repeat the above step and record the mean differences
### which will give us a null-population - this we will compare to
### the mean of the observed differences
m.true <- mean(true$x)
### set number of perms
B <- 1999
### setting up frame for the null-population of differences:
pop <- rep(NA, B + 1)
pop[1] <- m.true
for(i in 2:(B+1)){
perm <- aggregate(x = var*sign[permuted.index2(n, control = ctrl)],
by = list(pairs = pairs), FUN = sum)
pop[i] <- mean(perm$x)
}
### show the null-population and the observed difference:
hist(pop, xlab = "Differences")
abline(v = pop[1], col = "red", lty = 3)
text(x = pop[1]*0.9, y = 150, srt = 90, "the true Difference")
### depending on your hypothesis you will take the one-sided p-value:
print(pval_1 <- sum(pop >= pop[1]) / (B + 1))
### or the two-sided, which simply is:
print(pval_2 <- sum(abs(pop) >= abs(pop[1])) / (B + 1))
### the one-sided test would be appropiate if your hypothesis
### was taht the observed difference is sign. larger than the null.
### the two-sided is appropiate if your hypothesis was that
### the obs. diff. is different from the null - that is, it could be larger
### or smaller than the null.
### You can check the result without time effect by resetting
### the var like beneath, then re-run the test:
var <- rnorm(48, 10, 2)
To cite package ‘vegan’ in publications use:
Jari Oksanen, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre, R.
B. O'Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens and
Helene Wagner (2011). vegan: Community Ecology Package. R package
version 1.17-9. http://CRAN.R-project.org/package=vegan
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment