Online Trick: Lattice graphs

Showing posts with label Lattice graphs. Show all posts

Sunday, September 20, 2015

Apprentice Piece with Lattice Graphs

Lattice graphs can be quite tedious to learn. I don't use them too often and when I need them I usually have to dig deep into the archives for details on the parameter details.
The here presented example may serve as a welcome template for the usage of panel functions, panel ordering, for drawing of lattice keys, etc.
You can download the example data HERE.

(Also, check this resource with examples by the lattice-author).

library(lattice)
df <- read.csv("PATH/TO/Downloads/lattice_data.csv",
               header = T, sep = ";")
mypch <- rep(21, 3)
mycol <- c(rgb(0.2, 0.8, 0.9), rgb(0.8, 0.2, 0.9), rgb(0.9, 0.2, 0.2))

stripplot(mean ~ group1 | item_parc, groups = group2, data = df,
          type = c("a", "p"), strip = strip.custom(strip.names = c(F, T)),
          ylab = "Score-Mittelwert", layout = c(2, 4),
          scales = list(y = list(cex = 0.9),
                        x = list(cex = 1.1, labels = c("männl.", "weibl."),
                        tck = c(1, 0))),
          par.strip.text = list(cex = 0.9),
          panel = function(...) {
             panel.stripplot(..., col = mycol, pch = mypch, fill = mycol)
          },
          index.cond = list(c(7,8,5,6,3,4,1,2)),
          key = list(space = "bottom", text = list(c("Alter < 35 J.",
                                                     "Alter 35-50 J.",
                                                     "Alter > 50 J.")),
                     points = list(col = mycol, cex = 0.8, pch = mypch, fill = mycol),
                     rep = F))

Here I compare two distributions, flowering duration of indigenous and allochtonous plant species. The hypothesis is that alien compared to indigenous plant species exhibit longer flowering periods.

download data*

*Data is courtesy of BiolFlor:
Klotz, S., Kühn, I. & Durka, W. [Hrsg.] (2002): BIOLFLOR - Eine Datenbank zu biologisch-ökologischen Merkmalen der Gefäßpflanzen in Deutschland. - Schriftenreihe für Vegetationskunde 38. Bonn: Bundesamt für Naturschutz.

## comparing flowering time in indigenous and alien species by
## usage of a quantile-quantile plot (qqplot), the ks-test, by
## testing shifts in medians (wilcox test) and by testing
## difference of means (t-test, as we deal with integer and
## not a continous variable it is not the most appropiate choice)
## as well as by a chi-square test:

dat <- read.csv("E:\\R\\Data\\flowering_alien_vs_indigen.csv",
                sep = ";")

library(lattice)

histogram(~ Flowering|Status, data = dat, col = "gray60", layout = c(1, 2),
          xlab = list("Months of flowering"),
          ylab = list("Percentage of total"),
          scales = list(y = list(alternating = F)),
          strip = strip.custom(factor.levels = c("alien", "indigenous")))

qqplot(dat$Flowering[dat$Status == "indigen"],
       dat$Flowering[dat$Status == "Neophyt"])
abline(a = 0, b = 1, lty = 3)

ks.test(dat$Flowering[dat$Status == "indigen"],
        dat$Flowering[dat$Status == "Neophyt"])

wilcox.test(Flowering ~ Status, data = dat)

t.test(Flowering ~ Status, data = dat)

## Note that in the two-sample case the estimator for the
## difference in location parameters does not estimate
## the difference in medians (a common misconception) but
## rather the median of the difference between a sample
## from x and a sample from y.

## as we deal with a limited number of classes (1-12 months)
## and sample size is big enough my favourite would be a
## chi-square test:

m <- table(dat$Status, dat$Flowering)

(Xsq <- chisq.test(m))  # Prints test summary
Xsq$observed   # observed counts (same as M)
Xsq$expected   # expected counts under the null
Xsq$residuals  # Pearson residuals
Xsq$stdres     # standardized residuals

I present code for a stacked graph with common axes only for panels of the same row and with axis titles at different sides. This admittedly took me days (because i had not much of a clue how to use lattice), but eventually I did it and maybe someone can use this for his/her own purpose:

library(lattice)

y1 <- rnorm(120,100,10)
y2 <- rnorm(120,10,1)

facs <- expand.grid(Sites=rep(c("Site I","Site II",
                       "Site III"),20),Treatment = c("A","B"))

trellis.par.set(clip = list(panel = "off",strip = "off"),
   layout.widths = list(right.padding = 10,
   left.padding = 8))

PanFun<-function(...){
   panel.dotplot(...)
     if(is.element(panel.number(),1:2))
       {at<-pretty(c(min(y1),max(y1)))
        panel.axis("right",at = at,outside = T,
        labels = F,half = F)}
     if(panel.number() == 3)
       {at<-pretty(c(min(y1),max(y1)))
        panel.axis("right",at = at,outside = T,
        labels = T,half = F)}
     if(panel.number() == 4)
       {at<-pretty(c(min(y2),max(y2)))
        panel.axis("left",at = at,outside = T,
        labels = T,half = F)}
     if(is.element(panel.number(),5:6))
       {at<-pretty(c(min(y2),max(y2)))
        panel.axis("left",at = at,outside = T,
        labels = F,half = F)}}

dotplot(y1+y2 ~ Treatment|Sites,
        outer = TRUE,data = facs,
        scales = list(
             x = list(rot = 0, tck=c(1,0)),
             y = list(draw=F,relation="free",
        limits = list(range(y1),range(y1),range(y1),
                    range(y2),range(y2),range(y2)))),
        ylab = list("response one (y1)", y = 0.75, x = -4),
        ylab.right = list("response two (y2)", y = 0.25, 
                          rot = 270, x = 4),
        xlab = c("Site 1", "Site 2","Site 3"),
        strip = FALSE, panel=PanFun)

R package citation:
Sarkar, Deepayan (2008) Lattice: Multivariate Data Visualization with
R. Springer, New York. ISBN 978-0-387-75968-5

See also the thread:
http://www.mail-archive.com/r-help@r-project.org/msg107565.html

..and examples from:
http://lmdvr.r-forge.r-project.org/figures/figures.html

...here's an example of how to apply multiple comparisons to a generalised linear mixed model (GLMM) using the function glmer from package lme4 & glht() from package multcomp. Also, I present a nice example for visualizing data from a nested sampling design with lattice-plots!

The comparisons: I have a 4-level (stage A, B, C, D) and a 2-level (center vs. periphery) fixed factor and I want to know whether there are differences between center vs. periphery within each stage-level, and whether this effect differs across stage-levels (interactions). For the case being I simulated an effect only for center vs. periphery and this effect was simulated to be the same across all stages, so there should not be any sign. interactions.

This method was used in Cichini et al. (2011).

library(lme4)
library(mvtnorm)
library(multcomp)
library(lattice)

########################################################################
# setting up fake data, for details on sampling design see:            #
# cichini et al. (2011)                                                #
# http://www.springerlink.com/content/t4u305045t214882                 #
########################################################################

# 3 successional stages with 44 units and 1 succ. stage wit 10 units:
N = 44*3+10

cen <- per <- cbind(n = rep(NA, N), X = rep(NA, N))

# i simulate an effect between factor-levels cen and per
# all other fixed and random factors and interactions will have
# no effect.
# X = incidents, n = no. of observations:
for(i in 1:N){cen[i,"X"]<-sum(rbinom(5, 1, 0.35)); cen[i, "n"] <- 5}
for(i in 1:N){per[i,"X"]<-sum(rbinom(8, 1, 0.65)); per[i, "n"] <- 8}

dat <- data.frame(rbind(cen, per)) 

# probabilities:
dat$p <- dat$X/dat$n
stage <- as.factor(rep(c(rep("A",44), rep("B",44), rep("C",44), rep("D",10)),2))
plotA <- paste("A",rep(1:11,c(4,4,4,4,4,4,4,4,4,4,4)),sep="")
plotB <- paste("B",rep(1:13,c(3,4,2,4,4,4,4,4,3,4,4,2,2)),sep="")
plotC <- paste("C",rep(1:12,c(3,4,4,4,4,4,4,4,3,4,4,2)),sep="")
plotD <- paste("D",rep(1:3,c(2,4,4)),sep="")
plot <- as.factor(rep(c(plotA, plotB, plotC, plotD),2))
subplot <-  as.factor(rep(1:142,2))
pos <- as.factor(c(rep("cen", N),rep("per", N)))

df <- data.frame(cbind(dat, pos = pos, stage = stage, plot = plot, subplot = subplot))
str(df)

### visualize data:
stripplot(jitter(p,10) ~ pos | plot, data = df, groups = subplot,
strip = strip.custom(strip.names = c(F,T)),
ylab="p",
layout=c(12,4),
type = c("p", "a"),col=1,cex=0.5,
scales = list(y = list(cex=0.5),
              x = list(rot=90,cex=0.6,labels=c("Centers","Periphery"))),
              par.strip.text=list(cex=0.7),
     page = function(...) {
          ltext(x = 0.39,
                y = 0.835,
                lab = c("CENTERS vs. PERIPHERY\npanels = plots;\nrows = successional stages:\nA: pioneer, B: early, C: late, D: alpine gr.land; \nlines = paired samples (= subplots) "),
                cex = 1,adj=0)
      })

gmod <- glmer(cbind(dat$X, dat$n - dat$X) ~ stage * pos + (1|plot/subplot), family = binomial)

# comparisons -
# i want to know whether there are differences between center vs. periphery
# within each stage, and whether this effect differs across
# stages (interactions).
# as i simulated an effect only for centre vs. periphery and this
# effect was simulated
# to be the same across all stages, there should not be any sign.
# interactions:

c1 <- rbind("A: Center vs. Peri. Effect" = c(0,0,0,0,1,0,0,0),
            "B: Center vs. Peri. Effect" = c(0,0,0,0,1,1,0,0),
            "C: Center vs. Peri. Effect" = c(0,0,0,0,1,0,1,0),
            "D: Center vs. Peri. Effect" = c(0,0,0,0,1,0,0,1),
            "Center vs. Peri. Effect, A vs. B" = c(0,0,0,0,0,1,0,0),
            "Center vs. Peri. Effect, A vs. C" = c(0,0,0,0,0,0,1,0),
            "Center vs. Peri. Effect, A vs. D" = c(0,0,0,0,0,0,0,1),
            "Center vs. Peri. Effect, B vs. C" = c(0,0,0,0,0,-1,1,0),
            "Center vs. Peri. Effect, B vs. D" = c(0,0,0,0,0,-1,0,1),
            "Center vs. Peri. Effect, C vs. D" = c(0,0,0,0,0,0,-1,1))

summary(glht(gmod, c1))
###########################################################################

# EDIT:
# for investigation of contrasts you can use this functions, i.e.:
stage <- relevel(stage, "B")
model.matrix(~ stage * pos, contrasts = list(stage = "contr.treatment", pos = "contr.treatment"))

To cite package ‘lme4’ in publications use:

Douglas Bates, Martin Maechler and Ben Bolker (2011). lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-39. http://CRAN.R-project.org/package=lme4

Please cite the multcomp package by the following reference:

Torsten Hothorn, Frank Bretz and Peter Westfall (2008). Simultaneous Inference in General Parametric Models. Biometrical Journal 50(3), 346--363.

To cite the lattice package in publications use:

Sarkar, Deepayan (2008) Lattice: Multivariate Data Visualization with R. Springer, New York. ISBN 978-0-387-75968-5

..an illustrative example for testing proportions and presenting the results.

the data: number of indigenous and alien plant species with and without vegetative reproduction (N = 3399, mid-european species, data-courtesy: BiolFlor) . Hypothesis: The proportion of species with vegetative reproduction is different between alien and indigenuos plant species.

result: the prop. of plants with veg. reproduction is sign. lower for alien compared to indigenous plant species. this is simply due to the large number of agricultural weeds and contaminants within alien species - these species almost always reproduce by seeds.

## data:
dat <- data.frame(list(structure(list(flstat = structure(c(2L, 1L, 2L, 1L),
.Label = c("allo", "auto"), class = "factor"),
reprod = structure(c(1L, 1L, 2L, 2L),
.Label = c("non-veg", "veg"), class = "factor"),
X = c(872L, 423L, 1872L, 232L)),
.Names = c("flstat", "reprod", "X"),
class = "data.frame", row.names = c(NA, -4L))))

## proportion of species with vegetative reproduction
p_allo <- dat$X[4] / (dat$X[2] + dat$X[4])
p_auto <- dat$X[3] / (dat$X[1] + dat$X[3])
p_allo
p_auto

## restructure data for glm:
dat1 <- dat[rep(1:4, dat$X), 1:2]
head(dat1)
dat1$inc <- ifelse(dat1$reprod == "non-veg", 0, 1)

## glm:
summary(gmod <- glm(inc ~ flstat, data = dat1, family = binomial))

## intercept = logit(p_allo):
print(est_p_allo <- plogis(gmod$coef[1]))

## intercept + b = logit(p_allo+p_auto):
print(est_p_auto <- plogis(gmod$coef[1] + gmod$coef[2]))

## alternatively test difference in two proportions with prop.test():
ptest_diff <- prop.test(x = c(dat$X[1], dat$X[2]),
                        n = c(dat$X[1] + dat$X[3], dat$X[2] + dat$X[4]))

## only for one proportion prop.test gives you the confidence
## intervals of p.
## (you could also extract the glm-standard errors and calculate
## the conf.int. for this purpose..):
ptest_auto <- prop.test(x = dat$X[3], n = dat$X[1] + dat$X[3])
ptest_allo <- prop.test(x = dat$X[4], n = dat$X[2] + dat$X[4])

## plot with confidence intervals from prop.test
## (see methods in ?prop.test):

## coordinates for plotting confidence interval bars:
y0_al <- ptest_allo$conf[1]
y1_al <- ptest_allo$conf[2]
y0_au <- ptest_auto$conf[1]
y1_au <- ptest_auto$conf[2]

library(grid)
library(lattice)

## panel function for suppressing tck at top and right side,
## drawing bar with confidence interval,
## plotting glm-estimates (the crosses)
mpanel = function(...) {grid.segments(x0 = c(0.2725, 1 - 0.2725),
                                      x1 = c(0.2725, 1 - 0.2725),
                                      y0 = c(y0_al, y0_au),
                                      y1 = c(y1_al, y1_au))
                        panel.points(x = c(0.8, 2.2),
                                    y = c(est_p_allo, est_p_auto), pch = 4)
                        panel.abline(h = c(p_allo, p_auto), lty = 15,
                                     col = "grey70")
                        panel.text(x = 1.5, y = 0.9, cex = 1.2,
                                   "Species With\nVegetative Reproduction");
                        panel.xyplot(...)}

xyplot(c(p_allo, p_auto) ~ as.factor(c("Alien", "Indigenous")), type = "b",
       ylab = "Prop. +/- CIs\nX = GLM-Estimates",
       xlab = "", ylim = c(0, 1),
       panel = mpanel, pch = 16,
       scales = list(alternating = 1, tck = c(1, 0)))

Online Trick

Sunday, September 20, 2015

Apprentice Piece with Lattice Graphs

Comparing Two Distributions

Lattice Plots - Usage of Panel Functions - Different Axes For Panel-Rows - Alternating Axis Titles

Multiple Comparisons for GLMMs using glmer() & glht()

Test Difference between Two Proportions & Plot Confidence Intervals