Sunday, September 20, 2015

Custom Summary Stats as Dataframe or List

On Stackoverflow I found this useful example on how to apply custom statistics on a dataframe and return the results as list or dataframe:



somedata<- data.frame(
                year=rep(c(1990,1995,2000,2005,2010),times=3),
                country=rep(c("US", "Brazil", "Asia"), each=5),
                pct =  c(0.99, 0.99, 0.98, 0.05, 0.9,
                         0.4,  0.5,  0.55, 0.5,  0.45,
                         0.7,  0.85, 0.9,  0.85, 0.75)
                )

someStats <- function(x)
{
  dp <- as.matrix(x$pct)-mean(x$pct)
  indp <- as.matrix(x$year)-mean(x$year)
  f <- lm.fit( indp,dp )$coefficients
  w <- sd(x$pct)
  m <- min(x$pct)
  results <- c(f,w,m)
  names(results) <- c("coef","sdev", "minPct")
  results
}

# summary statistics as list with by():
by(somedata, list(country=somedata$country), someStats)

# ..or as dataframe with ddply():
library(plyr)
ddply(somedata, .(country), someStats)

No comments:

Post a Comment