Showing posts with label R. Show all posts
Showing posts with label R. Show all posts

Sunday, September 20, 2015

Blog Statistics with StatCounter & R

If you're interested in analysing your blog's statistics this can easily be done with a web-service like StatCounter (free, only registration needed, quite extensive service) and with R.
After implementing the StatCounter script in the html code of a webpage or blog one can download and inspect log-files with R with some short lines of code (like below) and then inspect visitor activity..

url <- "http://statcounter.com/p7447608/csv/download_log_file?form_user=MYUSERNAME&form_pass=MYPASSWORD"
file <- paste(tempdir(), "\\log", ".CSV", sep = "")
download.file(url, dest = file)
log <- read.csv(file, as.is = T, header = T)

str(log)

'data.frame': 500 obs. of 19 variables:
$ Date.and.Time : chr "2011-12-19 23:32:30" "2011-12-19 23:20:04" "2011-12-19 23:16:24" "2011-12-19 23:14:40" ...
$ IP.Address : chr "93.129.245.130" "128.227.27.189" "207.63.124.250" "140.247.40.121" ...
$ IP.Address.Label: logi NA NA NA NA NA NA ...
$ Browser : chr "Chrome" "Firefox" "Chrome" "Firefox" ...
$ Version : chr "16.0" "8.0" "15.0" "6.0" ...
$ OS : chr "MacOSX" "WinXP" "Win7" "MacOSX" ...
$ Resolution : chr "1280x800" "1680x1050" "1280x1024" "1280x800" ...
$ Country : Factor w/ 44 levels "Argentina","Australia",..: 17 44 44 44 44 44 44 44 44 44 ...
$ Region : chr "Nordrhein-Westfalen" "Florida" "Illinois" "Massachusetts" ...
$ City : chr "Köln" "Gainesville" "Chicago" "Cambridge" ...
$ Postal.Code : int NA 32611 NA 2138 2138 NA 10003 2138 2138 2138 ...
$ ISP : chr "Telefonica Deutschland GmBH" "UNIVERSITY OF FLORIDA" "Illinois Century Network" "Harvard University" ...
$ Returning.Count : int 2 0 4 2 2 0 0 2 2 2 ...
$ Page.URL : chr "http://onlinetrickpdf.blogspot.com/2015/09/r-function-google-scholar-webscraper.html" "http://onlinetrickpdf.blogspot.com/2015/09/if-then-vba-script-usage-in-arcgis.html" "http://onlinetrickpdf.blogspot.com/2015/09/how-to-link-to-google-docs-for-download.html" "http://onlinetrickpdf.blogspot.com/2015/09/two-way-permanova-adonis-with-custom.html" ...
$ Page.Title : Factor w/ 53 levels "","onlinetrickpdf*",..: 36 50 23 46 10 20 13 9 10 46 ...
$ Came.From : chr "http://stackoverflow.com/questions/5005989/how-to-download-search-results-on-google-scholar-using-r" "http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CCwQFjAC&url=http%3A%2F%2Fonlinetrickpdf.blogspot.com%2F2011%"| __truncated__ "" "" ...
$ SE.Name : chr "" "" "" "" ...
$ SE.Host : chr "" "" "" "" ...
$ SE.Term : chr "" "" "" "" ...
Read more »

How to Download and Run Google Docs Script in the R Console

...There is not much to it:
upload a txt file with your script, share it for anyone with the link, then simply run something like the below code. 

ps: When using the code for your own purpose mind to change "https" to "http" and to insert your individual document id.
pss: You could use download.file() in this way for downloading any file from Google Docs..


# Example 1:
setwd(tempdir())
download.file("http://docs.google.com/uc?export=download&id=0B2wAunwURQNsMDViYzllMTMtNjllZS00ZTc4LTgzMzEtNDFjMWQ3MTUzYTRk",
              destfile = "test_google_docs.txt", mode = "wb")
# the file contains: x <- sample(100); plot(x)
source(paste(tempdir(), "/test_google_docs.txt", sep = ""))
# remove files from tempdir:
unlink(dir())

# Example 2:
setwd(tempdir())
download.file("http://docs.google.com/uc?export=download&id=0B2wAunwURQNsY2MwMzNhNGMtYmU4Yi00N2FlLWEwYTctYWU3MDhjNTkzOTdi",
              destfile = "google_docs_script.txt", mode = "wb")
# the downloaded script is the GScholarScraper-Function,
# read it and run an example:
source(paste(tempdir(), "/google_docs_script.txt", sep = ""))
# remove files from tempdir:
unlink(dir())

EDIT, MARCH 2013:
Method is outdated, use Tony's from below!
Read more »

Convert OpenStreetMap Objects to KML with R

A quick geo-tip:
With the osmar and maptools package you can easily pull an OpenStreetMap object and convert it to KML, like below (thanks to adibender helping out on SO). I found the relation ID by googling for it (www.google.at/search?q=openstreetmap+relation+innsbruck).

# get OSM data
library(osmar)
library(maptools)

innsbruck <- get_osm(relation(113642), full = T)
sp_innsbruck <- as_sp(innsbruck, what = "lines")

# convert to KML
for( i in seq_along(sp_innsbruck) ) {
kmlLine(sp_innsbruck@lines[[i]], kmlfile = "innsbruck.kml",
lwd = 3, col = "blue", name = "Innsbruck")
}

shell.exec("innsbruck.kml")
Read more »

Retrieve GBIF Species Occurrence Data with Function from dismo Package

..The dismo package is awesome: with some short lines of code you can read & map species distribution data from GBIF (the global biodiversity information facility) easily:






library(dismo)

# get GBIF data with function:
myrger <- gbif("Myricaria", "germanica", geo = T)

# check:
str(myrger)

# plot occurrences:
library(maptools)
data(wrld_simpl)
plot(wrld_simpl, col = "light yellow", axes = T)
points(myrger$lon, myrger$lat, col = "red", cex = 0.5)
text(-140, -50, "MYRICARIA\nGERMANICA")
Read more »

Taxonomy with R: Exploring the Taxize-Package

http://upload.wikimedia.org/wikipedia/commons/6/68/Ernst_Haeckel_-_Tree_of_Life.jpgFirst off, I'd really like to give a shout-out to the brave people who have created and maintain this great package - the fame is yours!

So, while exploring the capabilities of the package some issues with the ITIS-Server arose and with large datasets things weren't working out quite well for me.
I then switched to the NCBI API and saw that the result is much better here (way quicker, on first glance also a higher coverage).
At the time there is no taxize-function that will pull taxonomic details from a classification returned by NCBI, that's why I plugged together a little wrapper - see here:

# some species data:
spec <- data.frame("Species" = I(c("Bryum schleicheri", "Bryum capillare", "Bryum argentum", "Escherichia coli", "Glis glis")))
spl <- strsplit(spec$Species, " ")
spec$Genus <- as.character(sapply(spl, "[[", 1))

# for pulling taxonomic details we'd best submit higher rank taxons
# in this case Genera. Then we'll submit Genus Bryum only once and
# save some computation time (might be an issue if you deal
# with large datasets..)

gen_uniq <- unique(spec$Genus)

# function for pulling classification details ("phylum" in this case)
get_sys_level <- function(x){ require(taxize)
a <- classification(get_uid(x))
y <- data.frame(a[[1]]) # if there are multiple results, take the first..
z <- tryCatch(as.character(y[which(y[,2] == "phylum"), 1]), # in case of any other errors put NA
error = function(e) NA)
z <- ifelse(length(z) != 0, z, NA) # if the taxonomic detail is not covered return NA
return(data.frame(Taxon = x, Syslevel = z))
}

# call function and rbind the returned values
result <- do.call(rbind, lapply(gen_uniq, get_sys_level))
print(result)
# Taxon Syslevel
# 1 Bryum Streptophyta
# 2 Escherichia Proteobacteria
# 3 Glis Chordata

# now merge back to the original data frame
spec_new <- merge(spec, result, by.x = "Genus", by.y = "Taxon")
print(spec_new)
# Genus Species Syslevel
# 1 Bryum Bryum schleicheri Streptophyta
# 2 Bryum Bryum capillare Streptophyta
# 3 Bryum Bryum argentum Streptophyta
# 4 Escherichia Escherichia coli Proteobacteria
# 5 Glis Glis glis Chordata
#
Read more »

Download all Documents from Google Drive with R

A commentator on my blog recently asked if it is possible to retrieve all direct links to your Google Documents. And indeed it can be very easily done with R, just like so:









# you'll need RGoogleDocs (with RCurl dependency..)
install.packages("RGoogleDocs", repos = "http://www.omegahat.org/R", type="source")
library(RGoogleDocs)



gpasswd = "mysecretpassword"
auth = getGoogleAuth("kay.cichini@gmail.com", gpasswd)
con = getGoogleDocsConnection(auth)

CAINFO = paste(system.file(package="RCurl"), "/CurlSSL/ca-bundle.crt", sep = "")
docs <- getDocs(con, cainfo = CAINFO)

# get file references
hrefs <- lapply(docs, function(x) return(x@access["href"]))
keys <- sub(".*/full/.*%3A(.*)", "\\1", hrefs)
types <- sub(".*/full/(.*)%3A.*", "\\1", hrefs)

# make urls (for url-scheme see: http://techathlon.com/download-shared-files-google-drive/)
# put format parameter for other output formats!
pdf_urls <- paste0("https://docs.google.com/uc?export=download&id=", keys)
doc_urls <- paste0("https://docs.google.com/document/d/", keys, "/export?format=", "txt")

# download documents with your browser
gdoc_ids <- grep("document", types)
lapply(gdoc_ids, function(x) shell.exec(doc_urls[x]))

pdf_ids <- grep("pdf", types, ignore.case = T)
lapply(pdf_ids, function(x) shell.exec(pdf_urls[x]))

Read more »

Make a KML-File from an OpenStreetMap Trail

Ever wished to use a trail on OSM on your GPS or smartphone? With this neat little R-Script this can easily be done. You'll just need to search OpenStreetMap for the ID of the trail (way), put this as argument to osmar::get_osm, convert to KML and you're good to go!




# get OSM data
library(osmar)
library(maptools)

rotewandsteig <- get_osm(way(166274005), full = T)
sp_rotewandsteig <- as_sp(rotewandsteig, what = "lines")

# convert to KML
kmlLine(sp_rotewandsteig@lines[[1]], kmlfile = "rotewandsteig.kml",
lwd = 3, col = "blue", name = "Rotewandsteig")

# view it
shell.exec("rotewandsteig.kml")
Read more »

knitr-Example: Use World Bank Data to Generate Report for Threatened Bird Species

I'll use the below script that retrieves data for threatened bird species from the World Bank via its API and does some processing, plotting and analysis. There is a package (WDI) that allows you to access the data easily.









# world bank indicators for species - 
# I'll check bird species:
code <- as.character(WDIsearch("bird")[1,1])
bird_data <- WDI(country="all", indicator=code, start=2010, end=2012)

# remove NAs and select values in the range 50 - 1000:
bird_data_sub <- bird_data[!is.na(bird_data$EN.BIR.THRD.NO)&
bird_data$EN.BIR.THRD.NO < 1000&
bird_data$EN.BIR.THRD.NO > 50, ]

# change in numbers across years 2010 and 2011:
change.no <- aggregate(EN.BIR.THRD.NO ~ country, diff,
data = bird_data_sub)
# plot:
par(mar = c(3, 3, 5, 1))
plot(x = change.no[,2], y = 1:nrow(change.no),
xlim = c(-12, 12), xlab = "", ylab = "",
yaxt = "n")
abline(v = 0, lty = 2, col = "grey80")
title(main = "Change in Threatened Bird Species in\nCountries with Rich Avifauna (>50)")
text(y = 1:nrow(change.no),
x = -2, adj = 1,
labels = change.no$country)
segments(x0 = 0, y0 = 1:nrow(change.no),
x1 = change.no[, 2], y1 = 1:nrow(change.no))

# test hypothesis that probability of species decrease is
# equal to probability of increase:
binom.test(sum(change.no < 0), sum(change.no != 0))
For generating the report you can source the script from dropbox.com and stitch it in this fashion:
stitch("http://dl.dropbox.com/s/ga0qbk1o17n17jj/Change_threatened_species.R")
..this is one line of code - can you dig it?..
BTW, for simplicity I use knitr::stitch with its default template...

You should get something like THIS PDF.

EDIT, MARCH 2013
OUTDATED! you can use this approach instead:

library(knitr); library(RCurl); library(WDI)

destfile = "script.txt"
x = getBinaryURL("https://dl.dropbox.com/s/ga0qbk1o17n17jj/Change_threatened_species.R", followlocation = TRUE, ssl.verifypeer = FALSE)
writeBin(x, destfile, useBytes = TRUE)
source(paste(tempdir(), "/script.txt", sep = ""))

stitch(paste(tempdir(), "/script.txt", sep = ""))
Read more »

Function to Collect Geographic Coordinates for IP-Addresses

I added the function IPtoXY to onlinetrickpdf-Archives which collects geographic coordinates for IP-addresses.
It uses a web-service at http://www.datasciencetoolkit.org// and works with the base R-packages.

# System time to collect coordinates of 100 IP-addresses:
> system.time(sapply(log$IP.Address[1:100], FUN = IPtoXY))
       User      System verstrichen
       0.05        0.02       33.10
Read more »

Use GDAL from R Console to Split Raster into Tiles

When working with raster datasets I often encounter performance issues caused by the large filesizes. I thus wrote up a little R function that invokes gdal_translate which would split the raster into parts which makes subsequent processing more CPU friendly. I didn't use built-in R functions simply because performance is much better when using gdal from the command line..

The screenshot to the left shows a raster in QGIS that was split into four parts with the below script.



## get filesnames (assuming the datasets were downloaded already. 
## please see http://onlinetrickpdf.blogspot.co.at/2013/06/use-r-to-bulk-download-digital.html
## on how to download high-resolution DEMs)
setwd("D:/GIS_DataBase/DEM")
files <- dir(pattern = ".hgt")

## function for single file processing mind to replace the PATH to gdalinfo.exe!
## s = division applied to each side of raster, i.e. s = 2 gives 4 tiles, 3 gives 9, etc.
split_raster <- function(file, s = 2) {

filename <- gsub(".hgt", "", file)
gdalinfo_str <- paste0("\"C:/OSGeo4W64/bin/gdalinfo.exe\" ", file)

# pick size of each side
x <- as.numeric(gsub("[^0-9]", "", unlist(strsplit(system(gdalinfo_str, intern = T)[3], ", "))))[1]
y <- as.numeric(gsub("[^0-9]", "", unlist(strsplit(system(gdalinfo_str, intern = T)[3], ", "))))[2]

# t is nr. of iterations per side
t <- s - 1
for (i in 0:t) {
for (j in 0:t) {
# [-srcwin xoff yoff xsize ysize] src_dataset dst_dataset
srcwin_str <- paste("-srcwin ", i * x/s, j * y/s, x/s, y/s)
gdal_str <- paste0("\"C:/OSGeo4W64/bin/gdal_translate.exe\" ", srcwin_str, " ", "\"", file, "\" ", "\"", filename, "_", i, "_", j, ".tif\"")
system(gdal_str)
}
}
}

## process all files and save to same directory
mapply(split_raster, files, 2)
Read more »

Get Long-Term Climate Data from KNMI Climate Explorer

You can query global climate data from the KNMI Climate Explorer (the KNMI is the Royal Netherlands Metereological Institute) with R.


Here's a little example how I retreived data for my hometown Innsbruck, Austria and plotted annual total precipitation. You can choose station data by pointing at a map, by setting coordinates, etc.



# get climate (precipitation) data from url:
# http://climexp.knmi.nl/selectstation.cgi?id=someone@somewhere

# station INNSBRUCK, FLUGHAFEN (11120), 47.27N, 11.35E:
ibk_dat <- read.table("http://climexp.knmi.nl/data/pa11120.dat", sep = "",
row.names = 1, col.names = 0:12)

# cut off first and last yr, due to missing data..
ibk_dat <- ibk_dat[c(-1, -50,]

# plot yearly sums:
windows(width = 15, height = 5)
plot(rowSums(ibk_dat), type = "s", ylab = "Annual Total Precipitation (mm)",
xlab = NA, col = "blue", xaxt = "n", lwd = 1.5, las = 2, cex.axis = 0.8,
main = "INNSBRUCK FLUGHAFEN, 47.27N, 11.35E, 593m, WMO station code: 11120")
axis(1, labels = rownames(ibk_dat), at = 1:nrow(ibk_dat), las = 2, cex.axis = 0.85)

abline(h = mean(rowSums(ibk_dat)), col = 1, lty = 2, lwd = 1.2)
text(1250, adj = 0, "Long-term average", cex = 0.75)
arrows(x0 = 2.5, y0 = 1220,
x1 = 2.5, y1 = 930, length = 0.05)
Read more »

An Image Crossfader Function

Some project offspin, the jpgfader-function (the jpgfader-function in funny use can be viewed HERE):


# purpose: crossfade 2 jpeg images
# packages: jpeg
# arguments: img1 (path.to.img1), img2 (path.to.img2),
# outpath(defaults to current directory), outname,
# frames
# output: png


require(jpeg)

jpgfader <- function(img1 = NA, img2 = NA, outpath = NA, frames = NA, outname = NA){

if(is.na(outpath)) {outpath <- path.expand("~")}

# stop if images are missing
if(is.na(img1)|is.na(img2)) stop(cat("\nAt least one image is missing!\n"))
if(is.na(outname)) {outname = "img.1.2"}

if(is.na(frames)) {frames = 10}
# read 2 jpegs, assuming same size!
pic.1 <- readJPEG(img1)
pic.2 <- readJPEG(img2)

# warn if images dont have same size:
if(sum(dim(pic.1) != dim(pic.2))>1) warning(cat("\nImages do not have same dimensions!"))

# create new array with 4 dimensions, the new dimension
# representing alpha:

by = 1/(frames-1)
alpha = seq(0, 1, by)
n = length(alpha)

for(j in n:1){

pic.2.a <- array(data = c(as.vector(pic.2),
rep(alpha[j], dim(pic.1)[1]*dim(pic.1)[2])),
dim = c(dim(pic.1)[1], dim(pic.1)[2], 4))

# assign output file name:
pic.out <- paste(outpath, "\\", outname, ".", j ,".png", sep = "")

# and open device:
png(pic.out, width = dim(pic.1)[2], height = dim(pic.1)[1])

# plot-parameters:
par(mar = rep(0, 4), oma = rep(0, 4), new = F)

# print img.a to plot region:
plot(1:2,
xlim = c(0, dim(pic.1)[2]), ylim = c(0, dim(pic.1)[1]),
xlab="", ylab="", type = "n",
yaxs ="i", xaxs = "i")
rasterImage(pic.1, 0, 0, dim(pic.1)[2], dim(pic.1)[1])

# overplotting with new alpha-pic,
# starting with full transparency, decreasing in steps, showing pic.2
# finally:
rasterImage(pic.2.a, 0, 0, dim(pic.1)[2], dim(pic.1)[1])
dev.off()
}
}

# Example, with 2 images, one system.file and one altered
# version of it:

# make black jpg and save to default system folder
Rlogo <- readJPEG(system.file("img", "Rlogo.jpg", package="jpeg"))
Rlogo[] <- 0
jpeg(path.expand("~\\Rlogo_Black.jpg"), dim(Rlogo)[2], dim(Rlogo)[1])
par(mar = rep(0, 4), oma = rep(0, 4))

# save black image:
plot(1:2,
xlim = c(0, 1), ylim = c(0, 1),
xlab="", ylab="", type = "n",
yaxs ="i", xaxs = "i")
rasterImage(Rlogo, 0, 0, 1, 1)
dev.off()

# function call:
jpgfader(img1 = system.file("img", "Rlogo.jpg", package="jpeg"),
img2 = path.expand("~/Rlogo_black.jpg"),
outname = "img12",
outpath = path.expand("~"),
frames = 10)

# see the images:
browseURL(path.expand("~"))

# remove files:
# files <- dir(path.expand("~"), full.names = T)
# file.remove(c(files[grep("img12.", files)],
path.expand("~/Rlogo_black.jpg")))
Read more »

..Some More Regex Examples Added to Collection


Find the below examples added to my list of regex-examples HERE.
ps: just found THIS very informative presentation on regex.





str <- c("i.e., George W. Bush", "Lyndon B. Johnson, etc.")
gsub("([A-Z])[.]?", "\\1", str)
# this will find abbreviated names and remove the fullstops:
# the uppercase letters followed by a full-stop are matched by
# [A-Z][.]? = repeated at most once. the parentheses delineate a
# back reference, i.e. the uppercase letter, which will be
# replaced by \\1 which is the first backreference.

# output:
[1] "i.e., George W Bush" "Lyndon B Johnson, etc."

str <- c("George W. Bush", "Lyndon B. Johnson")
sub(" .*", "", str)
# keeps the first word and removes the rest.
# matches and replaces the substring comprised of the first
# white space followed by any single character,
# designated by the period, repeated zero or more times, as
# given by the asterisk.

# output:
[1] "George" "Lyndon"

sub("\\s\\w+$", "", str)
# removes the last word plus the preceding space in a string.
# looks for a space followed by any word which is the last one:
# the dollar sign $ is a meta-character that matches the
# beginning and end of a line.

# output:
[1] "George W." "Lyndon B."

sub(".*\\s(\\w+$)", "\\1", str)
# keep only the last word of a string.
# looks for anything, repeated arbitrarily often followed by a
# space ".*\\" and a word which is the last in line.
# for this word you put brackets for a back-reference, which is
# returned by "\\1", the 1st back-reference.

# output:
[1] "Bush" "Johnson"

str <- c("&George W. Bush", "Lyndon B. Johnson?")
gsub("[^[:alnum:][:space:].]", "", str)
# keep alphanumeric signs AND full-stop, remove anything else,
# that is, all other punctuation. what should not be matched is
# designated by the caret.

# output:
[1] "George W. Bush" "Lyndon B. Johnson"
Read more »

R-Function to Read Data from Google Docs Spreadsheets

I used this idea posted on Stack Overflow to plug together a function for reading data from Google Docs spreadsheets into R.










google_ss <- function(gid = NA, key = NA) 
{
if (is.na(gid)) {stop("\nWorksheetnumber (gid) is missing\n")}
if (is.na(key)) {stop("\nDocumentkey (key) is missing\n")}
require(RCurl)
url <- getURL(paste("https://docs.google.com/spreadsheet/pub?key=", key,
"&single=true&gid=", gid, "&output=csv", sep = ""),
cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))
read.csv(textConnection(url), header = T, sep = ",")
}

## Example:
## Mind that the worksheets are numbered consecutively from 0 to n,
## irrespective of the actual worksheet-name.
## The key should be put in apostrophes.
## And, the URL works only for published spreadsheets!

(data <- google_ss(gid = 0,
key = "0AmwAunwURQNsdDNpZzJqTU90cmpTU0sza2xLTW9fenc"))
Read more »

Programmatically Download CORINE Land Cover Seamless Vector Data with R

Thanks to a helpful SO-Answer I was able to download all CLC vector data (43 zip-files) programmatically:








require(XML)

path_to_files <- "D:/GIS_DataBase/CorineLC/Seamless"
dir.create(path_to_files)
setwd(path_to_files)

doc <- htmlParse("http://www.eea.europa.eu/data-and-maps/data/clc-2006-vector-data-version-2")
urls <- xpathSApply(doc,'//*/a[contains(@href,".zip/at_download/file")]/@href')

# function to get zip file names
get_zip_name <- function(x) unlist(strsplit(x, "/"))[grep(".zip", unlist(strsplit(x, "/")))]

# function to plug into sapply
dl_urls <- function(x) try(download.file(x, get_zip_name(x), mode = "wb"))

# download all zip-files
sapply(urls, dl_urls)

# function for unzipping
try_unzip <- function(x) try(unzip(x))

# unzip all files in dir and delete them afterwards
sapply(list.files(pattern = "*.zip"), try_unzip)

# unlink(list.files(pattern = "*.zip"))
Read more »

Default Convenience Functions in R (Rprofile.site)

I keep my blog-reference-functions, snippets, etc., at github and want to source them from there. This can be achieved by utilizing a function (source_https, customized for my purpose HERE). The original function was provided by the R-Blogger Tony Breyal - thanks Tony! As I will use this function quite frequently I just added the function code to my Rprofile.site and now am able to source from github whenever running code from the R-console. This is very handy and I thought it might be worth to share..
Read more »

Transformation of Several Variables in a Dataframe

This is how I transform several columns of a dataframe, i.e., with count-data into binary coded data (this would apply also for any other conversion..).

count1 <- count2 <- count3 <- count4 <- sample(c(rep(0, 10), 1:10))
some <- LETTERS[1:20]
thing <- letters[1:20]
mydf <- data.frame(count1, count2, count3, count4, some, thing)

ids <- grep("count", names(mydf))
myfun <- function(x) {ifelse(x > 0, 1, 0)}
mydf[, ids] <- lapply(mydf[, ids], myfun)

p.s.: Let me know if you know of a slicker way.
Read more »

A Function for Adding up Matrices with Different Dimensions

I couldn't find a function that can handle matrices with different dimensions and thus coded one myself.  It can sum up matrices and also copes with matrices with different dimensions.


# File: combmat.R
# Purpose: add up matrices with different dimensions
# Input: a list of 2-dimensional matrices
# Output: a combined matrix
# Author: Kay Cichini
# Date: Nov. 23th 2011

combmat <- function(m_l = list(NA)){
n_m <- length(m_l) # no. of matrices used
rownames_l <- lapply(m_l, rownames) # list of rownames
colnames_l <- lapply(m_l, colnames) # list of colnames
rownames_new <- unique(unlist(rownames_l)) # new, general rownames
colnames_new <- unique(unlist(colnames_l)) # new, general colnames

dimnames_new = list(rownames_new, colnames_new)
m_new <- matrix(nrow = length(rownames_new),
ncol = length(colnames_new),
data = 0,
dimnames = dimnames_new)

m_interm_arr <- # array of intermediate
array(m_new, dim = c(length(rownames_new), # matrices with same no. of
length(colnames_new), n_m), # dimensions as elements in
dimnames = dimnames_new) # list of input matrices

# take i-th element in list of imput matrices and add
# its values according to the appropiate row and col indexes
# in i-th dimension (i-th matrix) within array:
for (i in 1:n_m) {
m_interm_arr[,,i][rownames_l[[i]], colnames_l[[i]]] <- m_l[[i]]
}
return(apply(m_interm_arr, c(1,2), sum))
}

# Example:
print(m1 <- matrix(sample(1:40), 4, 10, dimnames = list(1:4,1:10)))
print(m2 <- matrix(sample(1:40), 10, 4, dimnames = list(1:10,1:4)))

combmat(m_l = list(m1, m2))

It is very likely that someone else may come to a more effective approach - I'd be happy to here about improvements or if there is a package/function doing the same...
Read more »

Use Case: Make Contour Lines for Google Earth with Spatial R

Here's comes a script I wrote for creating contour lines in KML-format to be used with Google Earth https://github.com/gimoya/onlinetrickpdf-Archives/blob/master/R/r_contours_for_google_earth.R

If you want to check or just use the datasets I created for the Alps region, you can download it here: http://terrain-overlays.blogspot.co.at/index.html
Read more »

Web-Scraper for Google Scholar Updated!

I have updated the Google Scholar Web-Scraper Function GScholarScaper_2 to GScholarScraper_3 (and GScholarScaper_3.1) as it was outdated due to changes in the Google Scholar html-code. The new script is more slender and faster. It returns a dataframe or optionally a CSV-file with the titles, authors, publications & links. Feel free to report bugs, etc.



Update 11-07-2013: bug fixes due to google scholar code changes - https://github.com/gimoya/onlinetrickpdf-Archives/blob/master/R/Functions/GScholarScraper_3.2.R. Note that since lately your IP will be blocked by Google at about the 1000th search result (cumulated) - so there's not much fun when you want to do some extensive bibliometrics..
Read more »