Online Trick: R

Showing posts with label R. Show all posts

Sunday, September 20, 2015

Blog Statistics with StatCounter & R

If you're interested in analysing your blog's statistics this can easily be done with a web-service like StatCounter (free, only registration needed, quite extensive service) and with R.
After implementing the StatCounter script in the html code of a webpage or blog one can download and inspect log-files with R with some short lines of code (like below) and then inspect visitor activity..

url <- "http://statcounter.com/p7447608/csv/download_log_file?form_user=MYUSERNAME&form_pass=MYPASSWORD"
file <- paste(tempdir(), "\\log", ".CSV", sep = "")
download.file(url, dest = file)
log <- read.csv(file, as.is = T, header = T)

str(log)

'data.frame':   500 obs. of  19 variables:
 $ Date.and.Time   : chr  "2011-12-19 23:32:30" "2011-12-19 23:20:04" "2011-12-19 23:16:24" "2011-12-19 23:14:40" ...
 $ IP.Address      : chr  "93.129.245.130" "128.227.27.189" "207.63.124.250" "140.247.40.121" ...
 $ IP.Address.Label: logi  NA NA NA NA NA NA ...
 $ Browser         : chr  "Chrome" "Firefox" "Chrome" "Firefox" ...
 $ Version         : chr  "16.0" "8.0" "15.0" "6.0" ...
 $ OS              : chr  "MacOSX" "WinXP" "Win7" "MacOSX" ...
 $ Resolution      : chr  "1280x800" "1680x1050" "1280x1024" "1280x800" ...
 $ Country         : Factor w/ 44 levels "Argentina","Australia",..: 17 44 44 44 44 44 44 44 44 44 ...
 $ Region          : chr  "Nordrhein-Westfalen" "Florida" "Illinois" "Massachusetts" ...
 $ City            : chr  "KÃ¶ln" "Gainesville" "Chicago" "Cambridge" ...
 $ Postal.Code     : int  NA 32611 NA 2138 2138 NA 10003 2138 2138 2138 ...
 $ ISP             : chr  "Telefonica Deutschland GmBH" "UNIVERSITY OF FLORIDA" "Illinois Century Network" "Harvard University" ...
 $ Returning.Count : int  2 0 4 2 2 0 0 2 2 2 ...
 $ Page.URL        : chr  "http://onlinetrickpdf.blogspot.com/2015/09/r-function-google-scholar-webscraper.html" "http://onlinetrickpdf.blogspot.com/2015/09/if-then-vba-script-usage-in-arcgis.html" "http://onlinetrickpdf.blogspot.com/2015/09/how-to-link-to-google-docs-for-download.html" "http://onlinetrickpdf.blogspot.com/2015/09/two-way-permanova-adonis-with-custom.html" ...
 $ Page.Title      : Factor w/ 53 levels "","onlinetrickpdf*",..: 36 50 23 46 10 20 13 9 10 46 ...
 $ Came.From       : chr  "http://stackoverflow.com/questions/5005989/how-to-download-search-results-on-google-scholar-using-r" "http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CCwQFjAC&url=http%3A%2F%2Fonlinetrickpdf.blogspot.com%2F2011%"| __truncated__ "" "" ...
 $ SE.Name         : chr  "" "" "" "" ...
 $ SE.Host         : chr  "" "" "" "" ...
 $ SE.Term         : chr  "" "" "" "" ...

...There is not much to it:

upload a txt file with your script, share it for anyone with the link, then simply run something like the below code.

ps: When using the code for your own purpose mind to change "https" to "http" and to insert your individual document id.
pss: You could use download.file() in this way for downloading any file from Google Docs..

# Example 1:
setwd(tempdir())
download.file("http://docs.google.com/uc?export=download&id=0B2wAunwURQNsMDViYzllMTMtNjllZS00ZTc4LTgzMzEtNDFjMWQ3MTUzYTRk",
              destfile = "test_google_docs.txt", mode = "wb")
# the file contains: x <- sample(100); plot(x)
source(paste(tempdir(), "/test_google_docs.txt", sep = ""))
# remove files from tempdir:
unlink(dir())

# Example 2:
setwd(tempdir())
download.file("http://docs.google.com/uc?export=download&id=0B2wAunwURQNsY2MwMzNhNGMtYmU4Yi00N2FlLWEwYTctYWU3MDhjNTkzOTdi",
              destfile = "google_docs_script.txt", mode = "wb")
# the downloaded script is the GScholarScraper-Function,
# read it and run an example:
source(paste(tempdir(), "/google_docs_script.txt", sep = ""))
# remove files from tempdir:
unlink(dir())

EDIT, MARCH 2013:
Method is outdated, use Tony's from below!

Convert OpenStreetMap Objects to KML with R

A quick geo-tip:
With the osmar and maptools package you can easily pull an OpenStreetMap object and convert it to KML, like below (thanks to adibender helping out on SO). I found the relation ID by googling for it (www.google.at/search?q=openstreetmap+relation+innsbruck).

# get OSM data
library(osmar)
library(maptools)
 
innsbruck <- get_osm(relation(113642), full = T)
sp_innsbruck <- as_sp(innsbruck, what = "lines")
 
# convert to KML
for( i in seq_along(sp_innsbruck) ) {
      kmlLine(sp_innsbruck@lines[[i]], kmlfile = "innsbruck.kml",
               lwd = 3, col = "blue", name = "Innsbruck") 
      }
 
shell.exec("innsbruck.kml")

..The dismo package is awesome: with some short lines of code you can read & map species distribution data from GBIF (the global biodiversity information facility) easily:

library(dismo)

# get GBIF data with function:
myrger <- gbif("Myricaria", "germanica", geo = T)

# check:
str(myrger)

# plot occurrences:
library(maptools)
data(wrld_simpl)
plot(wrld_simpl, col = "light yellow", axes = T)
points(myrger$lon, myrger$lat, col = "red", cex = 0.5)
text(-140, -50, "MYRICARIA\nGERMANICA")

http://upload.wikimedia.org/wikipedia/commons/6/68/Ernst_Haeckel_-_Tree_of_Life.jpg

First off, I'd really like to give a shout-out to the brave people who have created and maintain this great package - the fame is yours!

So, while exploring the capabilities of the package some issues with the ITIS-Server arose and with large datasets things weren't working out quite well for me.
I then switched to the NCBI API and saw that the result is much better here (way quicker, on first glance also a higher coverage).
At the time there is no taxize-function that will pull taxonomic details from a classification returned by NCBI, that's why I plugged together a little wrapper - see here:

# some species data:
spec <- data.frame("Species" = I(c("Bryum schleicheri", "Bryum capillare", "Bryum argentum", "Escherichia coli", "Glis glis")))
spl <- strsplit(spec$Species, " ")
spec$Genus <- as.character(sapply(spl, "[[", 1))

# for pulling taxonomic details we'd best submit higher rank taxons
# in this case Genera. Then we'll submit Genus Bryum only once and 
# save some computation time (might be an issue if you deal 
# with large datasets..)

gen_uniq <- unique(spec$Genus)

# function for pulling classification details ("phylum" in this case)
get_sys_level <- function(x){ require(taxize)
                              a <- classification(get_uid(x))
                              y <- data.frame(a[[1]])                                        # if there are multiple results, take the first..
                              z <- tryCatch(as.character(y[which(y[,2] == "phylum"), 1]),    # in case of any other errors put NA
                                            error = function(e) NA)
                              z <- ifelse(length(z) != 0, z, NA)                             # if the taxonomic detail is not covered return NA
                              return(data.frame(Taxon = x, Syslevel = z))
                             }

# call function and rbind the returned values 
result <- do.call(rbind, lapply(gen_uniq, get_sys_level))
print(result)
#         Taxon       Syslevel
# 1       Bryum   Streptophyta
# 2 Escherichia Proteobacteria
# 3        Glis       Chordata

# now merge back to the original data frame
spec_new <- merge(spec, result, by.x = "Genus", by.y = "Taxon")
print(spec_new)
#         Genus           Species       Syslevel
# 1       Bryum Bryum schleicheri   Streptophyta
# 2       Bryum   Bryum capillare   Streptophyta
# 3       Bryum    Bryum argentum   Streptophyta
# 4 Escherichia  Escherichia coli Proteobacteria
# 5        Glis         Glis glis       Chordata
#

A commentator on my blog recently asked if it is possible to retrieve all direct links to your Google Documents. And indeed it can be very easily done with R, just like so:

# you'll need RGoogleDocs (with RCurl dependency..)
install.packages("RGoogleDocs", repos = "http://www.omegahat.org/R", type="source")
library(RGoogleDocs)



gpasswd = "mysecretpassword"
auth = getGoogleAuth("kay.cichini@gmail.com", gpasswd)
con = getGoogleDocsConnection(auth)

CAINFO = paste(system.file(package="RCurl"), "/CurlSSL/ca-bundle.crt", sep = "")
docs <- getDocs(con, cainfo = CAINFO)

# get file references
hrefs <- lapply(docs, function(x) return(x@access["href"]))
keys <- sub(".*/full/.*%3A(.*)", "\\1", hrefs)
types <- sub(".*/full/(.*)%3A.*", "\\1", hrefs)

# make urls (for url-scheme see: http://techathlon.com/download-shared-files-google-drive/)
# put format parameter for other output formats!
pdf_urls <- paste0("https://docs.google.com/uc?export=download&id=", keys)
doc_urls <- paste0("https://docs.google.com/document/d/", keys, "/export?format=", "txt")

# download documents with your browser
gdoc_ids <- grep("document", types)
lapply(gdoc_ids, function(x) shell.exec(doc_urls[x]))

pdf_ids <- grep("pdf", types, ignore.case = T)
lapply(pdf_ids, function(x) shell.exec(pdf_urls[x]))

Make a KML-File from an OpenStreetMap Trail

Ever wished to use a trail on OSM on your GPS or smartphone? With this neat little R-Script this can easily be done. You'll just need to search OpenStreetMap for the ID of the trail (way), put this as argument to osmar::get_osm, convert to KML and you're good to go!

# get OSM data
library(osmar)
library(maptools)
  
rotewandsteig <- get_osm(way(166274005), full = T)
sp_rotewandsteig <- as_sp(rotewandsteig, what = "lines")
  
# convert to KML 
kmlLine(sp_rotewandsteig@lines[[1]], kmlfile = "rotewandsteig.kml",
        lwd = 3, col = "blue", name = "Rotewandsteig") 

# view it
shell.exec("rotewandsteig.kml")

I'll use the below script that retrieves data for threatened bird species from the World Bank via its API and does some processing, plotting and analysis. There is a package (WDI) that allows you to access the data easily.

# world bank indicators for species - 
# I'll check bird species:
code <- as.character(WDIsearch("bird")[1,1])
bird_data <- WDI(country="all", indicator=code, start=2010, end=2012)

# remove NAs and select values in the range 50 - 1000:
bird_data_sub <- bird_data[!is.na(bird_data$EN.BIR.THRD.NO)&
                           bird_data$EN.BIR.THRD.NO < 1000&
                           bird_data$EN.BIR.THRD.NO > 50, ]

# change in numbers across years 2010 and 2011:
change.no <- aggregate(EN.BIR.THRD.NO ~ country, diff,
                       data = bird_data_sub)
# plot:
par(mar = c(3, 3, 5, 1))
plot(x = change.no[,2], y = 1:nrow(change.no),
     xlim = c(-12, 12), xlab = "", ylab = "",
     yaxt = "n")
abline(v = 0, lty = 2, col = "grey80")
title(main = "Change in Threatened Bird Species in\nCountries with Rich Avifauna (>50)")
text(y = 1:nrow(change.no), 
     x = -2, adj = 1,
     labels = change.no$country)
segments(x0 = 0, y0 = 1:nrow(change.no),
         x1 = change.no[, 2], y1 =  1:nrow(change.no))

# test hypothesis that probability of species decrease is
# equal to probability of increase:
binom.test(sum(change.no < 0), sum(change.no != 0))

For generating the report you can source the script from dropbox.com and stitch it in this fashion:

stitch("http://dl.dropbox.com/s/ga0qbk1o17n17jj/Change_threatened_species.R")

..this is one line of code - can you dig it?..
BTW, for simplicity I use knitr::stitch with its default template...

You should get something like THIS PDF.

EDIT, MARCH 2013
OUTDATED! you can use this approach instead:

library(knitr); library(RCurl); library(WDI)

destfile = "script.txt"
x = getBinaryURL("https://dl.dropbox.com/s/ga0qbk1o17n17jj/Change_threatened_species.R", followlocation = TRUE, ssl.verifypeer = FALSE)
writeBin(x, destfile, useBytes = TRUE)
source(paste(tempdir(), "/script.txt", sep = ""))

stitch(paste(tempdir(), "/script.txt", sep = ""))

I added the function IPtoXY to onlinetrickpdf-Archives which collects geographic coordinates for IP-addresses.

It uses a web-service at http://www.datasciencetoolkit.org// and works with the base R-packages.

# System time to collect coordinates of 100 IP-addresses:
> system.time(sapply(log$IP.Address[1:100], FUN = IPtoXY))
User System verstrichen
0.05 0.02 33.10

When working with raster datasets I often encounter performance issues caused by the large filesizes. I thus wrote up a little R function that invokes gdal_translate which would split the raster into parts which makes subsequent processing more CPU friendly. I didn't use built-in R functions simply because performance is much better when using gdal from the command line..

The screenshot to the left shows a raster in QGIS that was split into four parts with the below script.

## get filesnames (assuming the datasets were downloaded already. 
## please see http://onlinetrickpdf.blogspot.co.at/2013/06/use-r-to-bulk-download-digital.html 
## on how to download high-resolution DEMs)
setwd("D:/GIS_DataBase/DEM")
files <- dir(pattern = ".hgt")

## function for single file processing mind to replace the PATH to gdalinfo.exe!
## s = division applied to each side of raster, i.e. s = 2 gives 4 tiles, 3 gives 9, etc.
split_raster <- function(file, s = 2) {
    
    filename <- gsub(".hgt", "", file)
    gdalinfo_str <- paste0("\"C:/OSGeo4W64/bin/gdalinfo.exe\" ", file)
      
    # pick size of each side
    x <- as.numeric(gsub("[^0-9]", "", unlist(strsplit(system(gdalinfo_str, intern = T)[3], ", "))))[1]
    y <- as.numeric(gsub("[^0-9]", "", unlist(strsplit(system(gdalinfo_str, intern = T)[3], ", "))))[2]
    
    # t is nr. of iterations per side
    t <- s - 1
    for (i in 0:t) {
        for (j in 0:t) {
            # [-srcwin xoff yoff xsize ysize] src_dataset dst_dataset
            srcwin_str <- paste("-srcwin ", i * x/s, j * y/s, x/s, y/s)
            gdal_str <- paste0("\"C:/OSGeo4W64/bin/gdal_translate.exe\" ", srcwin_str, " ", "\"", file, "\" ", "\"", filename, "_", i, "_", j, ".tif\"")
            system(gdal_str)
        }
    }
}

## process all files and save to same directory
mapply(split_raster, files, 2)

Get Long-Term Climate Data from KNMI Climate Explorer

You can query global climate data from the KNMI Climate Explorer (the KNMI is the Royal Netherlands Metereological Institute) with R.

Here's a little example how I retreived data for my hometown Innsbruck, Austria and plotted annual total precipitation. You can choose station data by pointing at a map, by setting coordinates, etc.

# get climate (precipitation) data from url:
# http://climexp.knmi.nl/selectstation.cgi?id=someone@somewhere

# station INNSBRUCK, FLUGHAFEN (11120), 47.27N, 11.35E:
ibk_dat <- read.table("http://climexp.knmi.nl/data/pa11120.dat", sep = "", 
                      row.names = 1, col.names = 0:12)

# cut off first and last yr, due to missing data..
ibk_dat <- ibk_dat[c(-1, -50,]

# plot yearly sums:
windows(width = 15, height = 5)
plot(rowSums(ibk_dat), type = "s", ylab = "Annual Total Precipitation (mm)", 
     xlab = NA, col = "blue", xaxt = "n", lwd = 1.5, las = 2, cex.axis = 0.8,
     main = "INNSBRUCK FLUGHAFEN, 47.27N, 11.35E, 593m, WMO station code: 11120") 
axis(1, labels = rownames(ibk_dat), at = 1:nrow(ibk_dat), las = 2, cex.axis = 0.85)

abline(h = mean(rowSums(ibk_dat)), col = 1, lty = 2, lwd = 1.2)
text(1250, adj = 0, "Long-term average", cex = 0.75)
arrows(x0 = 2.5, y0 = 1220,
       x1 = 2.5, y1 = 930, length = 0.05)

An Image Crossfader Function

Some project offspin, the jpgfader-function (the jpgfader-function in funny use can be viewed HERE):

# purpose: crossfade 2 jpeg images
# packages: jpeg
# arguments: img1 (path.to.img1), img2 (path.to.img2),
#            outpath(defaults to current directory), outname, 
#            frames
# output: png


require(jpeg)

jpgfader <- function(img1 = NA, img2 = NA, outpath = NA, frames = NA, outname = NA){

    if(is.na(outpath)) {outpath <- path.expand("~")}

# stop if images are missing
    if(is.na(img1)|is.na(img2)) stop(cat("\nAt least one image is missing!\n"))
    if(is.na(outname)) {outname = "img.1.2"}

    if(is.na(frames)) {frames = 10}
# read 2 jpegs, assuming same size!
    pic.1 <- readJPEG(img1)
    pic.2 <- readJPEG(img2)

# warn if images dont have same size:
    if(sum(dim(pic.1) != dim(pic.2))>1) warning(cat("\nImages do not have same dimensions!"))

# create new array with 4 dimensions, the new dimension
# representing alpha:

    by = 1/(frames-1)
    alpha = seq(0, 1, by)
    n = length(alpha)

    for(j in n:1){

        pic.2.a <- array(data = c(as.vector(pic.2),
                                  rep(alpha[j], dim(pic.1)[1]*dim(pic.1)[2])),
                         dim = c(dim(pic.1)[1], dim(pic.1)[2], 4))

# assign output file name:
        pic.out <- paste(outpath, "\\", outname, ".", j ,".png", sep = "")

# and open device:
        png(pic.out, width = dim(pic.1)[2], height = dim(pic.1)[1])

# plot-parameters:
        par(mar = rep(0, 4), oma = rep(0, 4), new = F)

# print img.a to plot region:
        plot(1:2,
             xlim = c(0, dim(pic.1)[2]), ylim = c(0, dim(pic.1)[1]),
             xlab="", ylab="", type = "n",
             yaxs ="i", xaxs = "i")
        rasterImage(pic.1, 0, 0, dim(pic.1)[2], dim(pic.1)[1])

# overplotting with new alpha-pic,
# starting with full transparency, decreasing in steps, showing pic.2
# finally:
        rasterImage(pic.2.a, 0, 0, dim(pic.1)[2], dim(pic.1)[1])
        dev.off()
    }
}

# Example, with 2 images, one system.file and one altered 
# version of it:

# make black jpg and save to default system folder
Rlogo <- readJPEG(system.file("img", "Rlogo.jpg", package="jpeg"))
Rlogo[] <- 0
jpeg(path.expand("~\\Rlogo_Black.jpg"), dim(Rlogo)[2], dim(Rlogo)[1])
par(mar = rep(0, 4), oma = rep(0, 4))

# save black image:
plot(1:2,
     xlim = c(0, 1), ylim = c(0, 1),
     xlab="", ylab="", type = "n",
     yaxs ="i", xaxs = "i")
rasterImage(Rlogo, 0, 0, 1, 1)
dev.off()

# function call:
jpgfader(img1 = system.file("img", "Rlogo.jpg", package="jpeg"),
         img2 = path.expand("~/Rlogo_black.jpg"),
         outname = "img12",
         outpath = path.expand("~"),
         frames = 10)

# see the images:
browseURL(path.expand("~"))

# remove files:
# files <- dir(path.expand("~"), full.names = T)
# file.remove(c(files[grep("img12.", files)],
                path.expand("~/Rlogo_black.jpg")))

Find the below examples added to my list of regex-examples HERE.
ps: just found THIS very informative presentation on regex.

str <- c("i.e., George W. Bush", "Lyndon B. Johnson, etc.")
gsub("([A-Z])[.]?", "\\1", str)
# this will find abbreviated names and remove the fullstops:
# the uppercase letters followed by a full-stop are matched by
# [A-Z][.]? = repeated at most once. the parentheses delineate a
# back reference, i.e. the uppercase letter, which will be
# replaced by \\1 which is the first backreference.

# output:
[1] "i.e., George W Bush"    "Lyndon B Johnson, etc."

str <- c("George W. Bush", "Lyndon B. Johnson")
sub(" .*", "", str)
# keeps the first word and removes the rest.
# matches and replaces the substring comprised of the first
# white space followed by any single character,
# designated by the period, repeated zero or more times, as
# given by the asterisk.

# output:
[1] "George" "Lyndon"

sub("\\s\\w+$", "", str)
# removes the last word plus the preceding space in a string.
# looks for a space followed by any word which is the last one:
# the dollar sign $ is a meta-character that matches the
# beginning and end of a line.

# output:
[1] "George W." "Lyndon B."

sub(".*\\s(\\w+$)", "\\1", str)
# keep only the last word of a string.
# looks for anything, repeated arbitrarily often followed by a
# space ".*\\" and a word which is the last in line.
# for this word you put brackets for a back-reference, which is
# returned by "\\1", the 1st back-reference.

# output:
[1] "Bush"    "Johnson"

str <- c("&George W. Bush", "Lyndon B. Johnson?")
gsub("[^[:alnum:][:space:].]", "", str)
# keep alphanumeric signs AND full-stop, remove anything else,
# that is, all other punctuation. what should not be matched is
# designated by the caret.

# output:
[1] "George W. Bush"    "Lyndon B. Johnson"

I used this idea posted on Stack Overflow to plug together a function for reading data from Google Docs spreadsheets into R.

google_ss <- function(gid = NA, key = NA) 
    {
    if (is.na(gid)) {stop("\nWorksheetnumber (gid) is missing\n")}
    if (is.na(key)) {stop("\nDocumentkey (key) is missing\n")}
    require(RCurl)
    url <- getURL(paste("https://docs.google.com/spreadsheet/pub?key=", key,
                        "&single=true&gid=", gid, "&output=csv", sep = ""),
                  cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))
    read.csv(textConnection(url), header = T, sep = ",")
    }

## Example:
## Mind that the worksheets are numbered consecutively from 0 to n,
## irrespective of the actual worksheet-name.
## The key should be put in apostrophes.
## And, the URL works only for published spreadsheets!

(data <- google_ss(gid = 0,
                   key = "0AmwAunwURQNsdDNpZzJqTU90cmpTU0sza2xLTW9fenc"))

Thanks to a helpful SO-Answer I was able to download all CLC vector data (43 zip-files) programmatically:

require(XML)

path_to_files <- "D:/GIS_DataBase/CorineLC/Seamless"
dir.create(path_to_files)
setwd(path_to_files)

doc <- htmlParse("http://www.eea.europa.eu/data-and-maps/data/clc-2006-vector-data-version-2")
urls <- xpathSApply(doc,'//*/a[contains(@href,".zip/at_download/file")]/@href')

# function to get zip file names
get_zip_name <- function(x) unlist(strsplit(x, "/"))[grep(".zip", unlist(strsplit(x, "/")))]

# function to plug into sapply
dl_urls <- function(x) try(download.file(x, get_zip_name(x), mode = "wb"))

# download all zip-files
sapply(urls, dl_urls)

# function for unzipping
try_unzip <- function(x) try(unzip(x))

# unzip all files in dir and delete them afterwards
sapply(list.files(pattern = "*.zip"), try_unzip)

# unlink(list.files(pattern = "*.zip"))

I keep my blog-reference-functions, snippets, etc., at github and want to source them from there. This can be achieved by utilizing a function (source_https, customized for my purpose HERE). The original function was provided by the R-Blogger Tony Breyal - thanks Tony! As I will use this function quite frequently I just added the function code to my Rprofile.site and now am able to source from github whenever running code from the R-console. This is very handy and I thought it might be worth to share..

Transformation of Several Variables in a Dataframe

This is how I transform several columns of a dataframe, i.e., with count-data into binary coded data (this would apply also for any other conversion..).

count1 <- count2 <- count3 <- count4 <- sample(c(rep(0, 10), 1:10))
some <- LETTERS[1:20]  
thing <- letters[1:20]  
mydf <- data.frame(count1, count2, count3, count4, some, thing)

ids <- grep("count", names(mydf))
myfun <- function(x) {ifelse(x > 0, 1, 0)}
mydf[, ids] <- lapply(mydf[, ids], myfun)

p.s.: Let me know if you know of a slicker way.

A Function for Adding up Matrices with Different Dimensions

I couldn't find a function that can handle matrices with different dimensions and thus coded one myself. It can sum up matrices and also copes with matrices with different dimensions.

# File: combmat.R
# Purpose: add up matrices with different dimensions
# Input: a list of 2-dimensional matrices
# Output: a combined matrix
# Author: Kay Cichini
# Date: Nov. 23th 2011
 
combmat <- function(m_l = list(NA)){
  n_m <- length(m_l)                               # no. of matrices used
  rownames_l <- lapply(m_l, rownames)              # list of rownames
  colnames_l <- lapply(m_l, colnames)              # list of colnames
  rownames_new <- unique(unlist(rownames_l))       # new, general rownames
  colnames_new <- unique(unlist(colnames_l))       # new, general colnames
 
  dimnames_new = list(rownames_new, colnames_new)
  m_new <- matrix(nrow = length(rownames_new),
                 ncol = length(colnames_new),
                 data = 0,
                 dimnames = dimnames_new)
 
  m_interm_arr <-                                  # array of intermediate 
  array(m_new, dim = c(length(rownames_new),       # matrices with same no. of 
                       length(colnames_new), n_m), # dimensions as elements in  
        dimnames = dimnames_new)                   # list of input matrices
 
  # take i-th element in list of imput matrices and add
  # its values according to the appropiate row and col indexes
  # in i-th dimension (i-th matrix) within array:
  for (i in 1:n_m) {
    m_interm_arr[,,i][rownames_l[[i]], colnames_l[[i]]] <- m_l[[i]]
  }
  return(apply(m_interm_arr, c(1,2), sum))
}
 
# Example:
print(m1 <- matrix(sample(1:40), 4, 10, dimnames = list(1:4,1:10)))
print(m2 <- matrix(sample(1:40), 10, 4, dimnames = list(1:10,1:4)))
 
combmat(m_l = list(m1, m2))

It is very likely that someone else may come to a more effective approach - I'd be happy to here about improvements or if there is a package/function doing the same...

Here's comes a script I wrote for creating contour lines in KML-format to be used with Google Earth https://github.com/gimoya/onlinetrickpdf-Archives/blob/master/R/r_contours_for_google_earth.R

If you want to check or just use the datasets I created for the Alps region, you can download it here: http://terrain-overlays.blogspot.co.at/index.html

I have updated the Google Scholar Web-Scraper Function GScholarScaper_2 to GScholarScraper_3 (and GScholarScaper_3.1) as it was outdated due to changes in the Google Scholar html-code. The new script is more slender and faster. It returns a dataframe or optionally a CSV-file with the titles, authors, publications & links. Feel free to report bugs, etc.

Update 11-07-2013: bug fixes due to google scholar code changes - https://github.com/gimoya/onlinetrickpdf-Archives/blob/master/R/Functions/GScholarScraper_3.2.R. Note that since lately your IP will be blocked by Google at about the 1000th search result (cumulated) - so there's not much fun when you want to do some extensive bibliometrics..

Online Trick

Sunday, September 20, 2015

Blog Statistics with StatCounter & R

How to Download and Run Google Docs Script in the R Console

Convert OpenStreetMap Objects to KML with R

Retrieve GBIF Species Occurrence Data with Function from dismo Package

Taxonomy with R: Exploring the Taxize-Package

Download all Documents from Google Drive with R

Make a KML-File from an OpenStreetMap Trail

knitr-Example: Use World Bank Data to Generate Report for Threatened Bird Species

Function to Collect Geographic Coordinates for IP-Addresses

Use GDAL from R Console to Split Raster into Tiles

Get Long-Term Climate Data from KNMI Climate Explorer

An Image Crossfader Function

..Some More Regex Examples Added to Collection

R-Function to Read Data from Google Docs Spreadsheets

Programmatically Download CORINE Land Cover Seamless Vector Data with R

Default Convenience Functions in R (Rprofile.site)

Transformation of Several Variables in a Dataframe

A Function for Adding up Matrices with Different Dimensions

Use Case: Make Contour Lines for Google Earth with Spatial R

Web-Scraper for Google Scholar Updated!