Showing posts with label Ordination diagram. Show all posts
Showing posts with label Ordination diagram. Show all posts

Sunday, September 20, 2015

A Word Cloud with Spatial Meaning

..Some time ago I did a word cloud for representing a Google Scholar search result. Tal Galili pointed me at a post by Drew Conway that expanded on the topic of word clouds lacking spatial meaning. In fact the spatial ordering of words in a word cloud is arbitrary and meaningless..

As I am an ecologist, I soon came to the idea that text could be treated as a multivariate data set - assuming that words can be treated as species and sentences being similar to samples. So, presuming that it makes sense to put sentences and words in a cross-table as I similarly would do with a species / samples matrix, it may also be sensible to analyze such a matrix by ordination-methods for multivariate data, mostly used by ecologist recently. I chose NMDS ordination, as it is robust and quite easy to compute with R-package {vegan}.


In a NMDS ordination plot the distances between Species/Words that often co-occurre within sentences or/and within groups of sentences (say, sentences said by you vs. sentences said by me) are minimized. That is, words associated with each other or with words within levels of a grouping-factor are plotted closer to each other as comapred to words with low association.

In my simple example two texts are compared, each with five sentences. One with sentences I said about you (denoted by red "Is") and sentences said by you about yourself (the red "Ys"). Words used by both of us are in the intersection. Whereas, e.g., words said exclusively by me are far away from the centroid of sentences said by you, and vice versa. I will not annoy you with the nitty-gritty stuff of ordination methods or NMDS, you will have to check this yourself.
Word frequencies are represented by size of the plotted text, as in the usual word clouds..

So, to all linguists out there, what do you think??

The stand-alone code to produce this word cloud can be found HERE.
Read more »

Avoid Overplotting of Text in Ordination Diagram

Referring to a recent posting on r-sig-eco mailing list I'll add this example to onlinetrickpdf:














library(vegan)
library(vegan)
data(dune)
sol <- metaMDS(dune)

# use ordipointlabel -
# here is an example where I added cex according to species frequencies:
plot(sol, type = "n")
cex.lab = colSums(dune > 0) / nrow(dune) + 1
col.lab = rgb(0.2, 0.5, 0.4, alpha = 0.6)
ordipointlabel(sol, displ = "sp", col = col.lab, cex = cex.lab)


# you could also use pointLabel() from maptools package:
library(maptools)
x = as.vector(sol$species[,1])
y = as.vector(sol$species[,2])
w = row.names(sol$species)

plot(sol, type = "n")
points(sol, displ = "species", cex = 1, pch = 4, col = 3)
pointLabel(x, y, w, col = col.lab, cex = cex.lab)
Read more »

Custom Labels for Ordination Diagram

Here is how you do custom labels, hull, spider in a vegan ordination diagram:


library(vegan)

### data on 35 species of Oribatid mites
### data description at: http://cc.oulu.fi/~jarioksa/softhelp/vegan/html/mite.html
data(mite)

### ...and environmental variables
data(mite.env)

### the factor which you wish to use for labeling the displayed sites
### in the ordination diagramm:
fac<-mite.env$Topo
sol<-metaMDS(mite)

windows(9,5)
par(mfrow=c(1,2))

### graph with "spider":
plot(sol,type="n")
points(sol, display = "sites", cex=0.65, select=which(fac=="Hummock"),
pch = 21,col="black", bg="black")
points(sol, display = "sites", cex=0.65, select=which(fac=="Blanket"),
pch = 21,col="black", bg="white")
ordihull(sol,group=fac,show.groups="Hummock")
ordihull(sol,group=fac,show.groups="Blanket",lty=3)
orditorp(sol, dis = "sp", pcex=0,air=0.85,col="grey35",cex=0.8 ,font=3)

### graph with "hull":
plot(sol,type="n")
points(sol, display = "sites", cex=0.65, select=which(fac=="Hummock"),
pch = 21,col="black", bg="black")
points(sol, display = "sites", cex=0.65, select=which(fac=="Blanket"),
pch = 21,col="black", bg="white")
ordispider(sol,group=fac,show.groups="Hummock")
ordispider(sol,group=fac,show.groups="Blanket",lty=3)
orditorp(sol, dis = "sp", pcex=0,air=0.85,col="grey35",cex=0.8 ,font=3)

R package ‘vegan’ citation: Jari Oksanen, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre,
R. B. O'Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens
and Helene Wagner (2011). vegan: Community Ecology Package. R package
version 1.17-9. http://CRAN.R-project.org/package=vegan

See also:
Read more »