Sunday, September 20, 2015

Webscraping Google Scholar & Show Result as Word Cloud Using R

NOTE: Please see the update HERE and HERE!

...When reading Scott Chemberlain's last post about web-scraping I felt it was time to pick up and complete an idea that I was brooding over for some time now:

When a scientist aims out for a new project the first thing to do is to evaluate if other people already have come along to answer the very questions he is about to work on. I.e., I was interested if there has been done any research regarding amphibian diversity at regional/geographical scales correlated to environmental/landscape parameters. Usually I would got to Google-Scholar and search something like - intitle:amphibians AND intitle:richness OR intitle:diversity AND environment OR landscape - and then browse thru the results. But, this is often tedious and a way for a quick visual examination would be of great benefit.

The code I present will solve this task. It may be awkward in places and there might be a more effective way to yield the same result - but it may serve as a starter and I would very much appreciate people more literate than me picking up the torch...

For my example-search it is shown that there has not been very much going on regarding amphibian diversity correlated to environment and landscape...

See code HERE.

PS: I'd be happy about collaboration / tips / editing - so feel free to contact me and I will add you to the list of editors - you then could edit / comment / add to the script on Google Docs.

...some drawbacks need to be considered:
  • Maximum no. of search results = 100
  • Only titles are considered. Additionally considering abstracts may yield more representative results.. but abstracts are truncated in the search result and I don't know if it is possible to retrieve the full abstracts.
  • Also, long titles may be truncated...
  • A more illustrative result would be achieved if one could get rid of all other words than nouns, verbs and adjectives - don't know how to do this, but I am sure this is possible.
  • more drawbacks? you tell..

No comments:

Post a Comment