lunedì 31 ottobre 2011

R 2.14.0 is released!

The new R 2.14.0 is out! Get the source code from here.
Take a look at these posts for some miscellaneous advices to make the upgrade easier.
Also this thread on stackoverflow and this post contributed by Tal Galili can be of some value to make the procedure less painful.
Feel free to contribute with suggestions about how to upgrade your R installation.

mercoledì 27 luglio 2011

Word Cloud in R

A word cloud (or tag cloud) can be an handy tool when you need to highlight the most commonly cited words in a text using a quick visualization. Of course, you can use one of the several on-line services, such as wordle or tagxedo , very feature rich and with a nice GUI. Being an R enthusiast, I always wanted to produce this kind of images within R and now, thanks to the recently released Ian Fellows' wordcloud package, finally I can!
In order to test the package I retrieved the titles of the XKCD web comics included in my RXKCD package and produced a word cloud based on the titles' word frequencies calculated using the powerful tm package for text mining (I know, it is like killing a fly with a bazooka!).

library(RXKCD)
library(tm)
library(wordcloud)
library(RColorBrewer)
path <- system.file("xkcd", package = "RXKCD")
datafiles <- list.files(path)
xkcd.df <- read.csv(file.path(path, datafiles))
xkcd.corpus <- Corpus(DataframeSource(data.frame(xkcd.df[, 3])))
xkcd.corpus <- tm_map(xkcd.corpus, removePunctuation)
xkcd.corpus <- tm_map(xkcd.corpus, content_transformer(tolower))
xkcd.corpus <- tm_map(xkcd.corpus, function(x) removeWords(x, stopwords("english")))
tdm <- TermDocumentMatrix(xkcd.corpus)
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
pal <- brewer.pal(9, "BuGn")
pal <- pal[-(1:2)]
png("wordcloud.png", width=1280,height=800)
wordcloud(d$word,d$freq, scale=c(8,.3),min.freq=2,max.words=100, random.order=T, rot.per=.15, colors=pal, vfont=c("sans serif","plain"))
dev.off()

As a second example,  inspired by this post from the eKonometrics blog, I created a word cloud from the description of  3177 available R packages listed at http://cran.r-project.org/web/packages.
require(XML)
require(tm)
require(wordcloud)
require(RColorBrewer)
u = "http://cran.r-project.org/web/packages/available_packages_by_date.html"
t = readHTMLTable(u)[[1]]
ap.corpus <- Corpus(DataframeSource(data.frame(as.character(t[,3]))))
ap.corpus <- tm_map(ap.corpus, removePunctuation)
ap.corpus <- tm_map(ap.corpus, content_transformer(tolower))
ap.corpus <- tm_map(ap.corpus, function(x) removeWords(x, stopwords("english")))
ap.corpus <- Corpus(VectorSource(ap.corpus))
ap.tdm <- TermDocumentMatrix(ap.corpus)
ap.m <- as.matrix(ap.tdm)
ap.v <- sort(rowSums(ap.m),decreasing=TRUE)
ap.d <- data.frame(word = names(ap.v),freq=ap.v)
table(ap.d$freq)
pal2 <- brewer.pal(8,"Dark2")
png("wordcloud_packages.png", width=1280,height=800)
wordcloud(ap.d$word,ap.d$freq, scale=c(8,.2),min.freq=3,
max.words=Inf, random.order=FALSE, rot.per=.15, colors=pal2)
dev.off()

As a third example, thanks to Jim's comment, I take advantage of Duncan Temple Lang's RNYTimes package to access user-generate content on the NY Times and produce a wordcloud of 'today' comments on articles.
Caveat: in order to use the RNYTimes package you need a API key from The New York Times which you can get by registering to the The New York Times Developer Network (free of charge) from here.
require(XML)
require(tm)
require(wordcloud)
require(RColorBrewer)
install.packages(packageName, repos = "http://www.omegahat.org/R", type = "source")
require(RNYTimes)
my.key <- "your API key here"
what= paste("by-date", format(Sys.time(), "%Y-%m-%d"),sep="/")
# what="recent"
recent.news <- community(what=what, key=my.key)
pagetree <- htmlTreeParse(recent.news, error=function(...){}, useInternalNodes = TRUE)
x <- xpathSApply(pagetree, "//*/body", xmlValue)
# do some clean up with regular expressions
x <- unlist(strsplit(x, "\n"))
x <- gsub("\t","",x)
x <- sub("^[[:space:]]*(.*?)[[:space:]]*$", "\\1", x, perl=TRUE)
x <- x[!(x %in% c("", "|"))]
ap.corpus <- Corpus(DataframeSource(data.frame(as.character(x))))
ap.corpus <- tm_map(ap.corpus, removePunctuation)
ap.corpus <- tm_map(ap.corpus, content_transformer(tolower))
ap.corpus <- tm_map(ap.corpus, function(x) removeWords(x, stopwords("english")))
ap.tdm <- TermDocumentMatrix(ap.corpus)
ap.m <- as.matrix(ap.tdm)
ap.v <- sort(rowSums(ap.m),decreasing=TRUE)
ap.d <- data.frame(word = names(ap.v),freq=ap.v)
table(ap.d$freq)
pal2 <- brewer.pal(8,"Dark2")
png("wordcloud_NewYorkTimes_Community.png", width=1280,height=800)
wordcloud(ap.d$word,ap.d$freq, scale=c(8,.2),min.freq=2,
max.words=Inf, random.order=FALSE, rot.per=.15, colors=pal2)
dev.off()


giovedì 14 luglio 2011

R meets XKCD

Being a big fan of XKCD and, of course, of the R programming language, I thought that a package which allows to display my favorite strips  would something (useless) but cool!
So, mimicking the approach (and the code) of the fortunes package (thanks Achim Zeileis!), I created a simple package (names RXKCD) which allows the user to displays his favorite XKCD strip by selecting the specific number, randomly or simply displaying the current strip.
You can install the package using:
if (!require('RJSONIO')) install.packages('RJSONIO', repos = 'http://cran.r-project.org')
if (!require('png')) install.packages('png', repos = 'http://cran.r-project.org')
if (!require('ReadImages')) install.packages('ReadImages', repos = 'http://cran.r-project.org')
install.packages("RXKCD", repos="http://R-Forge.R-project.org")
And you can use it by typing:
library(RXKCD)
searchXKCD("someone is wrong")
getXKCD(386)
Below the result (xkcd license):


Update: The updated version of the package , which is available from CRAN (just type install.packages("RXKCD") ), allows the user to save the xkcd metadata database in a local directory (.Rconfig) and update it in order to have access to the latest XKCD info: see ?saveConfig and ?updateConfig.

venerdì 24 giugno 2011

Installing Multiple Version of R in parallel on the same machine - Mac OS X

In a few days I'm going to attend a Bioconductor Course; I was requested to install on my MacBook (Mac OS X 10.5.8) a developer version of R (plus ad hoc Bioconductor packages). In order to keep my old R installation ((2.13) along side the new one (2.14) I decided to use the RSwitch app (you can download from here) and the instructions you can read here.
In practical term, you type the following commands in Terminal:

sudo pkgutil --forget org.r-project.R.Leopard.fw.pkg
sudo pkgutil --forget org.r-project.R.Leopard.GUI.pkg
sudo pkgutil --forget org.r-project.R.Leopard.GUI64.pkg


You install the alternative version of R (for example, following the procedure depicted here) and then you can switch between the different version using the RSwitch GUI (see the below screenshot). So easy!



giovedì 14 aprile 2011

R 2.13.0 is released!

The new R 2.13.0 is out! Get the source code from here.
Take a look at these posts for some miscellaneous advices to make the upgrade easier.
Also this thread on stackoverflow and this post contributed by Tal Galili can be of some value to make the procedure less painful.
Feel free to contribute with suggestions about how to upgrade your R installation.

mercoledì 2 febbraio 2011

venerdì 31 dicembre 2010