lunedì 16 novembre 2009

R in Action - early thoughts

I was invited to review the book R in Action written by Rob Kabacoff. Since I consider the Quick-R website, created by the same smart guy, one of the most valuable resources about R, It is both an honor and a pleasure to have the opportunity to take an early look at his book and to express some thoughts about it.

First, this book is distributed under an early access policy that means, as it is stated on the editor's web site, that: This Early Access version of the book enables you to receive new chapters as they are being written. You can also interact with the authors to ask questions, provide feedback and errata, and help shape the final manuscript on the Author Online. This is a nice publishing approach, the editor settled up an ad-hoc forum which allows real-time feedback from early adopters. This beta-test sort of approach is convenient both to the author that can fix errata and improve contents before the final version is published and to the early adopters that can access to useful contents in advance and receive valuable explanations directly from the author.

Since only the initial part of the book is available, this short review will be at most incomplete and present only preliminary thoughts. I'm going to update the review as soon as I have the possibility to read the rest of the book.

R in Action, as mimicked in its structure, aims to guide the new adopters from the vary basics of the language through to the most advanced features by a progressive task-driven approach carefully curated by the author.

In the initial part of the book, Kabacoff covers all the basic features of the language from data manipulation to the basic statistics required to make sense of the data plus the most common and useful graphical methods for visualizing them.

The author makes large use of working example. This is one of the most effective teaching technique, in my opinion, because it encourages readers to apply immediately the knowledge acquired.

An other nice ingredient of Kabacoff method is to introduce effective high quality packages from the huge R collection to solve a proposed task. For example, in chapter three the author introduces the rename function from the awesome reshape package to rename the columns of a data.frame. This is a very trivial task, that can be easily managed by standard R (as the author shows shortly afterward); but the smoothly introduction of this useful package, explained and used more extensively in the forthcoming chapters, represents a nice touch that both means to manage the task in a more elegant way and introduces the user to a powerful tool.
In this fashion, the tasks presented in the text are addressed using several different packages in order to depict the various alternative methods available in R.
Furthermore, the numerous notes accompanying the explanations serve both to make easier the understanding of the described concepts and to provide useful insights about R features and idiosyncrasies.

To sum up, the chapters I had the opportunity to examine are a solid base for people getting started with R. I'm impatient to dig through the forthcoming chapters of the book which deal with advanced statistics and graphics!

I warmly recommend this book even in this early stage: if you are new to R programming this is a valid approach to start being familiar with the language and make effective use of it in from day one.

giovedì 29 ottobre 2009

Bioconductor 2.5 is out

For all bioinformaticians and R users out there: the Bioconductor project  for the analysis and comprehension of genomic data is out! A lot of interesting new stuff! See the full announcement here.

lunedì 26 ottobre 2009

R 2.10.0 is Out!

The new R 2.10.0 is out! Get it from here.
If you like take a look at these posts for some miscellaneous advices to make the upgrade easier.
Feel free to contribute with suggestions about how to upgrade your R installation.

mercoledì 14 ottobre 2009

The Elements of Statistical Learning

The Elements of Statistical Learning written by Trevor Hastie, Robert Tibshirani and Jerome Friedman is A-MUST-TO-READ for everyone involved in the data mining field! Now you can legally download a copy of the book in pdf format from the authors website! Grab it here!

venerdì 4 settembre 2009

R Flashmob #2

As I said before, I consider the R-Help mailing list an invaluable source of information if you want to get things done in R. Recently the stackoverflow website, a site where programmers can post and answer questions about a wide list of programming languages, has been populated with a lot of questions and answers regarding R thanks to a 'virtual' flash mob. Because of this event, stackoverflow has become a extremely  precious web 2.0 resource for the R community.

An other R Flash Mob event is scheduled for Tuesday, 8th September. I warmly recommend all my readers to take part to this event so to populate the stackoverflow site with even more useful questions and answers about our beloved R.

You can find both the event details and a letter, depicting the event, which you may forward to your colleagues/R-fanboy here.

mercoledì 5 agosto 2009

Locate the position of CRAN mirror sites on a map using Google Maps

Inspired by this post (suggested here by the always useful Revolutions blog), I attempted to plot the position of CRAN mirrors on a map taking advantage of the nice R package RgoogleMaps (check the dependencies!). Below the code:

library(XML)
# download.file("http://www.maths.lancs.ac.uk/~rowlings/R/Cranography/cran.gml",destfile="cran.gml")
cran.gml <- xmlInternalTreeParse("cran.gml")
# Create a data.frame assembling all the information from the gml file
Name <- sapply(getNodeSet(cran.gml, "//ogr:Name"), xmlValue)
Country <- sapply(getNodeSet(cran.gml, "//ogr:Country"), xmlValue)
City <- sapply(getNodeSet(cran.gml, "//ogr:City"), xmlValue)
URL <- sapply(getNodeSet(cran.gml, "//ogr:URL"), xmlValue)
Host <- sapply(getNodeSet(cran.gml, "//ogr:Host"), xmlValue)
Maintainer <- sapply(getNodeSet(cran.gml, "//ogr:Maintainer"), xmlValue)
CountryCode <- sapply(getNodeSet(cran.gml, "//ogr:countryCode"), xmlValue)
lng <- as.numeric(sapply(getNodeSet(cran.gml, "//ogr:lng"), xmlValue))
lat <- as.numeric(sapply(getNodeSet(cran.gml, "//ogr:lat"), xmlValue))
cran.mirrors <- data.frame(Name, Country, City, URL, Host, Maintainer, CountryCode, lng, lat)
# cran.mirrors <- cbind(getCRANmirrors(), lng, lat) ## alternatively
library(RgoogleMaps)
# Define the markers:
cran.markers <- cbind.data.frame( lat=cran.mirrors$lat, lon=cran.mirrors$lng, 
size=rep('tiny', length(cran.mirrors$lat)), col=colors()[1:length(cran.mirrors$lat)], 
char=rep('',length(cran.mirrors$lat)) )
# Get the bounding box:
bb <- qbbox(lat = cran.markers[,"lat"], lon = cran.markers[,"lon"])
num.mirrors <- 1:dim(cran.markers)[1] ## to visualize only a subset of the cran.mirrors
maptype <- c("roadmap", "mobile", "satellite", "terrain", "hybrid", "mapmaker-roadmap", "mapmaker-hybrid")[1]
# Download the map (either jpg or png): 
MyMap <- GetMap.bbox(bb$lonR, bb$latR, destfile = paste("Map_", maptype, ".png", sep=""), GRAYSCALE=F, maptype = maptype)
# Plot:
png(paste("CRANMirrorsMap_", maptype,".png", sep=""), 640, 640)
tmp <- PlotOnStaticMap(MyMap,lat = cran.markers[num.mirrors,"lat"], lon = cran.markers[num.mirrors,"lon"], 
cex=1, pch="R",col=as.numeric(cran.mirrors$Country), add=F)
dev.off()


## Hosts from Italy
maptype <- c("roadmap", "mobile", "satellite", "terrain", "hybrid", "mapmaker-roadmap", "mapmaker-hybrid")[4]
num.it <- row.names(cran.mirrors[cran.mirrors$CountryCode=="IT",])
# Get the bounding box:
bb.it <- qbbox(lat = cran.markers[num.it,"lat"], lon = cran.markers[num.it,"lon"])
# Download the map (either jpg or png):
ITMap <- GetMap.bbox(bb.it$lonR, bb.it$latR, destfile = paste("ITMap_", maptype, ".png", sep=""), GRAYSCALE=F, maptype = maptype)
#ITMap <- GetMap.bbox(bb.it$lonR, bb.it$latR, destfile = paste("ITMap_", maptype, ".jpg", sep=""), GRAYSCALE=F, maptype = maptype)
# Plot:
png(paste("CRANMirrorsMapIT_", maptype,".png", sep=""), 640, 640);
tmp <- PlotOnStaticMap(ITMap,lat = cran.markers[num.it,"lat"], lon = cran.markers[num.it,"lon"], 
cex=2, pch="R",col="dodgerblue", add=F)
# tmp <- PlotOnStaticMap(ITMap,lat = cran.markers[num.it,"lat"], lon = cran.markers[num.it,"lon"],labels=as.character(cran.mirrors[cran.mirrors$CountryCode=="IT",]$Host),col="black", FUN=text, add=T)
dev.off()


CAVEAT: To reproduce the example you need the gml file you can download from here , a  Google account and a Google Maps API key. Here you can sign up for a free API key.

domenica 26 luglio 2009

Rosetta Code

Today I'd like to suggest the interesting Rosetta Code site:

Rosetta Code is a programming chrestomathy site. The idea is to present solutions to the same task in as many different languages as possible, to demonstrate how languages are similar and different, and to aid a person with a grounding in one approach to a problem in learning another.

Since the R coverage of the different tasks is still largely incomplete, I encourage everyone to populate the missing tasks with appropriate R code.