One R Tip A Day: function

Visualizzazione post con etichetta function. Mostra tutti i post

martedì 28 aprile 2009

Tips from the R-help list : shadow text in a plot and bumps charts

Stumbling across the R-help mailing-list I found, as often happens, two threads in the spirit of this blog (of course, since they come from the list, the quality is higher): here you can find a function allowing a shadow outline style for a text in a plot. From here you can follow an interesting thread depicting how to produce bumps charts in R.

giovedì 11 dicembre 2008

Tips from Jason

I want to thank Jason Vertrees for the following collection of useful tips!

(1) Use ~/.Rprofile for repeated environment initialization

(2) Ever have the problem of a large data frame only being displayed across 40% of your terminal window? Then, you can resize the R display to fit the size of your terminal window. Use the following "wideScreen" function:

# define wideScreen

wideScreen <- function() {

options(width=as.integer(Sys.getenv("COLUMNS")));

}

#

# Test wideScreen

#

a <- rnorm(100)

a

wideScreen()

# notice how the data fill the screen

a

(3) Get familiar with colorspace. For example, if you need to color data points across a range, you can easily do:



##

## lut.R -- small function that returns a cool pallete of nColors

##

require(colorspace)

lut <- function(nColors=20) {

return(hex(HSV(seq(0, 360, length=nColors)[-nColors], 1, 1)));

}

# Now use lut.

plot( rnorm(100), col=lut(100)[1:100] )

# Now use just a range; use colors near purple; pretty

# much like gettins subsections of rainbow.colors()

plot( rnorm(30), col=lut(100)[71:100] )

(4) Given an N-dimensional data set, (m instances in N dimensions), find the K-nearest neighbors to a given row/instance/point:



##

## neighbors -- find and return the K closest neighbors to "home"

##

neighbors <- function( dat, home, k=10 ) {

theHood <- apply( dat, 1, function(x) sqrt(sum((x-home)**2)))

return(order(theHood)[1:k] )

}

# Use it.  Create a random 10x10 matrix and find which rows

# in D are closest (Euclidean-wise) to row 1.

d <- matrix( rnorm(100), nrow=10, ncol=10)

neighbors(d, d[1,], k=3)

(5) A _VERY_ useful tip is to show the users the vast difference in speed between using for, apply, sapply, mapply and tapply. A for loop is typically very slow, where the ?apply family is great. You can use the apply vs for-loop in the neighbors function above with a timer on a large set to show the difference.

(6) Another useful tip, also in neighbors is generating difference vectors and their lengths:



# the difference vector between two vectors is very easy,

c <- a -b

# now the vector length (how far apart in Euclidean space these two points are)

sqrt(sum(c**2))

mercoledì 3 dicembre 2008

Retrieving the author of a script

I know that the best/recommended way to manage the authoring of R code consists in building a package containing a DESCRIPTION file.
Nevertheless, I wrote a very basic function retrieving the name of the authors of a script (or any text file) if these names are written within the first three rows of the file (easily changeable) with this format:

##
## Author:Pinco Palla, Paolino Paperino, Topo Gigio
##

The function:

catch.the.name <- function(filename="myscript.R"){

require(gdata)

str <- scan(filename, what='character', nlines=3, sep="\t", quiet=TRUE)

author <- grep("Author:([^ ]+)", str, value=T)

author <-sub('^.*Author:', "", author)

author <-strsplit(author,",")

author <- trim(author)

return(author[[1]])

}

giovedì 17 maggio 2007

Quick and dirty function for descriptive statistics

desc <- function(mydata) {
require(e1071)
quantls <- quantile(x=mydata, probs=seq(from=0, to=1, by=0.25))
themean <- mean(mydata)
thesd <- sd(mydata)
kurt <- kurtosis(mydata)
skew <- skewness(mydata)
retlist <- list(Quantiles=quantls, Mean=themean,
StandDev=thesd,Skewness=skew, Kurtosis=kurt)
return(retlist)
}
# example
exampledata <- rnorm(10000)
summary(exampledata)
desc(exampledata)

mercoledì 2 maggio 2007

ls() improved!

This marvelous little function shows all objects in the current workspace
by mode, class and 'size'! Thanks to Bendix Carstensen!

lls <- function (pos = 1, pat = "")

{

  dimx <- function(dd) if (is.null(dim(dd)))

  length(dd)

  else dim(dd)

  lll <- ls(pos = pos, pat = pat)

  cat(formatC("mode", 1, 15), formatC("class", 1, 18),

  formatC("name",1, max(nchar(lll)) + 1), "size\n-----------------------------------------------------------------\n")

  if (length(lll) > 0)

  {

    for (i in 1:length(lll))

    {

    cat(formatC(eval(parse(t = paste("mode(", lll[i],

    ")"))), 1, 15), formatC(paste(eval(parse(t = paste("class(",

    lll[i], ")"))), collapse = " "), 1, 18), formatC(lll[i],

    1, max(nchar(lll)) + 1), " ", eval(parse(t = paste("dimx(", lll[i], ")"))), "\n")

    }

  }

}

giovedì 26 aprile 2007

How to Superimpose Histograms

Function inspired by the code of Martin Maechler found on the R-List at http://tolstoy.newcastle.edu.au/R/help/06/06/30059.html

superhist2pdf <- function(x, filename = "super_histograms.pdf",
dev = "pdf", title = "Superimposed Histograms", nbreaks ="Sturges") {
junk = NULL
grouping = NULL
for(i in 1:length(x)) {
junk = c(junk,x[[i]])
grouping <- c(grouping, rep(i,length(x[[i]]))) }
grouping <- factor(grouping)
n.gr <- length(table(grouping))
xr <- range(junk)
histL <- tapply(junk, grouping, hist, breaks=nbreaks, plot = FALSE)
maxC <- max(sapply(lapply(histL, "[[", "counts"), max))
if(dev == "pdf") { pdf(filename, version = "1.4") } else{}
if((TC <- transparent.cols <- .Device %in% c("pdf", "png"))) {
cols <- hcl(h = seq(30, by=360 / n.gr, length = n.gr), l = 65, alpha = 0.5) }
else {
h.den <- c(10, 15, 20)
h.ang <- c(45, 15, -30) }
if(TC) {
plot(histL[[1]], xlim = xr, ylim= c(0, maxC), col = cols[1], xlab = "x", main = title) }
else { plot(histL[[1]], xlim = xr, ylim= c(0, maxC), density = h.den[1], angle = h.ang[1], xlab = "x") }
if(!transparent.cols) {
for(j in 2:n.gr) plot(histL[[j]], add = TRUE, density = h.den[j], angle = h.ang[j]) } else {
for(j in 2:n.gr) plot(histL[[j]], add = TRUE, col = cols[j]) }
invisible()
if( dev == "pdf") {
dev.off() }
}

# How to use the function:
d1 = rnorm(1:100)
d2 = rnorm(1:100) + 4
# the input object MUST be a list!
l1 = list(d1,d2)
superhist2pdf(l1, nbreaks="Sturges")