lunedì 30 luglio 2007

screen - an other VERY useful Unix tool

from R News Vol. 7/1 April 2007 (http://cran.r-project.org/doc/Rnews/Rnews_2007-1.pdf):
If you need to run R code that executes for long periods of time upon remote machines, this amazing unix tool would became your best friend!
screen is a so-called terminal multiplexor, which allows us to create, shuffle, share, and suspend command line sessions within one window. It provides protection against disconnections and the flexibility to retrieve command line sessions remotely.

Starting using this utility is easy like ABC:

  1. Log in to remote server
  2. Run screen
  3. Run R and the long calculation
  4. Detach screen (Ctrl-a, Ctrl-d)
  5. Logout

The R session continues working in the background, contained within the screen session. If we want to revisit the session to check its progress, then we:

  1. Log in remotely via secure shell
  2. Start screen -r, which recalls the unattached session
  3. Examine how your calculation/script is performing
  4. Detach the screen session, (Ctrl-a, Ctrl-d)
  5. Log out

This procedure can be used, clearly, for invoking whatever unix program/command you need to use; it is sufficient to substitute the R invoking command with your invoking command line program(for example python).

As usual in the shell-space, invoking man (man screen in this case) will provide all sort of information you need to know about the tool.

lunedì 16 luglio 2007

R upgrading on Windows© revisited

From the list:
When I update R the following has worked for me (Windows XP)
1. Install the new version to a new directory (say C:\Program Files\R\R-2.5.1).
2. Rename the new library subdirectory to library2.
3. Copy the entire contents of the old library subdirectory (say
C:\Program Files\R\R-2.4.0\library\ to the new R root to create
C:\Program Files\R\R-2.5.1\library\ .
4. Copy the contents of library2 to library to update your basic library.
5. Now start your new version of R and update packages from the GUI or
from the R console. (You may need to firs check Rprofile .site to
ensure that no packages have been loaded)
6. On occasion I have got warning messages when I tried to load
packages after this procedure. This has been cleared by running
update.packages(checkBuilt = TRUE)
This checks that your packages have been built with the latest
version. When I do this I agree to install all available updates.
7. You may wish to copy various autoloads etc from your old
Rprofile.site to your new Rprofile.site. I understand that there are
some compatibility problems with 2.5.1 and SciViews so be careful.

mercoledì 20 giugno 2007

String manipulation, insert delim

From the list, as usual:

I want to be able to insert delimiters, say commas, into a string
of characters at uneven intervals such that:

foo<-c("haveaniceday")# my string of character
bar<-c(4,1,4,3) # my vector of uneven intervals
my.fun(foo,bar) # some function that places delimiters appropriately
have,a,nice,day # what the function would ideally return


1)

paste(read.fwf(textConnection(foo), bar, as.is = TRUE), collapse = ",")
[1] "have,a,nice,day"


2)

my.function <- function(foo, bar){
# construct a matrix with start/end character positions
start <- head(cumsum(c(1, bar)), -1) # delete last one
sel <- cbind(start=start,end=start + bar -1)
strings <- apply(sel, 1, function(x) substr(foo, x[1], x[2]))
paste(strings, collapse=',')
}

my.function(foo, bar)
[1] "have,a,nice,day"

venerdì 8 giugno 2007

Back to back historgram

library(Hmisc)
age <- rnorm(1000,50,10)
sex <- sample(c('female','male'),1000,TRUE)
out <- histbackback(split(age, sex), probability=TRUE, xlim=c(-.06,.06), main = 'Back to Back Histogram')
#! just adding color
barplot(-out$left, col="red" , horiz=TRUE, space=0, add=TRUE, axes=FALSE)
barplot(out$right, col="blue", horiz=TRUE, space=0, add=TRUE, axes=FALSE)


lunedì 4 giugno 2007

How do you get the most common row from a matrix?

If I have a matrix like this:

array(1:3,dim=c(4,5))

[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 1 2
[2,] 2 3 1 2 3
[3,] 3 1 2 3 1
[4,] 1 2 3 1 2


in which rows 1 and 4 are similar, I want to find that vector c(1,2,3,1,2).

library(cluster)
x <- array(1:3,dim=c(4,5))
dissim <- as.matrix(daisy(as.data.frame(x)))
dissim[!upper.tri(dissim)] <- NA
unique(x[which(dissim == 0, arr.ind=TRUE), ])


or

count <- table(apply(x, 1, paste, collapse=" "))
count[which.max(count)]

venerdì 1 giugno 2007

R number output format

I'd like to save the number 0.0000012 to a file just as it appears:
?formatC
formatC(.000000012, format='fg')
[1] "0.000000012"
also
?sprintf
sprintf("%.10f", 0.0000000012)
[1] "0.0000000012"
or
format(.0000012, scientific=FALSE)
[1] "0.0000012"

martedì 29 maggio 2007

Detecting outliers through boxplots of the features

This function detects univariate outliers simultaneously using boxplots
of the features:

require(dprep)

data(diabetes)

outbox(diabetes,nclass=1)