Visualizzazione post con etichetta regexp. Mostra tutti i post
Visualizzazione post con etichetta regexp. Mostra tutti i post

mercoledì 3 dicembre 2008

Retrieving the author of a script

I know that the best/recommended way to manage the authoring of R code consists in building a package containing a DESCRIPTION file.
Nevertheless, I wrote a very basic function retrieving the name of the authors of a script (or any text file) if these names are written within the first three rows of the file (easily changeable) with this format:

##
## Author:Pinco Palla, Paolino Paperino, Topo Gigio
##

The function:

catch.the.name <- function(filename="myscript.R"){
require(gdata)
str <- scan(filename, what='character', nlines=3, sep="\t", quiet=TRUE)
author <- grep("Author:([^ ]+)", str, value=T)
author <-sub('^.*Author:', "", author)
author <-strsplit(author,",")
author <- trim(author)
return(author[[1]])
}

giovedì 10 luglio 2008

Parsing problem solved thanks to R-Help mailing list

Recently I had the necessity to parse several HUGE text files (~6M lines ~ 600Mb file size) not formatted in a standard way (so not easy import via scan, read.table etc.).
Because of the size of these files I have to avoid loops and find a way to vectorize my problem.
After several hours spent trying to solve this problem without success, I decided to send an help request to the R-help list. In no time i got the answer to this very problematic (at least for me) exercise :-)

You can read the full story here.

I REALLY love the R-Help mailing list! Thanks Guys!