28 Feb 2012

Apprentice Piece with Lattice Graphs

Lattice graphs can be quite tedious to learn. I don't use them too often and  when I need them I usually have to dig deep into the archives for details on the parameter details.
The here presented example may serve as a welcome template for the usage of panel functions, panel ordering, for drawing of lattice keys, etc.
You can download the example data HERE.

(Also, check this resource with examples by the lattice-author). 

10 Nov 2011

An Image Crossfader Function

Some project offspin, the jpgfader-function (the jpgfader-function in funny use can be viewed HERE):

9 Nov 2011

Add Transparency to JPEG - Yes, We Can!



...Just read in your JPEG and add an alpha channel manually, then assign values for transparency. Of course for printing you need to use a device that accepts alpha.

See how it's done HERE.

R-Function GScholarScraper to Webscrape Google Scholar Search Result

NOTE: You'll find the update HERE and HERE.

NOTE: The script is currently not working because the code of the Google-Scholar site has changed...
I'll see for this as soon as I find some spare time for it!

NOTE: If you try to access GoogleScholar programatically consider this words of caution:
http://stackoverflow.com/questions/7523961/google-scholar-with-matlab/7587994#7587994
...

Based on my previous post on Web Scraping I coded and uploaded the Function "GScholarScraper" HERE for testing!
The function will pull all (!) results, processing pages in chunks of 100 results/titles, and return a file with all titles, links, etc. It will also produce a word cloud using the words in the publication titles.

Please try your own search strings and report errors, etc.!

Build and run properly under:
R version 2.13.0 (2011-04-13) and R version R-2.13.2 (2011-09-30)

Platform: i386-pc-mingw32/i386 (32-bit) locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] stringr_0.5 tm_0.5-6 wordcloud_1.2 Rcpp_0.9.7

loaded via a namespace (and not attached):
[1] plyr_1.5.1 slam_0.1-23

PS: Errors reported lately (see comments) were resolved, the source code was updated..

10 Oct 2011

Plot Animation with Imported Images

...I really dig the animation package! ..so here's the outcome of my firsts encounters with saveHTML() - I produced an animation with pre-existing images by utilizing the functions readJPEG() and rasterImage() from the R-packages jpeg and ReadImages. Credit goes out to xingmowang (nzprimarysectortrade-blog) from whom I picked up the concept of putting images to the plot region of a graph produced with the animation-functions.

20 Sept 2011

Use of Classification Trees to Investigate Traits of Invasive Species

Which traits make an alien species invasive?
Due to what traits an alien species becomes established in a foreign flora?


This kind of questions could be analysed by the use of recursive partitioning and classification trees..
(the below example also includes some useful data manipulation techniques)...

29 Aug 2011

Comparing Two Distributions

Here I compare two distributions, flowering duration of indigenous and allochtonous plant species. The hypothesis is that alien compared to indigenous plant species exhibit longer flowering periods.

11 Aug 2011

Test Difference between Two Proportions & Plot Confidence Intervals

..an illustrative example for testing proportions and presenting the results.

the data: number of indigenous and alien plant species with and without vegetative reproduction (N = 3399, mid-european species, data-courtesy: BiolFlor) . Hypothesis: The proportion of species with vegetative reproduction is different between alien and indigenuos plant species.

result:  the prop. of plants with veg. reproduction is sign. lower for alien compared to indigenous plant species. this is simply due to the large number of agricultural weeds and contaminants within alien species - these species almost always reproduce by seeds.
## data:
dat <- data.frame(list(structure(list(flstat = structure(c(2L, 1L, 2L, 1L),
.Label = c("allo", "auto"), class = "factor"),
reprod = structure(c(1L, 1L, 2L, 2L),
.Label = c("non-veg", "veg"), class = "factor"),
X = c(872L, 423L, 1872L, 232L)),
.Names = c("flstat", "reprod", "X"),
class = "data.frame", row.names = c(NA, -4L))))

## proportion of species with vegetative reproduction
p_allo <- dat$X[4] / (dat$X[2] + dat$X[4])
p_auto <- dat$X[3] / (dat$X[1] + dat$X[3])
p_allo
p_auto

## restructure data for glm:
dat1 <- dat[rep(1:4, dat$X), 1:2]
head(dat1)
dat1$inc <- ifelse(dat1$reprod == "non-veg", 0, 1)

## glm:
summary(gmod <- glm(inc ~ flstat, data = dat1, family = binomial))

## intercept = logit(p_allo):
print(est_p_allo <- plogis(gmod$coef[1]))

## intercept + b = logit(p_allo+p_auto):
print(est_p_auto <- plogis(gmod$coef[1] + gmod$coef[2]))

## alternatively test difference in two proportions with prop.test():
ptest_diff <- prop.test(x = c(dat$X[1], dat$X[2]),
                        n = c(dat$X[1] + dat$X[3], dat$X[2] + dat$X[4]))

## only for one proportion prop.test gives you the confidence
## intervals of p.
## (you could also extract the glm-standard errors and calculate
## the conf.int. for this purpose..):
ptest_auto <- prop.test(x = dat$X[3], n = dat$X[1] + dat$X[3])
ptest_allo <- prop.test(x = dat$X[4], n = dat$X[2] + dat$X[4])

## plot with confidence intervals from prop.test
## (see methods in ?prop.test):

## coordinates for plotting confidence interval bars:
y0_al <- ptest_allo$conf[1]
y1_al <- ptest_allo$conf[2]
y0_au <- ptest_auto$conf[1]
y1_au <- ptest_auto$conf[2]

library(grid)
library(lattice)

## panel function for suppressing tck at top and right side,
## drawing bar with confidence interval,
## plotting glm-estimates (the crosses)
mpanel = function(...) {grid.segments(x0 = c(0.2725, 1 - 0.2725),
                                      x1 = c(0.2725, 1 - 0.2725),
                                      y0 = c(y0_al, y0_au),
                                      y1 = c(y1_al, y1_au))
                        panel.points(x = c(0.8, 2.2),
                                    y = c(est_p_allo, est_p_auto), pch = 4)
                        panel.abline(h = c(p_allo, p_auto), lty = 15,
                                     col = "grey70")
                        panel.text(x = 1.5, y = 0.9, cex = 1.2,
                                   "Species With\nVegetative Reproduction");
                        panel.xyplot(...)}

xyplot(c(p_allo, p_auto) ~ as.factor(c("Alien", "Indigenous")), type = "b",
       ylab = "Prop. +/- CIs\nX = GLM-Estimates",
       xlab = "", ylim = c(0, 1),
       panel = mpanel, pch = 16,
       scales = list(alternating = 1, tck = c(1, 0)))

14 Jun 2011

Multiple Comparisons for GLMMs using glmer() & glht()

...here's an example of how to apply multiple comparisons to a generalised linear mixed model (GLMM) using the function glmer from package lme4 & glht() from package multcomp. Also, I present a nice example for visualizing data from a nested sampling design with lattice-plots! 

1 Jun 2011

Drawing Grids in R

Here's an example of how to draw a grid in R and how to fill it.
I did use the grid-package and its functions for displaying species cover values at squares of a recording frame...

2 May 2011

Import dbf to R, Manipulate Strings with grep & sub Function

Here's a set of historical species presence records of a certain geographical region (data-link). I wanted to manipulate / simplify strings (species names) and get an overview of the data. 
...The tasks were to split genera and epitheta, to exclude species with specific strings included and to get rid of unwanted text (author names). For graphical presentation of the species record history I did a plot with segments indicating the first and last year of a species record:

28 Apr 2011

20 Apr 2011

Bootsrap Confidence Intervals, Stratified Bootstrap

 Here's a worked example for comparing group averages with bootstrap confidence intervals and allowing for different subsample sizes by calling the strata argument within the bootstrap function.
The data (simulated) is set up analogous to an before-after impact experiment conducted on plots across 4 levels of a grouping factor ('stage'). Similarities were calculated for each composition before and after an impact and will be averaged over the grouping factor. Our hypothesis was that the levels of the grouping factor would show significantly different average similarities - that is, a higher/lower impact on composition. As plots were aggregated in different sites within the 'stages', this dependency had to be allowed for by use of the "strata" argument in the boot.ci call.

The conclusion from this simulated example would be that the averages similarities at stages C and D are significantly different from stages A and B. That is, as the similarities are higher in C and D than in A and B, impact on composition is significantly lower in C and D.

Custom Labels for Ordination Diagram

Here is how you do custom labels, hull, spider in a vegan ordination diagram:

19 Apr 2011

Lattice Plots - Usage of Panel Functions - Different Axes For Panel-Rows - Alternating Axis Titles

I present code for a stacked graph with common axes only for panels of the same row and with axis titles at different sides. This admittedly took me days (because i had not much of a clue how to use lattice), but eventually I did it and maybe someone can use this for his/her own purpose: