Showing posts with label ave. Show all posts
Showing posts with label ave. Show all posts
Review: Kölner R Meeting 18 October 2013
The Cologne R user group met last Friday for two talks on split apply combine in R and XLConnect by Bernd Weiß and Günter Faes respectively, before the usual Schnitzel and Kölsch at the Lux.
The
Alternatively to the base R function Bernd touched also on the
Günter presented the
Please get in touch if you would like to present and share your experience, or indeed if you have a request for a topic you would like to hear more about. For more details see also our Meetup page.
Thanks again to Bernd Weiß for hosting the event and Revolution Analytics for their sponsorship.
Split apply combine in R
The
apply family of functions in R is incredible powerful, yet for newcomers often somewhat mysterious. Thus, Bernd gave an overview of the different apply functions and their cousins. The various functions differ in their object inputs, e.g. vectors, arrays, data frames or lists, and their outputs. Other related functions are by, aggregate and ave. While functions like aggregate reduce the output size, others like ave will return as many rows as the input object and repeat the results where necessary. Alternatively to the base R function Bernd touched also on the
**ply functions of the plyr package. The function names are certainly easier to remember, but their syntax can be a little awkward (.()). Bernd's slides, in German, are already available from our Meetup site. XLConnect
When dealing with data stored in spreadsheets most member of the group rely onread.csv and write.csv in R. However, if you have a spreadsheet with multiple tabs and formatted numbers, read.csv becomes clumsy, as you would have to save each tab without any formatting in separate files. Günter presented the
XLConnect as an alternative to read.csv or indeed RODBC for reading spreadsheet data. It uses the Apache POI API as the underlying interface. XLConnect requires a Java runtime environment on your computer, but no installation of Excel. That makes it a true platform independent solution to exchange data with spreadsheets and R. Not only can you read defined rows and columns from Excel into R, or indeed named ranges, but in the same way data can be stored in Excel files again and to top it all - also graphic output from R.Next Kölner R meeting
The next meeting is scheduled for 13 December 2013. A discussion of the data.table package is already on the agenda.Please get in touch if you would like to present and share your experience, or indeed if you have a request for a topic you would like to hear more about. For more details see also our Meetup page.
Thanks again to Bernd Weiß for hosting the event and Revolution Analytics for their sponsorship.
22 Oct 2013
07:45
aggregate
,
apply
,
ave
,
Koelner R User
,
Kölner R Users
,
R
,
XLconnect
ave and the "[" function in R
The
The top of
Group Averages Over Level Combinations of Factors
Subsets of x[] are averaged, where each subset consist of those observations with the same factor levels.
As an example I look at revenue data by product and shop.
I think it is the following sentence in the help file of
So there we are. I feel less inclined to use
ave function in R is one of those little helper function I feel I should be using more. Investigating its source code showed me another twist about R and the "[" function. But first let's look at ave.The top of
ave's help page reads:Group Averages Over Level Combinations of Factors
Subsets of x[] are averaged, where each subset consist of those observations with the same factor levels.
As an example I look at revenue data by product and shop.
revenue <- c(30,20, 23, 17)
product <- factor(c("bread", "cake", "bread", "cake"))
shop <- gl(2,2, labels=c("shop_1", "shop_2"))
To answer the question "Which shop sells proportionally more bread?" I need to divide the revenue vector by the sum of revenue per shop, which can be calculated easily by ave:(shop_revenue <- ave(revenue, shop, FUN=sum))
# [1] 50 50 40 40
(revenue_split_in_shop <- revenue/shop_revenue)
# [1] 0.600 0.400 0.575 0.425 # Shop 1 sells more bread than cake
In other words, ave has to split the revenue vector by shop and apply the sum function to it. Well that's exactly what it does. Here is the source code of ave:# Copyright (C) 1995-2012 The R Core Team
ave <- function (x, ..., FUN = mean)
{
if(missing(...))
x[] <- FUN(x)
else {
g <- interaction(...)
split(x,g) <- lapply(split(x, g), FUN)
}
x
}However, and this is what intrigued me, if I don't provide a grouping variable (missing(...)) it will apply the function FUN on x itself and write its output to x[]. That's actually what the help file to ave mentioned in its description. So what does it do? Here is an example again:ave(revenue, FUN=sum)
# [1] 90 90 90 90I get the sum of revenue repeated as many time as the vector has elements, not just once, as with sum(revenue). The trick is that the output of FUN(x) is written into x[], which of course is output of a function call itself "["(x). I think it is the following sentence in the help file of
"[" (see ?"["), which explains it: Subsetting (except by an empty index) will drop all attributes except names, dim and dimnames.So there we are. I feel less inclined to use
ave more, as it is just short for the usual split, lapply routine, but I learned something new about the subtleties of R.