Showing posts with label LondonR. Show all posts
Showing posts with label LondonR. Show all posts

Test Driven Analysis

I mused over Test Driven Analysis on this blog before, but it was Richard Pugh's talk on SAS to R Migration at LondonR last week that brought the topic back into my mind and clarified a few things.

Rich's presentation focused on the challenge of how to ensure that the new system (R) would provide the same answers as the legacy system (SAS).

This is when it clicked with me: My brain is just another system as well. Suppose you have an idea for an analysis in your head. Taking that idea and transforming it into code is basically just the same as migrating code from one system to another system. Or, isn't it?

Rich showed us how he does it: Start with the old code, write unit tests in the legacy system to confirm your understanding, re-write the unit tests in the new system and then start building the new analysis code in the new system.


Once he achieved that, he said, he would go backwards in forwards between the different pieces until he has enough confidence that the new system does what it supposed to do.


Test Driven Analysis is just that as well.

I start with an idea in my head, think about reasonable checks and following that I (should) write down unit tests and only then start writing the analysis code. Finally I go backwards and forwards until I have gained enough evidence and confidence to present my output and be able to defend it.

Test Driven Analysis

Interactive pivot tables with R

I love interactive pivot tables. That is the number one reason why I keep using spreadsheet software. The ability to look at data quickly in lots of different ways, without a single line of code helps me to get an understanding of the data really fast.

Perhaps I can do the same now in R as well. At yesterday's LondonR meeting Enzo Martoglio presented briefly his rpivotTable package. Enzo builds on Nicolas Kruchten's PivotTable.js JavaScript library that provides drag'n'drop functionality and wraps it with htmlwidget into R. The result is an interactive pivot table rendered in either your default browser or the viewer pane of RStudio with one line of code:


## Install packages
library(devtools)
install_github("ramnathv/htmlwidgets") 
install_github("smartinsightsfromdata/rpivotTable")
## Load rpivotTable
library(rpivotTable)
data(mtcars)
## One line to create pivot table
rpivotTable(mtcars, rows="gear", col="cyl", aggregatorName="Average", 
vals="mpg", rendererName="Treemap")

The following animated Gif from Nicolas' project page gives an idea of the interactive functionality of PivotTable.js.

Example of PivotTable.js Source: Nicolas Kruchten

Session Info

R version 3.1.3 (2015-03-09)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.2 (Yosemite)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils    
[5] datasets  methods   base     

other attached packages:
[1] rpivotTable_0.1.3.4

loaded via a namespace (and not attached):
[1] digest_0.6.8      htmltools_0.2.6  
[3] htmlwidgets_0.3.2 RJSONIO_1.3-0    
[5] tools_3.1.3       yaml_2.1.13

Using planel.groups in lattice

Last Tuesday I attended the LondonR user group meeting, where Rich and Andy from Mango argued about the better package for multivariate graphics with R: lattice vs. ggplot2.

As part of their talk they had a little competition in visualising London Underground performance data, see their slides. Both made heavy use of the respective panelling / faceting capabilities. Additionally Rich used the panel.groups argument of xyplot to fine control the content of each panel. Brilliant! I had never used this argument before. So, here is a silly example with the iris data set to remind myself of panel.groups in the future.


There is definitely R in July


The useR!2013 conference in Albacete, Spain, will commence next Wednesday, 10 July, and on the day before Diego and I will give a googleVis tutorial.

The following Monday, 15 July, the first R in Insurance event will take place at Cass Business School and I am absolutely delighted with the programme and the fact that we are sold out.

On Tuesday, 16 July, the LondonR user group meets in the City, awaiting presentations by Andrie de Vries (Revolution Analytics), Rich Pugh (Mango Solutions) and Hadley Wickham (RStudio).

Finally on Friday, 19 July, the next Cologne R user group meeting is scheduled with two talks: Predicting the Euro/Dollar exchange rates with Twitter (Dietmar Janetzko) and Networks in R using igraph (Afshin Sadeghi).

Test Driven Analysis?

At the last LondonR meeting Francine Bennett from Mastodon C shared some of her experience and findings from an analysis of a large prescriptions data set of the UK's national health service (NHS). However, it was her last slide, which I found the most thought provoking. It asked for the definition of the following term:
Test-driven analysis?
Francine explained that test driven development (TDD) is a concept often used in software development for quality assurance and she wondered if a similar approach could be also used for data analysis. Unfortunately the audience couldn't provide her with the answer, but many expressed that they face similar challenges. So do I.


Indeed, how do I go about test driven analysis? How do I know that I haven't made a mistake, when I start an analysis of a new data set? Well, I don't. But I try to mitigate risks. Similar to TDD, I consider which outputs I should expect from my analysis. Those outputs form the test scenarios of my analysis. Basically I try to write down everything I know, before I start working with the data, e.g.
  • any other data sets or reports I can use for cross referencing,
  • any back-of-the-envelope analysis I can carry out to provide ballpark answers,
  • any relativities and ratios which should hold true,
  • any known boundaries and thresholds,
  • test scenarios for my code with small well known data, for which I know the outcome,
  • names of experts, who could sense check and peer review my output.
But most importantly: I try to think long and hard which questions I want to answer, following the advice of John Tukey: Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.

Dynamical systems in R with simecol

This evening I will talk about Dynamical systems in R with simecol at the LondonR meeting.

Thanks to the work by Thomas Petzoldt, Karsten Rinke, Karline Soetaert and R. Woodrow Setzer it is really straight forward to model and analyse dynamical systems in R with their deSolve and simecol packages.

I will give a brief overview of the functionality using a predator-prey model as an example.


This is of course a repeat of my presentation given at the Köln R user group meeting in March.

For a further example of a dynamical system with simecol see my post about the Hodgkin-Huxley model, which describes the action potential of a giant squid axon.

I shouldn't forget to mention the other talks tonight as well:

For more information about venue and timing see the LondonR web site.

LondonR, 6 December 2011

The London R user group met again last Wednesday at the Shooting Star pub. And it was busy. More than 80 people had turned up. Was it the free beer and food, sponsored by Mango, which attracted the folks or the speakers? Or the venue? James Long, who organises the Chicago R user group meetings and who gave gave the first talk that night, noted that to his knowledge only the London and Chicago R users would meet in a pub.


However, it were the speakers and their talks which attracted me:
You will notice that this London R meeting had a theme around risk pricing. James talked about reinsurance pricing using R in the cloud, while Chibisi focused more on personal lines insurance with generalised linear models and Richard came from the angle of investment management and portfolio optimisation.

LondonR, 7 September 2011

On 7 September 2011 I attended the London R user group meeting. It was a very good turn out with about 50 attendees at the Shooting Star, a pub close to Liverpool Street Station. The session started at 18:00 with four presentations, followed by drinks sponsored by Mango Solutions. The slides of the presentation are available on londonr.org.

The first presentation was given by Lisa Wainer from UCL Department of Security and Crime Science about crime data analysis using R. Lisa presented about a project with Merseyside police, where she had built software, in R with the gWidgets package, called the Hot Products Early Warning System, that is used to help understand and characterise the acquisitive crime problem in Merseyside on an ongoing basis, detecting emerging trends in hot products.

Chris Wood gave an insightful talk about his research on sediment biogeochemical modelling in the North Sea. His model uses a set differential equations with over 20 parameters. Chris is able to analyse and fit his model to data he gathered on an expedition in the North Sea using R, the deSolve package and having access to the super-computer at the University of Southampton. How cool is this?

Jean-Robert Avettand-Fenoel talked about the Rook package and how R and Rook has helped him to roll out new applications to his colleagues faster than using Excel, VBA and C++ or RExcel. Rook allows you to build web apps with R. The package is maintained by Jeffery Horner, who also brought us the brew package. The brew allows us, in combination with Rapache, to mix html and R code in the same file. This is quite similar to the approach taken by Sweave for LaTeX and R. However, Rook provides a way to run R web applications on your desktop with the new internal R web server named Rhttpd.

The final presentation was actually given by myself talking about the googleVis package and the recent developments in version 0.2.9: