Visualizzazione post con etichetta review. Mostra tutti i post
Visualizzazione post con etichetta review. Mostra tutti i post

lunedì 5 dicembre 2011

The Art of R Programming - my two cents

What makes this book different from other books about R is stated clearly by the author Norman Matloff in the introduction:
"This book is not a compendium of the myriad types of statistical methods that are available in the wonderful R package. It really is about programming and cover programming-related topics missing from most other books on R".
Most books about R present a gentle introduction to the language and then jump to practical applications. Norman Matloff, across the 350 pages of this book, accompanies the reader in developing the necessary skills useful to write software in a proper way focusing on the characteristics and idiosyncrasies of the R language.

In each of the first six chapters of the book the author covers a different R data type: vector, matrix, list, data.frame and factor.   Starting from basic examples and progressing to more complex ones each data type is properly introduced and used in the proper context. Furthermore,  some extended examples are ameliorated or re-implemented along new type are introduced in order to show the expressivity of the language. The explanation of small details such as the use of the drop=FALSE argument in matrix/data.frame subsetting or the stringsAsFactor=FALSE argument when building up a data.frame are the proverbial icing on the cake which can make your day-by-day workflow more productive.
Chapter 7, 8 and 9 are the heart of the Art of R Programming introducing the structures, idioms, peculiarities and idiosyncrasies of R as programming language.
Chapter 7 presents how the typical programming structures are implemented in R and how to use them correctly: control statements, functions, recursion etc. are explained by clear and appropriate examples of increased complexity and usefulness.
Chapter 8 about doing math and simulation in R is a more 'traditional' chapter depicting the mathematical/statistical facilities embedded in R. Since the main selling-point of R is its statistical capabilities an introduction to their characteristics and use makes perfectly sense.
Chapter 9 covers S3 and S4 the two most commonly used paradigms of object-oriented programming (OOR) implemented in R. If you are going to start designing and developing R software in a proper and reusable form this chapter will provide all the necessary information and a good collection of examples tailored to R mathematical/statistical peculiarities.
Chapter 10 is about I/O and provides all the necessary directions needed to parse data in R locally and from the internet.
Chapter 11 is about string manipulation and it is less technical than former chapters, presenting a sort of cheat-sheet collection of the most common functions to handle string in R. The author covers the  string capabilities embedded in base R but advices to take a look at Hadley Wickham's stringr package for a more consistent handling of strings in R.
Chapter 12 introduces graphics in R providing a gentle overview of the huge R graphics capabilities but it doesn't present an in-depth discussion. Fortunately there are a lot of other books (for example Paul Murrel's R Graphics) dedicated to this subject which is indeed one of the R's strong points.
Chapter 13 about debugging is short but points out almost everything is important to know about debugging R code; furthermore it provides a wide vision about debugging in general: the author Norman Matloff is also the co-author of The Art of Debugging with GDB and DDD and clearly he knows the matter of which he speaks.
Chapter 14 covers strategies to handle the time/space trade-off in order to enhance the performance of R programs. In particular it explains the proper use of vectorization in order to speed up your code.
Chapter 15 and 16 are a sort of follow-up to chapter 14, meaning that they explain how to enhance the performance of your code by integrating R with other language, such as Python and C/C++ (Chapter 15) and by parallelizing your code. Both chapters provide an introductory glance on these topics but present sufficient coverage in order to be useful.
Conclusions:
Is it worth to buy this book? The short answer is YES. If you are serious in learning R in order to both analyze in the most appropriate and effective way your data (e.g. using the appropriate data type according your specific task) and to develop software, The Art of R programming will be beneficial to you.
Caveats: since the peculiar approach and aim of this book my advice is to buy this book together with a more statistical oriented, for example Rob Kabacoff's R in Action and one or two about graphics in R (e.g. Hrishi Mittal's R Graph Cookbook or Hadley Wickham's ggplot2 book).

Disclaimer: No Starch Press provided me a free copy for review.

lunedì 16 novembre 2009

R in Action - early thoughts

I was invited to review the book R in Action written by Rob Kabacoff. Since I consider the Quick-R website, created by the same smart guy, one of the most valuable resources about R, It is both an honor and a pleasure to have the opportunity to take an early look at his book and to express some thoughts about it.

First, this book is distributed under an early access policy that means, as it is stated on the editor's web site, that: This Early Access version of the book enables you to receive new chapters as they are being written. You can also interact with the authors to ask questions, provide feedback and errata, and help shape the final manuscript on the Author Online. This is a nice publishing approach, the editor settled up an ad-hoc forum which allows real-time feedback from early adopters. This beta-test sort of approach is convenient both to the author that can fix errata and improve contents before the final version is published and to the early adopters that can access to useful contents in advance and receive valuable explanations directly from the author.

Since only the initial part of the book is available, this short review will be at most incomplete and present only preliminary thoughts. I'm going to update the review as soon as I have the possibility to read the rest of the book.

R in Action, as mimicked in its structure, aims to guide the new adopters from the vary basics of the language through to the most advanced features by a progressive task-driven approach carefully curated by the author.

In the initial part of the book, Kabacoff covers all the basic features of the language from data manipulation to the basic statistics required to make sense of the data plus the most common and useful graphical methods for visualizing them.

The author makes large use of working example. This is one of the most effective teaching technique, in my opinion, because it encourages readers to apply immediately the knowledge acquired.

An other nice ingredient of Kabacoff method is to introduce effective high quality packages from the huge R collection to solve a proposed task. For example, in chapter three the author introduces the rename function from the awesome reshape package to rename the columns of a data.frame. This is a very trivial task, that can be easily managed by standard R (as the author shows shortly afterward); but the smoothly introduction of this useful package, explained and used more extensively in the forthcoming chapters, represents a nice touch that both means to manage the task in a more elegant way and introduces the user to a powerful tool.
In this fashion, the tasks presented in the text are addressed using several different packages in order to depict the various alternative methods available in R.
Furthermore, the numerous notes accompanying the explanations serve both to make easier the understanding of the described concepts and to provide useful insights about R features and idiosyncrasies.

To sum up, the chapters I had the opportunity to examine are a solid base for people getting started with R. I'm impatient to dig through the forthcoming chapters of the book which deal with advanced statistics and graphics!

I warmly recommend this book even in this early stage: if you are new to R programming this is a valid approach to start being familiar with the language and make effective use of it in from day one.