R for Dummies – De Vries and Meys (2012)

The for Dummies series has been around since 1991. (A bit of trivia, DOS for Dummies was the first title.) I’ve owned a few books in the series and have been adequately impressed with most of them, but when I learned there was an R for Dummies I was immediately skeptical. Possibly I was skeptical because R has a steep learning curve and many idiosyncrasies, so the idea of an R for Dummies text seemed oxymoronic – it’s difficult to imagine a (successfully) dumbed-down version of an introductory R text. But if you’re familiar with the for Dummies series, you already know that the moniker is just for marketing. In reality, these books usually do a good job of distilling a topic down to the important components a new user needs to know. This edition is no exception.

Title: R for Dummies
Author(s): Andrie de Vries and Joris Meys
Publisher/Date: Wiley and Sons/2012
Statistics level: Not Applicable
Programming level: Beginner
Overall recommendation: Highly Recommended

The core topic areas that R for Dummies covers should come as no surprise: A basic overview of R and its capabilities, importing data into R, writing and debugging functions, summarizing data and graphing. In addition, there are sections covering potentially frustrating tasks for beginners such as working with dates and multidimensional arrays.

I am a big fan of periodically reviewing the fundamentals. I pick up introductory texts every once and a while to make sure I haven’t forgotten anything important and R for Dummies is a nice edition to my library for that reason.

One thing that sets this book apart from other currently available introductory R texts is that it covers a couple of recent and important developments in R coding – namely the RStudio development environment and ggplot2 graphics.

If you’re not familiar with the for Dummies series, it is important to note that they are written in a specific informal style, which is in stark contrast to most R texts. (For reference, the style is more similar to The R Inferno than the ggplot2 manual.) You can get a sense of this style by browsing a few pages on Amazon to see if you find it helpful or distracting. On the other hand, R for Dummies has a more polished feel than many R texts I’ve read. I didn’t encounter any of the frustrating and distracting editing errors that are common in some R texts.

R for Dummies is primarily focused on R as a programming language, so for the most part, statistical analyses are presented only as a means of illustrating programming techniques. Given its focus on programming and fundamentals, this book is highly recommended for someone with little to no experience in R who wants to learn R programming. Intermediate to advanced R programmers like myself who want a current review of the fundamentals, might also find it useful.

I might also recommend R for Dummies to experienced users of other programming languages who are new to R. The discussion of basic programming concepts, such as control flow, is minimal and focused primarily on details specific to R. It is not recommended for those looking to learn statistics in conjunction with R.

The current price of $20 USD puts it in the middle price range for texts of its kind. It is available as a paperback or Kindle text.

The Art of R Programming – Matloff (2011)

It’s difficult to write a book on an entire programming language and keep it manageable and concise, but The Art of R Programming does it as well as any text I’ve seen. Matloff covers, in detail and among other things, R data structures, programming idioms, performance enhancements, interfaces with other languages, debugging and graphing.

Title:The Art of R Programming
Author(s): Norman Matloff
Publisher/Date: No Starch Press/2011
Statistics level: Very Low
Programming level: Intermediate
Overall recommendation: Highly Recommended

There is the requisite “Introduction to R” section that is present in almost all R texts, but any beginners who benefit from this chapter may benefit from re-reading ARP after some additional practical experience with R. The issues that Matloff addresses and the solutions he provides are more salient after you’ve spent hours trying to resolve them.

The section on graphing is a good overview, but the average programmer may find it less useful than the other sections. Anyone looking for graphic optimization tips will be better served by a book focused specifically on graphing.

With that minor critique in mind, put simply, The Art of R Programming is a must read for all intermediate level R programmers. It covers nearly every method of performance enhancement available and provides a review of key fundamentals that may have been forgotten or missed.

The Art of R ProgrammingOne point of note, this text focuses almost solely on programming – the statistical examples are a means to an end, not an end themselves. For that reason, this book is recommended for those seeking to improve the efficiency of their programming rather than their statistical acumen.

At around ~$25 USD from Amazon, The Art of R Programming is one of the best R text values available. I highly recommend it for almost all R users. (You can also purchase this book directly from the publisher and get both the print and e-book version for ~$40.)

Bayesian Computation with R – Albert (2009)

Title: Bayesian Computation with R
Author(s): Jim Albert
Publisher/Date: Springer/2009
Statistics level: High
Programming level: Low
Overall recommendation: Recommended

Bayesian Computation with R focuses primarily on providing the reader with a basic understanding of Bayesian thinking and the relevant analytic tools included in R. It does not explore either of those areas in detail, though it does hit the key points for both.

As with many R books, the first chapter is devoted to an introduction of data manipulation and basic analyses in R. This introductory chapter focuses more heavily on analyses that many of the other similarly focused chapters in other texts. The new R user who hasn’t yet built up a library of these chapters will find it useful, but for experienced R users or those with multiple R texts, there is little new information.

Albert’s introduction to the foundational Bayesian concepts (e.g., Bayes’ theorem) is concise and will be clear to those with a statistical background, but others may need to refresh their statistical knowledge before they can fully grasp the content in the second chapter. Those from programming backgrounds without extensive statistical knowledge may be better off beginning with a text that deals specifically with Bayesian analysis.

Many of the topics discussed in this text have limited application, but possibly the most broadly applicable chapter deals with Bayesian regression. Those interested in learning how to run and diagnose Bayesian regression in R will find almost everything they need to know here.

As with many R texts, Bayesian Computation with R has an accompanying package of functions available on CRAN (“LearnBayes”). The functions in this package are focused mainly on teaching Bayesian analysis, but also include some useful basic implementations.

This book straddles the line between introductory theory and intermediate-level statistical programming. Because of the omissions of information on each side of that line, the reader will get the most mileage from the text if he or she has access to resources (i.e., other texts, colleagues, or previous knowledge) that can fill in those omissions. For that reason, it would work well as a text for an upper-level course on Bayesian statistics and their application, but it is not well suited as a reference text, or as a guide for real-world analysis.

Overall, I recommend this book, with the caveat that interested readers should review the sample pages available on the Springer website here and the functions in the “LearnBayes” package prior to purchasing. The text is currently available for approximately $50 in paperback and $40 for the Kindle version.

Data Manipulation with R – Spector (2008)

Title: Data Manipulation with R
Author(s): Phil Spector
Publisher/Date: Springer/2008
Statistics level: N/A
Programming level: Intermediate
Overall recommendation: Highly recommended

If there is one book that every beginning R user coming from a programming background should have, it is Spector’s Data Manipulation with R. New R users with analytic backgrounds and experience with software packages such as SAS and SPSS will do well to start with Muenchen’s R for SPSS and SAS users, especially given that a free abbreviated version is available, but those users should also make Data Manipulation with R a quick second addition to their library.

The text of this book is as concise and to the point as its title. It covers almost every relevant data manipulation topic in R, from modes and classes, through accessing data via database connections, to complex reshaping and aggregating functions. It has copious examples and the text hits just the right level of sophistication for the individual who has some experience with programming, but little experience with R idioms and data manipulation techniques.

My only critique of this book is that it skips over the basics of creating user-defined functions for data manipulation tasks. Spector addresses mapping functions to various data structures, but it seems likely that, at this level, the average R analyst would be better served by a discussion of how to simply create a function in R. Keep in mind that if you are looking for that type of information, you will need to look elsewhere. The same is true if you are looking for any sort of statistical instruction, as Data Manipulation with R focuses almost exclusively on programming.

Overall, I highly recommend this book. At around $45 USD, it is well worth the price. You’ll breeze through it on your first pass, but if you’re new to R you will get your money’s worth out of it as a reference text.

A Handbook of Statistical Analyses Using R – Everitt and Hothorn (2006)

Title: A Handbook of Statistical Analyses Using R
Author(s): Brian S. Torvitt; Torsten Hothorn
Publisher/Date: Chapman & Hall/2006
Statistics level: Intermediate to advanced
Programming level: Intermediate
Overall recommendation: Highly recommended

A Handbook of Statistical Analyses Using R addresses a list of several common statistical analyses in great detail. Over a course of 15 chapters, the handbook takes the reader from an introduction to R through a discussion of statistical inference, to linear and logistic regression, tree analysis, survival analysis, longitudinal analysis, meta-analysis, factoring, scaling, and clustering. The handbook has a peer-reviewed journal style that will be familiar to academic researchers and each chapter stands on its own. This approach makes the text exceptionally useful in the academic setting as a professor can distribute and assign the first chapter of the book to her Research Methods 101 course; the final chapters on scaling and dimensionality to her Psychometrics Methods course; the last chapter on clustering to her Marketing Research course; and require the entire book for her graduate methods course. For custom research shops making the transition to R or who frequently hire new entry level R users, this book will work well as a reference and training manual.

The handbook does show typical first edition flaws. There are sporadic mistakes in grammar such as misspellings and incorrect words. The overall organization of the book is strong, but the chapter level organization is less effective. Each chapter begins with a discussion of all of the datasets used in that chapter and is followed by examples and applications based on those datasets. In chapters where there are several examples, the discussion of the data is too detached from its corresponding example. When the reader reaches the example based on the first dataset they have likely forgotten the relevant details about that data’s structure. Grouping the data discussions with the examples they accompanied would have made the example based approach more effective.

The introductory section on R is one of the best introductory sections I have read. It strikes an almost perfect balance between the programming and statistical features of R. I frequently recommend this initial chapter to colleagues who have research experience but are new to R. There are numerous graphs included in the examples in the text and although there is virtually no general discussion of producing graphs in R, each graph presented in this text includes the code required to reproduce it. This omission is a welcome one, as it allows the authors to focus more on statistical details. Readers looking for a more general discussion of how to produce graphs in R should consider Data Analysis and Graphing Using R.

Data Analysis and Graphics Using R – Maindonald and Braun (2003)

Title: Data Analysis and Graphics Using R: An Example-Based Approach
Author(s): John Maindonald; John Braun
Publisher/Date: Cambridge University Press/2003
Statistics level: Intermediate to advanced
Programming level: Beginner to intermediate
Overall recommendation: Highly recommended

Data Analysis and Graphics Using R (DAAG) covers an exceptionally large range of topics. Because of the book’s breadth, new and experienced R users alike will find the text helpful as a learning tool and resource, but it will be of most service to those who already have a basic understanding of statistics and the R system.

Although the text includes both an Introduction to R section (chapter one) and a discussion of the basics of quantitative data analysis (chapters two through four), these chapters will be most useful as overviews (or reviews for more experienced readers), as they lack the detail required to take a reader from no knowledge of these subjects to a functional understanding. For example, chapter one discusses importing data in .txt and .csv format, but the foreign package is not discussed until chapter fourteen – the final chapter of the book. In practice, .txt data structures are not common enough to justify relegating a discussion of the foreign package to the supplemental materials and a researcher stuck with a .sav or .dbf file would not leave chapter one with enough knowledge to import their data into R.

Chapters five through thirteen deal primarily with different flavors of regression techniques. These chapters are the truly valuable pieces of this work as each chapter covers one or two approaches in detail. The major analyses covered in this section include bivariate and multivariate regression, GLM and survival models, time-series analyses, repeated measures, classification trees, and factor analysis. As regression techniques are a core component of quantitative methods these chapters will be useful to many researchers across many industries and disciplines. Much of the discussion of graphing comes via diagnostic and exploratory techniques that are related to the analyses in this section.

As the subtitle suggests, examples of code accompany most significant discussions of analyses. Additionally, several full color plates of graphs are included in the appendices, allowing the authors to provide examples of color options.

DAAG is highly recommended for readers who have at least a basic understanding of quantitative analysis and at least some limited experience with R, however, more advanced readers will also find this book useful as a review and reference.