The controversy about R: epic fail or epic success?

by on Apr.28, 2010 , under JMP & JSL

Statisticians and data analysts are in a kerfuffle about the recent remarks of AnnMaria De Mars, Ph.D. (President of The Julia Group and a SAS Global Forum attendee) in her blog that the open source statistical analysis tool R is an “epic fail,” or to put it in Twitterese, #epicfail:

I know that R is free and I am actually a Unix fan and think Open Source software is a great idea. However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail.

And oh, how the hashtags and comments and teeth-gnashing began!

Nathan Yau’s excellent FlowingData blog recaps the kerfuffle nicely, and his post has accumulated a thoughtful comments thread, as has Dr. De Mars’, to both of which I added my thoughts, expanded here:

To make my prejudices clear, I’ve spent several decades in commercial statistical software development (working in a variety of R&D roles at SYSTAT, StatView, JMP, SAS, and Predictum, and I now do custom JMP scripting, etc., for Global Pragmatica LLC.

I can say with hard-won authority that:

– good statistical software development is difficult and expensive
– good quality assurance is more difficult and expensive
– designing a good graphical user interface is difficult, and expensive
– a good GUI is worthwhile, because the easier it is to try more things, the more things you will try, &
– creative insight is worth a lot more than programming skill

Even commercial software tends to be under-supported, and I’ll be the first to admit that my own programming is as buggy as anybody else’s, but if I’m making life-and-death or world-changing decisions, I want to be sure that I’m not the only one who’s looked at my code, tested border cases, considered the implications of missing values, controlled for underflow and overflow errors, done smart things with floating point fuzziness, and generally thought about any given problem in a few more directions than I have. I want to know that when serious bugs are discovered, the knowledge will be disseminated and somebody’s job is on the line to fix them.

For all these reasons, I temper my sincere enthusiasm about the wide open frontiers of open source products like R with a conservative appreciation for software that has a big company’s reputation and future riding on its accuracy, and preferably a big company that has been in the business long enough to develop the paranoia that drives a fierce QA program.

R is great for what it is, as long as you bear in mind what it isn’t. Your own R code or R code that you find sitting around is only as good as your commitment to testing and understanding of thorny computational gotchas.

I share the apparently-common opinion that R’s interface leaves a lot to be desired. Confidentiality agreements prevent me from confirming or denying the rumors about JMP 9 interfacing with R, but I will say that if they turn out to be true, both products would benefit from it. JMP, like any commercial product, improves when it faces stiff competition and attends to it, and R, like most open source products, could use a better front end.

And now let me make my case for R being an epic success.

I like open source software. I use a bunch of it, and I do what I can for the cause (which isn’t much more than evangelism, unfortunately). For me, the biggest win with open source software is that it makes tools available to me, and others, who don’t need them enough to justify much of a price, but who can benefit from them when they’re affordable or free. When an open source tool gets something done for me, or eases some pain at least, I’m not that picky about its interface, and I’m willing to do my own validation (where applicable).

I can’t say that I love using Linux, but as a long-time UNIX geek and Mac OS X bigot, I am glad Linux is available, I use it for certain things, and I think it’s a whole lot better than Windows and other OSes, especially when Ubuntu builds work out. (I’ve had trouble getting JMP for Linux installed on Ubuntu, but that’s probably due to my own incompetence.) OpenOffice is kind of a pain, but it’s better than paying Microsoft for the privilege of enduring the epic fail that is Office, and it has much better support than Office for import/export of other formats. I love it that any number of open source projects are developing such fabulous tools as bzr version control, which I use daily, and that the FINK project is porting a whole bunch of great open source UNIX widgets to Mac OS X.

I think it’s wonderful that some of the world’s greatest analytical minds are using R to create publicly available routines for power-analysts. I love it that students and people who can’t afford commercial stats software, or who won’t use it enough to justify buying a license, have a high-quality open source option, if they’re willing to work at it a bit. I think it’s great that people who think Excel is good enough can’t make a price objection to upgrading to R.

I believe that democratizing innovation and proliferating analytical competence are good for us all. I count on projects like R and Linux to push commercial developers to make better products, and to force pricing and licensing of those products to remain reasonable. Monopolies are good for nobody, including monopolists.

Long live the proponents of R!

What do you think? Do you trust open source stats code? Do you think R’s interface is good enough? Is JMP’s any better? How heavily do you factor quality of documentation into decisions about software?

:, , , , , , , , ,

6 Comments for this entry

  • @marnen

    Interesting article.

  • Erin Vang

    A LinkedIn comment:

    I think your response was well-balanced. I haven’t seen the word, “kerfuffle” in a long time, but using that word surely places the controversy in a certain context. I wasn’t aware of the controversy until your post.

    I use JMP and I appreciate all of the support and the webinars and the conferences, although I’ve yet to go to a seminar. And certainly, the link between JMP and SAS helps me with analytical “naming dropping.” Most people have heard of SAS.

    I’ve been tempted to learn R or use the R-add-in for Excel for the analyst group I manage. The reasoning is this: 1) We don’t do enough statistical analysis to support multiple JMP licenses. 2) There’s a large learning curve with JMP as well. I’ve been working with JMP for a year, and there’s still plenty of stuff I don’t know how to do. So for the team, another software package to learn. 3) There’s a core set of simple stat routines that we would use: the basic descriptive stats and graphs and some t-tests. Excel is a pain to use, even for a simple histogram. So the R-add-in for Excel is very tempting. Just haven’t tried it yet.

    Posted by Richard Giambrone, Manager, Business Intelligence & Analytics
    Patterson Companies

  • Erin Vang

    Another LinkedIn comment:

    I think your statement […] sums it up perfectly, and I also enjoyed the use of kerfuffle.

    Your post doesn’t mention it (although I’m sure you know) R is not only a Linux tool. R runs perfectly fine in the beloved Windows environment.

    The comments section on AnnMarie’s blog are priceless:

    I think JMP is making an excellent move by simply allowing its users to connect to R. The kerfuffle arises from SAS/JMP users confused over what do to with R. Simple; you don’t *have* to do anything. I’ve used R quite extensively in graduate school, yet in industry the open source phobia is blatantly apparent (there are several exceptions to this rule – Google comes to mind). I think the strategic move here is to capture R users (and future SAS users) with JMP as the common interface. I see this as a win for R, SAS, and both their user bases.

    Posted by Mike Olson, Analysis Manager

  • Erin Vang

    Isn’t “kerfuffle” a great word? I love how it conveys controversy and pokes gentle fun at the same time. Most people seem to agree that, although we probably have strong preferences about what we’ll use ourselves, having a range of choices is a good thing.

  • Dr. Winfried Koch

    As a Biostatistician I am aware of the benefit of combination products in medicine if it has been shown that both components contribute significantly to the efficacy of the combination.
    JMP has an excellent user interface but a small progress in providing new statistical routines.
    R with its user contributed packages has the most dynamic progress in development of statistical procedures I am aware of.
    Therefore I expect great thinks from the combination of both via the new JMP to R interface in JMP 9 and future releases of JMP.

    • Erin Vang

      That’s a good point, and I enjoy how you compare it to combination products in medicine!

      Do you think SAS is doing something wrong that causes JMP to be slow in statistical innovation, or is it perhaps the nature of a business to lag a bit behind?

      Does a bleeding-edge procedure gain credibility or stature simply by being added to a big-name product like SAS? If so, what responsibility do companies like SAS, Mathematica, SPSS, etc. have to evaluate the worth and applicability of new methods? Is it enough that the market seems to demand things?

      I always go back to 3D pie charts. We all know they’re atrocious, but customers want them. So what is the proper role of market demand in the evolution of a product? How prescriptivist should software providers be?