The controversy about R: epic fail or epic success?

by on Apr.28, 2010 , under JMP & JSL

Sta­tis­ti­cians and data ana­lysts are in a ker­fuf­fle about the recent remarks of Ann­Maria De Mars, Ph.D. (Pres­i­dent of The Julia Group and a SAS Global Forum attendee) in her blog that the open source sta­tis­ti­cal analy­sis tool R is an “epic fail,” or to put it in Twit­terese, #epicfail:

I know that R is free and I am actu­ally a Unix fan and think Open Source soft­ware is a great idea. How­ever, for me per­son­ally and for most users, both indi­vid­ual and orga­ni­za­tional, the much greater cost of soft­ware is the time it takes to install it, main­tain it, learn it and doc­u­ment it. On that, R is an epic fail.

And oh, how the hash­tags and com­ments and teeth-gnashing began!

Nathan Yau’s excel­lent Flow­ing­Data blog recaps the ker­fuf­fle nicely, and his post has accu­mu­lated a thought­ful com­ments thread, as has Dr. De Mars’, to both of which I added my thoughts, expanded here:

To make my prej­u­dices clear, I’ve spent sev­eral decades in com­mer­cial sta­tis­ti­cal soft­ware devel­op­ment (work­ing in a vari­ety of R&D roles at SYSTAT, StatView, JMP and SAS, and Pre­dic­tum), and I now do cus­tom JMP script­ing, etc., for Global Prag­mat­ica LLC.

I can say with hard-won author­ity that:

- good sta­tis­ti­cal soft­ware devel­op­ment is dif­fi­cult and expen­sive
– good qual­ity assur­ance is more dif­fi­cult and expen­sive
– design­ing a good graph­i­cal user inter­face is dif­fi­cult, and expen­sive
– a good GUI is worth­while, because the eas­ier it is to try more things, the more things you will try, &
– cre­ative insight is worth a lot more than pro­gram­ming skill

Even com­mer­cial soft­ware tends to be under-supported, and I’ll be the first to admit that my own pro­gram­ming is as buggy as any­body else’s, but if I’m mak­ing life-and-death or world-changing deci­sions, I want to be sure that I’m not the only one who’s looked at my code, tested bor­der cases, con­sid­ered the impli­ca­tions of miss­ing val­ues, con­trolled for under­flow and over­flow errors, done smart things with float­ing point fuzzi­ness, and gen­er­ally thought about any given prob­lem in a few more direc­tions than I have. I want to know that when seri­ous bugs are dis­cov­ered, the knowl­edge will be dis­sem­i­nated and somebody’s job is on the line to fix them.

For all these rea­sons, I tem­per my sin­cere enthu­si­asm about the wide open fron­tiers of open source prod­ucts like R with a con­ser­v­a­tive appre­ci­a­tion for soft­ware that has a big company’s rep­u­ta­tion and future rid­ing on its accu­racy, and prefer­ably a big com­pany that has been in the busi­ness long enough to develop the para­noia that dri­ves a fierce QA program.

R is great for what it is, as long as you bear in mind what it isn’t. Your own R code or R code that you find sit­ting around is only as good as your com­mit­ment to test­ing and under­stand­ing of thorny com­pu­ta­tional gotchas.

I share the apparently-common opin­ion that R’s inter­face leaves a lot to be desired. Con­fi­den­tial­ity agree­ments pre­vent me from con­firm­ing or deny­ing the rumors about JMP 9 inter­fac­ing with R, but I will say that if they turn out to be true, both prod­ucts would ben­e­fit from it. JMP, like any com­mer­cial prod­uct, improves when it faces stiff com­pe­ti­tion and attends to it, and R, like most open source prod­ucts, could use a bet­ter front end.

And now let me make my case for R being an epic success.

I like open source soft­ware. I use a bunch of it, and I do what I can for the cause (which isn’t much more than evan­ge­lism, unfor­tu­nately). For me, the biggest win with open source soft­ware is that it makes tools avail­able to me, and oth­ers, who don’t need them enough to jus­tify much of a price, but who can ben­e­fit from them when they’re afford­able or free. When an open source tool gets some­thing done for me, or eases some pain at least, I’m not that picky about its inter­face, and I’m will­ing to do my own val­i­da­tion (where applicable).

I can’t say that I love using Linux, but as a long-time UNIX geek and Mac OS X bigot, I am glad Linux is avail­able, I use it for cer­tain things, and I think it’s a whole lot bet­ter than Win­dows and other OSes, espe­cially when Ubuntu builds work out. (I’ve had trou­ble get­ting JMP for Linux installed on Ubuntu, but that’s prob­a­bly due to my own incom­pe­tence.) OpenOf­fice is kind of a pain, but it’s bet­ter than pay­ing Microsoft for the priv­i­lege of endur­ing the epic fail that is Office, and it has much bet­ter sup­port than Office for import/export of other for­mats. I love it that any num­ber of open source projects are devel­op­ing such fab­u­lous tools as bzr ver­sion con­trol, which I use daily, and that the FINK project is port­ing a whole bunch of great open source UNIX wid­gets to Mac OS X.

I think it’s won­der­ful that some of the world’s great­est ana­lyt­i­cal minds are using R to cre­ate pub­licly avail­able rou­tines for power-analysts. I love it that stu­dents and peo­ple who can’t afford com­mer­cial stats soft­ware, or who won’t use it enough to jus­tify buy­ing a license, have a high-quality open source option, if they’re will­ing to work at it a bit. I think it’s great that peo­ple who think Excel is good enough can’t make a price objec­tion to upgrad­ing to R.

I believe that democ­ra­tiz­ing inno­va­tion and pro­lif­er­at­ing ana­lyt­i­cal com­pe­tence are good for us all. I count on projects like R and Linux to push com­mer­cial devel­op­ers to make bet­ter prod­ucts, and to force pric­ing and licens­ing of those prod­ucts to remain rea­son­able. Monop­o­lies are good for nobody, includ­ing monopolists.

Long live the pro­po­nents of R!

What do you think? Do you trust open source stats code? Do you think R’s inter­face is good enough? Is JMP’s any bet­ter? How heav­ily do you fac­tor qual­ity of doc­u­men­ta­tion into deci­sions about software?

8 Comments :, , , , , , , , , more...


Global Pragmatica’s art­work includes paint­ings by Zsuzsi Saper and dig­i­tal pho­tographs by Erin Vang. Fur­ther notes on spe­cific pieces of art are given at the bot­tom of pages in which they appear. All art­work is copy­right 2009–2010 by Global Prag­mat­ica LLC®. All rights reserved worldwide.

© 2009–14 Global Pragmatica LLC®

All con­tent © 2009–14 by Global Prag­mat­ica LLC®. All rights reserved worldwide.

Global Prag­mat­ica LLC® is a reg­is­tered trade­mark of Global Prag­mat­ica LLC. The ® sym­bol indi­cates USA trade­mark registration.

Contact Global Pragmatica LLC®

+1 415.997.9671
Oak­land, CA 94611