The controversy about R: epic fail or epic success?

by on Apr.28, 2010 , under JMP & JSL

Sta­tis­ti­cians and data ana­lysts are in a ker­fuf­fle about the recent remarks of Ann­Maria De Mars, Ph.D. (Pres­i­dent of The Julia Group and a SAS Global Forum attendee) in her blog that the open source sta­tis­ti­cal analy­sis tool R is an “epic fail,” or to put it in Twit­terese, #epicfail:

I know that R is free and I am actu­ally a Unix fan and think Open Source soft­ware is a great idea. How­ever, for me per­son­ally and for most users, both indi­vid­ual and orga­ni­za­tional, the much greater cost of soft­ware is the time it takes to install it, main­tain it, learn it and doc­u­ment it. On that, R is an epic fail.

And oh, how the hash­tags and com­ments and teeth-gnashing began!

Nathan Yau’s excel­lent Flow­ing­Data blog recaps the ker­fuf­fle nicely, and his post has accu­mu­lated a thought­ful com­ments thread, as has Dr. De Mars’, to both of which I added my thoughts, expanded here:

To make my prej­u­dices clear, I’ve spent sev­eral decades in com­mer­cial sta­tis­ti­cal soft­ware devel­op­ment (work­ing in a vari­ety of R&D roles at SYSTAT, StatView, JMP and SAS, and Pre­dic­tum), and I now do cus­tom JMP script­ing, etc., for Global Prag­mat­ica LLC.

I can say with hard-won author­ity that:

- good sta­tis­ti­cal soft­ware devel­op­ment is dif­fi­cult and expen­sive
– good qual­ity assur­ance is more dif­fi­cult and expen­sive
– design­ing a good graph­i­cal user inter­face is dif­fi­cult, and expen­sive
– a good GUI is worth­while, because the eas­ier it is to try more things, the more things you will try, &
– cre­ative insight is worth a lot more than pro­gram­ming skill

Even com­mer­cial soft­ware tends to be under-supported, and I’ll be the first to admit that my own pro­gram­ming is as buggy as any­body else’s, but if I’m mak­ing life-and-death or world-changing deci­sions, I want to be sure that I’m not the only one who’s looked at my code, tested bor­der cases, con­sid­ered the impli­ca­tions of miss­ing val­ues, con­trolled for under­flow and over­flow errors, done smart things with float­ing point fuzzi­ness, and gen­er­ally thought about any given prob­lem in a few more direc­tions than I have. I want to know that when seri­ous bugs are dis­cov­ered, the knowl­edge will be dis­sem­i­nated and somebody’s job is on the line to fix them.

For all these rea­sons, I tem­per my sin­cere enthu­si­asm about the wide open fron­tiers of open source prod­ucts like R with a con­ser­v­a­tive appre­ci­a­tion for soft­ware that has a big company’s rep­u­ta­tion and future rid­ing on its accu­racy, and prefer­ably a big com­pany that has been in the busi­ness long enough to develop the para­noia that dri­ves a fierce QA program.

R is great for what it is, as long as you bear in mind what it isn’t. Your own R code or R code that you find sit­ting around is only as good as your com­mit­ment to test­ing and under­stand­ing of thorny com­pu­ta­tional gotchas.

I share the apparently-common opin­ion that R’s inter­face leaves a lot to be desired. Con­fi­den­tial­ity agree­ments pre­vent me from con­firm­ing or deny­ing the rumors about JMP 9 inter­fac­ing with R, but I will say that if they turn out to be true, both prod­ucts would ben­e­fit from it. JMP, like any com­mer­cial prod­uct, improves when it faces stiff com­pe­ti­tion and attends to it, and R, like most open source prod­ucts, could use a bet­ter front end.

And now let me make my case for R being an epic success.

I like open source soft­ware. I use a bunch of it, and I do what I can for the cause (which isn’t much more than evan­ge­lism, unfor­tu­nately). For me, the biggest win with open source soft­ware is that it makes tools avail­able to me, and oth­ers, who don’t need them enough to jus­tify much of a price, but who can ben­e­fit from them when they’re afford­able or free. When an open source tool gets some­thing done for me, or eases some pain at least, I’m not that picky about its inter­face, and I’m will­ing to do my own val­i­da­tion (where applicable).

I can’t say that I love using Linux, but as a long-time UNIX geek and Mac OS X bigot, I am glad Linux is avail­able, I use it for cer­tain things, and I think it’s a whole lot bet­ter than Win­dows and other OSes, espe­cially when Ubuntu builds work out. (I’ve had trou­ble get­ting JMP for Linux installed on Ubuntu, but that’s prob­a­bly due to my own incom­pe­tence.) OpenOf­fice is kind of a pain, but it’s bet­ter than pay­ing Microsoft for the priv­i­lege of endur­ing the epic fail that is Office, and it has much bet­ter sup­port than Office for import/export of other for­mats. I love it that any num­ber of open source projects are devel­op­ing such fab­u­lous tools as bzr ver­sion con­trol, which I use daily, and that the FINK project is port­ing a whole bunch of great open source UNIX wid­gets to Mac OS X.

I think it’s won­der­ful that some of the world’s great­est ana­lyt­i­cal minds are using R to cre­ate pub­licly avail­able rou­tines for power-analysts. I love it that stu­dents and peo­ple who can’t afford com­mer­cial stats soft­ware, or who won’t use it enough to jus­tify buy­ing a license, have a high-quality open source option, if they’re will­ing to work at it a bit. I think it’s great that peo­ple who think Excel is good enough can’t make a price objec­tion to upgrad­ing to R.

I believe that democ­ra­tiz­ing inno­va­tion and pro­lif­er­at­ing ana­lyt­i­cal com­pe­tence are good for us all. I count on projects like R and Linux to push com­mer­cial devel­op­ers to make bet­ter prod­ucts, and to force pric­ing and licens­ing of those prod­ucts to remain rea­son­able. Monop­o­lies are good for nobody, includ­ing monopolists.

Long live the pro­po­nents of R!

What do you think? Do you trust open source stats code? Do you think R’s inter­face is good enough? Is JMP’s any bet­ter? How heav­ily do you fac­tor qual­ity of doc­u­men­ta­tion into deci­sions about software?

:, , , , , , , , ,

6 Comments for this entry

  • Erin Vang

    A LinkedIn comment:

    I think your response was well-balanced. I haven’t seen the word, “ker­fuf­fle” in a long time, but using that word surely places the con­tro­versy in a cer­tain con­text. I wasn’t aware of the con­tro­versy until your post.

    I use JMP and I appre­ci­ate all of the sup­port and the webi­nars and the con­fer­ences, although I’ve yet to go to a sem­i­nar. And cer­tainly, the link between JMP and SAS helps me with ana­lyt­i­cal “nam­ing drop­ping.” Most peo­ple have heard of SAS.

    I’ve been tempted to learn R or use the R-add-in for Excel for the ana­lyst group I man­age. The rea­son­ing is this: 1) We don’t do enough sta­tis­ti­cal analy­sis to sup­port mul­ti­ple JMP licenses. 2) There’s a large learn­ing curve with JMP as well. I’ve been work­ing with JMP for a year, and there’s still plenty of stuff I don’t know how to do. So for the team, another soft­ware pack­age to learn. 3) There’s a core set of sim­ple stat rou­tines that we would use: the basic descrip­tive stats and graphs and some t-tests. Excel is a pain to use, even for a sim­ple his­togram. So the R-add-in for Excel is very tempt­ing. Just haven’t tried it yet.

    Posted by Richard Giambrone, Man­ager, Busi­ness Intel­li­gence & Ana­lyt­ics
    Pat­ter­son Companies

  • Erin Vang

    Another LinkedIn com­ment:

    I think your state­ment […] sums it up per­fectly, and I also enjoyed the use of kerfuffle.

    Your post doesn’t men­tion it (although I’m sure you know) R is not only a Linux tool. R runs per­fectly fine in the beloved Win­dows environment.

    The com­ments sec­tion on AnnMarie’s blog are price­less: http://​www​.the​ju​lia​group​.com/​b​l​o​g​/​?​p​=​433

    I think JMP is mak­ing an excel­lent move by sim­ply allow­ing its users to con­nect to R. The ker­fuf­fle arises from SAS/JMP users con­fused over what do to with R. Sim­ple; you don’t *have* to do any­thing. I’ve used R quite exten­sively in grad­u­ate school, yet in indus­try the open source pho­bia is bla­tantly appar­ent (there are sev­eral excep­tions to this rule — Google comes to mind). I think the strate­gic move here is to cap­ture R users (and future SAS users) with JMP as the com­mon inter­face. I see this as a win for R, SAS, and both their user bases.

    Posted by Mike Olson, Analy­sis Man­ager
    EYC

  • Erin Vang

    Isn’t “ker­fuf­fle” a great word? I love how it con­veys con­tro­versy and pokes gen­tle fun at the same time. Most peo­ple seem to agree that, although we prob­a­bly have strong pref­er­ences about what we’ll use our­selves, hav­ing a range of choices is a good thing.

  • Dr. Winfried Koch

    As a Bio­sta­tis­ti­cian I am aware of the ben­e­fit of com­bi­na­tion prod­ucts in med­i­cine if it has been shown that both com­po­nents con­tribute sig­nif­i­cantly to the effi­cacy of the com­bi­na­tion.
    JMP has an excel­lent user inter­face but a small progress in pro­vid­ing new sta­tis­ti­cal rou­tines.
    R with its user con­tributed pack­ages has the most dynamic progress in devel­op­ment of sta­tis­ti­cal pro­ce­dures I am aware of.
    There­fore I expect great thinks from the com­bi­na­tion of both via the new JMP to R inter­face in JMP 9 and future releases of JMP.

    • Erin Vang

      That’s a good point, and I enjoy how you com­pare it to com­bi­na­tion prod­ucts in medicine!

      Do you think SAS is doing some­thing wrong that causes JMP to be slow in sta­tis­ti­cal inno­va­tion, or is it per­haps the nature of a busi­ness to lag a bit behind?

      Does a bleeding-edge pro­ce­dure gain cred­i­bil­ity or stature sim­ply by being added to a big-name prod­uct like SAS? If so, what respon­si­bil­ity do com­pa­nies like SAS, Math­e­mat­ica, SPSS, etc. have to eval­u­ate the worth and applic­a­bil­ity of new meth­ods? Is it enough that the mar­ket seems to demand things?

      I always go back to 3D pie charts. We all know they’re atro­cious, but cus­tomers want them. So what is the proper role of mar­ket demand in the evo­lu­tion of a prod­uct? How pre­scrip­tivist should soft­ware providers be?

2 Trackbacks / Pingbacks for this entry

Leave a Reply

You must be logged in to post a comment.

Artwork

Global Pragmatica’s art­work includes paint­ings by Zsuzsi Saper and dig­i­tal pho­tographs by Erin Vang. Fur­ther notes on spe­cific pieces of art are given at the bot­tom of pages in which they appear. All art­work is copy­right 2009–2010 by Global Prag­mat­ica LLC®. All rights reserved worldwide.

© 2009–14 Global Pragmatica LLC®

All con­tent © 2009–14 by Global Prag­mat­ica LLC®. All rights reserved worldwide.

Global Prag­mat­ica LLC® is a reg­is­tered trade­mark of Global Prag­mat­ica LLC. The ® sym­bol indi­cates USA trade­mark registration.

Contact Global Pragmatica LLC®

info@globalpragmatica.com
+1 415.997.9671
Oak­land, CA 94611