Should JMP scripts exploit data objects? Yes and no.

by on Jul.21, 2010 , under JMP & JSL

Answering a LinkedIn group question about object oriented programming in JSL from Philip Brown, the always helpful Mark Bailey wrote, “The best scripts exploit JMP objects such as data columns and platforms.”

I don’t quite agree.

In my opinion, Mark’s both right and wrong in saying that the best scripts exploit JMP objects like data columns and platforms. Yes, if a JMP data object or analysis platform already knows how to do something you need, you shouldn’t be reinventing that wheel. BUT–and this is a big but–JMP’s data tables and their sub-objects (columns, rows, etc.) come with a heavy overhead cost, and this is where I think it’s sometimes better to avoid using JMP’s object.

It all depends, of course. If you just need a quick result, by all means, let JMP do the work for you.

But if you need to do something computationally intensive, you’re far better off grabbing just what you need in a JSL data structure–a vector, a matrix, a list, an associative array, whatever is appropriate–and doing that computation outside the data table. You’ll speed up your code, reduce the memory consumption, and even avoid difficult-to-chase-down crashes.

Don’t forget that most of JMP’s formula operators can work on things other than data columns, too. I recently worked on a client’s script that was doing its own calculations of means on lists of numbers by adding up the values, counting the number of values, and dividing the sum by that. It was correct, of course, but it was SLOW. I got the script to run about twice as fast by using JMP’s column-wise mean operator, Col Mean(), on data columns instead, and then I got it to run five times faster still by using JMP’s regular Mean() operator on vectors instead of data columns.

Why did this change give me a tenfold speed improvement? Two reasons.

  1. JMP’s internal calculation of means is more efficient than anything we can build in JSL—JMP’s developers have always optimized their code so that JMP’s numerical computations are done lickedy-split.
  2. Those calculations happen much faster when you don’t run them through the data table, which is a complex object with a lot of internal and external dependencies—and a big, fancy window that needs to be repainted whenever the data change. Window-painting alone can cause significant script slowdowns.

Another more important reason to use JMP’s operators is that JMP’s developers have already thought through pesky details like the proper handling of missing values, underflow and overflow errors, and other arcana of numerical computation that can cause reasonable-looking calculations to get wrong results.

I’m not bashing data tables, by the way. It’s amazing what JMP’s data tables can do for us, with all their table and column properties, row states, formulas, dynamic links to graphs, and so on. All that power comes at a cost, though, and basic numerical computations will go faster when they’re not manipulating complex data objects.

Bottom line: if you’re taking advantage of the rich features in JMP’s data tables, use data tables, but if you’re just doing some calculations, and speed is an issue, then do the calculations outside the data table. In either case, take advantage of JMP’s built-operators as much as possible.

:,