JMP & JSL

One Proportion Test

by on Jul.21, 2010 , under JMP & JSL

One Proportion Test is a JMP script for performing a One Proportion Test quickly and easily.

Background

This script is inspired by a recent question on the LinkedIn Group “JMP Professional Network” from Jennifer Atlas, Senior Business Development Coordinator at Minitab, who asked:

I know I can calculate a sample size for a one proportion test in JMP, but how do I test for 1 proportion?

Karen Copeland, Ph.D., from Boulder Statistics promptly gave a helpful explanation of how to do it in JMP. JMP handles the problem quite capably, but you have to know where to find it and remember a bunch of details about how to use it:

To test one proportion in JMP I use the distribution platform. When you run a distribution for a nominal variable then in the red drop down menu there is an option for testing probabilities.

Details:

If you have a data table with a column for your proportion then you can proceed straight to the distribution platform. If not then first create a summary table with two columns. The first column would be your outcome such as Y/N or 0/1 and the he second column would be your frequency column of how many of each outcome you observed.

Testing of a proportion:

1. With a column of outcomes: First make sure that your column is of the nominal type. Second, use the distribution platform to create a distribution of the outcome. Under the red triangle drop down menu select “Test Probabilities” and a dialog box will appear with various options for testing your probabilities. Note, you need only fill in the one probability that you are interested in testing.

2. With a summary table for your outcomes: Again, make sure that your outcome column is of the nominal type. Second, use the distribution platform with your outcome column as the Y and the frequency column as the “Freq”. Then continue as above by selecting the test probabilities from the red triangle menu.

Note that you will also find confidence intervals for your proportions in the red drop down menu.

Boulder Statistics LLC and Global Pragmatica LLC are strategic allies, pairing Boulder Statistics’ analytical expertise with Global Pragmatica’s JMP scripting expertise to deliver outstanding solutions to our clients. When I saw Karen’s explanation, I immediately thought this would be a great opportunity to collaborate with her and build a JSL widget to make this easier.

This script is the result, and Global Pragmatica and Boulder Statistics are proud to make it available free, in an encrypted (run-only) script you can download today.

How to use One Proportion Test

You must license or download a demo copy of JMP software from SAS to use this JMP add-on script.

Launch the script. You are asked to choose a data table, which can be set up either of the two ways Dr. Copeland describes above. A dialog box requests the necessary column assignments. Click OK. At the bottom, fill in the details of the One Proportion Test. Use the Start Over button to restart the analysis with a different data table. For further help, see text at the top of the window and tooltips when hovering over buttons.

How encrypted scripts work (the free widget)

Encrypted scripts are run-only scripts. They are not human-readable, so you can’t modify them or adapt them for other purposes. If you would like to study the JSL, modify, or adapt the script, you should license the unencrypted version of this script instead.

Why pay for the unencrypted script?

Don’t! At least not right away… Start by “buying” the free widget. Try it out and see if you like it!

If you do, you might want to buy the full-price unencrypted script, so that you can modify, customize, or adapt the script for your own specialized needs, or so you can study a professional-quality script! This script demonstrates quite a few advanced scripting techniques, including:

  • building an elegant, all-in-one-window user interface using display objects
  • attaching scripts to buttons, radio buttons, column lists, etc.
  • including a custom logo graphic
  • implementing a different dialog box according to the user’s radio button choice
  • opening, closing, and deleting display tree elements dynamically
  • offering tips and help right in the window
  • including email and web links for more information, sending feedback, etc.
  • hiding globals—whether to protect intellectual property or to avoid cluttering up Show Globals() output
  • optimizing memory management and simplifying between-use value-clearing by storing problem-specific “globals” as entries in an Associative Array() instead of in globals

You can read more about encrypted vs. unencrypted scripts offered by Global Pragmatica here.

Compatibility

One Proportion Test has been tested on JMP 8 for Mac and Windows as well as current beta versions of JMP 9 for Mac and Windows.

Known issues

  • In the second (no longer current) early adopter release of JMP 9 for Windows, you have to start the script manually. (This is not a problem in JMP 8 or later beta releases of JMP 9.) There are several ways to do this:
    • press Ctrl-R, or
    • click the Run Script button in the toolbar, or
    • from the Edit menu, select Run Script

Buy now!

Contact me if you’re interested in purchasing this script or commissioning one like it.

Comments Off on One Proportion Test :, more...

Should JMP scripts exploit data objects? Yes and no.

by on Jul.21, 2010 , under JMP & JSL

Answering a LinkedIn group question about object oriented programming in JSL from Philip Brown, the always helpful Mark Bailey wrote, “The best scripts exploit JMP objects such as data columns and platforms.”

I don’t quite agree.

In my opinion, Mark’s both right and wrong in saying that the best scripts exploit JMP objects like data columns and platforms. Yes, if a JMP data object or analysis platform already knows how to do something you need, you shouldn’t be reinventing that wheel. BUT–and this is a big but–JMP’s data tables and their sub-objects (columns, rows, etc.) come with a heavy overhead cost, and this is where I think it’s sometimes better to avoid using JMP’s object.

It all depends, of course. If you just need a quick result, by all means, let JMP do the work for you.

But if you need to do something computationally intensive, you’re far better off grabbing just what you need in a JSL data structure–a vector, a matrix, a list, an associative array, whatever is appropriate–and doing that computation outside the data table. You’ll speed up your code, reduce the memory consumption, and even avoid difficult-to-chase-down crashes.

Don’t forget that most of JMP’s formula operators can work on things other than data columns, too. I recently worked on a client’s script that was doing its own calculations of means on lists of numbers by adding up the values, counting the number of values, and dividing the sum by that. It was correct, of course, but it was SLOW. I got the script to run about twice as fast by using JMP’s column-wise mean operator, Col Mean(), on data columns instead, and then I got it to run five times faster still by using JMP’s regular Mean() operator on vectors instead of data columns.

Why did this change give me a tenfold speed improvement? Two reasons.

  1. JMP’s internal calculation of means is more efficient than anything we can build in JSL—JMP’s developers have always optimized their code so that JMP’s numerical computations are done lickedy-split.
  2. Those calculations happen much faster when you don’t run them through the data table, which is a complex object with a lot of internal and external dependencies—and a big, fancy window that needs to be repainted whenever the data change. Window-painting alone can cause significant script slowdowns.

Another more important reason to use JMP’s operators is that JMP’s developers have already thought through pesky details like the proper handling of missing values, underflow and overflow errors, and other arcana of numerical computation that can cause reasonable-looking calculations to get wrong results.

I’m not bashing data tables, by the way. It’s amazing what JMP’s data tables can do for us, with all their table and column properties, row states, formulas, dynamic links to graphs, and so on. All that power comes at a cost, though, and basic numerical computations will go faster when they’re not manipulating complex data objects.

Bottom line: if you’re taking advantage of the rich features in JMP’s data tables, use data tables, but if you’re just doing some calculations, and speed is an issue, then do the calculations outside the data table. In either case, take advantage of JMP’s built-operators as much as possible.

Comments Off on Should JMP scripts exploit data objects? Yes and no. :, more...

Marker State()s and states of confusion

by on May.12, 2010 , under JMP & JSL

When I wrote the first JMP Scripting Guide, way back in the dark old days of JMP 4, easily the most difficult chapter to write was the one about data tables, and the most difficult section of that chapter was the one about Row States. I must have spent the better part of a month trying to figure out all the relevant concepts. Not only was JSL still in its gestational period, but I was new to JMP entirely, so I wasn’t just confused about how to work with row states in JSL. I was confused about row states, period.

Slowly I untangled it all, but I had to keep referring to two cheat sheets I’d built for myself, so I decided they’d probably better become a part of the book. These live on in current editions of the JMP Scripting Guide as Table 5.3, “Row states and how they affect JMP’s analyses, charts, and plots,” and Table 5.4, “Operators for converting between numbers and row states.” Then I had to struggle some more to figure out how to use the operators, and on and on. By the time I was done, I still had only a tenuous grip on how to get anything done with row states, but at least I’d gotten to the point where I knew which of my pages to reread and could usually adapt one of my own examples after a few minutes of squinting and swearing.

Fast forward about a dozen years, and I’m programming in JSL full-time for a variety of clients—cool projects for big companies you’ve heard of—but I still cringe when I have to use row state operators. Unless I can refer to another recent project and steal code from myself, I know that I’ve got a good half hour in that darned book and its ridiculously confusing section on row states. I should point out that I haven’t been involved in that book since version 4—since then, it’s been updated by Lee Creighton, Ph.D. (version 5) and Melanie Drake (since version 6). Honestly, I’m not sure whether the section on row states has changed at all, so the fact that I still hate that section and how hard it is for me to read and understand could still be entirely my own fault. I hate that!

In version 8 markers and several other row state operators got a lot more complicated, because JMP 8 introduced marker themes and unicode markers, among several other improvements by the talented developer Xan Gregg, the guy who brought us Graph Builder. The new features are great, but the already confusing row state operators got even more confusing.

So, when Glenn Donahey at W.L. Gore & Associates posed a question about markers on the LinkedIn group “JMP Professional Network,” I thought, “Great! I’ll answer his question and refresh my own memory while I’m at it. Here’s his question:

Simple JSL question – assigning custom markers. This is a simple question, but I can’t seem to find the answer. Starting with a populated data table, one can right click on the row, choose Markers >> Custom… and assign a marker such as “D”. What is the analagous command in JSL?

Unfortunately, my first few attempts to bang out the command he needed didn’t work, and then I got busy with something else. Monday morning I made another attempt at it but once again got myself confused, and then the ever-helpful Mark Bailey, Ph.D., popped in with some answers. Meanwhile, I banged my head against some of the less-documented details of the Marker State() operator, and here’s what I figured out.

Either I’m dumber than the average bear…

This could be the problem. It really could.

…or row state operators are really confusing!

You know what, they are. Regardless of the first part—whether I’m some special kind of stupid or not—row state operators are confusing. And now here are the things I learned (again!) that might be helpful to everyone else!

To set one row’s state at a time, use a pinch of Whatever State( rowNumber )

For example, to set row 1 to pink, which is for no particular reason color #11:

Row State( 1 ) = Color State( 11 );

Or to set row 1 to an inverted solid triangle, which is for no particular reason marker #21:

Row State( 1 ) = Marker State( 21 );

To use Unicode characters as markers, prepare to sweat

Up until JMP 8, there were a few dozen markers known mysteriously as 1 through 31. These are pictured at right and ought to be familiar to veteran JMP users. JMP 8 introduced a way to exploit a whole mess of Unicode characters as markers—not the entire Unicode character set, unfortunately, but the first 19,560 of them—by using 16-bit numbers as arguments. You have three choices for how to give the argument:

  1. As an integer from 33 to 19,560, inclusive
  2. As a hex number in a string, inside a Hex to Number() operator
  3. As a hex number in a four-digit string escaped with \!u

That makes perfect sense, right? No, me either. Let’s go through these one at a time.

1. Integers

Integers are the easy part. We’ve already seen that 1-31 give JMP’s good old-fashioned marker symbols as seen at right. Number 32 is empty. Not coincidentally, the first 32 positions of the Unicode character set are all empty; JMP’s standard markers fill that gap, and #32 is just empty. After that, you start getting Unicode characters by their positions given in any Unicode character table.

Note: be sure to set your Marker Font() preference to a unicode font:

// set the Marker Font to be Arial Unicode MS, Lucida Grande,
// or another font containing an extensive set of Unicode characters
If( Host is( Windows ),
  Preferences( Fonts( English( Marker Font( "Arial Unicode MS", 10 ) ) ) ),
  Preferences( Fonts( English( Marker Font( "Lucida Grande", 10 ) ) ) )
);

Whoa! What are the Unicode characters, anyway?

Windows users have a decent tool for browsing Unicode. Go to Start / Programs / Accessories / System Tools / Character Map. For Font, choose a unicode font such as Arial Unicode MS. To get details for a specific character, click it and look in the lower part of the window:

The Advanced view is marginally more helpful. Searching online for “Unicode character map” yields plenty of useful options.

Mac users have a great built-in tool for browsing Unicode: in System Preferences, choose Language & Text, click the Input Sources panel, and check the Keyboard & Character Viewer from the list of input methods, and check the “Show Input menu in menu bar.” Here’s how that looks in OS 10.6 (Snow Leopard): Now you have an input menu in the upper right corner of the menu bar. From this menu, choose Show Character Viewer, and you get this handy browser of the entire Unicode character set. (The characters shown are limited to those available in fonts installed on your machine. Mac OS X’s best font for this kind of thing is Lucida Grande.)

So, once you find a symbol you like, these tools tell you which character is, but they tell you in hexadecimal notation, e.g. the FOR ALL character above is in position 2200 hexadecimal (base 16). Now, just figure out what that is in decimal (base ten) and you’re all set! To do that, you can either grab your favorite programmer’s calculator, or you can use JMP’s own conversion operator to learn that it’s 8704:

show(hex to number("2200"));
Hex To Number("2200"):8704

Armed with this knowledge, you can set FOR ALL as the marker symbol for row 5 this way:

Row State( 5 ) = Marker State( 8704 );

2. Hex numbers

But it’s kind of silly to work that hard at it. If you’re using Unicode character tables to find symbols, they’re telling you hex numbers, and you might as well just let JMP do the work:

Row State( 5 ) = Marker State( hex to number("2200" );

3. Escaped numbers

This too is making it too hard, though. Why not just use the common \u-escaped notation, e.g. \u2200 to mean “Unicode character in position hex 2200”? In this case, our argument needs to be a string, and the usual \u escape needs to be escaped with both a backslash and a bang, like most other special characters for JSL. So what in the rest of the world is “\u2200” becomes “\!u2200” for JSL, for a command like this:

Row State( 5 ) = Marker State ( "\!u2200" );

This isn’t too bad, but if you want to store a bunch of these in a character data table column, you need to do more escaping to get them to play nicely with JSL.

New Column( "unicode",
  Character, Nominal,
  Formula( "\!"\!\\!!u" || Substr( Hex( Row(), "integer" ), 5, 4 ) || "\!"" )
);

What a mess! Let’s break it down. In the middle we ask for row numbers to be converted from integers to hex, but we only take the last four digits by using Substr().

Around the whole string, we need quotation marks, but we can’t just use them directly—the character column gets string values, which are quoted strings by definition, and those surrounding quotes are thrown away when we read the values out of the column. So we need escaped quotation marks, \!”, before and after. So far we’ve got, for example on row 8704, ” \!”2200\!” “. But we also need \!u before the number, which is two more special characters that need escaping in JSL: backslash (\) and bang (!). To get the backslash, we need to use \!\, and to get the bang, we need \!!. The u is the easy part. So add \!\ and \!! and u before the 2200, and you’ve got ” \!”\!\\!!u2200\!” ”

But it gets worse! Now if you want to ask for markers from those values, you have to parse all that to get it back to “\!u2200”!

I know: “Yikes!”

To put it all together into a useful script, let’s create a data table, give it the 19560 rows we’ll need to see all the supported Unicode characters, and then set marker states For Each Row(). We’ll add a While() in the middle to make sure the data table finishes getting built before JMP tries to run the For Each Row(); if we don’t, we’ll get an error about arguments.

// a data table to see the entire set of standard markers
// and Unicode characters available for markers
New Table( "unicode characters",
  add rows( 19561 ),
  New Property( "plot my markers",
    Overlay Plot(
      X( :X ),
      Y( :Y ),
      Separate Axes( 1 ),
      SendToReport( Dispatch( {}, "Overlay Plot",
      FrameBox, {Frame Size( 1600, 1600 )} ) )
    )
  ),
  New Column( "unicode",
    Character,
    Nominal,
    Formula( "\!"\!\\!!u" || Substr( Hex( Row(), "integer" ), 5, 4 ) || "\!"" )
  ),
  New Column( "X", Numeric, Continuous, Format( "Best", 10 ),
    Formula( Sequence( 1, 140, 1, 1 ) ) ),
  New Column( "Y", Numeric, Continuous, Format( "Best", 10 ),
    Formula( Sequence( 1, 140, 1, 140 ) ) )
);

// use While() to see when the data table is finished -
// else next line processes before the data table is built
// and you get error: "Argument should be row State{1}"
while(is missing(:Y[19561]), wait(.0001) );

// see all markers using unicode arguments
For Each Row( Row State( Row() ) = Marker State( Parse( :unicode ) ) );

// the equivalent command, using integer arguments
For Each Row( Row State( Row() ) = Marker State( Row() ) );

To set several aspect of one row’s state, add a teaspoon of Combined States()

For example, to set row 1 to a solid pink inverted triangle, stick both of the previous examples together with a Combined States() operator. This would be handy for a World War II historian plotting deaths in the Holocaust.

Row State( 1 ) = Combine States( Marker State( 21 ), Color State( 11 ) );

Or to set row 1 to a yellow (color #41) star of David (marker #10017), to plot the most horrendous statistics from the Holocaust, you could set:

Row State( 1 ) = Combine States( Marker State( 10017 ), Color State( 41 ) );

Some interesting Unicode characters to use as markers

Just scroll through the data table created above to see some Unicode characters that might be particularly useful for JMP graph markers, such as:

  • Greek and Coptic start at \u0370
  • Currency symbols start at \u20A0
  • Hebrew, Arabic, etc. start at \u0590
  • Fractions and roman numerals start at \u2150
  • Box elements and other geometric shapes and dingbats start at \u2500
  • Arrows, math operators, and technical symbols start at \u2190
  • Weather, pointing hand, smiley faces, gender, and astrology symbols start at \u2600
  • Chess pieces, card suits, musical symbols, recycling symbols, dice, monogram, digram, and trigram symbols start at \u2654
  • Braille starts at \u2800
  • Asian language characters start around \u2E80

Download the script

Don’t bother with copying and pasteing from the examples above—you can download a tidy sample script here:

  • [download id=”4″]

What did I forget?

What is still confusing about marker states? Post your questions and comments below, and I’ll do my best to address them in future posts (or in the comments, if they’re not too complex).

Comments Off on Marker State()s and states of confusion : more...

The controversy about R: epic fail or epic success?

by on Apr.28, 2010 , under JMP & JSL

Statisticians and data analysts are in a kerfuffle about the recent remarks of AnnMaria De Mars, Ph.D. (President of The Julia Group and a SAS Global Forum attendee) in her blog that the open source statistical analysis tool R is an “epic fail,” or to put it in Twitterese, #epicfail:

I know that R is free and I am actually a Unix fan and think Open Source software is a great idea. However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail.

And oh, how the hashtags and comments and teeth-gnashing began!

Nathan Yau’s excellent FlowingData blog recaps the kerfuffle nicely, and his post has accumulated a thoughtful comments thread, as has Dr. De Mars’, to both of which I added my thoughts, expanded here:

To make my prejudices clear, I’ve spent several decades in commercial statistical software development (working in a variety of R&D roles at SYSTAT, StatView, JMP, SAS, and Predictum, and I now do custom JMP scripting, etc., for Global Pragmatica LLC.

I can say with hard-won authority that:

– good statistical software development is difficult and expensive
– good quality assurance is more difficult and expensive
– designing a good graphical user interface is difficult, and expensive
– a good GUI is worthwhile, because the easier it is to try more things, the more things you will try, &
– creative insight is worth a lot more than programming skill

Even commercial software tends to be under-supported, and I’ll be the first to admit that my own programming is as buggy as anybody else’s, but if I’m making life-and-death or world-changing decisions, I want to be sure that I’m not the only one who’s looked at my code, tested border cases, considered the implications of missing values, controlled for underflow and overflow errors, done smart things with floating point fuzziness, and generally thought about any given problem in a few more directions than I have. I want to know that when serious bugs are discovered, the knowledge will be disseminated and somebody’s job is on the line to fix them.

For all these reasons, I temper my sincere enthusiasm about the wide open frontiers of open source products like R with a conservative appreciation for software that has a big company’s reputation and future riding on its accuracy, and preferably a big company that has been in the business long enough to develop the paranoia that drives a fierce QA program.

R is great for what it is, as long as you bear in mind what it isn’t. Your own R code or R code that you find sitting around is only as good as your commitment to testing and understanding of thorny computational gotchas.

I share the apparently-common opinion that R’s interface leaves a lot to be desired. Confidentiality agreements prevent me from confirming or denying the rumors about JMP 9 interfacing with R, but I will say that if they turn out to be true, both products would benefit from it. JMP, like any commercial product, improves when it faces stiff competition and attends to it, and R, like most open source products, could use a better front end.

And now let me make my case for R being an epic success.

I like open source software. I use a bunch of it, and I do what I can for the cause (which isn’t much more than evangelism, unfortunately). For me, the biggest win with open source software is that it makes tools available to me, and others, who don’t need them enough to justify much of a price, but who can benefit from them when they’re affordable or free. When an open source tool gets something done for me, or eases some pain at least, I’m not that picky about its interface, and I’m willing to do my own validation (where applicable).

I can’t say that I love using Linux, but as a long-time UNIX geek and Mac OS X bigot, I am glad Linux is available, I use it for certain things, and I think it’s a whole lot better than Windows and other OSes, especially when Ubuntu builds work out. (I’ve had trouble getting JMP for Linux installed on Ubuntu, but that’s probably due to my own incompetence.) OpenOffice is kind of a pain, but it’s better than paying Microsoft for the privilege of enduring the epic fail that is Office, and it has much better support than Office for import/export of other formats. I love it that any number of open source projects are developing such fabulous tools as bzr version control, which I use daily, and that the FINK project is porting a whole bunch of great open source UNIX widgets to Mac OS X.

I think it’s wonderful that some of the world’s greatest analytical minds are using R to create publicly available routines for power-analysts. I love it that students and people who can’t afford commercial stats software, or who won’t use it enough to justify buying a license, have a high-quality open source option, if they’re willing to work at it a bit. I think it’s great that people who think Excel is good enough can’t make a price objection to upgrading to R.

I believe that democratizing innovation and proliferating analytical competence are good for us all. I count on projects like R and Linux to push commercial developers to make better products, and to force pricing and licensing of those products to remain reasonable. Monopolies are good for nobody, including monopolists.

Long live the proponents of R!

What do you think? Do you trust open source stats code? Do you think R’s interface is good enough? Is JMP’s any better? How heavily do you factor quality of documentation into decisions about software?

Comments Off on The controversy about R: epic fail or epic success? :, , , , , , , , , more...

Iterative Regression script

by on Feb.09, 2010 , under JMP & JSL

“Iterative Regression” is a script that Global Pragmatica LLC® developed for Ron Tanasichuk of the Canadian Department of Fisheries and Oceans, implementing his own interative regression technique. Performing the analysis is by hand is cumbersome and extremely slow and cumbersome. Automating the analysis reduces the processing time from weeks to minutes and eliminates the human error.

The method:

  1. for a single Y response and multiple X factors, fit all possible models having one and two factors (no interactions)
  2. for each model, exclude any outliers, and refit the model
  3. repeat fitting and excluding until there are no outliers for the model
  4. save the results from that model
  5. re-include all rows (unexclude all rows) and begin again with the next factor combination

Outliers are defined as rows where Abs(Studentized residuals > 2.5) AND (Y-hats > 4/n).

The script is compatible with JMP 7, 8, and 9 and has been thoroughly tested on Mac versions JMP 7, 8, and 9. A version tested on Windows as well will be available shortly.

Buy now!

Contact me if you’re interested in purchasing this script or commissioning one like it.

Comments Off on Iterative Regression script more...