Marker State()s and states of confusion

by on May.12, 2010 , under JMP & JSL

When I wrote the first JMP Scripting Guide, way back in the dark old days of JMP 4, easily the most difficult chapter to write was the one about data tables, and the most difficult section of that chapter was the one about Row States. I must have spent the better part of a month trying to figure out all the relevant concepts. Not only was JSL still in its gestational period, but I was new to JMP entirely, so I wasn’t just confused about how to work with row states in JSL. I was confused about row states, period.

Slowly I untangled it all, but I had to keep referring to two cheat sheets I’d built for myself, so I decided they’d probably better become a part of the book. These live on in current editions of the JMP Scripting Guide as Table 5.3, “Row states and how they affect JMP’s analyses, charts, and plots,” and Table 5.4, “Operators for converting between numbers and row states.” Then I had to struggle some more to figure out how to use the operators, and on and on. By the time I was done, I still had only a tenuous grip on how to get anything done with row states, but at least I’d gotten to the point where I knew which of my pages to reread and could usually adapt one of my own examples after a few minutes of squinting and swearing.

Fast forward about a dozen years, and I’m programming in JSL full-time for a variety of clients—cool projects for big companies you’ve heard of—but I still cringe when I have to use row state operators. Unless I can refer to another recent project and steal code from myself, I know that I’ve got a good half hour in that darned book and its ridiculously confusing section on row states. I should point out that I haven’t been involved in that book since version 4—since then, it’s been updated by Lee Creighton, Ph.D. (version 5) and Melanie Drake (since version 6). Honestly, I’m not sure whether the section on row states has changed at all, so the fact that I still hate that section and how hard it is for me to read and understand could still be entirely my own fault. I hate that!

In version 8 markers and several other row state operators got a lot more complicated, because JMP 8 introduced marker themes and unicode markers, among several other improvements by the talented developer Xan Gregg, the guy who brought us Graph Builder. The new features are great, but the already confusing row state operators got even more confusing.

So, when Glenn Donahey at W.L. Gore & Associates posed a question about markers on the LinkedIn group “JMP Professional Network,” I thought, “Great! I’ll answer his question and refresh my own memory while I’m at it. Here’s his question:

Simple JSL question – assigning custom markers. This is a simple question, but I can’t seem to find the answer. Starting with a populated data table, one can right click on the row, choose Markers >> Custom… and assign a marker such as “D”. What is the analagous command in JSL?

Unfortunately, my first few attempts to bang out the command he needed didn’t work, and then I got busy with something else. Monday morning I made another attempt at it but once again got myself confused, and then the ever-helpful Mark Bailey, Ph.D., popped in with some answers. Meanwhile, I banged my head against some of the less-documented details of the Marker State() operator, and here’s what I figured out.

Either I’m dumber than the average bear…

This could be the problem. It really could.

…or row state operators are really confusing!

You know what, they are. Regardless of the first part—whether I’m some special kind of stupid or not—row state operators are confusing. And now here are the things I learned (again!) that might be helpful to everyone else!

To set one row’s state at a time, use a pinch of Whatever State( rowNumber )

For example, to set row 1 to pink, which is for no particular reason color #11:

Row State( 1 ) = Color State( 11 );

Or to set row 1 to an inverted solid triangle, which is for no particular reason marker #21:

Row State( 1 ) = Marker State( 21 );

To use Unicode characters as markers, prepare to sweat

Up until JMP 8, there were a few dozen markers known mysteriously as 1 through 31. These are pictured at right and ought to be familiar to veteran JMP users. JMP 8 introduced a way to exploit a whole mess of Unicode characters as markers—not the entire Unicode character set, unfortunately, but the first 19,560 of them—by using 16-bit numbers as arguments. You have three choices for how to give the argument:

  1. As an integer from 33 to 19,560, inclusive
  2. As a hex number in a string, inside a Hex to Number() operator
  3. As a hex number in a four-digit string escaped with \!u

That makes perfect sense, right? No, me either. Let’s go through these one at a time.

1. Integers

Integers are the easy part. We’ve already seen that 1-31 give JMP’s good old-fashioned marker symbols as seen at right. Number 32 is empty. Not coincidentally, the first 32 positions of the Unicode character set are all empty; JMP’s standard markers fill that gap, and #32 is just empty. After that, you start getting Unicode characters by their positions given in any Unicode character table.

Note: be sure to set your Marker Font() preference to a unicode font:

// set the Marker Font to be Arial Unicode MS, Lucida Grande,
// or another font containing an extensive set of Unicode characters
If( Host is( Windows ),
  Preferences( Fonts( English( Marker Font( "Arial Unicode MS", 10 ) ) ) ),
  Preferences( Fonts( English( Marker Font( "Lucida Grande", 10 ) ) ) )

Whoa! What are the Unicode characters, anyway?

Windows users have a decent tool for browsing Unicode. Go to Start / Programs / Accessories / System Tools / Character Map. For Font, choose a unicode font such as Arial Unicode MS. To get details for a specific character, click it and look in the lower part of the window:

The Advanced view is marginally more helpful. Searching online for “Unicode character map” yields plenty of useful options.

Mac users have a great built-in tool for browsing Unicode: in System Preferences, choose Language & Text, click the Input Sources panel, and check the Keyboard & Character Viewer from the list of input methods, and check the “Show Input menu in menu bar.” Here’s how that looks in OS 10.6 (Snow Leopard): Now you have an input menu in the upper right corner of the menu bar. From this menu, choose Show Character Viewer, and you get this handy browser of the entire Unicode character set. (The characters shown are limited to those available in fonts installed on your machine. Mac OS X’s best font for this kind of thing is Lucida Grande.)

So, once you find a symbol you like, these tools tell you which character is, but they tell you in hexadecimal notation, e.g. the FOR ALL character above is in position 2200 hexadecimal (base 16). Now, just figure out what that is in decimal (base ten) and you’re all set! To do that, you can either grab your favorite programmer’s calculator, or you can use JMP’s own conversion operator to learn that it’s 8704:

show(hex to number("2200"));
Hex To Number("2200"):8704

Armed with this knowledge, you can set FOR ALL as the marker symbol for row 5 this way:

Row State( 5 ) = Marker State( 8704 );

2. Hex numbers

But it’s kind of silly to work that hard at it. If you’re using Unicode character tables to find symbols, they’re telling you hex numbers, and you might as well just let JMP do the work:

Row State( 5 ) = Marker State( hex to number("2200" );

3. Escaped numbers

This too is making it too hard, though. Why not just use the common \u-escaped notation, e.g. \u2200 to mean “Unicode character in position hex 2200”? In this case, our argument needs to be a string, and the usual \u escape needs to be escaped with both a backslash and a bang, like most other special characters for JSL. So what in the rest of the world is “\u2200” becomes “\!u2200” for JSL, for a command like this:

Row State( 5 ) = Marker State ( "\!u2200" );

This isn’t too bad, but if you want to store a bunch of these in a character data table column, you need to do more escaping to get them to play nicely with JSL.

New Column( "unicode",
  Character, Nominal,
  Formula( "\!"\!\\!!u" || Substr( Hex( Row(), "integer" ), 5, 4 ) || "\!"" )

What a mess! Let’s break it down. In the middle we ask for row numbers to be converted from integers to hex, but we only take the last four digits by using Substr().

Around the whole string, we need quotation marks, but we can’t just use them directly—the character column gets string values, which are quoted strings by definition, and those surrounding quotes are thrown away when we read the values out of the column. So we need escaped quotation marks, \!”, before and after. So far we’ve got, for example on row 8704, ” \!”2200\!” “. But we also need \!u before the number, which is two more special characters that need escaping in JSL: backslash (\) and bang (!). To get the backslash, we need to use \!\, and to get the bang, we need \!!. The u is the easy part. So add \!\ and \!! and u before the 2200, and you’ve got ” \!”\!\\!!u2200\!” ”

But it gets worse! Now if you want to ask for markers from those values, you have to parse all that to get it back to “\!u2200”!

I know: “Yikes!”

To put it all together into a useful script, let’s create a data table, give it the 19560 rows we’ll need to see all the supported Unicode characters, and then set marker states For Each Row(). We’ll add a While() in the middle to make sure the data table finishes getting built before JMP tries to run the For Each Row(); if we don’t, we’ll get an error about arguments.

// a data table to see the entire set of standard markers
// and Unicode characters available for markers
New Table( "unicode characters",
  add rows( 19561 ),
  New Property( "plot my markers",
    Overlay Plot(
      X( :X ),
      Y( :Y ),
      Separate Axes( 1 ),
      SendToReport( Dispatch( {}, "Overlay Plot",
      FrameBox, {Frame Size( 1600, 1600 )} ) )
  New Column( "unicode",
    Formula( "\!"\!\\!!u" || Substr( Hex( Row(), "integer" ), 5, 4 ) || "\!"" )
  New Column( "X", Numeric, Continuous, Format( "Best", 10 ),
    Formula( Sequence( 1, 140, 1, 1 ) ) ),
  New Column( "Y", Numeric, Continuous, Format( "Best", 10 ),
    Formula( Sequence( 1, 140, 1, 140 ) ) )

// use While() to see when the data table is finished -
// else next line processes before the data table is built
// and you get error: "Argument should be row State{1}"
while(is missing(:Y[19561]), wait(.0001) );

// see all markers using unicode arguments
For Each Row( Row State( Row() ) = Marker State( Parse( :unicode ) ) );

// the equivalent command, using integer arguments
For Each Row( Row State( Row() ) = Marker State( Row() ) );

To set several aspect of one row’s state, add a teaspoon of Combined States()

For example, to set row 1 to a solid pink inverted triangle, stick both of the previous examples together with a Combined States() operator. This would be handy for a World War II historian plotting deaths in the Holocaust.

Row State( 1 ) = Combine States( Marker State( 21 ), Color State( 11 ) );

Or to set row 1 to a yellow (color #41) star of David (marker #10017), to plot the most horrendous statistics from the Holocaust, you could set:

Row State( 1 ) = Combine States( Marker State( 10017 ), Color State( 41 ) );

Some interesting Unicode characters to use as markers

Just scroll through the data table created above to see some Unicode characters that might be particularly useful for JMP graph markers, such as:

  • Greek and Coptic start at \u0370
  • Currency symbols start at \u20A0
  • Hebrew, Arabic, etc. start at \u0590
  • Fractions and roman numerals start at \u2150
  • Box elements and other geometric shapes and dingbats start at \u2500
  • Arrows, math operators, and technical symbols start at \u2190
  • Weather, pointing hand, smiley faces, gender, and astrology symbols start at \u2600
  • Chess pieces, card suits, musical symbols, recycling symbols, dice, monogram, digram, and trigram symbols start at \u2654
  • Braille starts at \u2800
  • Asian language characters start around \u2E80

Download the script

Don’t bother with copying and pasteing from the examples above—you can download a tidy sample script here:

  • [download id=”4″]

What did I forget?

What is still confusing about marker states? Post your questions and comments below, and I’ll do my best to address them in future posts (or in the comments, if they’re not too complex).

2 Comments : more...

The controversy about R: epic fail or epic success?

by on Apr.28, 2010 , under JMP & JSL

Statisticians and data analysts are in a kerfuffle about the recent remarks of AnnMaria De Mars, Ph.D. (President of The Julia Group and a SAS Global Forum attendee) in her blog that the open source statistical analysis tool R is an “epic fail,” or to put it in Twitterese, #epicfail:

I know that R is free and I am actually a Unix fan and think Open Source software is a great idea. However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail.

And oh, how the hashtags and comments and teeth-gnashing began!

Nathan Yau’s excellent FlowingData blog recaps the kerfuffle nicely, and his post has accumulated a thoughtful comments thread, as has Dr. De Mars’, to both of which I added my thoughts, expanded here:

To make my prejudices clear, I’ve spent several decades in commercial statistical software development (working in a variety of R&D roles at SYSTAT, StatView, JMP, SAS, and Predictum, and I now do custom JMP scripting, etc., for Global Pragmatica LLC.

I can say with hard-won authority that:

– good statistical software development is difficult and expensive
– good quality assurance is more difficult and expensive
– designing a good graphical user interface is difficult, and expensive
– a good GUI is worthwhile, because the easier it is to try more things, the more things you will try, &
– creative insight is worth a lot more than programming skill

Even commercial software tends to be under-supported, and I’ll be the first to admit that my own programming is as buggy as anybody else’s, but if I’m making life-and-death or world-changing decisions, I want to be sure that I’m not the only one who’s looked at my code, tested border cases, considered the implications of missing values, controlled for underflow and overflow errors, done smart things with floating point fuzziness, and generally thought about any given problem in a few more directions than I have. I want to know that when serious bugs are discovered, the knowledge will be disseminated and somebody’s job is on the line to fix them.

For all these reasons, I temper my sincere enthusiasm about the wide open frontiers of open source products like R with a conservative appreciation for software that has a big company’s reputation and future riding on its accuracy, and preferably a big company that has been in the business long enough to develop the paranoia that drives a fierce QA program.

R is great for what it is, as long as you bear in mind what it isn’t. Your own R code or R code that you find sitting around is only as good as your commitment to testing and understanding of thorny computational gotchas.

I share the apparently-common opinion that R’s interface leaves a lot to be desired. Confidentiality agreements prevent me from confirming or denying the rumors about JMP 9 interfacing with R, but I will say that if they turn out to be true, both products would benefit from it. JMP, like any commercial product, improves when it faces stiff competition and attends to it, and R, like most open source products, could use a better front end.

And now let me make my case for R being an epic success.

I like open source software. I use a bunch of it, and I do what I can for the cause (which isn’t much more than evangelism, unfortunately). For me, the biggest win with open source software is that it makes tools available to me, and others, who don’t need them enough to justify much of a price, but who can benefit from them when they’re affordable or free. When an open source tool gets something done for me, or eases some pain at least, I’m not that picky about its interface, and I’m willing to do my own validation (where applicable).

I can’t say that I love using Linux, but as a long-time UNIX geek and Mac OS X bigot, I am glad Linux is available, I use it for certain things, and I think it’s a whole lot better than Windows and other OSes, especially when Ubuntu builds work out. (I’ve had trouble getting JMP for Linux installed on Ubuntu, but that’s probably due to my own incompetence.) OpenOffice is kind of a pain, but it’s better than paying Microsoft for the privilege of enduring the epic fail that is Office, and it has much better support than Office for import/export of other formats. I love it that any number of open source projects are developing such fabulous tools as bzr version control, which I use daily, and that the FINK project is porting a whole bunch of great open source UNIX widgets to Mac OS X.

I think it’s wonderful that some of the world’s greatest analytical minds are using R to create publicly available routines for power-analysts. I love it that students and people who can’t afford commercial stats software, or who won’t use it enough to justify buying a license, have a high-quality open source option, if they’re willing to work at it a bit. I think it’s great that people who think Excel is good enough can’t make a price objection to upgrading to R.

I believe that democratizing innovation and proliferating analytical competence are good for us all. I count on projects like R and Linux to push commercial developers to make better products, and to force pricing and licensing of those products to remain reasonable. Monopolies are good for nobody, including monopolists.

Long live the proponents of R!

What do you think? Do you trust open source stats code? Do you think R’s interface is good enough? Is JMP’s any better? How heavily do you factor quality of documentation into decisions about software?

6 Comments :, , , , , , , , , more...

Putting disasters in perspective, or Our crappy economy isn’t so bad

by on Jan.13, 2010 , under JMP & JSL, random

Many people are depressed these days, for many valid reasons. The economy is still a disaster. Many of us are out of work and have been for a frighteningly long time. Many of us are clinging to scaled-back jobs. Many of us are worried about how long the work we’re grateful to have will last. When even the blue chip companies are slashing workforces and budgets and the banks themselves are declaring bankruptcy, we know our economy is a disaster.

Looking outside the devastated economy of the developed world, let’s consider the vastly greater struggles in the two-thirds world.

Terminology break! When people say “third world,” they mean “undeveloped or developing nations,” and these represent over two-thirds of the world’s population, so let’s stop saying that and say what we really mean: “two-thirds world.”

In the news today, hundreds of thousands of Haitians are believed dead after a major 7.0 earthquake hit, its epicenter right in the most populous part of an already fragile island. Most Haitians are black and live on less than US$1 a day. Putting this in perspective, fewer than 3000 people will killed in the horrifying 9/11 attacks. However, I fear that history will show the great failure of our humanity when the global public response to the crisis gets those metrics backwards.

Because I have spent several decades working in statistical software in various roles, I can’t help wanting to look at the desperation quantitatively. Here are two graphs that will probably startle most people—and, mind you, I mean the well-educated, privileged, mostly white people in the developed world who have the means to read my blog. First, let’s compare the death tolls from a handful of disasters that have filled our headlines in recent years. Before you look at the graph, which do you think was worse?

  • 9/11 terrorist attacks
  • Hurricane Katrina
  • Indian Ocean tsunami
  • Haiti earthquake
  • 2008 Earthquakes in the People’s Republic of China

And how do you think the economies of these places compare?

First, the scale of the disasters. For my North American readers: remember how devastated you felt watching the TV coverage of 9/11 and of Hurricane Katrina, please.

That’s right. The devastation of 9/11 and Katrina combined are trivial compared to any of the others.

Now let’s consider the economies of these places. Most of us know that USA’s wealth dwarfs that of most countries by most measures. A relevant measure for this situation would be the gross domestic product per capita–that is, the total economic output of each state or nation, divided by its number of people.


We all know that New York is wealthier than Louisiana, but did you realize that the New York-Louisiana comparison is almost meaningless in the big picture? Even the difference between those two tall bars dwarfs the size of the bars in the two-thirds world nations!

So now let’s put those two ideas together: let’s look at the wealth in each place lined up with the scale of the disaster in each place, as measured by GDP per capita

abilityRecoverThis composition of the most massive bloodbaths in big red bars lining up directly with the meager economic means of each place in tiny green bars is the most devastating graph of all. The biggest disasters have taken place where people are least prepared to cope with them.

There are many ways to help, and of course there are many craven imbeciles who take this opportunity to scam the people of goodwill with fraudulent donation methods. Here are some ways that have been vetted and determined to be reliable:

Here are some flaws in my analysis that could distract nitpickers from the clarion call to our humanity:

  • My national and state GDP data are from different years and sources, and they’re probably inflation-adjusted differently.
  • I’m considering these events to have taken place in New York, Louisiana, Indonesia, China, and Haiti, where the most deaths occurred, although other states and nations were affected.
  • The costs of 9/11 and Katrina were borne nationally, but the victims were (mostly) local, so I considered the state economies instead of the national economy.
  • Estimates of the death tolls in the two-thirds world are always much fuzzier, because the poorer you are, the less likely you are to be accurately counted.
  • Estimates of the death toll in Haiti are wildly premature. Some sources say “hundreds of thousands,” and while they might mean “100,000 give or take a few 10,000,” a careful speaker would mean the far more frightening “100,000 or 200,000 or 300,000” by that description.
  • It’s a little weird to measure ability to recover by comparing the GDP per person to the number of persons dead. The dead people are dead, and no amount of money will help them. But the people left behind are living in economies that are more or less capable of recovering.
  • These data are confounded, if you consider that poorer nations have a lesser ability to build safety into their communities. Wealthier nations have higher survival rates in times of disaster because their buildings are sturdier, more of their citizens live in buildings in the first place, their bridges and roads and so on are more prevalent and higher quality, their emergency responders are more numerous and better-equipped and -funded, and on and on and on. The ways in which wealth mitigates disaster and the lack of wealth compounds disaster are numerous and heartbreaking.

My data sources:

The analysis was my own, and I prepared all the graphs using JMP’s Graph Builder.

6 Comments :, , more...