Tag: JMP & JSL

Is South Carolina lazy? Um… no, not so much!

by on Aug.30, 2010 , under JMP & JSL

We recently had the priv­i­lege of help­ing out the good folks at Moore Data LLC with a SAS data import project. Scott Moore had noticed an arti­cle in the Post and Courier claim­ing that South Car­olina was the eighth lazi­est state in the US. He smelled a rat, though—as he put it, “Typ­i­cally, sub­jec­tive words used to describe data pop a red flag that warns me of impend­ing data mis­use doom.”

With a lit­tle SAS import­ing help from Global Prag­mat­ica LLC, Scott was able to peruse the data him­self in JMP, and he reached a rather dif­fer­ent conclusion!

Read Scott’s rebut­tal here.

Leave a Comment :, more...

How to parse a tricky date string

by on Jul.28, 2010 , under JMP & JSL

Recently on LinkedIn, Asit Rairkar asked about how to parse a tricky date string into a numeric date col­umn in JMP. He has date/time data as char­ac­ter data, but unfor­tu­nately it’s not for­mat­ted in one of the many date for­mats that JMP sup­ports directly. He wrote:

I have found that infor­mat does not work with all sorts of texted dates. e.g. I have had some tables with date codes as YYMMDDHHMMSS and I have had to man­u­ally seper­ate the 2 dig­its each and com­bine them again into YY/mm/dd hh:mm:ss for­mat to feed in. Not sure if there is a bet­ter way to deal with this type of input.

Writ­ing JSL to han­dle this sit­u­a­tion is con­cep­tu­ally sim­ple, but get­ting the JSL just right is tricky. The for­mat JMP sup­ports that comes clos­est to this for­mat is “yyyy-mm-ddThh:mm:ss,” so that’s what I decided to work with, but over­com­ing the lit­tle dif­fer­ences between JMP’s for­mat and Asit’s string takes some work. We have to doc­tor up the input string so that it matches the for­mat JMP can han­dle. We’ll need to:

  1. change the two-digit years to four-digit
  2. add the “-” delim­iter between year, month, and day
  3. add the “T” delim­iter between date and time
  4. add the “:” delim­iters between hours, min­utes, and seconds

Once we’ve got­ten the input string to match a JMP-supported for­mat, we put that inside Num() as the Formula() for a new numeric col­umn. We also have to spec­ify one of the date/time for­mats for the col­umn, and to keep things sim­ple, I’ll just use the “yyyy-mm-ddThh:mm:ss” for­mat again, but you could choose any format.

Here’s how it works

Sup­pose your data table looks like this:

You can gen­er­ate my exam­ple by run­ning this script:

dt = New Table( "tricky times",
  Add Rows( 4 ),
  New Column( "event",
    Character,
    Nominal,
    Set Values(
      {"Erin starts researching Asit's question", "Erin starts this blog post",
      "Erin wants a beer", "This blog post better be done"}
    )
  ),
  New Column( "when",
    Character,
    Nominal,
    Set Values( {"100726174324", "100728154315", "1007281730", "110728154315"} )
  )
)

So our first pro­gram­ming task is to con­vert a string like "100726174324" to a string like "2010-07-26T17:43:24", by per­form­ing the steps listed above. We could prob­a­bly do this most effi­ciently with a clever pattern-match or reg­u­lar expres­sion state­ment, but both of those strate­gies present some dis­ad­van­tages for a casual scripter:

  • they’re hard to learn, and it’s prob­a­bly too much trou­ble given how sel­dom a casual scripter needs to do this sort of thing
  • they’re hard to read and deci­pher later—somebody who’s got a lot of prac­tice at this kind of thing can write the pat­terns or reg­u­lar expres­sions fairly eas­ily and quickly, but fig­ur­ing out a pat­tern or reg­u­lar expres­sion that some­one else wrote (or that you your­self wrote a week ago) is much harder
  • they’re hard to debug
  • they’re hard to “bor­row from” later on when you encounter a prob­lem that’s slightly different

For these rea­sons, I’d rather solve this prob­lem with a for­mula that is per­haps inef­fi­cient but is easy for me to read and easy for me to adapt later on. It won’t run as fast, but the time I really want to save is my own—I don’t care so much, in this case, if JMP has to work a lit­tle harder. So, let’s take it one step at a time.

1. change the two-digit years to four-digit

This is poten­tially tricky. Are all of the years in this cen­tury, e.g. 20-something? Or do we have a mix of cen­turies? If they’re all the same, it’s easy. We just Concat()enate a “20” in front of the two year dig­its we do have. I like the short­hand || oper­a­tor bet­ter than the func­tion Concat(). We get those two year dig­its by tak­ing a Substr()ing of :when start­ing on the first char­ac­ter and con­tin­u­ing for two characters:

"20" || Substr( :when, 1, 2 )

If they’re a mix­ture of cen­turies, we’ll need to add some logic for guess­ing the cor­rect cen­tury. Using a tran­si­tion point is one way to do it. For exam­ple, if none of the dates are expected to be in the future or more than a hun­dred years in the past, we can safely assume that any­time the yy por­tion of the string is greater than the cur­rent year, we should prepend “19” rather than “20.” To do that, we need to com­pare the string’s yy, Substr( mydate, 1, 2 ), against today’s yy. But what’s today’s yy? Start by get­ting today, as a year: Year( Today() ) is the numeric value 10. Now con­vert that to a string with Char() so that we can use Substr() to grab just the last two characters:

Substr( Char( Year( Today() ) ), 3, 2 )

Now do the com­par­i­son inside an If(), and make the result "19" when yy is greater than the cur­rent year and "20" oth­er­wise. Concat() that result onto the yy:

If( Substr( mydate, 1, 2 ) > Substr( Char( Year( Today() ) ), 3, 2 ),
  "19",
  "20"
) || Substr( mydate, 1, 2 )

Whew! For­tu­nately the next few steps are easy.

2. add the “-” delim­iter between year, month, and day

We’ve already got the yyyy piece. Now we use Substr(:when, x, 2) a few more times with x being the first char­ac­ter of the mm and dd pieces. We insert a string "-" between each piece, stick­ing it all together with ||s:

 || "-" || Substr( :when, 3, 2 ) || "-" || Substr( :when, 5, 2 )

3. add the “T” delim­iter between date and time

That’s easy:

|| "T" ||

4. add the “:” delim­iters between hours, min­utes, and seconds

Same old same old! Just more Substr() and || action.

Substr( :when, 7, 2 ) || ":"
|| Substr( :when, 9, 2 ) || ":"
|| Substr( :when, 11, 2 )

Put it all together!

Those four steps have pro­duced a string for­mat­ted like "2010-07-26T17:43:24", and if we put all that inside Formula( Num( ) ) for a New Column com­mand, we’ll be cre­at­ing numeric date values.

New Column( "time",
  Format( "yyyy-mm-ddThh:mm:ss" ),
  Formula(
    Num(
      // convert what you have to something like "2010-07-26T17:43:24"
      // assume dates only in past, within last 100 years
      If( Substr( :when, 1, 2 ) > Substr( Char( Year( Today() ) ), 3, 2 ),
        "19",
        "20"
      ) || Substr( :when, 1, 2 ) || "-" || Substr( :when, 3, 2 )
        || "-" || Substr( :when, 5, 2 ) || "T" ||
      Substr( :when, 7, 2 ) || ":" || Substr( :when, 9, 2 )
        || ":" || Substr( :when, 11, 2 )
    )
  )
);

Now we have:

Notice that this tech­nique is even for­giv­ing of strings that are too short. I inten­tion­ally left off the ss part of a few of those strings, but since Substr(:when, 11, 2) just pro­duced an empty string "", the for­mula still worked. That’s an advan­tage we’ll lose in the next strategy.

Uh-oh! There’s a prob­lem in that last row! When I wrote my exam­ple time strings, I was think­ing of 2011, but our algo­rithm above assumes that all data are from the cur­rent year or ear­lier, so that date got con­verted to 1911 instead. My event descrip­tion needs improvement.

What if the dates are not in a data column?

Sup­pose the date value strings aren’t in a char­ac­ter col­umn of a data table—suppose they’re just string val­ues. In that case, we can use the In Format() func­tion to con­vert them. The guts of it are still the same, but this time we’re pars­ing a string stored in a global called mydate instead of rows in a column.

mydate = "100726174324";
mydateNum=Informat(
  // convert the string to "2010-07-26T17:43:24",
  // assume dates only in past, within last 100 years
  If( Substr( mydate, 1, 2 ) > Substr( Char( Year( Today() ) ), 3, 2 ),
    "19",
    "20"
  ) || Substr( mydate, 1, 2 ) || "-" || Substr( mydate, 3, 2 ) || "-"
    || Substr( mydate, 5, 2 ) || "T" || Substr( mydate, 7, 2 ) || ":"
    || Substr( mydate, 9, 2 ) || ":" || Substr( mydate, 11, 2 ),
  "yyyy-mm-ddThh:mm:ss" // or another format
);

Run­ning this script returns the numeric value 26Jul2010:17:43:24 and shows that in the log. Notice, how­ever, that when mydate = "1007281730";, miss­ing the ss part of the string, it doesn’t work, and we get a miss­ing value. Our ugly hand-made parser is actu­ally more for­giv­ing than JMP’s In Format() com­mand. But it’s not much better—if we were miss­ing any other part of the string than the part at the end, we’d get mixed-up results.

What pesky date, time, or dura­tion strings have vexed you?

The good folks at JMP keep adding sup­port for more and more date/time/duration for­mats, but it seems like there’s a never-ending sup­ply of crazy for­mats that ana­lysts encounter. Typ­i­cally the strings are com­ing into JMP from other pro­grams and from data-streaming devices, but they might also arrive when peel­ing data off web pages or deal­ing with data entry per­formed by other people.

What are the crazy for­mats that you’ve had to crunch into numeric val­ues in JMP?

What are the tricks you’ve devel­oped for crunch­ing them?

Please share your expe­ri­ences in the Com­ments! I’ll try to answer any ques­tions you leave there, too.

3 Comments :, , more...

How to change string columns to numeric date columns

by on Jul.22, 2010 , under JMP & JSL

Sudar­shan Krish­nan asked a ques­tion on the LinkedIn JMP Pro­fes­sional Net­work about how to change a string col­umn into a numeric date column:

What is the JSL com­mand to con­vert a char­ac­ter data type (“hr:m:s” –eg:“23:59:59″) to a Numeric, Con­tin­u­ous, dura­tion (“hr:m:s”) data type?

Dates and times in a char­ac­ter column

Let’s assume you have a data table includes date, time, or dura­tion val­ues as strings, in char­ac­ter columns. You might run into this sit­u­a­tion fre­quently when import­ing data from text files or from other pro­grams such as Microsoft Excel. The columns might look like this: Here’s a sim­ple JSL script you can run to cre­ate a data table like this one:

New Table( "sample",
	Add Rows( 4 ),
	New Column( "timeChar A",
		Character,
		Nominal,
		Set Values( {"23:59:59", "10:14:01", "22:01:03", "08:01:05"} )
	),
	New Column( "timeChar B",
		Character,
		Nominal,
		Set Values( {"23:59:59", "10:14:01", "22:01:03", "08:01:05"} )
	)
);

Con­vert date/time strings to numeric

You have two options for how to han­dle this, depend­ing on what your goals are. You can either make a new, numeric col­umn that con­verts the strings to date/time val­ues using a for­mula, or you can change the char­ac­ter col­umn directly into a numeric col­umn. Let’s try it each way.

Option A: make a new numeric column

To make a new numeric copy of the orig­i­nal col­umn, use the New Column() oper­a­tor with the argu­ments spec­i­fy­ing the new column’s name (“timeNum A”), data type (Numeric), the new for­mat desired (in this case “hr:m:s”), and a for­mula. The for­mula is sim­ply the orig­i­nal col­umn, :timeChar A, inside the Num() operator.

New Column( "timeNum A",
	Numeric,
	Format( "hr:m:s" ),
	Formula( Num( :timeChar A ) )
);

Now you have the orig­i­nal string col­umn, unchanged, plus a new numeric copy of that col­umn. Note that the new col­umn is linked to the orig­i­nal with a for­mula, so if you want to delete the orig­i­nal col­umn, you’ll get an error:

Can­not delete col­umn “timeChar A” while it is ref­er­enced by the for­mula of col­umn “timeNum A”. Remov­ing a for­mula leaves the data unchanged. Remov­ing ref­er­ences replaces each ref­er­ence with an empty value. These effects are per­ma­nent and can­not be undone.

Just click “Remove For­mula” to pro­ceed. Of course, if you no longer want the orig­i­nal char­ac­ter col­umn, you might as well go with Option B instead.

Option B: change a char­ac­ter col­umn to numeric directly

To change an exist­ing char­ac­ter col­umn into numeric, just send a series of mes­sages to the col­umn, first chang­ing the type to numeric, then spec­i­fy­ing a for­mat. You might also want to change the mod­el­ing type to con­tin­u­ous, and you might want to change the column’s name.

Column( "timeChar B" )
	<< data type( numeric )
	<< Format( "hr:m:s" )
	<< modeling type( continuous )
	<< set name( "timeNum B" );

After

Now the data table with two char­ac­ter columns should have one char­ac­ter col­umn, a sec­ond col­umn that was char­ac­ter and has been changed to numeric, and a third col­umn that is a numeric copy of the first col­umn:

1 Comment :, more...

One Proportion Test (99¢ JSL widget!)

by on Jul.21, 2010 , under scripts, widgets

One Pro­por­tion Test is a JMP script for per­form­ing a One Pro­por­tion Test quickly and easily.

Back­ground

This script is inspired by a recent ques­tion on the LinkedIn Group “JMP Pro­fes­sional Net­work” from Jen­nifer Atlas, Senior Busi­ness Devel­op­ment Coör­di­na­tor at Minitab, who asked:

I know I can cal­cu­late a sam­ple size for a one pro­por­tion test in JMP, but how do I test for 1 proportion?

Karen Copeland, Ph.D., from Boul­der Sta­tis­tics promptly gave a help­ful expla­na­tion of how to do it in JMP. JMP han­dles the prob­lem quite capa­bly, but you have to know where to find it and remem­ber a bunch of details about how to use it:

To test one pro­por­tion in JMP I use the dis­tri­b­u­tion plat­form. When you run a dis­tri­b­u­tion for a nom­i­nal vari­able then in the red drop down menu there is an option for test­ing probabilities.

Details:

If you have a data table with a col­umn for your pro­por­tion then you can pro­ceed straight to the dis­tri­b­u­tion plat­form. If not then first cre­ate a sum­mary table with two columns. The first col­umn would be your out­come such as Y/N or 0/1 and the he sec­ond col­umn would be your fre­quency col­umn of how many of each out­come you observed.

Test­ing of a proportion:

1. With a col­umn of out­comes: First make sure that your col­umn is of the nom­i­nal type. Sec­ond, use the dis­tri­b­u­tion plat­form to cre­ate a dis­tri­b­u­tion of the out­come. Under the red tri­an­gle drop down menu select “Test Prob­a­bil­i­ties” and a dia­log box will appear with var­i­ous options for test­ing your prob­a­bil­i­ties. Note, you need only fill in the one prob­a­bil­ity that you are inter­ested in testing.

2. With a sum­mary table for your out­comes: Again, make sure that your out­come col­umn is of the nom­i­nal type. Sec­ond, use the dis­tri­b­u­tion plat­form with your out­come col­umn as the Y and the fre­quency col­umn as the “Freq”. Then con­tinue as above by select­ing the test prob­a­bil­i­ties from the red tri­an­gle menu.

Note that you will also find con­fi­dence inter­vals for your pro­por­tions in the red drop down menu.

Boul­der Sta­tis­tics LLC and Global Prag­mat­ica LLC are strate­gic allies, pair­ing Boul­der Sta­tis­tics’ ana­lyt­i­cal exper­tise with Global Pragmatica’s JMP script­ing exper­tise to deliver out­stand­ing solu­tions to our clients. When I saw Karen’s expla­na­tion, I imme­di­ately thought this would be a great oppor­tu­nity to col­lab­o­rate with her and build a JSL wid­get to make this easier.

This script is the result, and Global Prag­mat­ica and Boul­der Sta­tis­tics are proud to make it avail­able free, in an encrypted (run-only) script you can down­load today.

How to use One Pro­por­tion Test

You must license or down­load a demo copy of JMP soft­ware from SAS to use this JMP add-on script.

Launch the script. You are asked to choose a data table, which can be set up either of the two ways Dr. Copeland describes above. A dia­log box requests the nec­es­sary col­umn assign­ments. Click OK. At the bot­tom, fill in the details of the One Pro­por­tion Test. Use the Start Over but­ton to restart the analy­sis with a dif­fer­ent data table. For fur­ther help, see text at the top of the win­dow and tooltips when hov­er­ing over buttons.

How encrypted scripts work (the free widget)

Encrypted scripts are run-only scripts. They are not human-readable, so you can’t mod­ify them or adapt them for other pur­poses. If you would like to study the JSL, mod­ify, or adapt the script, you should license the unen­crypted ver­sion of this script instead.

Why pay for the unen­crypted script?

Don’t! At least not right away… Start by “buy­ing” the free wid­get. Try it out and see if you like it!

If you do, you might want to buy the full-price unen­crypted script, so that you can mod­ify, cus­tomize, or adapt the script for your own spe­cial­ized needs, or so you can study a professional-quality script! This script demon­strates quite a few advanced script­ing tech­niques, including:

  • build­ing an ele­gant, all-in-one-window user inter­face using dis­play objects
  • attach­ing scripts to but­tons, radio but­tons, col­umn lists, etc.
  • includ­ing a cus­tom logo graphic
  • imple­ment­ing a dif­fer­ent dia­log box accord­ing to the user’s radio but­ton choice
  • open­ing, clos­ing, and delet­ing dis­play tree ele­ments dynamically
  • offer­ing tips and help right in the window
  • includ­ing email and web links for more infor­ma­tion, send­ing feed­back, etc.
  • hid­ing globals—whether to pro­tect intel­lec­tual prop­erty or to avoid clut­ter­ing up Show Globals() output
  • opti­miz­ing mem­ory man­age­ment and sim­pli­fy­ing between-use value-clearing by stor­ing problem-specific “glob­als” as entries in an Associative Array() instead of in globals

You can read more about encrypted vs. unen­crypted scripts offered by Global Prag­mat­ica here.

Com­pat­i­bil­ity

One Pro­por­tion Test has been tested on JMP 8 for Mac and Win­dows as well as cur­rent beta ver­sions of JMP 9 for Mac and Windows.

Known issues

  • In the sec­ond (no longer cur­rent) early adopter release of JMP 9 for Win­dows, you have to start the script man­u­ally. (This is not a prob­lem in JMP 8 or later beta releases of JMP 9.) There are sev­eral ways to do this:
    • press Ctrl-R, or
    • click the Run Script but­ton in the tool­bar, or
    • from the Edit menu, select Run Script

Buy now!

Run-only wid­get: $0.99
Editable script: $500.00

Note: be sure to choose either the $0 (wid­get) or $500 (script) option in the combo-box. For $0 orders, you can use the “No pay­ment needed” pay­ment method; for orders over $0, please use Google Check­out, a secure pay­ment gate­way for pur­chas­ing by credit card.

3 Comments :, more...

Should JMP scripts exploit data objects? Yes and no.

by on Jul.21, 2010 , under JMP & JSL

Answer­ing a LinkedIn group ques­tion about object ori­ented pro­gram­ming in JSL from Philip Brown, the always help­ful Mark Bai­ley wrote, “The best scripts exploit JMP objects such as data columns and platforms.”

I don’t quite agree.

In my opin­ion, Mark’s both right and wrong in say­ing that the best scripts exploit JMP objects like data columns and plat­forms. Yes, if a JMP data object or analy­sis plat­form already knows how to do some­thing you need, you shouldn’t be rein­vent­ing that wheel. BUT–and this is a big but–JMP’s data tables and their sub-objects (columns, rows, etc.) come with a heavy over­head cost, and this is where I think it’s some­times bet­ter to avoid using JMP’s object.

It all depends, of course. If you just need a quick result, by all means, let JMP do the work for you.

But if you need to do some­thing com­pu­ta­tion­ally inten­sive, you’re far bet­ter off grab­bing just what you need in a JSL data structure–a vec­tor, a matrix, a list, an asso­cia­tive array, what­ever is appropriate–and doing that com­pu­ta­tion out­side the data table. You’ll speed up your code, reduce the mem­ory con­sump­tion, and even avoid difficult-to-chase-down crashes.

Don’t for­get that most of JMP’s for­mula oper­a­tors can work on things other than data columns, too. I recently worked on a client’s script that was doing its own cal­cu­la­tions of means on lists of num­bers by adding up the val­ues, count­ing the num­ber of val­ues, and divid­ing the sum by that. It was cor­rect, of course, but it was SLOW. I got the script to run about twice as fast by using JMP’s column-wise mean oper­a­tor, Col Mean(), on data columns instead, and then I got it to run five times faster still by using JMP’s reg­u­lar Mean() oper­a­tor on vec­tors instead of data columns.

Why did this change give me a ten­fold speed improve­ment? Two reasons.

  1. JMP’s inter­nal cal­cu­la­tion of means is more effi­cient than any­thing we can build in JSL—JMP’s devel­op­ers have always opti­mized their code so that JMP’s numer­i­cal com­pu­ta­tions are done lickedy-split.
  2. Those cal­cu­la­tions hap­pen much faster when you don’t run them through the data table, which is a com­plex object with a lot of inter­nal and exter­nal dependencies—and a big, fancy win­dow that needs to be repainted when­ever the data change. Window-painting alone can cause sig­nif­i­cant script slowdowns.

Another more impor­tant rea­son to use JMP’s oper­a­tors is that JMP’s devel­op­ers have already thought through pesky details like the proper han­dling of miss­ing val­ues, under­flow and over­flow errors, and other arcana of numer­i­cal com­pu­ta­tion that can cause reasonable-looking cal­cu­la­tions to get wrong results.

I’m not bash­ing data tables, by the way. It’s amaz­ing what JMP’s data tables can do for us, with all their table and col­umn prop­er­ties, row states, for­mu­las, dynamic links to graphs, and so on. All that power comes at a cost, though, and basic numer­i­cal com­pu­ta­tions will go faster when they’re not manip­u­lat­ing com­plex data objects.

Bot­tom line: if you’re tak­ing advan­tage of the rich fea­tures in JMP’s data tables, use data tables, but if you’re just doing some cal­cu­la­tions, and speed is an issue, then do the cal­cu­la­tions out­side the data table. In either case, take advan­tage of JMP’s built-operators as much as possible.

Leave a Comment :, more...

Artwork

Global Pragmatica’s art­work includes paint­ings by Zsuzsi Saper and dig­i­tal pho­tographs by Erin Vang. Fur­ther notes on spe­cific pieces of art are given at the bot­tom of pages in which they appear. All art­work is copy­right 2009–2010 by Global Prag­mat­ica LLC®. All rights reserved worldwide.

© 2009–14 Global Pragmatica LLC®

All con­tent © 2009–14 by Global Prag­mat­ica LLC®. All rights reserved worldwide.

Global Prag­mat­ica LLC® is a reg­is­tered trade­mark of Global Prag­mat­ica LLC. The ® sym­bol indi­cates USA trade­mark registration.

Contact Global Pragmatica LLC®

info@globalpragmatica.com
+1 415.997.9671
Oak­land, CA 94611