JMP & JSL
Lessons from smart JSL mistakes
by Erin Vang on Jan.11, 2011 , under JMP & JSL
This is a follow-up to my previous post, “A script to check and change a JMP preference ($2).”
Background
Marianne Toft asked the JMP scripting group on LinkedIn a question about managing preferences with JSL:
Does anybody know how to read a specific setting from the Preferences? I need to read the status of ‘Use JMP locale settings’ and if 0 then change to 1 and give the user a message to restart JMP.
A $2 solution
I wrote a $2 script that solves the problem, and I sent Marianne a coupon code for a free copy, in thanks for posting such a thought-provoking JSL challenge.
My strategy is described in detail in that previous post, if you’d like to try writing it yourself with a bunch of pointers.
It’s also available as a $2 download if you’d rather just get the final code, fully commented and tested on JMP 8 and 9 on both Windows and Mac. One reason to consider buying this script instead of a tall drip is that the script is abstracted to check and change any preference, not just Use JMP Locale Settings( 0|1 ). That abstraction reveals some scripting tricks that might be helpful in your other scripting projects.
I’ll be delighted if someone else comes up with a more elegant solution, because mine is brute force. That’s because JMP doesn’t return the current preferences as an expression or a string or anything else you could potentially operate on with JSL. It just echoes the preferences to the Log window, so you need to read the text of the Log window and deal with that. If I’m missing something here, someone please enlighten me!
Several smart mistakes and lessons to learn from them
Several other people wrote helpful suggestions. Unfortunately, their code didn’t work, and we can all learn some important things about JSL by studying their smart mistakes.
Mark Anawi suggested using Show Preferences(Use JMP Locale Settings) )
to see the setting for just that preference. It’s a good guess at an argument for the Show Preferences()
command, but it’s not actually a supported argument. It doesn’t work. You’ll just get the usual output for Show Preferences() without an argument.
Lesson: when you supply an argument JMP doesn’t understand, sometimes you get errors, but a lot of the time it’s just ignored, silently. You get no indication that you did something that doesn’t work. The only way you can be sure it was ignored is to try it again without the argument, or with a different argument, and check whether anything changed.
Peter Wiebe wrote an elegant solution that uses a For()
loop to step through each Arg()
of an expression, looking for and if necessary changing the preference in question. His code is concise and beautiful:
jmp_rulz = getpreferences(); For( i = 1, i <= N Arg( jmp_rulz ), i++, If( Contains( Char( Arg( jmp_rulz, i ) ), "Use JMP Locale Settings(0)" ), Preferences( Use JMP Locale Settings( 1 ) ); New Window( "Restart JMP", Panel Box( "Restart JMP", Button Box( "OK", << closewindow ) ) ) ) );
His code is far more elegant than mine. It’s an efficient strategy! Unfortunately, it doesn’t work, for several reasons.
Get Preferences()
isn’t a command in JMP 8 or 9, so the first line returns an error:
Name Unresolved: get preferences in access or evaluation of 'get preferences' , get preferences() In the following script, error marked by /*###*/ ::jmp_rulz = get preferences() /*###*/
I assume that was just a typo, though. Peter meant Show Preferences()
, but as I warned at the outset, that command doesn’t return an expression. It just echoes text to the Log window. So, after running the first line (with “get” changed to “show”), jmp_rulz contains . (missing).
Therefore, N Arg(jmp_rulz)
i
is never < 0, so the For()
loop never runs.
I like Peter’s script better than mine. It should have worked—Show Preferences()
really ought to return an expression so that his For()
loop can step through it and look for the specific preference in question. Unfortunately it doesn’t. I hope our heroes in JMP R&D will consider improving this in a future version.
Why did Peter think this worked? Probably because when he ran it, he didn’t get any error messages. I didn’t either, and at first I was trying to figure out why his script worked when I’d already proven to myself that Show Preferences()
doesn’t return anything into the jmp_rulz
global.
He probably also checked the Log window after running it to confirm that the preference was on, saw that it was, and figured that it must have worked. He wouldn’t have expected to see his “Restart JMP” window unless it the setting was off to begin with, so he probably figured it had already been on.
Peter’s a clever scripter, by the way—whenever he posts something on LinkedIn, I make sure to read it, because I know I’ll learn something.
Lesson: Not getting error messages in the Log window doesn’t mean your script works.
Why should we care more than “Oops!”?
OK, we all make mistakes, especially when we’re writing JSL. (We do, right? Please tell me it’s not just me!) No big deal.
So why am I writing about these mistakes? Believe me, I’m not doing this to make anyone feel bad! They were both great suggestions, and they looked like they worked. Both Mark and Peter had good reasons to think they were correct.
I see some important lessons for JMP scripters.
Be aware that JMP has a lot of silent error conditions
Mark’s command Show Preferences(Use JMP Locale Settings)
didn’t produce any error messages, and it did echo the current settings to the Log, so it was reasonable for Mark to assume that putting a specific preference as an argument to the command was a valid syntax variation that would show the setting of that specific preference.
This is a type of mistake I make all the time: I come up with a syntax improvement in my imagination or in my failing memory, one that isn’t real, and it seems to work, so I think it is real.
But it’s not. There are a lot of situations where JMP is silent about errors in JSL. Sending an unrecognized argument to a command is a typical example. Another is when you put something that needs to be evaluated—say, an arithmetic expression to set a parameter from a global—inside an object message or platform launch command, and it turns out that the particular message or command you’re using is not actually capable of evaluating its arguments. (JMP is inconsistent that way, unfortunately. Sometimes you can get away with it, sometimes you can’t, and you can only find out by testing.)
Be sure you know which kinds of errors are silent (and deadly)
This article would go on for a hundred pages if I tried to list every possible error and whether JMP will report an error message or not, but here are some of the most important things to know:
Some typical mistakes that usually produce error messages:
- Operators that don’t exist, e.g.
Until()
instead ofWhile()
- Globals that don’t exist or haven’t been assigned values yet (or have had their values cleared with
Clear Globals()
- Using
=
instead of==
as a conditional test, e.g.If(a=1, write("a"), write("not a"))
would assign 1 toa
, but JMP alerts you to the fact that you probably meant to test whethera
equals 1 witha==1
Some typical mistakes that do not produce error messages:
- Messages that don’t exist, e.g.
Distribution[1]<<close;
instead ofDistribution[1]<<close window;
just silently pretends to run, returning the name of the object (in this example,Distribution[]
) as if it had done something to it - Arguments that don’t exist, e.g.
Show Preferences(Initial Splash Window)
instead ofShow Preferences(All)
simply runs as if you hadn’t included the unrecognized argument - Scoping mistakes, e.g. inside
Formula()
, any variable that you don’t explicitly scope, such as by putting::
before a global’s name, is going to be interpreted as a column name, and unless that leads to an error condition or some kind of unexpected behavior that you notice, you will never realize your mistake - Conditional mistakes, e.g.
For()
loops or other loops that never need to iterate simply don’t iterate, and they don’t tell you that they haven’t—unless you write some code to report on the iteration, e.g. you could set some global inside theFor()
loop and then check the global after theFor()
loop
Is it a bug when JMP doesn’t report an error?
Not necessarily.
In the case of invented messages not being flagged, my opinion is yes, that should be reported as an error. It’s too dangerous to think you’re accomplishing something and never realize JMP’s just ignoring you. But consider what the JMP developers are up against: JSL is so flexible that you can send lists of messages like obj << {mesg1, mesg2, …, mesgN}
, and you can send messages to an obj << mesg
group, and that’s just for starters. It would be hard to make the error-checking and reporting bulletproof.
What about invented arguments? Well, again, I’d say yes, if JMP can’t interpret something, it should say so. But again, consider what the developers are up against: most, but not all, commands in JSL can take arbitrarily complex expressions inside their parentheses. For example, myList={}; InsertInto(myList, 4)
is a straightforward way to put something in a list, but you could also have it calculate the thing it needs to insert, e.g. myList={}; InsertInto(myList, 1+3)
. That calculation could be incredibly complex, involving If()
tests and For()
loops and all kinds of insanity. Any number of things could go wrong inside those calculations. How is JMP’s InsertInto()
supposed to know that the problem is you made up an argument and not that you attempted a calculation that didn’t work out?
What about unscoped, mis-scoped, or insufficiently scoped variables? Hard to say. The name-resolution and scoping details of JSL keep improving with each new version of JMP, so that careful scripters can get better and better control of these kinds of problems, but for most scripters (and I include myself), the bigger problem is that we don’t realize when we’re not scoping carefully enough. For example, inside a Formula
expression, any variables that aren’t scoped otherwise will get interpreted as operators (if one by that name exists) or as columns (if one by that name exists). If either of those exist, then the global you thought you were calling is not going to come into play. For example:
x=1; column("Age Next Year") << set formula(x + :Age)
That should add 1 to each Age column’s value and put it in the Age Next Year column, right? Right. As long as you don’t also have a column named “x”! If you do, it’s going to add the value in the Age column to the value in the x column and put it in the Age Next Year column; more precisely, it will do that for each row.
Or what about this:
For=1; column("Age Next Year") << set formula(x + :Age)
Will that work? Well, if you want to play along, open good old Big Class.jmp, create an Age Next Year column, and try these two scripts. First try the one with x=1
. That’ll work fine since Big Class doesn’t have an x column. Now try the one with For=1
.
What happens?
Do you think it worked? Well, you might—after all, the Age Next Year column has the right answers in it, and you didn’t get an error message in the Log.
But did it? Let’s check more carefully. Change the first bit to For=5
and run it again. Do you get values like 17, 18, 19 as you should?
No!
Do you get error messages?
No!
What’s going on? Remember, For
is an operator! JMP sees a For()
operator even though its ()
parentheses and arguments aren’t there. Since For
has no arguments, it doesn’t do anything, and then it can’t add :Age to it because there’s no “it” to add it to, so the column is unchanged. JMP doesn’t think anything is wrong, because it didn’t see anything it couldn’t interpret. It knows For()
! It just didn’t realize that by For
what you really meant was 5!
The first time, when For=1
, we didn’t realize the problem, because the Age Next Year column already had the answers we expected and we didn’t have any way of knowing they didn’t change because of an error rather than because the answers were the same. The second time, when we were expecting different answers, we had a chance of noticing our mistake—but only if we checked those answers!
What do you do if you actually do want to use a global that shares a name with an operator or a column? You scope it. Change it to ::For
inside the Formula()
, and you’re back in business.
So, it’s not clear that JMP is in the wrong when errors don’t get messages. It’s not necessarily a bug, although it certainly can be frustrating! We can all hope that these kinds of errors will be trapped better in future versions of JMP, but in the meantime, we need to take that responsibility ourselves.
Tips on interpreting error messages
JMP’s error reporting is quite helpful, but it can be frustrating at times. Here are a few tips:
- Sometimes it’s hard to find the
/*###*/
in the Log to see where your error is. Don’t forget about Edit/Find, or the keyboard shortcut Ctrl-F or Command-F! - Sometimes the
/*###*/
will be in a misleading place. Start by investigating the thing that comes right before it, but if you’re sure that’s not in error, see if you can find some other problem that’s causing that thing to get read in some way other than what you meant, e.g. if you had a comma instead of a semicolon, it might be getting read as the “else” action in anIf()
, where you meant it to be an extra command in the “then” action. - When you get error messages about an unexpected or a missing
)
or,
(parentheses or commas), if the/*###*/
marker doesn’t help you, try getting some help from the Reformat Script utility. Try reformatting the whole script, and see where the cursor sits after you get the error message again. If you don’t see the problem nearby, try selecting chunks of the script and reformatting those. Keep going until you find the chunk of script that won’t format; your problem is inside that chunk somewhere. - Notice that the Reformat Script utility does helpful auto-indenting of your code, so that things happening inside other things will be indented further than their containers, and the closing
)
will appear vertically-aligned to the command and opening(
that JMP thinks it corresponds to. Make sure that JMP’s alignment of)
s matches what you think you’re doing. Getting messed up inside the parentheses is an especially common mistake when you’re writing custom display boxes. - Also try using the fence-matching feature: double-click on a parenthesis, and the Script Editor should highlight everything up to the matching parenthesis. See if it matches what you think it should; if it doesn’t, you’ve got one missing or extra somewhere. Note: one problem with the Script Editor’s fence-matching is that it won’t always select all the way up to a matching parenthesis; it’ll just do nothing when you double-click. This seems to happen when the chunk is too big—if the matching parenthesis is scrolled too far off-screen, JMP doesn’t even try. I hope this will be improved in a future version of JMP. In this situation, using an external programmer’s editor (such as jedit on Windows or BBEdit on Mac) can be helpful.
Be sure to test that your code is doing something
The smart JSL mistakes we saw above didn’t report errors, so the scripters didn’t realize anything was wrong. This brings up an important point: you need to test not only that the results you get are correct, but also that you’re getting results!
Mark’s error was to invent an argument. JMP’s response was to ignore the fictional argument. So how did he know it was a nonexistent argument and not just an undocumented one? The only way to be sure is to try the command twice—first without the argument, and again with the argument—and see if the results actually differ. If they don’t, that argument isn’t doing anything. Either it doesn’t exist, or you’re making some kind of a mistake in how you’re using it.
Peter’s mistake was not checking that his For()
loop actually did any iterations. Since NArg()
was zero, his i
counter that started at 1 was never less than zero, so the loop didn’t do a single iteration. JMP skipped right over it. There are several ways to test for this kind of problem. The easiest way is just to put Write()
statements inside the For()
loop that will report what’s going on to the Log window. As long as you make sure to look at the Log and see that the messages you expect are showing up, that’s good enough.
But if your For()
loop is happening as part of a larger program, you might not think to look for the absence of a message, right? In that case it would be better to lay a trap. For example, you could set a global right before the For()
loop, do something to that global inside the For()
loop, and then test the global after the For()
loop. For example:
test=0; For( i=1, i<6, i++, [ do whatever you need to do here ]; test++; ); if( test==0, throw( "The For loop did not iterate" ));
(Note: it wouldn’t necessarily be a mistake for a For()
loop never to iterate. There might come a situation where you want to write an iteration test that takes advantage of For()
‘s ability to decide whether to iterate or not.)
What are your issues with JSL errors?
What kinds of silent-but-deadly errors have you noticed? What kind of errors have you found it hard to track down? What tips do you have for trapping silent errors, or figuring out reported errors?
Please ask your questions and share your experiences in the Comments below. We’ll be happy to follow-up with replies to your comments or at greater length in future blog posts.
A script to check and change a JMP preference
by Erin Vang on Dec.23, 2010 , under JMP & JSL
Marianne Toft asked the JMP scripting group on LinkedIn a question about managing preferences with JSL:
Does anybody know how to read a specific setting from the Preferences? I need to read the status of ‘Use JMP locale settings’ and if 0 then change to 1 and give the user a message to restart JMP.
Several people had helpful suggestions, but unfortunately it’s just not a trivial problem. It’s a surprisingly tricky problem.
Challenges
First, by default Show Preferences()
only shows the preferences that you’ve taken to change. You need to use Show Preferences(all)
to see the defaults listed, too.
Second, Show Preferences(SomeSpecificPreferenceHere)
, although a good guess, doesn’t actually work. You’ll just get the usual output for Show Preferences() without an argument.
Lesson: when you supply an argument JMP doesn’t understand, sometimes you get errors, but a lot of the time it’s just ignored, silently. You get no indication that you did something that doesn’t work. The only way you can be sure it was ignored is to try it again without the argument, or with a different argument, and check whether anything changed.
Third, Show Preferences( [all] )
doesn’t return an expression. It returns missing. The information you need is echoed to the Log window, but it’s not returned as a string or an expression or a list or anything else you’d be able to do some work on. So you need to overcome that.
How to solve the basic problem
In JMP 9, you can Get Log(), which returns a list of strings, one for each line of whatever’s in the log. So to work efficiently, you need to Clear Log()
first, then Show Preferences(all)
, then do Get Log()
.
Well! Now you have a list of strings, and you need to find the item in that list that is relevant, so you need to put a Contains()
test inside a For()
loop.
It’s a string, so you have to transform it into an expression with Parse()
and some other bookkeeping, so that you can probe its argument with Arg()
.
Then check whether the argument is the desired one, and if not, change it, by editing the expression with Substitute()
and then immediately doing Eval()
on the result of that.
Now we need to let the user know we did something, and perhaps that they need to restart. So, put up a New Window()
with a Text Box()
saying that a preference has changed and they need to restart. Include a Button Box()
for “OK” that closes the alert window.
So, that’s the gist of the strategy.
Abstract the script
But, why write all this just for this one specific preference? Why not write it to check and if necessary change any preference?
So, start out with a couple globals that set the preference and the desired value.
But wait! Marianne’s example involves a numeric argument, but other preferences have text arguments; consider Use JMP Locale Settings(0)
vs. Report Table Style("Plain")
. So we need to make sure our script can work properly whether the particular preference takes a numeric or string value.
And while Marianne’s example requires a restart for the changed preference to take effect, most preferences do not. Might as well set a restartNeeded=1
if it’s important, or 0
if it’s not.
In that alert window, we only need to tell them to restart if it’s important, not if it’s not, so let’s add an If Box()
around the Text Box()
that only shows the instruction if it’s needed.
If the preference was what we wanted in the first place, we should probably echo something to the Log window, just so we know it ran, if nothing else. We need a Write()
statement for that.
Which version of JMP?
But wait, there’s more! The Get Log()
command didn’t come along until JMP version 9. If you haven’t been able to upgrade yet, then you need to take another strategy to this whole problem.
There might be a better way to do this, but the only way I could find was to Save Log()
to a text file, then Load Text File()
to a string, then use Substr()
and a bunch of arguments with Contains()
to chop out the single preference we care about. From there, the script runs about the same way.
And we need to put all this inside a test for JMP Version()
, which returns a string, and usually we need to Trim()
it and then do an inequality test, e.g. If( Trim( JMP Version() ) > "9", …)
, where the first argument is the Get Log()
strategy described first, and the second argument uses the Save Log()
strategy.
What about nested preferences?
Yet another problem! The script so far deals with top-level preferences with simple, single arguments, either numeric or character. But some preferences are trickier, e.g.,
Preferences( Fonts( English( Monospaced Font("Courier", 10) ) ) );
In this case, the Parse()
line will need to be more elaborate, and you’ll have to nest more Arg()
s around it to dig down to the particular setting value of interest. Conditioning for such a situation is left as an exercise for the reader, primarily because it’s a hassle, it’s not what Marianne needs, and the syntax of the Preferences()
command in JMP is arbitrary enough that we can’t really write code today that will be future-proof; we don’t know how deeply nested a given preference might be, nor what kind of arguments it might expect.
Buy now!
Contact me if you’re interested in purchasing this script or commissioning one like it.
Is South Carolina lazy? Um… no, not so much!
by Erin Vang on Aug.30, 2010 , under JMP & JSL
We recently had the privilege of helping out the good folks at Moore Data LLC with a SAS data import project. Scott Moore had noticed an article in the Post and Courier claiming that South Carolina was the eighth laziest state in the US. He smelled a rat, though—as he put it, “Typically, subjective words used to describe data pop a red flag that warns me of impending data misuse doom.”
With a little SAS importing help from Global Pragmatica LLC, Scott was able to peruse the data himself in JMP, and he reached a rather different conclusion!
Read Scott’s rebuttal here.
How to parse a tricky date string
by Erin Vang on Jul.28, 2010 , under JMP & JSL
Recently on LinkedIn, Asit Rairkar asked about how to parse a tricky date string into a numeric date column in JMP. He has date/time data as character data, but unfortunately it’s not formatted in one of the many date formats that JMP supports directly. He wrote:
I have found that informat does not work with all sorts of texted dates. e.g. I have had some tables with date codes as YYMMDDHHMMSS and I have had to manually seperate the 2 digits each and combine them again into YY/mm/dd hh:mm:ss format to feed in. Not sure if there is a better way to deal with this type of input.
Writing JSL to handle this situation is conceptually simple, but getting the JSL just right is tricky. The format JMP supports that comes closest to this format is “yyyy-mm-ddThh:mm:ss,” so that’s what I decided to work with, but overcoming the little differences between JMP’s format and Asit’s string takes some work. We have to doctor up the input string so that it matches the format JMP can handle. We’ll need to:
- change the two-digit years to four-digit
- add the “-” delimiter between year, month, and day
- add the “T” delimiter between date and time
- add the “:” delimiters between hours, minutes, and seconds
Once we’ve gotten the input string to match a JMP-supported format, we put that inside Num()
as the Formula()
for a new numeric column. We also have to specify one of the date/time formats for the column, and to keep things simple, I’ll just use the “yyyy-mm-ddThh:mm:ss” format again, but you could choose any format.
Here’s how it works
Suppose your data table looks like this:
You can generate my example by running this script:
dt = New Table( "tricky times", Add Rows( 4 ), New Column( "event", Character, Nominal, Set Values( {"Erin starts researching Asit's question", "Erin starts this blog post", "Erin wants a beer", "This blog post better be done"} ) ), New Column( "when", Character, Nominal, Set Values( {"100726174324", "100728154315", "1007281730", "110728154315"} ) ) )
So our first programming task is to convert a string like "100726174324"
to a string like "2010-07-26T17:43:24"
, by performing the steps listed above. We could probably do this most efficiently with a clever pattern-match or regular expression statement, but both of those strategies present some disadvantages for a casual scripter:
- they’re hard to learn, and it’s probably too much trouble given how seldom a casual scripter needs to do this sort of thing
- they’re hard to read and decipher later—somebody who’s got a lot of practice at this kind of thing can write the patterns or regular expressions fairly easily and quickly, but figuring out a pattern or regular expression that someone else wrote (or that you yourself wrote a week ago) is much harder
- they’re hard to debug
- they’re hard to “borrow from” later on when you encounter a problem that’s slightly different
For these reasons, I’d rather solve this problem with a formula that is perhaps inefficient but is easy for me to read and easy for me to adapt later on. It won’t run as fast, but the time I really want to save is my own—I don’t care so much, in this case, if JMP has to work a little harder. So, let’s take it one step at a time.
1. change the two-digit years to four-digit
This is potentially tricky. Are all of the years in this century, e.g. 20-something? Or do we have a mix of centuries? If they’re all the same, it’s easy. We just Concat()
enate a “20” in front of the two year digits we do have. I like the shorthand ||
operator better than the function Concat()
. We get those two year digits by taking a Substr()
ing of :when
starting on the first character and continuing for two characters:
"20" || Substr( :when, 1, 2 )
If they’re a mixture of centuries, we’ll need to add some logic for guessing the correct century. Using a transition point is one way to do it. For example, if none of the dates are expected to be in the future or more than a hundred years in the past, we can safely assume that anytime the yy portion of the string is greater than the current year, we should prepend “19” rather than “20.” To do that, we need to compare the string’s yy
, Substr( mydate, 1, 2 )
, against today’s yy
. But what’s today’s yy
? Start by getting today, as a year: Year( Today() )
is the numeric value 10
. Now convert that to a string with Char()
so that we can use Substr()
to grab just the last two characters:
Substr( Char( Year( Today() ) ), 3, 2 )
Now do the comparison inside an If()
, and make the result "19"
when yy
is greater than the current year and "20"
otherwise. Concat()
that result onto the yy
:
If( Substr( mydate, 1, 2 ) > Substr( Char( Year( Today() ) ), 3, 2 ), "19", "20" ) || Substr( mydate, 1, 2 )
Whew! Fortunately the next few steps are easy.
2. add the “-” delimiter between year, month, and day
We’ve already got the yyyy
piece. Now we use Substr(:when, x, 2)
a few more times with x
being the first character of the mm
and dd
pieces. We insert a string "-"
between each piece, sticking it all together with ||
s:
|| "-" || Substr( :when, 3, 2 ) || "-" || Substr( :when, 5, 2 )
3. add the “T” delimiter between date and time
That’s easy:
|| "T" ||
4. add the “:” delimiters between hours, minutes, and seconds
Same old same old! Just more Substr()
and ||
action.
Substr( :when, 7, 2 ) || ":" || Substr( :when, 9, 2 ) || ":" || Substr( :when, 11, 2 )
Put it all together!
Those four steps have produced a string formatted like "2010-07-26T17:43:24"
, and if we put all that inside Formula( Num( ) )
for a New Column
command, we’ll be creating numeric date values.
New Column( "time", Format( "yyyy-mm-ddThh:mm:ss" ), Formula( Num( // convert what you have to something like "2010-07-26T17:43:24" // assume dates only in past, within last 100 years If( Substr( :when, 1, 2 ) > Substr( Char( Year( Today() ) ), 3, 2 ), "19", "20" ) || Substr( :when, 1, 2 ) || "-" || Substr( :when, 3, 2 ) || "-" || Substr( :when, 5, 2 ) || "T" || Substr( :when, 7, 2 ) || ":" || Substr( :when, 9, 2 ) || ":" || Substr( :when, 11, 2 ) ) ) );
Now we have:
Notice that this technique is even forgiving of strings that are too short. I intentionally left off the ss
part of a few of those strings, but since Substr(:when, 11, 2)
just produced an empty string ""
, the formula still worked. That’s an advantage we’ll lose in the next strategy.
Uh-oh! There’s a problem in that last row! When I wrote my example time strings, I was thinking of 2011, but our algorithm above assumes that all data are from the current year or earlier, so that date got converted to 1911 instead. My event description needs improvement.
What if the dates are not in a data column?
Suppose the date value strings aren’t in a character column of a data table—suppose they’re just string values. In that case, we can use the In Format()
function to convert them. The guts of it are still the same, but this time we’re parsing a string stored in a global called mydate
instead of rows in a column.
mydate = "100726174324"; mydateNum=Informat( // convert the string to "2010-07-26T17:43:24", // assume dates only in past, within last 100 years If( Substr( mydate, 1, 2 ) > Substr( Char( Year( Today() ) ), 3, 2 ), "19", "20" ) || Substr( mydate, 1, 2 ) || "-" || Substr( mydate, 3, 2 ) || "-" || Substr( mydate, 5, 2 ) || "T" || Substr( mydate, 7, 2 ) || ":" || Substr( mydate, 9, 2 ) || ":" || Substr( mydate, 11, 2 ), "yyyy-mm-ddThh:mm:ss" // or another format );
Running this script returns the numeric value 26Jul2010:17:43:24
and shows that in the log. Notice, however, that when mydate = "1007281730";
, missing the ss
part of the string, it doesn’t work, and we get a missing value. Our ugly hand-made parser is actually more forgiving than JMP’s In Format()
command. But it’s not much better—if we were missing any other part of the string than the part at the end, we’d get mixed-up results.
What pesky date, time, or duration strings have vexed you?
The good folks at JMP keep adding support for more and more date/time/duration formats, but it seems like there’s a never-ending supply of crazy formats that analysts encounter. Typically the strings are coming into JMP from other programs and from data-streaming devices, but they might also arrive when peeling data off web pages or dealing with data entry performed by other people.
What are the crazy formats that you’ve had to crunch into numeric values in JMP?
What are the tricks you’ve developed for crunching them?
Please share your experiences in the Comments! I’ll try to answer any questions you leave there, too.
How to change string columns to numeric date columns
by Erin Vang on Jul.22, 2010 , under JMP & JSL
Sudarshan Krishnan asked a question on the LinkedIn JMP Professional Network about how to change a string column into a numeric date column:
What is the JSL command to convert a character data type (“hr:m:s” –eg:”23:59:59″) to a Numeric, Continuous, duration (“hr:m:s”) data type?
Dates and times in a character column
Let’s assume you have a data table includes date, time, or duration values as strings, in character columns. You might run into this situation frequently when importing data from text files or from other programs such as Microsoft Excel. The columns might look like this: Here’s a simple JSL script you can run to create a data table like this one:
New Table( "sample", Add Rows( 4 ), New Column( "timeChar A", Character, Nominal, Set Values( {"23:59:59", "10:14:01", "22:01:03", "08:01:05"} ) ), New Column( "timeChar B", Character, Nominal, Set Values( {"23:59:59", "10:14:01", "22:01:03", "08:01:05"} ) ) );
Convert date/time strings to numeric
You have two options for how to handle this, depending on what your goals are. You can either make a new, numeric column that converts the strings to date/time values using a formula, or you can change the character column directly into a numeric column. Let’s try it each way.
Option A: make a new numeric column
To make a new numeric copy of the original column, use the New Column()
operator with the arguments specifying the new column’s name (“timeNum A”), data type (Numeric), the new format desired (in this case “hr:m:s”), and a formula. The formula is simply the original column, :timeChar A, inside the Num()
operator.
New Column( "timeNum A", Numeric, Format( "hr:m:s" ), Formula( Num( :timeChar A ) ) );
Now you have the original string column, unchanged, plus a new numeric copy of that column. Note that the new column is linked to the original with a formula, so if you want to delete the original column, you’ll get an error:
Cannot delete column “timeChar A” while it is referenced by the formula of column “timeNum A”. Removing a formula leaves the data unchanged. Removing references replaces each reference with an empty value. These effects are permanent and cannot be undone.
Just click “Remove Formula” to proceed. Of course, if you no longer want the original character column, you might as well go with Option B instead.
Option B: change a character column to numeric directly
To change an existing character column into numeric, just send a series of messages to the column, first changing the type to numeric, then specifying a format. You might also want to change the modeling type to continuous, and you might want to change the column’s name.
Column( "timeChar B" ) << data type( numeric ) << Format( "hr:m:s" ) << modeling type( continuous ) << set name( "timeNum B" );
After
Now the data table with two character columns should have one character column, a second column that was character and has been changed to numeric, and a third column that is a numeric copy of the first column: