Wednesday, April 18, 2007

Explicit...

I'm a raving SAS partisan. Yeah, it's my current meal ticket, but it's not the only thing on my resume - it's just that good. But that's not to say that there isn't room for improvement.

For instance, SAS doesn't require declaration of variables. That's part of why it's very nimble - it's incredible what you can do with very little code. If you only write tiny code blocks and run them one at a time, it's terrific. And if you're the kind of person who writes a data step to rename a variable you might never regret this.

Today's hardware can be lightning fast, CPUs and disk are multiplying and SAS procs have been refined such that running through a few million records isn't the kind of thing you might need to coordinate any more. Run time for processing a few billion is on the order of hours instead of days or being simply impossible.

Yet capacity is still finite, and I/O takes forever relatively speaking. I remember the old days, and when I read gigabytes of data I want to wring it dry before I let it go. That means writing more code and thus having more opportunities for error.

Many of those opportunities involve typos. Mistyping a variable name in data step code is a new variable, a syntax error or a crossed wire. If it's not a syntax error SAS can't help you.

If you're paying attention to your SAS log, those typo variables will generally result in a note like "variable TYPO is uninitialized". (If you can't explain notes like that I might have to come by and bop you upside the head.)

Other languages like perl and BASIC derivatives like VB let you do without declarations too, so the compiler can't help keep you out of trouble. But perl and VB* also permit you to force declarations too (use strict, Option Explicit).

It seems that providing a SAS option like strict or explicit could do a lot of good without a tremendous amount of work. It's not as if you'd have to rewrite every PROC - my SWAG is that the impact would be confined to the compiler, and it's not as if nobody ever taught a compiler to demand declarations.

Enough for now. Don't go mistyping any variable names, OK?

No comments: