Sunday, April 8, 2012

Shell Game


[This week I’m essaying a very technical topic.  This is partially because I want to spit out something fast, and I can do technobabble pretty easily.  Plus I’ve always wanted to write this down, because people—other technogeeks, obviously—ask me about this all the time, and it’s difficult to sum up in a quick sound bite.  But, if you’re not a techie (and specifically a *nix techie at that), you may wish to pretend that I scribbled out another “I don’t have time to write a proper post this week” interstitial post.]

Let’s talk for a moment about shells.  Linux shells, I mean, although we’ll speak in broad enough terms that it shouldn’t matter which flavor of Unix you prefer.  If you don’t what a shell is, you may as well find another blog post to read.  If you do, you probably know most of the history I’ll cover below, but it’s a nice refresher anyway.

The original Unix shell was the Bourne shell (sh), by Stephen Bourne, which goes all the way back to 1977.  Like all things Unix, there was nearly instantaneously a competing product: the C shell (csh), released in 1978, by Bill Joy (who also gave us vi).  The C shell has a number of improvements over the Bourne shell, but they’re utterly incompatible: the syntax isn’t remotely the same, in some cases to the point where you suspect Joy just did the opposite of what Bourne had done, out of spite.  Broadly speaking, the shells that followed hewed to one or the other syntax, giving us two “families” of shells, which are mostly compatible among themselves and not at all with each other.

So, next to come along was the Korn shell (ksh), by David Korn, in 1983; then the Tenex C shell (tcsh), by Ken Greer, later that same year; then the “Bourne-again” shell (bash) by Brian Fox, in 1989.  On the one side, we have the Bourne shell family (sh/ksh/bash); on the other side, the C shell family (csh/tcsh).  There are some other options out there, but these are the most popular by far, with the youngest member of each family eclipsing (for the most part) their elders.  And, these days, bash has emerged as the clear winner, and the others are hardly ever seen.

Well, except on my machines.

You see, I have the following philosophy on *nix shells: use tcsh at the command line; program with bash.  These days, at least on the Linux machines that I work on, that means having to install (or request to be installed) tcsh manually.  Many people ask me why I cling to tcsh.  This post will hopefully explain why.

Now, the first thing you must understand is that, when I came along, it wasn’t a choice between tcsh and bash: it was a choice between tcsh and ksh.  I never even saw a machine with bash on it until sometime in the 90’s, and I was well-established in my patterns by then.  So some of the reasons I made my decisions don’t even apply any more: bash has features that ksh lacked, and is just as good as tcsh these days in many areas, such as history.  But there are still enough reasons that do apply that I continue to refuse to switch.  If you think I’m wrong, though, I welcome your comments to show me the error of my ways.  However, realize that, while I always switch my personal account, I never switch the shell for the root account, so it’s not like I never use bash at the command line.

So, why is tcsh better for an interactive shell?  First of all, let’s look at some of those reasons that don’t really apply any more.  I won’t go into too much detail here, since ... well, since they don’t really apply any more.

  • Tab completion.  ksh didn’t originally have it at all; the ‘93 ksh added it, and bash can do it pretty much as well as any other shell around.  But tcsh was basically invented for this, and had it first and best for many years.  Completion of history commands, in particular, I have never gotten to work right in either ksh or bash, although both claim to support it.
  • History recall.  Now that bash does it too, it’s easy to forget how awesome !! was when only the C shells could do it ...
  • Command line editing.  tcsh lets me use vi keys to edit my command line (which drives anyone trying to use my terminals crazy).  I’m pretty sure bash can do this as well; if ksh ever could, I never knew about it.

Now, what about the reasons that do still apply?  (Or, at least, do still apply as far as I know ... no doubt bash might have snuck some of these in when I wasn’t looking.)

  • Shell redirection.  If I want to redirect both stderr and stdout of a command in the Bourne shell family, I have to do something like this: command >/dev/null 2>&1.  Ick.  In the C shells, it’s simpler: command >& /dev/null.  The Bourne versions let you redirect the two separately, true, but, in practice, I never want to do that on the command line.  Whereas redirecting both at once is very common.
  • “Separate” environments.  This one is more conceptual.  Shells maintain variables, and those variables can either be visible to subshells (and other child processes) or not.  In the Bourne family, you think of them as two different types of variables.  I can set a variable, and it’s local; I then “export” that variable and it becomes global.  In the C shell family, you think of them as two different sets of variables.  If I want a local variable, I use one command (set), and if I want a global variable, I use a whole different command (setenv).  No mixing.  Perhaps it’s just a personal preference, but this really works for me much better.
  • Alias arguments.  All shells will let you define aliases.  But, in the C shells, your aliases can have arguments.  So I can define lln as ls -lhtr !* | tail (and I have, as it happens).  In the Bourne shells, the arguments to your alias go at the end of the command line, period, no exceptions.  If you want it otherwise, you have to write a function.  Why do I want to write a function when I have a perfectly good alias?
  • Customized prompts.  It’s true, bash has come a long way in this department.  But I still find tcsh easier.  My current prompt is "[${LOCALHOSTNAME}:%.04] ".  To do that in bash, I’d have to fiddle around with $PROMPT_DIRTRIM or somesuch, and I’m still not convinced I could end up with exactly the same thing.

Now, on the flip side, I would never use tcsh for a shell script.  The only scripting I ever do in tcsh is in my .tcshrc, and even that I find almost unbearably painful.  I regularly have to look up the syntax for loops and even if conditionals.  I originally used ksh for all my shell scripts, and I even held on to it long past the point when it was rational to do so: past the point where I was manually downloading pdksh because no one was shipping ksh any more.  I finally made the switch to bash about 10 years ago, and, other than having to change all my prints back to echos, it was fairly painless.

The definitive proscription against using the C shells for scripting is of course Tom Christiansen’s Csh Programming Considered Harmful, but I’ll give you my personal breakdown:

  • Shell redirection.  In a shell script, I quite often do want to redirect stdout and stderr separately, and not being able to do so in the C shells is practically a non-starter.
  • Unset variables.  In the Bourne shells, an unset variable expands to the empty string, which makes sense.  In the C shells, it expands to ... a syntax error.  You have to use $?VARNAME to see if the variable is set before trying to use it, unless you’re very very sure that it will be.  That’s just annoying.
  • Backquotes.  Trying to use backquotes for capturing command output just sucks.  The ksh/bash construct of $(command) is so much better.  Interpolation works, nesting works, it just ... works.
  • Basic string manipulation.  In ksh or bash, I can chop off prefixes or suffixes, do global substitutions, put in default values, and all sorts of other stuff.  In tcsh I pretty much have to echo my variables to sed or somesuch.
  • Basic arithmetic.  In ksh or bash, it’s $(( $VAR + 6 )).  In tcsh, I’d have to pipe that to bc or something.
  • Newlines in strings.  You just can’t in the C shells.
  • Getopts.  No such thing in the C shells.
  • Trap.  No such thing in the C shells.
  • Extended pattern matching.  No such thing in the C shells.
  • "$@".  No such thing in the C shells.

I could probably go on.  Or you could just read Tom Christiansen’s essay, linked above.  I don’t agree with everything he says, and some of what he says is nitpicky and doesn’t actually come up that often, but it’s comprehensive, and Tom has experience with “considered harmful” essays.

So hopefully the next time someone asks me why I still use tcsh, after all these years, I’ll pass on a link to this post and that’ll be the end of it.  I don’t mind switching to newer things when it makes sense (I did, eventually, switch from ksh to bash, as I mentioned above), but right now my .tcshrc is nearly 250 lines of painstakingly handcrafted customization over nearly 20 years, and there’s too much effort for not enough gain to try to get used to bash on the command line.  Maybe one day ... but probably not.  Maintaining compatibility with the Bourne syntax inherently creates some difficulties that I simply don’t have any need or desire to overcome.  Perhaps if tcsh ever gets as hard to come by as Betamax machines or PS/2 MCA buses, I’ll reluctantly give it up.  Until then ... why should I?