Sunday, September 7, 2014

Modernizing "less"

I have just spent an all-nighter doing something I didn't expect to do.

I've "modernized" less(1).  (That link is to the changeset.)

First off, let me explain the motivation.  We need a pager for illumos that can meet the requirements for POSIX IEEE 2003.1-2008 more(1).  We have a suitable pager (barely), in closed source form only, delivered into /usr/xpg4/bin/more.  We have an open source /usr/bin/more, but it is incredibly limited, hearkening back to the days of printed hard copy I think.  (It even has Microsoft copyrights in it!)

So closed source is kind of a no go for me.

less(1) looks attractive.  It's widely used, and has been used to fill in for more(1) to achieve POSIX compliance on other systems (such as MacOS X.)

So I started by unpacking it into our tree, and trying to get it to work with an illumos build system.

That's when I discovered the crazy contortions autoconf was doing that basically wound up leaving it with just legacy BSD termcap.   Ewww.   I wanted it to use X/Open Curses.

When I started trying to do that, I found that there were severe crasher bugs in less, involving the way it uses scratch buffer space.  I started trying to debug just that problem, but pretty soon the effort mushroomed.

Legacy less supports all kinds of crufty and ancient systems.   Systems like MS-DOS (actually many different versions with different compiler options!) and Ultrix and OS/2 and OS9, and OSK, etc.  In fact, it apparently had to support systems where the C preprocessor didn't understand #elif, so the #ifdef maze was truly nightmarish.  The code is K&R style C even.

I decided it was high time to modernize this application for POSIX systems.  So I went ahead and did a sweeping update.  In the process I ditched thousands of lines of code (the screen handling bits in screen.c are less than half as big as they were).

So, now it:

  • Speaks terminfo (X/Open Curses) instead of ancient BSD termcap
  • Uses glob(3C) instead of a hack involving the shell and a helper program (lessecho, which I've removed from my tree.)
  • Functions properly as /usr/bin/more, both with and without -e (even on broken xterms)
  • Is fully ANSI C (or ISO C, if you prefer)
  • Passes illumos' cstyle code style checks
  • Is lint(1) clean

There is more work to do in the future if someone wants to.  Here are the ideas for the future:

  • Internationalization.  This is a pretty easy task involving gettext().
  • Make less use getopt() instead of its byzantine option parser (it needed that for PC operating systems.  We don't need or want this complexity on POSIX.)
  • Fix its character set handling so it can use the mbstring and wcstring routines in the platform instead of relying on it's own implementation of UTF-8.  (This would make it support other multibyte locales.)
  • Make it support port events instead of sleeping when acting in "tail -f" mode.

If someone wants to pick up any of this work, let me know.  I'm happy to advise.  Oh, and this isn't in illumos proper yet.  It's unclear when, if ever, it will get into illumos -- I expect a lot of grief from people who think I shouldn't have forked this project, and I'm not interested in having  a battle with them.  The upstream has to be a crazy maze because of the platforms it has to support.  We can do better, and I think this was a worthwhile case.  (In any event, I now know quite a lot more about less internals than I did before.  Not that this is a good thing.)