Wednesday, October 13, 2010

New implementation of printf

So I finally got tired of waiting for someone else to do a printf(1) replacement in illumos for the closed binary from Oracle. I had thought this would be a trivial thing to do via ksh93/libcmd using a symbolic link ala /usr/bin/alias.

Lo and behold, it wasn't! Why? Because ksh93 printf insists (like all ksh93 builtins) on having -- and - getopt style processing. This is fundamentally incompatible with legacy printf. (Why does it do this? So it can dump its builtin man page, e.g. printf --man, to the console. A feature I've railed against in the past.)

Here's what should happen:

% printf -v

Here's what ksh93 does:

garrett@thinkpad:~$ printf -v
ksh93: printf: -v: unknown option
Usage: printf [ options ] format [string ...]

Now there is an argument to be made that a script which relies on the legacy behavior is fundamentally broken. But it doesn't matter -- the scripts are in the field (there are real examples of them), and the legacy behavior must be preserved. Breaking these legacy scripts just so that we can dump printf --version output is... silly. This is case where pragmatism wins over purity.

Rather than try to rip this out and fight with the ksh93 about "deviation from the upstream" (apparently the ksh93 folks view any changes we make in illumos or OpenSolaris as automatically toxic unless they originate from David Korn or Glenn Fowler), I've just gone ahead and implemented my own printf(1) on top of FreeBSD's. This will be the implementation in illumos.

I've added significantly to FreeBSD's code though. Specifically, I added handling of %n$ processing to get parameterized position handling. This is needed for internationalization -- it allows you to change the order of output as part of the output from something like gettext(1). (This is needed when you have to change word order to accommodate different natural language grammars.)

So my implementation is superior to FreeBSD's, and its superior to the legacy closed binary version. Why? Because rather than a half-hearted attempt at processing positional parameters, my version really handles these, including full support for the usual format specifiers. For example:

New open code:

garrett@thinkpad{4}> printf '%2$1d %1$s\n' one 2 three 4
2 one
4 three

Old closed code:

garrett@master{22}> printf '%2$1d %1$s\n' one 2 three 4
134511600 one

Clearly the old behavior is just plain wrong. For the record, ksh93 does the right thing here too. (Although somewhat older versions of ksh93 would dump core on this command line.)

My diffs (which also include style and lint fixes required for illumos) relative to FreeBSD are online. You can also review a webrev of the changes that I hope to integrate into illumos. The license remains BSD, so the various BSD operating systems (or even Oracle) are free to incorporate these improvements if they like.


Chris said...

Mr. Gdamore, option parsing in printf is mandated by the UNIX standard. The Austin Group has issued in interpretation of the standard that the printf tool must, while having no options yet, do option parsing.
Ingoing options like in your example is not allowed and VSC6 will fail with your code.

Garrett D'Amore said...

Do you have a pointer that interpretation?

Perhaps this is a situation where /usr/xpg4/bin/printf should have it.

The problem is that there are scripts that may require it. I'm more concerned with compatibility with legacy scripts than I am with a hypothetical standard here.

Garrett D'Amore said...

So I've posted a new
webrev. This one adds a few extra lines, to accomodate "--", per a POSIX requirement:

Standard utilities that do not accept options, but that do accept operands, shall recognize "--" as a first argument to be discarded.

Notably legacy closed printf also handles -- the same way.