Alpha-Channel Extensions to the pbmplus / netpbm Suite

Primarily prompted by activities related to the PNG image format, Greg in mid-1997 started implementing a hack (that is to say, an unofficial extension) to allow alpha-channel support in some of the pbmplus / netpbm utilities. The motivation for and status of this effort are described below (including links to the modified sources).

Quick Introduction

For those of you who don't already know, the pbm utilities are a set of Unix-based (usually), command-line programs for manipulating bi-level (portable bitmap or pbm or 1-bit), grayscale (portable graymap or pgm or 8-bit), and RGB truecolor (portable pixmap or ppm or 24-bit) images. The file formats themselves are quite simple; the real benefit is in the vast array of associated programs, which together support the conversion of almost any format to any other and the manipulation of images in numerous ways. For example, to convert a 24-bit TIFF image to a nicely dithered, 256-color, interlaced GIF, one might do this:

    tifftopnm greg.tiff | ppmquant -floyd 256 | ppmtogif -interlace > greg.gif

Jef Poskanzer wrote the original pbmplus utilities, but others have contributed many additional programs, and the name of the most recent public collection, netpbm, reflects this collaboration. It is still supported via a mailing list; send subscription requests to pbmplus-request@acme.com to be added to the list. [Jef moved the list from best.com to acme.com (his own site) in early December 1999. Unfortunately there is no list archive.]

Approaches to Transparency

The pbm utilities are particularly weak in one area important to the Web: transparency or, more generally, alpha-channel (variable opacity) support. Certain utilities such as ppmtogif and pnmtopng / pngtopnm have options to enable transparency, but this is purely local: there is no way to send such information through a pipe. In the example above, the original TIFF image might include an alpha channel, and the final GIF image certainly could support a single level of (full) transparency, but the only way to transfer knowledge of the transparency from the TIFF to the GIF except via temporary files.

[The following is somewhat out of date, since Greg is now in favor of a format that allows the orthogonal inclusion or omission of an alpha channel. That is, not only RGBA should be supported but also grayscale+alpha (GA) and conceivably even bilevel+alpha--though Greg doesn't think it's worth extending things quite that far. Since the entire suite would have to be modified in order to support this new capability--whether just RGBA or any type with an alpha channel--one may as well revise the ASCII header, too, and upgrade the moderately bad P1-P6 magic signatures. See the new ``The "Right" Way'' section below for details.]

The better solution to this would be to introduce a new pbm file format, the portable alphamap or pam format, and extend all of the pbm utilities to understand it. Just as the ppm format is basically nothing but raw red/green/blue (RGB) triplets, the pam format would consist of 32-bit red/green/blue/alpha (RGBA) quads. The problem with this solution is that every utility would need to be modified in order even to recognize the format as something related to pbm. That's a lot of work. [It turns out that the crude solution outlined in the next paragraph was almost as much work, so interleaved is definitely the way to go. Indeed, it's already in regular use at one fairly large map site (as part of the generation process).]

A simpler but cruder solution (and the one taken here) is to treat the RGB part of the image as a normal ppm stream and the alpha channel as an appended pgm stream. (Grayscale + alpha images can be handled similarly as a pair of pgm streams.) The advantage of this approach is that RGBA pam images are trivial to produce: simply concatenate a ppm and a pgm of identical dimensions, and voilá. The images are also backward compatible in the sense that non-alpha-aware utilities will simply ignore the appended alpha info and otherwise behave normally. The disadvantage is that most normal image-processing cannot be done on a streaming basis; the entire image (or most of it, anyway) must be read into memory before processing can begin. An appended-alpha pam also cannot be easily recognized by utilities except by comparing its true size (in bytes) with that expected on the basis of its height and width.

Nevertheless, the crude method suffices as a stopgap measure for now, and many of the modifications necessary to support it will also carry over to the good method, assuming anyone ever gets around to implementing that.

Progress So Far

A relatively quiet weekend in Tahoe was sufficient to whack out basic alpha-channel support in three utilities: pngtopnm, pnmtopng and ppmquant. (The latter was actually cloned into pamquant since its algorithms are so closely tied to the image type.) These three are all that's needed to convert full 32-bit RGBA PNGs into 8-bit RGBA-palette PNGs.

to do:

Source Code

Completely bare-bones; there are no makefiles or even build scripts. The normal pnmtopng makefiles (see the PNG Converters page for links to the official distribution, currently version 2.37.4) can be used for the enhanced versions contained herein, however:

[1/16-size image of AlphaSnakes.png being viewed with Arena]

Example Images

The first is the original, 32-bit RGBA image; the second is the output of the modified pngtopnm / pnmtopng and the new pamquant:

The latter is also embedded in the Miscellaneous PNG Images page, although both Netscape Navigator and Internet Explorer choke on the OBJECT tags and generally fail to display anything. [Navigator and MSIE have both improved somewhat since this was written, and the snakes image has been replaced with four others that are each available in three sizes--the smaller ones as 8-bit RGBA-palette PNGs and the largest ones as 32-bit RGBA PNGs. There is also an alternate version of the page that uses standard IMG tags, since the Big Two have not yet improved enough for OBJECT to be particularly usable.] Arena does a nice job, however. A reduced-size screenshot from Arena is at right.

The "Right" Way

[Added 21 November 1999, restating a proposal in an e-mail message to Willem van Schaik dated 15 February 1999]

To expand upon the comments in red above, here's a quick summary of Greg's thoughts on the correct way to extend pbmplus/netpbm to include alpha support. First, any such extension will require a major revision of all utilities--even if only to pass through or discard the alpha information, for example--so there's no real point in slavishly sticking to the current P1, P2, ..., P6 signature scheme. That is, old versions of the utilities will not be able to understand alpha-enhanced images, and since PBM, PGM and PPM were never really that useful as final storage formats anyway, a clean break with the older releases isn't that big a deal. Indeed, even with the changes proposed below, a simple sed or perl one-liner could be used to convert new-style PBM/PGM/PPM files back into the old format for use with old versions of the utilities. (But of course one would include a full set of old-to-new and new-to-old conversion utilities with any new release, thus allowing maximal interoperability with old-style utilities for which the source code is no longer available.)

Second, since it's either one format or six formats, why does it currently have three file extensions? Let's forget that, stick with the one-format philosophy (e.g., PNG, GIF, TIFF, JPEG), and choose a single file extension. Conveniently, there's already a candidate: PNM, which currently is merely a catchall name for any of the original three ("portable anymap"), but could be made into a "real" format now.

Third, netpbm's two-byte "magic" (or three-byte if you count the newline after the `P' and digit) is insufficient for unique identification of the file type, and the variable offset of its dimension info makes it unfriendly to the file(1) command. (Recall that the PBM format supports comment lines, so the dimensions need not begin at offset 3.) The comment issue may be difficult to avoid, but there's no question that one could improve the magic signature and at the same time include more information in very simple, machine-readable form--yet still encoded as human-readable ASCII, of course. Additional information beyond the signature bytes that would either be required or merely useful includes the image type (bilevel, grayscale or RGB), the data encoding (ASCII or binary), the transparency type (none, bitmask or alpha channel), the number of bytes per sample, etc. Such a signature/image-type line might look as follows:

NetPBM2k:3:B:-:1\n
1024 768\n
255\n

The first 9 bytes (NetPBM2k:) are a pretty unique file signature, though not absolutely foolproof (including a byte with its eighth bit set, a la PNG's signature, would be slightly better). The remaining fields are separated by colons, though they're intentionally limited to single bytes for simple parsing--so the colons are actually redundant from a machine-processing standpoint, but not necessarily from a human-readability standpoint.

The first such field indicates the image type: 1 = PBM, 2 = PGM and 3 = PPM.

The second field indicates the data encoding: A = ASCII, B = binary.

The third field indicates the transparency option; - = no transparency, t (or m?) = bitmask (simple transparency a la GIF), a = full alpha channel. (One would have to decide whether to support both associated/premultiplied and unassociated/non-premultiplied alpha or just one of them; if both, a fourth character would be necessary, perhaps p = premultiplied alpha.)

The fourth field indicates the number of bytes (not bits) per sample, where 0 = shorthand for 1-bit (PBM), 1 = 8-bit, and 2 = 16-bit. (This is related to pbmplus's maxval but not exactly identical to it; for example, an image with a maxval of 63 must still be stored with 8-bit samples.)

One could optionally move the width, height, and maxval values up to the end of the line, again colon-separated but no longer fixed-width. For future extensibility, one might wish to allow other flags or values in the future; if any are detected, the current suite would assume that the file is unreadable. A double-colon could be used to separate the flags from the height/width/maxval values, if this approach seems worthwhile:

NetPBM2k:3:B:-:1::1024:768:255\n

Of course, comment-lines can be extremely useful, so Greg does not advocate eliminating that capability--at least, not without some serious thought and discussion on the matter. In fact, nothing about the proposal is cast in stone, but there is much to be said for a format that is both human-readable (i.e., sort of backward-compatible) and trivially machine-readable (i.e., for which file(1) can return useful info), and, if left to himself, Greg will likely implement something like this.

Related Links

Here are pointers to some other folks with ideas about and/or fixes for pbmplus/netpbm:

And here's the subscription address for the recently moved pbmplus mailing list:

Acknowledgments

Thanks to Stefan Schneider for the proof-of-concept of RGBA quantization and dithering (check out his LatinByrd app); for the psychological push to get this underway; and for some sample ppmquant-RGBA code (which, even though Greg didn't use it, was nevertheless greatly appreciated). Thanks also to Pieter van der Meulen for the idea of simply appending the alpha channel to a normal ppm file, and to Willem van Schaik, Alexander Lehmann and Jef Poskanzer for some excellent utilities in the finest Unix tradition.


Click here to return to Greg's software development page.
Click here to return to Greg's home page.
Click here to return to Greg's table of contents.
Last modified 31 December 2004 by Greg Roelofs, you betcha. This page is http://gregroelofs.com/greg_rgba.html .