Wednesday, November 02, 2005

Perl, XML, and XSLT

Our website has a thumbnail of a campus photo which rotates randomly and links to a gallery page of all of the student/faculty-submitted photos. In the past, we would have managed this one of two ways:
  1. A JavaScript reading an array file, typically a comma-delimited list of filename and metadata, and picking a random array element to output to HTML
  2. Perl doing the same thing and being SSI included
Thing is, the amount of metadata we're collecting about our photos is reasonably complex. The thumbnail image URL, the fullsize image URL, the author, their title... and whether the image should only appear in the gallery. This is the kind of data that looks less like a table of columns and more like an XML manifest.

The translation of the manifest to the gallery is tailor made for XSLT (XML stylesheet transformation), and strictly speaking a modern browser is capable of doing the translation by itself -- as long as that's the only content on the page. Inserted into the middle of an XHTML document, that's another story. So, it makes sense to have the server manage the XSLT.

In order to that, the Perl we use has to be made aware of XML and XSLT. Good things and bad: first, an XML file really needs a DTD. Why? For one thing, Firefox and IE throw hissies at XML without DTDs even though they don't validate the XML against them. For another, it forces you to define the syntax of your XML. Better yet, XML editors (even Dreamweaver) actually read them and autocomplete your entries. DTDs aren't hard to write once you get used to SGML definition syntax.

Perl still sucks its thumb and sits in the corner, until you integrate XML parsers into it. XML parsers in turn rely on libexpat, whose installation was covered earlier in this blog. As a non-root user, I have to install the XML modules to a local directory; fortunately, previous experience installing FAQ-O-Matic helps here.

Why go to all this trouble? The answer's simpler than it seems. It's a database important enough to need server-side manipulation but not important enough to justify an Oracle table. Currently we're manually editing the XML manifest for the gallery, but with a DTD-based framework for this XML, we can construct webapps for maintaining that database that aren't dependent on a particular language.

With XML::Parser, XML::Simple, and XML::XSLT, we can slurp the database to a Perl dataset, add/edit/remove items and output the resulting dataset back to XML. Should another application need to work with the data, it's there in a friendly, future-extensible format.