API Design

My discussion about easy scripting of web apps is part of the subject of API design, which I do find very interesting. API choice is subjective, but is it that subjective? There should be examples of APIs everyone can point to and say "That is how/how not to do it".

Obviously, I'm a fan of object-oriented APIs, but the ease of use of OO APIs differs greatly.

Python's standard API is too flat. The modules are self-contained rather than having dependencies on one another. This means most functions will tend to return a primitive rather than a more suitable datatype. There is little consistency in naming, but that's not too bad because of the interactive interpreter and help.

Java's API is too nested. It is highly interdependent, but its power usually comes from composition of the objects rather than subclassing, so some of the packages get extremely complex. It is entirely based on design patterns and has a consistent verbNounPhrase naming policy which they only break occassionally (*cough* System.currentTimeMillis() *cough*). I think I prefer this to any other API I've used, despite areas of high complexity and still not having found a truly succinct way of using AWT and Swing (perhaps I will try a XUL implementation next time).

The worst API I've ever looked at is libxml2, minor faffing with which is required to get XSLT up and running in Python. It's so bad it gives me a headache just looking at the API documentation!

Scriptable Interfaces Revisited

I solved my problem mentioned in my last post by creating my own scriptable interface in Python.

Happily, I had decoded enough of the database format to be able to write a class - about 50 lines of Python - which encapsulates the configuration I needed to modify, and the ability to store it into the database (I didn't fully implement reading the old configuration from the database though).

Then I was able to do the reconfiguration as simple object instantiation and member function calls. As expected, it was much easier than logging in as administrator for each CMS in turn, going through the interface to find the configuration page required, and updating and saving it.

There were a number of advantages:

  • All the configuration to be input was collected at the bottom of the script, together on one page, so it was easy to check that everything was present and correct.
  • The script can be re-run at any point, so it's maintainable.
  • The script performs some checks to ensure the new configuration is valid.

Obviously, it would be easier if Joomla offered this kind of capability so it didn't have to be maintained separately in Python. Joomla is written in PHP, but that doesn't mean you can just write a PHP script that loads up Joomla's classes and tweaks them as I've described. PHP doesn't have import semantics, so you need knowledge about what has to be included and in what order. It also doesn't have a very strong object model so trying to make object manipulations correspond to data manipulations is inherently inconvenient. Finally, Joomla's PHP objects are comprised, as far as I've examined, of monolithic functions that do a lot of procedural steps rather than small, reusable functions.

So I'm one step closer to defining exactly what I need here for administering web applications. I need to be able to write administration scripts quickly, without prior knowledge of the application's internals (ideally using interactive introspection to obtain the knowledge I need).

  1. An API which allows me to succinctly retrieve and manipulate the application's data. This should be object-oriented so that for any object I'm handed, I know what operations I can perform with it. The structure of the object model should reflect the way the data is presented in the web app's front end, so that I go blindly using what I see of the web app's data model.
  2. A very low overhead in terms of lines of code for getting up and running. I don't want to learn code by rote just so I can bootstrap this API.
  3. Implicit or succinct persistence for the objects I've retrieved. I can call store() on each object I modify, if I must, but implicit persistence (ie. everything is automatically updated when the script ends) will better allow the API to handle ACID for me.
  4. No SQL queries. I don't want to have to understand the database structure. I also don't care about efficiency in this instance so just hand me a list of all the objects and I'll filter it myself.
  5. No using built-in types to represent abstract data. For example, no using associative arrays to represent objects. A class for everything, with useful member functions, please. I don't want to have to work out the semantics or write any code that might already exist in the application.
  6. A few CLI utilities that use this API. Not only can these perform useful tasks, but they prove the API works, is succinct, and gives examples of it in use that I can copy.

Note that the API doesn't need to remain backwardly compatible. I'm happy to modify my management scripts. The aim is for the scripts to be as brief as possible, so it shouldn't be hard to tweak them if the data model is changed (improved, we hope).

Web apps need scriptable interfaces

I was just working on a set of separate Joomla installations for a client today when I realised that I really needed to be able to run scripts against the different installations.

I was trying to install three different Mambots (one of Joomla's three different types of extensions) in about 8 installations of Joomla - each with different database configurations and paths, and having started out with a Bash script to merely copy the plugin files into place, I realised that because automating the whole operation would involve reading a configuration file in PHP syntax and performing some queries in MySQL with it, coding this would probably take longer than installing the plugins manually.

There are not very many web apps which have any kind of scriptable API. In fact, I only really know of Mailman, which is only partly a web application. But it's a feature I've used frequently in Mailman - there is a script bin/withlist which acquires locks and opens the list, allows you to modify the list as a Python object, and saves it on exit. Mailman provides a few CLI tools too which can be used in scripting but which are really only trivial examples of the power of the scriptable API.

When I began writing Mailhammer, my own announcement-only mailing list software, I took this scriptability even futher based on my positive experience with Mailman's scriptable API. All of the working parts are implemented in Python, and the PHP is just an HTML wrapper which opens and talks to a CLI Python script over pipes. This means that the PHP is kept extremely simple, and the Python core is a very clean and simple API, and that the CLI can do everything reliably. It's a cleanly divided implementation of an n-tier architecture. In fact in practice, I only use the web interface for viewing the data already in the database. Consequently, that interface isn't very powerful - yet!

Python is well-suite for scriptable APIs - its interactive interpreter and neat object model mean that it's easy to perform arbitrary operations interactively on complex, persistent data structures. In PHP web applications it might be more feasible to build an XML-RPC interface of some kind and provide a command-line client.

I don't think that scriptability is considered as even a potential feature for almost any web application I've tried; their operation is tied inextricably to their unique interfaces.

For anybody developing a new web application please ask yourself this: will administrators using your software want to be locked in to your pretty and easy-to-use interface, or will they end up cursing you for failing to provide them with power beyond what HTML can provide?

Data mining with AJAX

Just had an idea: how about using Javascript to record client-side usage of your website?

The principle is this:

  1. Register Javascript listeners which construct a list of events, particularly mouse, scroll and click events, along with the time that the event was fired.
  2. Register an unload event which posts the information as XML with AJAX to a script on the server when the user leaves the page.
  3. Browsing sessions can be collated on the server using cookies.
  4. Create a player, which reads the events as XML and renders them using a DHTML 'cursor' and/or by firing events within the DOM. Could have a time slider and fast-forward controls, etc, depending on how complex you want to get.

Voila - see exactly what people are doing with your site. I have knocked up a test which implements the first two steps, for mousemove events, and that much works, so the whole concept would be workable. I can imagine it would break down if your site uses plugins (or Javascript navigation, depending on how easy it is to replay the events accurately) but that's a limitation you would have to live with.

There are obviously privacy concerns but this is relatively mild as no personal data would be recorded. Perhaps it could pop up a Javascript window.confirm() dialog asking if it's OK to record your behaviour. But it would be a very useful tool for examining site usage, especially for commercial sites. This is the way modern marketing works. I leave it up to your conscience as to whether it's ethical.

IE7 and FF2

Well, this was the week that the world of web design was turned on its head. The release of Firefox 2.0 wasn't the cause: it's only Microsoft who can shake the industry up like this with it's release of Windows Internet Explorer 7.

It's been 5 years with the same IE bugs, but now we get a lot of them taken away and a whole new load handed back. Because of this, we once again have three major platforms to target: IE6, IE7 and Real Browsers. There has been a lot of talk about better standards compliance in IE7 but it only takes a glance at comparative Acid2 renderings to see that IE is still way off the mark. Firefox 2 doesn't pass Acid2, but its performance is actually not bad at all. In fact there is only one major class of bugs left for Acid2, bugs which will all be fixed by the reflow rewrite, which has been in progress for a while now.

Happily, the disparity between IE6 and IE7 isn't as great as I had feared and having checked a few of my websites they generally all work fine in Internet Explorer. Until I start seriously working with IE7 I don't know why that might be; either it is largely backwards compatible with IE6 and the hacks and workarounds are working, or it's much more standards compliant and the workarounds aren't working but also aren't needed. But it's probably a mixture. Some things remain broken but the workarounds are still working; other things are fixed but the workarounds are now not working, or are harmless anyway. I have avoided working with the IE7 Betas so far apart from a brief test of the first beta. It wasn't worth monitoring the situation until now. I'm not interested in Internet Explorer more than this job requires.

One thing that I have noted since the early IE7 announcements was that it would respect PNG Alpha transparency, something which I started to use years ago but which for the past two or three years I have relied on in almost every website I design. Because I've usually served a simple non-transparent replacement for Internet Explorer (usually a screen-capture of Firefox!) switched on the User-Agent request header, it's been habit to allow IE7 to get the transparent PNGs like Real Browsers do. (Actually, Microsoft did debate whether to change the format of the User-Agent string rather than just the version number, so it's lucky that they didn't or this crafty pre-emptiveness wouldn't have worked).

So what do I think of IE7? It's OK. The radical UI layout is alright but I don't know if putting your menus in hard-to-reach places is actually any better. With much more screen space for web pages (much more even than Firefox used to trumpted about back when 1.0 was released, gained by ditching the menu bar and using the tab bar for other toolbar icons), and a sleek rocker button for back/forward, it's a little bit tidier and fresher than Firefox 2.0, but it's a radical departure from old Windows styles that people are used to. Of course, I in no way recommend that anyone should choose IE over Firefox, and particularly not simply because it looks nice. In fact I would still suggest that nobody should use IE for anything other than Microsoft Update.

Upgrading to IE7 is a pain if you still need access to IE6. I've installed a copy of XP in a VMware VM so that I've got a usable copy of IE6 tucked away, and I will also try and get a standalone copy installed. Most people don't care that IE takes over your system, but all computer scientists find it horrifyingly inelegant. I loathe IE because it's insecure, and it genuinely makes my work harder day-to-day, so I also find it mildly sickening that it's always there in the background driving some application I didn't want it to.

One of the biggest changes that's happened in the past 5 years is that now, Microsoft pushes out software updates over Automatic Updates. Microsoft haven't pushed the button to send out IE7 yet, but they are planning to, and when they do we will see browser uptake like never before. It's a scary thought, actually. There could easily be a 50% jump in IE7 market share within a week, so it's entirely up to Microsoft exactly when your website will start getting crawled over by masses of IE7-using idiots and whether they will or won't be able to view your website properly.

Firefox 2.0 is also a welcome improvement. They've changed the icons, clearly to compete with the new Internet Explorer. They have been given a perspex glossiness mimicking IE7's icons. There are also many more of the minor dialogs and views such as a search engine manager, and a style for RSS feeds. Mainly this was just a wrapper for a new Gecko, which is not really a bad thing. It just doesn't need much hype, especially because many of the new Gecko features are so far ahead of IE7 that they are entirely useless on any Internet website.

If you're using a browser which doesn't run on an IE (Trident) or Mozilla (Gecko) engine, then good for you. Firefox is best for me because of its flexibility as a development tool, but I happily endorse all the other Real Browsers too.

More PHP segfaults

Another case of PHP segfaulting. This time, at least, it was behaving deterministically and by inserting

print "Meep!";
flush();

throughout various bits of the code I managed to track down the problem. It was segfaulting trying to read a config file to which it didn't have read permissions.

PHP is bad.

Why I'm not sold on RSS

I don't know if I'm the only one but I've just never gotten on with RSS (under the umbrella of which I include Atom too). Nothing I've read about it resolves these open questions:

  • What is RSS for?
  • Why is RSS the best way to do... whatever it is that it's for?

I think that RSS's history lends credibility to the fact that nobody really has the answers to those questions.

I discussed this recently with Dee on IRC, and I think we both started to understand one another's views on the subject. He said he was a fan because it allowed him to set up notifications when websites were changed, and since then I've been using it for the same purpose. I added a couple of feeds to Thunderbird and yes, I can now easily see when a website has new content.But that hasn't really address the difficulties I have with the concept:

Linear view of a complex data structure

One of the strangest things about RSS is that it's been shoehorned into a variety of applications where it isn't ideal. Most dynamic sites will have a very rich structure and RSS is merely one projection of that structure onto a sequence. It's usually chronological, but it's always inflexible.

You are robbed of the richness of structure that the web interface provides. It's possible that the web author has gone to great lengths to provide a user-friendly way to navigate around the site and you're missing it by viewing merely a sequence of excerpts.

All blogs have a list of posts which correspond to the RSS feed. The problem is that there's usually much more on the page that you will miss out on.

Suppose I blog about cocktails (Cocktails are a really good example, that I spotted in Sean Kelly's screencasts about Plone. Cocktails are colourful, visual and rich in content: they have histories and ingredients). Maybe in the sidebar of each post I have a widget that links to other cocktails - a random cocktail, and "if you like this cocktail, you may enjoy these". Suppose I allow the posts to be filtered by what ingredients are available, and also by category/tag. The webpage also lets you sort the list in ways other than chronologically - by votes or by alcohol content. I've also got AJAX which pulls a little glossary popup on clicking arbitary terms. All very slick, integrated and non-linear. It's still a blog because the front page shows the most recently added articles. But an RSS feed of the blog wouldn't expose any of the richness. And surely that richness is what makes the difference between a brilliant site and Yet Another Blog?

To an extent there's a conscious choice that you might be cutting yourself off from that functionality by choosing to read it using RSS, but that doesn't detract from the argument that RSS isn't particularly appropriate.

A simpler example - a photoblog. Each post has a mini-gallery attached. The RSS feed can't (as standard) describe a gallery as nested within the articles - it can only provide the XHTML markup to describe how it looks.

Wordpress has to provide two feeds: one for posts and one for comments.

As I understand it, W3C's RDF format can describe these kinds of data structures and that seems much more appropriate. The argument that RSS is 'Really Simple' is nonsense: RSS is universally generated and consumed by software, not humans, so the complexity of the description is irrelevant to users.

Nobody knows what RSS is for

RSS isn't for anything specific. People use it in different ways. I recently looked through a list of aggregators on Windows, and I tried half a dozen, which lets me be somewhat authoritative on the subject. The aggregators vary between software which merely pops up a notification when there's new content, and full-blown power tools for notifying, merging and reading dozens of feeds. There is almost no commonality among feature sets. Firefox's RSS folders don't notify, they just silently list items. And then there's Planet, which creates a webpage by merging feeds.

This makes it difficult, as far as I'm concerned, to conceptualize uses for RSS. If authors don't know what their audience would like to do, then how can they know that RSS is providing the capabilities they want to provide? And so I suspect many authors just blindly provide a basic RSS feed, in case visitors want to do something with it.

RSS nominally does syndication - ie. it can allow publishers to collect and republish new articles, or adverts from other sources. This requires metadata, and to provide the requisite metadata requires some knowledge of the problem domain - knowledge which blog maintainers users don't have and which RSS doesn't encapsulate (although DC does).

More importantly, what does RSS do for me that I can't do without a new format? I can already poll for updates on a page by using an HTTP query like

HEAD / HTTP/1.0

If-Modified-Since: time-I-last-checked

and waiting for 200 Ok rather than 304 Not Modified.

If I was intending to improve on such a scheme I would think about using notification rather than polling. Or simply notify that there's something to poll.

Non-normalized XML

RSS is bad XML. It's rather ambivalent about using namespaces, allowing namespaces for DC and others, but the main problem is that it doesn't use namespaces to embed HTML content (and in most versions, doesn't provide a default namespace). Instead it usually (I think it's possible to do the right thing, but it doesn't appear that this happens in the wild) embeds the HTML and escapes it as CDATA either with an explicit CDATA section or character entities.

This is undeniably bad form. It means that to parse RSS you need not only an XML parser but a tagsoup SGML parser. And all the functionality that has been built for XML is lost. Validation, transformation, query, character-set awareness. Embedding structured data within XML CDATA is equivalent to storing non-normalized data in a relational database.

The argument that "well, we need to put HTML in there if that's what people are blogging in" doesn't hold water. The requirements of the format must be defined by what the consumers of the feed need. Data consumers want to work with a known data type, not whatever language I happen to be blogging in.

Atom is better XML but still allows for mixed content formats.

MP3-spliced Encrypted Filesystem

I've had a crazy idea for a way of protecting data using an MP3 collection. It's completely ridiculous, inefficient, and it can probably be shot to pieces. But it's fun.

MP3 streams consist of a bundle of frames. Frames begin with a 12-bit string of zeros, then there is a brief header which gives the bitrate and length of the frame, then the frame data. MP3 players should wait for the 12-bit sync, then read the frame, then wait for another sync (in most MP3 data this follows immediately).

It should be possible to bung in random bytes in there between frames and have players ignore it completely.

What if the bytes you stick in there comprise a filesystem? Say you use 10 6MB MP3s and pad them by say 10% with the filesystem data. 6MB filesystem! Enough for your most important secrets, and nobody is going to look for it there. Security through obscurity. Still, that's only version 1 of the protocol (ie. it occurred to me first).

Version 2: use a block cipher to encrypt the filesystem. Obviously, this is important as otherwise plaintext bytes are readily visible.

Version 3: hide the MP3s that you've used to create the filesystem in a collection of MP3s - a large but random number of MP3s that have been similarly padded, but with junk. Now you need the right MP3s in the right order. Choosing r MP3s in the right order from a set of n gives nPr combinations, which, for r n is approximately nr. For example, choosing 10 (in order) from a relatively modest collection of 500 (~ 29) is roughly equivalent to a 90-bit passphrase or a 15-character random password consisting of A-Z,a-z,0-9. But the nice thing about this is that humans should be good at reconstructing playlists from memory, even with thousands of MP3s to choose from.

Version 4: Stripe the bytes between different MP3s. Ensure that cipher blocks are split between MP3s. This ensures that you can't run a brute force crack attempt against part of the encrypted data because you can't dig a whole block out of any one file.

Version 5: (optionally) use some acoustic element of the assembled MP3 playlist as part of the passphrase for the block cipher. Entering one 'digit' of passphrase might be the equivalent of selecting a riff in the right song, or one particular lyric. Say you have to choose a 5-second segment from your 10 3-minute MP3s - that's about 8.5 bits of passphrase.

Version 6: swap some, but not all of your MP3s with P2Ps. There is an element of deniability - the random data may or may not be yours. Most MP3 collections I've seen have been collected from hundreds of different sources - and anyone using the system will have lots of MP3s with a mixture of junk and real padding, so to find many MP3s containing junk on any one system does not mean they have encrypted data. It just means they are guilty of piracy.

Told you it's completely ridiculous. But isn't it fun? :)

How I came to love developing in Python

As I've implied previously, I find PHP a desperately bad language for developing web applications. Python is my current favourite; it is a joy to work with both in writing code and maintaining code. Using Python, I can develop web applications faster and with more complexity, than I ever could with PHP.

There was a disaster a couple of years ago with PHP which was the reason my preference changed. PHP fell apart when it came to the crunch, but using Python I was able to rapidly pick up the pieces. I was developing an application which would display quite an extensive mortgage application form, collect the answers and print them back to the PDF, because the mortgage lender was still using a paper-based system.

I developed a system which read questions in XML. The asking of some questions could be predicated on the answers given to previous questions. This allowed me to omit questions which the original paper form didn't require, and this would mean that I could require valid values to all of the questions I asked.

I had written the system in PHP, as was our standard practice at the time. Obviously this required quite complex data structures; each question was an object, but the predication was effectively a parse tree which could be be evaluated - collapsed to a single value: true, false or unknown. PHP makes this kind of work a huge nuisance. It's only got a SAX parser, which means you need your own stack to parse it, and when you're doing any data structure work in PHP you have to be very careful to keep references rather than copies, which means you have to insert & in every assignment and function spec, and you can't update the $v in foreach ($x as $k=>$v) - that's also a copy.

The system worked on my simple hand-drafted test data, which was much of the first page of the form, but it was extremely laborious to set up the XML source, because the questions needed coordinates from the PDF.

I stopped work on the web application and swapped over to writing a tool to generate the XML input from the original paper form, which we had in the form of a PDF. I wrote a Java tool which called on Ghostscript to render the PDF, and displayed a Swing and Java2D UI to draw the fields onto the page.

A week of programming and 3 days and 12 pages of questions later, I plugged a completed XML file into the PHP application, and... nothing. Blank page. Couldn't get any output from PHP at all. It turned out PHP was segfaulting serialising the data structure. This was an almost impossible situation to resolve; the gdb trace was useless, the project was running late, and PHP wasn't behaving in a deterministic way, making it impossible to debug.

The best solution I could think of was to rewrite the entire application in a language I trusted more than PHP, and Python, which I had been experimenting with, seemed appropriate. I already had a very basic framework for writing CGI applications in Python, and even though I didn't start with a session system, I was able to write one, transcribe the PHP into Python, and get it all up and running within about 2 hours, which I remain impressed with to this day.

As I worked, I found I could transcribe every PHP construct into Python quickly and more succinctly. I could simply omit the & nonsense as objects are always passed by reference. It's amazing to be able to look at a block of PHP code, recall what it does, and write one line of Python which can do the same thing, omitting all the hoops that PHP requires you to jump through to construct data structures.