Atom and RDF

I'm annoyed with Atom.

I was hoping to use Atom to describe a range of things within Mauvespace, such as blogs, logs and so on, but it appears that even though there is a mapping from Atom to RDF, there is no inverse mapping from that RDF vocabulary to Atom, because it is not universally possible to convert in that direction.

For example, the <atom:author> element mandates exactly one child element <atom:name>. Even if an RDF reasoner can assume an author has a name, it does not necessarily know what it is. Also, if you pull Atom data from two feeds written under different pseudonyms into an RDF model and then claim that two authors are the same, the model stops being able to distinguish which name each feed was written under, unless you add a vocabulary to subclass Atom authors as pseudonyms of FOAF people.

These may seem like gripes about the mapping, but it's more serious than that. It means that there is no bijection between an arbitrary RDF model and a valid Atom 1.0 document.

I can see a few options:

  • Map RDF to Atom only. Construct a mapping from any sensible RDF model structure to Atom. That this would not be bijective means that the software could not import from Atom. An alternative RDF version would have to be provided to import triples.
  • Work instead with RSS 1.0, which is pure RDF but is widely considered deprecated and isn't as expressive as Atom anyway. Trivially, however, this is bijective.
  • Map Atom to RDF only. Create a separate Atom store and map into RDF only temporarily when I need to reason upon it or style it with the RDF template code. Lack of bijection means the store could not invariably re-export Atom intact. Synchronising updates bidirectionally between Atom and triple store becomes an issue.

PHP4 must die

Sitting down this evening to code some PHP from scratch after a couple of months of working exclusively in Python, I am stunned to realise how bad plain PHP (PHP4, we're talking about) is. The shop code provides a fairly comprehensive framework on top of standard PHP, wrapping database, output and error handling. Without it, PHP is so much more dreadful than I remember.

I'm actually amazed that PHP doesn't print a stack trace when there's an error. You're clearly not supposed to write functions.

Perhaps I will make the jump to PHP5. There are compatibility issues even now, but maybe it's simply time.

Plan for 2007

New Year is a good time to look forward to the things we hope to achieve over the next year. So I thought I'd define now my main (technological) priorities for the year ahead so that I can get some sense of focus.

  1. Get up to speed on RDF and get using it in applications. I am not a total stranger to RDF but I've not used it at all so far. The main focus of my effort for now is a new project called Mauvespace. Mauvespace is an open-source web application that is a cross between a semantic CMS for personal homepages and a full social networking service. I don't want to hype it too much now though until there is something to show. But I hope very soon to roll up all of my homepage stuff from Mauveweb into Mauvespace, then throw it open to other people to use it for the same thing, either on my server or on their own. This frees up the mauveweb.co.uk domain, which could become a place for web projects. Sorry about all the 'Mauve's. I guess I'm not very imaginative with names. Although, it works as a brand, I suppose.
  2. Deploy some applications using Zope. My Python web applications are becoming increasingly Zope-like. The latest one I've been working on for a client is a self-contained web server, but that's partly because I wanted very careful handling of file uploads. I needed to remove file size and memory limits imposed by PHP, and I implement concurrent querying of the status of uploads, which allows me to provide AJAX progress bars. There are lots of parallels with Zope: that it's Python; that it's a web server; that any persistence is object-based (although in this application it's in-memory persistence; non-volatile data is retrieved from other network services mandated by the brief). Anyway, in 2007 I hope to transfer from ad-hoc Zope-like systems to Zope proper with all the advantages that brings. It's just a shame there have always been reasons not to so far. Unfortunately Mauvespace is PHP by necessity. PHP is the only language that enjoys widespread hosting support and I consider that vital.
  3. Hack Inkscape. Inkscape is of course hugely important to my work and as a result I've become quite involved with making sure it meets my needs, mainly through bug reporting, feature requesting, and so on. I would like to stretch my C++ legs and improve things, if I find time. Incidentally Inkscape 0.45 has been bug hunted and is moving to feature freeze very soon. The headline news is the Gaussian blur feature but there are a plethora of other improvements too.
  4. Continue the high standard of technical commentary on this blog :) Actually, I wish I could get it more organised and make it more accessible to people who aren't knowledgable web developers. But if it would be less personally useful to me if that was the case. So the status quo may have to suffice.

Cineworld Cinemas

Cineworld Cinemas' website has been revamped again recently. It was not all that long ago that it was last done, but it has frequent had performance problems which leads me to believe that this is why it has been redone (more or less from scratch). I use our local Cineworld Cinema a lot. I saw 37 films there last year. This stuff matters a lot to me.

This makes it the third iteration in a row with severe accessibility and usability problems.

  1. The earliest website I saw was static, but ugly with a large spinning raytraced star. This was their branding style at the time. Although it had weekly film times, you could not book online. You had to phone a telephone number which had a horrific voice recognition system to book. Film times were displayed in one weekly timetable, by cinema.
  2. This was replaced by a much more contemporary website in their new branding style with AJAX drop-down menus for booking and searches for film times. This was clumsy and unintuitive; the menus looked exactly like tabs, and you were supposed to select one item from each tab/menu - cinema, film, showing, number of tickets - before moving on to book. The link I needed was a less prominent "What's On" at the top of the page to get showing times for the week ahead. However there was no way to bookmark the showing times for my local cinema, because it was a form POST. Most people are unlikely to want to search for their nearest cinema every time they are thinking of going! As I mentioned, this site ground to a halt regularly.
  3. The new one looks similar but works even worse. There are three somewhat cryptic tab/buttons called Cinemas, Films and Dates, plus a larger button saying "Find out what's on here and book now" which doesn't do anything. Cinemas takes you to a horrid Flash map to select region and cinema, but will then display showing times for today only: much less useful than a week's timetable. But it can now be bookmarked. Films lists all films that are showing at Cineworld Cinemas. But not necessarily cinemas anywhere near me. Dates takes you to, via the Flash region selection map, to a screen which lets you pick one date, one time, and one cinema to see which films are on. It then ignores the cinema you chose and displays film times for all cinemas in the "region" (19 cinemas covering the whole of the South of England). The page title, for the whole site, is "Cinematheque1". I can't operate this site on my smartphone, perhaps because it doesn't support the latest versions of Flash.

I just find it bizarre that their website should get so steadily worse, especially when Odeon was so strongly criticised for lack of accessibility.

The death of * HTML

As I've now started to look at how some of my sites work in IE7, I discover that the main thing that has gone wrong is that the hack where you prepend CSS selectors with * html is now disabled. Of course I could have found this out six months ago but frankly, learning the particular quirks of an as-yet-unreleased and sickeningly broken user agent is not something I am going to invest time in.

Obviously, this means that for any site that IE7 breaks, it is failing on the standards-compliant CSS. But equally, as I noted before, not as many sites break as was expected. So that's pretty good news.

But for the sites that do break, and future sites in general, the situation is a mess. It's now necessary to split IE7 styles into different, conditionally included files, either by selecting on UA (which is unreliable) or use a gut-wrenchingly sickening IE misfeature called "conditional comments". IE6 rules need to be moved too, as IE7 will have significant common ground with IE6 and you don't want to have to maintain two copied of any rule that is shared.

Having rules split between different files makes it harder to work with, because in CSS you need to literally compare selectors to work out the precedence. * html is much easier to work with because you can place your IE rule right next to your real browser rule, and easily cross-reference the differences. Or if you change the real-browser rule, you can also amend the IE rule at the same time. More annoyingly, if you already have a tidy collection of CSS stylesheets importing one another with the @import statement, you have to either collect your IE styles in one file, or duplicate the tree for IE. Neither is very maintainable. The @import statement is not very useful anyway though, because CSS's precedence rules don't allow CSS to be modularised in an elegant way.

In Microsoft's defence, they are looking for a painless route to standards compliance that in all honestly does not exist. But I apportion the blame entirely to them, for two reasons. First, it was they who let the situation regarding standards compliance get so out of hand first; in effect, they are five years too late in starting to take this course of action. Secondly, bundling IE with Windows is one of the most devastatingly damaging things they have ever done. The cost to businesses worldwide could run into hundreds of millions, if not billions of dollars, and this money is not even paying Microsoft shareholders: it's being flushed down the toilet. Web designers to some extent profit from it at the expense of other businesses, but even so, we would rather not have to do it because we would then simply be able to achieve more. We would probably charge more or less the same, but all sites would be stunningly beautiful with rich interactivity.

CSS, has its own problems, of course, but these would have been very hard to predict all that time ago when it was drawn up:

  • Selector-based stylesheets cannot be refactored without reference to a schema for the source document.
  • CSS cannot deal with varying capabilities across implementations.
  • CSS cannot be modularised, because selectors can very easily collide and supercede each other, dependent on the way the selector is described rather than the structure of the module. For modularised HTML/CSS 'components', you really need to prevent styles being overriden unless explicitly requested.
  • CSS units cannot be specified as an arithmetic operation on unknowns, such as '(1em-3px)'. This means dimensions measured in ems (useful for text) are incompatible with lengths measured in pixels (useful for images).
  • CSS has insufficient control of vertical positioning. The basic operations available are "the vertical order is the same as the document order", "x is at this vertical position", and with a bit of creativity with floating blocks, "x follows all of these", plus one modifier, "all my descendants lay out relative to me".
  • CSS doesn't allow constants. This means that you have to repeat constants, say colour codes or border styles, and change them in more than one place.
  • CSS has some bizarre quirks. For example, it doesn't include padding in the width and height dimensions, so if you increase the padding, you have to decrease the width correspondingly. IE for years did the opposite, which is totally wrong but much more intuitive.
  • CSS selectors are lacking tidy disjunctions and assertions, things that exist nicely in XPath. It's possible to work around these, but not necessarily succinctly.

IE7 scores a point over FF2

If anyone's keeping score, I've found my first snippet of code which IE7 handles flawlessly but FF2 does not.

It's code for creating multiple recipient email fields with Javascript, such that creates as many as necessary and keeps the last field spare. Specifically this code is for deleting fields, which happens when you backspace or delete from a field which you've already emptied. This is modelled on how Thunderbird/Icedove's compose window behaves.

function autoCompleteContact(e)
{
    var event=(window.event)?window.event:e;
    var target=(window.event)?window.event.srcElement:e.target;

    //catch backspace or delete on an empty field as deleting the field
    if (target.value == '' && (e.keyCode == 8 || e.keyCode == 46))
    {
        //don't delete the last field if it's empty
        if (target == recips[recips.length -1])
            return;

        //find the container element for all recipient input fields
        var recips_container=document.getElementById('recipients');
        if (!recips_container) return;

        //Javascript doesn't seem to provide an array remove()
        //so do effectively newrecips=recips.remove(target) manually
        var newrecips=new Array;
        for (var i=0; i < recips.length; i++) {
            var r = recips[i];
            if (r != target) {
                newrecips.push(r);
            }
        }
        recips_container.removeChild(target);
        recips=newrecips;

        //Focus the last field
        recips[recips.length-1].focus();
    }


    ...

IE7 does it perfectly, FF2's layout breaks and it starts showing the contents of the INPUT elements in the wrong place.

GnuCash Accounts

The past couple of days have been spent tidying up my accounts in GnuCash. It's great when it all comes together and your accounts reconcile perfectly with your statements.

I like GnuCash a lot actually. It's slightly harder to get your head around than just listing your accounts in a spreadsheet, but much more powerful when it's done. Because money always has to go from somewhere, to somewhere, you can view transactions from both ends immediately. So every time I pay for a domain name on card, I see the money transfer from my credit card, with the net cost going to the registrar, and the VAT value going to my VAT account and reducing my debt to the VAT man. And then I can turn it round and see the actual cost to me of the domains, or track my VAT debt.

The other neat thing is that accounts are nested, so for example, I can create an account for each client within Accounts Payable, and see how much each client owes, plus clients' debts to me can be included within my assets. GnuCash's own customer invoice tools don't do use subaccounts though, which makes them actually harder to work with than doing it manually, I find.

At first I found GnuCash kind of quirky, and I did struggle with it. But the new 2.0 series is better on the UI front (now a GTK2 app) and now I know what I'm doing with it, it's actually quite easy to get everything to work and incredibly useful when it does. It becomes quite frustrating that all the other accounting information I receive is in a simple flat transaction list, like a spreadsheet or a bank statement or some printed accounts. It's not wrong; there may be no other way to do it; but it's simply not so elegant and right.

All I need is some way to get the accounts data to my accountant.

I tried a few different ways:

  • Linux VM with GnuCash and accounts, burned to a CD along with VMware player. Couldn't get Ubuntu VM to fit on a CD; Debian and Damn Small Linux wouldn't install properly.
  • Converting to QIF with a Java tool. Tried importing this into Grisbi and it looked a mess.
  • Importing GnuCash directly into Grisbi (with the intention of exporting to QIF or CSV or something). Seemed to make a mess of it, not as much as the Java exporter, but the account balances were all wrong.
  • Transforming to Gnumeric sheet with an XSL stylesheet and sabcmd. No account balances, but these can be added quite easily within the spreadsheet app. Required me to install Gnumeric.

I sent the QIF and the spreadsheet (saved as XLS) to the accountant. Other ways that occurred to me:

  • Hand them an Ubuntu CD and my GnuCash files. This would require them to reboot into Ubuntu and GnuCash isn't even included on the CD anyway.
  • Hand them an Ubuntu CD, an empty VMware VM and my accounts, and let them install everything. Probably too technical and overkill.
  • Set up a VNC server that they can log into to access a copy of GnuCash. Security aside, I don't know what kind of connection they have. It could either be too slow for them or it could DoS my outbound connection.

Zope

You might have noticed that the title of this blog includes the name Zope, but that I have, thus far, not so much as mentioned Zope. Not to worry, I have been looking into Zope, but it is a big subject, and it's incredibly time consuming to just sit and read swathes of documentation to even get an inkling as to how to work with it.

I do have a couple of ideas for web applications which I would consider writing/rewriting in Zope pretty soon.

I have only looked into Zope 2 so far, because that's what's available in Debian Stable, which is what is currently running on my web server. I am starting to understand some of the things Zope does, but I'm not at this point particularly sure why it does them.

The truly wonderful thing about Zope is the ZODB, Zope's Object DBMS. This was what really attracted me to Zope in the first place, working as I have been with Python CGIs with persistence mainly provided by my session class, which just pickles objects to temporary storage.

ZODB allows you to avoid dodgy object-relational mappers and just store real Python objects. This sounds wonderful, but in practice it's nowhere near this simple and if you have any experience of concurrent programming you will know why. Relational databases these days provide ACID for you quickly and effectively, and they can do this because the operations to perform are passed to the DBMS for it to execute. When operations must be performed in application code and there can't be a way to transparently persist the objects without running into concurrency issues. I couldn't understand how the ZODB was transparently solving these concurrency issues, so I read up a bit further on that subject, and as I understand it, it isn't. You get transactional locking, but the granularity is very coarse, so you need to do your own concurrency programming if you need performance. Happily it looks like there are some utility classes which implement some abstract datatypes in a concurrent way.

Zope provides 'products' which are object systems you can import into the ZODB hierarchy and combine to produce applications. To achieve anything useful I believe you must write your own products, which might manipulate other products (your own, built-in ones, or third-party) within the ZODB to achieve their task. For example, the folder object contains other objects and allows them to be accessed at a sub-URL.

Most of the examples involve DTML and ZPT, templating schemes which, although they could produce XML to plug into XSL if I wanted, just don't really fit the way I work.

There's also a Python Script object which might be useful as glue but is sandboxed quite severely.

At this point I must explain my main reservation with Zope 2. It's that the strategy of constructing object hierarchies by creating and configuring object instances in the ZMI (Zope's web-based administration interface) means that your web application is only stored within the ZODB. It depends on the contents of the database not only for data but code structure. Therefore none of your existing tools will work.

That means, for example, you can't use tools like Subversion to maintain a code repository or create a working copy. Perhaps it's possible to use it to maintain your Zope products, but not your entire web application. If you're already using Subversion, it's difficult for anyone to construct any kind of argument as to why you're better off not using it. ZODB's own versioning is not equivalent. In fact, almost anything you already have a tool to achieve, you must write a script to do in Python, or else do manually with the ZMI.

My other point of issue is that Zope's documentation describes in detail a principle called acquisition, which is really some magic with the Python object model to inherit functionality from other objects, a lot like subclassing, but not. The documentation has not, however, given me the slightest inkling as to what it's for.

I've just been reading a few bits and pieces about Zope 3 which suggest that it might be a more straightforward step, providing the bits of Zope that I do want as libraries without forcing me down a road of total Zopiness which would mean my abandoning of some of the tools that I have come to rely on.

Apache Batik

I installed Apache Batik for the first time last night. Batik is a Java library for SVG, particularly rasterising drawings. It did exactly what I needed (converting SVG graphs to PNG) without much fuss, either with installation or integration (I used the commandline rasteriser). I'd just not had a situation where I'd needed something other than Inkscape's commandline before.

I did have to install some nicer ttf fonts as there weren't any installed apart from Lucida in the JRE, but simply using aptitude to install an xfonts package worked.

It was a little slower than I had hoped, unfortunately, even for these very simple graphs, which means that working with this in web apps won't be quite as straightforward. It was much faster doing batches than individual graphs.

Anyway, server-side rasterisation is one obstacle to SVG adoption which happily proves relatively simple to overcome.