More on Django

As I come to the end of my first project using Django, I can offer a slightly better picture of how Django actually measures up in the field. In general I found that the bulk of the application was easier than I'd expect, but the remainder was more time consuming than I'd expected.

As mentioned before, Django's weakest link is its rudimentary template language. Content and data has to be digested and spoon-fed to the templates because the templating language doesn't provide the tools required to manipulate data arbitrarily. This doesn't allow the separation of presentation and controller code that is the purpose of a templating language. Instead, each template has a one-to-one dependence on the code which sets up the model. On the plus side, it's very easy to track where the thread of execution goes in these templates. A potentially dangerous issue is that the templating system defaults to not escaping data variables. This is effectively XSS unaware. Inverting the semantics such as to escape everything unless I tell you otherwise (to insert XML/SGML markup, say) would fix this.

The philosophy given by the Django documentation for the design of its templating is that

We wanted Django’s template language to be usable for more than just XML/HTML templates. At World Online, we use it for e-mails, JavaScript and CSV. You can use the template language for any text-based format.

This "one size fits all" approach is self-evidently misguided. XML and JSON need a guarantee of structural validity but they rarely need re-styling anyway. (X)HTML needs escaping. Javascript (other than JSON) should be static anyway. CSV barely needs an API, let along a templating language. Emails... well, OK, this is a fine templating language for text-only emails.

I'd also found my Django application hard to structure. Views (functions called by the Django context in response to different request URIs) shouldn't contain application code. Following that philosophy, my code is split mainly into four classes in two Django "applications", a handful of utility functions in the ORM models, and everything else is wrapped up in the 20 or so views. They don't do more than delegation and template context setup, but there be semantics in them there views, and those semantics aren't codified a form that can be nested and built into more advanced and interdependent web applications. For larger applications, Django will need code to modify and extend the framework, but there isn't a good API to plug in this code. It's a mixture of adding the class names to the config file, and giving modules special names within the package.

These issues tends to convey to me the general sense that Django code is hard to tie together. Overall, to build an application as elegantly structured as my current web shop code would require a sizable layer on top of the Django framework, with the occasional aspect of Django re-written too. But then, my current shop code has already borrowed some of the best aspects of programming with Django as well as a handful of other frameworks too ;)

Time

Look, an actual website which uses Swatch Internet time! If you haven't heard of Swatch Internet Time, it was Swatch's bizarre marketing ploy from 1998 to unify time on the Internet by promoting a time system which was baffling to everyone the world over equally. With an @-sign so that everyone know's it's all Internet-y.  A sensible approach to i18n for time is included in the HTML5/Web Apps 1.0 draft. I'll talk more about this spec soon.

In other news,  my desktop box has a dmesg entry stating that it inserted a leap second last night. Leap seconds are the extra seconds that get wedged in on the occasional 30th of June or 31st of December to correct UTC for the gravitational deceleration of the Earth. However, there was no leap second scheduled for last night. I have investigated a little bit but not deduced the cause. Apparently leap seconds are configured by ntpd using the kernel linux/timex.h API. NTP servers pass out announcements about leap seconds. Either my kernel or ntpd has its knickers in a twist or a low-stratum NTP server I've trusted has erroneously issued a leap second. Obviously this is pretty much immaterial but for some reason it really frustrates me.

Paypal Encrypted Web Payments

I've spent this afternoon fighting with Paypal's developer sandbox to make encrypted web payments work. This is the system of whereby the details of an order are transferred as a form field encrypted with public key cryptography, but I wouldn't expect anyone to know that because Paypal has a wide glossary of internal terminology that is almost impenetrable to the novice.

I have never done a particularly large amount with Paypal. It's come up occasionally but I've always done the hastiest job possible, perhaps pasting some code from PaypalTech.com or something. This time I've been working in Python with Django so I've had to develop everything from scratch with the M2Crypto OpenSSL wrapper. Paypal's developer documentation lacks sufficient detail for a clean-room implementation (thought there are numerous examples to be found, none exactly corresponded to M2Crypto/Python). Furthermore, it does not give useful error messages, which can make it extremely troublesome to integrate with.

This is how I made EWP work.

  1. Paypal expects data to arrive as a set of key-value pairs, which you should already have/know about. The documentation for these is extensive.
  2. Make the payment system work unencrypted, using input fields with the corresponding names and values to the key-value pairs. This ensures that the information PayPal needs to receive is correct before you start faffing with encrypting that data. If you do not have a PayPal account, you can sign up for a "sandbox" account, then register as many scratch accounts as you like. Note that PayPal does not email you the confirmation emails for sandbox account; they appear in the sandbox interface under the "Emails" tab.
  3. Generate an SSL keypair in PEM format with openssl genrsa . If you view this file, it is titled as "PRIVATE KEY", but it does contain both private and public keys. The exact commandline arguments for this command are documented in the PayPal Website Payments Standard Integration Guide.
  4. Generate a self-signed SSL certificate from your keypair with openssl req. An SSL certificate contains your public key, some details about the owner, and one or more cryptographic signatures. Again, this is well documented.
  5. Exchange certficates with PayPal by logging in and visiting "Encrypted Web Payments" under "Profile". You save PayPal's certificate to a file, and upload your own. Paypal assigns a certificate ID to your certificate, which you must now add to your key-value pairs under the key cert_id . It displays this certificate ID in the table and will also email you a copy. Recall however, that PayPal's sandbox development server does not actually email you; "emails" are stored and are available on the web interface under the "Emails" tab.
  6. Generate the plaintext for the signing by encoding the key-value pairs as key=value , separated only by linefeed (\n, ASCII 0x0a) characters. CR-LF does not work.
  7. Sign the plaintext using S/MIME. This requires both your private key (for the cryptography) and your certificate (to identify whose signature it is). Use these options:
    • Use binary input mode, which prevents OpenSSL munging its input.
    • Encode the data in opaque form. This implies that the text to be signed is encoded along with the signature, as opposed to detached, which doesn't encode the plaintext.
    • Output the resulting PKCS7 structure in DER (not PEM, nor S/MIME) format. This is a binary format.

  8. Encrypt the resulting DER using PayPal's certificate (ie. public key):
    • Again, use binary input mode
    • Use the 3-key triple-DES, CBC mode block cipher. OpenSSL calls this des-ede3-cbc or just des3.
    • Output the resulting PKCS7 structure in PEM format, which is a base64-encoded format.

  9. Insert the PEM blob into a form field named encrypted. There must also be a hidden value form field, named cmd, with value _s-xclick.

PHP Superclass constructor

I have just found this expression in some PHP code I wrote around two years ago:

// Call the constructor of the parent class

$this->{get_parent_class(CLASS)}();

I've never seen this syntax before, and I couldn't find any reference to it in the PHP documentation. The closest I found was a brief mention that curly brackets could be used to resolve ambiguity in expressions like ${$a}[0]. I have probably pasted it from somewhere after taking a dislike to some of the other ways of calling the superclass constructor (which must be done explicitly in PHP).

This looks like it might be useful in a variety of situations, except that I'm reluctant to use what is, AFAICT, undocumented syntax.

2012 Olympic Logo

2012 Olympic Logo

I'm watching the London 2012 logo fiasco with interest because it's very rare for the public to take an interest in graphic design in this way. The criticism of it has covered almost every aspect, and there are remarkably few people who actually like it. This logo represents £10 billion of investment so it's crucial that they get it right. On that basis, £400,000 isn't unreasonable.

If we are talking about a budget of £400,000 for just the branding (and I believe that figure covers the production costs for the entire marketing campaign), we're in a very different league to the kind of ad-hoc logo design I usually deal with. Normally with logo design, I come up with a few ideas, as different as possible, based on what I perceive the brandee's identity to be, and there's usually one or two in there that are decent enough for the client to want to run with.

Trusted with a budget as large as this, and forced to provide some measure of accountability rather than just using Inkscape's random polygon tool and stuffing the cash into my pockets, I would probably conceive of a procedure like:

  • Write down design criteria that the marketing campaign must meet, both at a technical and an aesthetic level.
  • Production of a whole load of logos that meet the formalised criteria.
  • Allow LOCOG to narrow it down to a few candidate logos.
  • Pitch each campaign and logo to a separate focus group comprising a proportion of foreign nationals, Brits and Londoners, to judge public response to each. At this stage, you can not only ask whether they like it, but actually collect feedback on how it can be improved.
  • Repeat steps 3 and 4, unless the response is so poor that you have to return to step 2.

I cannot imagine that this logo has come through any such process. Focus groups are cheap and they can prevent mistakes which cost millions! I can conceive of how the graphic designer might submit this to LOCOG, but not how this could have been selected as the final logo unless the alternatives were truly dreadful, but that does not constitute endorsement and focus groups responses would have reflected that.

I would envisage that design criteria for any Olympics logo would be along these lines:

  • MUST incorporate the Olympic rings device unaltered and preferably in full colour.
  • MUST incorporate the name of the host city in legible roman script, and optionally local script.
  • MUST incorporate contain the year 2012 in legible Arabic or Roman numerals.
  • MUST NOT incorporate other text.
  • MAY convey a mild national theme or style, contemporary if possible.
  • SHOULD convey athletic achievement and/or Olympic tradition.
  • SHOULD remain identifiable as the Olympic logo regardless of treatment, orientation and low-fidelity reproduction.
  • MUST NOT exhibit any image likely to cause offense to any group, particularly with a view to avoiding cross-cultural faux pas.
  • SHOULD NOT exhibit anachronism.
  • SHOULD be distinctive, worldwide.

Do not construct URLs with concatenation

I'm working on an installation of the Joomla! CMS where none of the links are working correctly. Joomla! is very sloppy with URLs. The uploads directory appears to be called images/stories but a quick grep shows that that exact string is referenced 146 times in the Joomla! installation. That's in the source code, not the database. Most of those times it is being concatenated into strings to make URLs.

I've just spent three hours working out that I have no idea what Joomla or the XHTMLSuite editor the client has chosen to use is doing and that I don't give a damn because whatever they are doing, they are wrong.

The correct way to construct a URL from a filename is not concatenation. Do not do this. It does not work properly. So to avoid any confusion let me state categorically how URLs are supposed to work.

Relative URLs are the only situation where a web browser tries to interpret the query string of an HTTP request. For this purpose, the URLs http://hostname/directory and http://hostname/directory/ are not the same. The latter form is correct. The former works because Apache works out that this is a directory and issues an HTTP redirect to "canonicalise" it. Never hard code a URL for a "directory" which does not contain a trailing slash. If it isn't hard-coded, make sure that the application appends a trailing slash if none exists.

There are two operations which you then need to define to be able to construct URLs:

  • Given an absolute base URL A, and an absolute or relative URL B, compute a new URL B` which is an absolute representation of B in the context of A.
  • Given an absolute or relative URL, append a query-string parameter.

The first operation is not concatenation. Learn this.

In notation, let A ~ B = B`

So say you want a URL for a specific uploaded image. Start with a base URL for your site.

http://mysite/

We then have a relative url of our image directory from the base url.

images/stories/

Then http://mysite/ ~ images/stories/ = http://mysite/images/stories/

We have a filename of our image, "Uploaded Image.jpg". First, we need to make that a relative URL. This requires URL encoding:

Uploaded%20Image.jpg

Then http://mysite/images/stories/ ~ Uploaded%20Image.jpg = http://mysite/images/stories/Uploaded%20Image.jpg

At this point we have a working URL. I know, it looks like all we've done is concatenation, and that's why people appear to make this mistake time and time and time again. But it isn't concatenation. What if our base URL was http://mysite/CMS/ and our images URL was /uploads/ ? Or what if our images URL is http://uploads.mysite/?

More than that, using this operation doesn't let people go wrong. It discourages them from just wedging a / in there in the hope that it will make their URLs work, and prevents ambiguity about whether a piece of code works in all situations or just the way they've got it configured.

Unit Testing

I am missing a way to write unit tests for web applications. I found a few options online, but they aren't really along the lines of what I'm looking for. I want to be able to describe unit and regression tests with respect to expected or unexpected DOM fragments, make requests, fill and post forms, check the results and run the whole test suite automatically as a cron job or before committing. I want the whole test suite to be described in XML so that writing web-level tests doesn't require programming, and so that it can go into Subversion along with the project code.

I think I will have to write this myself. In fact it's something I've been really wanting for years. But I'm way too busy at the moment to do it.

Django

For the past couple of weeks I've been developing a web application for a client in the Django framework. I looked at frameworks before I started, wondering what I could use to make the development of this project easier.

I have previously mentioned why I abhor PHP even as I developed Mauvespace in that language, but as I started this project PHP seemed utterly unconscionable. PHP doesn't offer anything apart from a dodgy syntax, incomplete object model and an API consisting of a basic CGI wrapper and bindings for a few library functions. Alternatives included my Python framework, but that is a long way from being ready for the mainstream. In fact it's been all but abandoned. I originally wrote it to serve static sites generated from XSL and XML, but then found myself bolting in little bits of CGI scripts and eventually writing a whole ORM.

I had also been looking at Zope, and particularly Zope 3, but Zope has a ridiculous learning curve. It does seem appropriate for more large-scale, extensible development though. It's just that most web application projects for SMEs don't need to be so enterprise that they require unit tests for everything and interfaces saying what each class will do as well as an implementation. I am looking into Zope but I don't expect to find a project that it will be suitable for very often.

So Django then. I'd seen Django before and discounted it because of its templating language, which was not what I wanted. I wanted to use XSL, obviously. XSL and XML can together be handed to browsers. This makes writing AJAX applications much easier: no special handlers are required to be able to pass XML data to the client. With the same requests, Javascript can create whatever REST model it likes.

But regardless, it looked more appropriate than anything else at this point.

So Django then. I was really impressed with Django initially. However, while I managed to write about half of the application in the first two hours of using it, the rest has drawn out and out.

Django has an excellent ORM. However, it makes the mistake of trying to encapsulate all of SQL in the Python Object Model, which is a kind of antipattern in itself. This happens all the time with ORMs, because while it's easy to make database rows appear to be objects, SQL is designed to do clever things that you don't really want to re-implement in another layer just because you can abstract it. Django's ORM allows users to write code like Game.objects.get(id=the_game) to retrieve a single game. That's lovely. Then it allows you to write code like Game.objects.filter(category=this_category), and it lazily evaluates the query, even allowing you to chain these filters lazily. Which is fantastic.

Then you hit queries like finding objects with date ranges. In SQL this is easy.

SELECT * FROM games WHERE start <= NOW()

In Django this is munged into a horrific double-underscore variable name:


now=datetime.datetime.now()
Game.objects.filter(start__lte=now)

Two sets or more of double-underscores starts performing joins, I believe.

So Django's ORM is a bit unwieldy, but this should come as no surprise because I think many ORMs are like this. One that I have seen that avoid this is Joomla's, where you just hand off an SQL query to the ORM and it just populates objects from the results. Trying to construct complex queries in an object-oriented way is likely to be harder and less maintainably than just writing the SQL query, and so I think this is the best solution in general.

Django's ORM libraries handily integrate with other bits of the system, such as administration screens, meaning almost no work has to be done to make your database rows entirely editable, with configurable permissions, from the admin screen. This is a big win for Django. There's no way you could even contemplate this with, say, Zope, because the ZODB stores arbitrary objects and an admin screen would have to deal with, basically, all of Python.

The annoyances I've found later on in the development of Django is that it's not intuitive. There's a vast wealth of features, and you have to keep looking at the documentation to find them all. There is a lot of magic involved. This is Python's fault, really. Magic is easy to do in Python, and a lot of developers won't have any issues with jumping in and creating magic for ever situation. This means that there is much less code to be written for performing many tasks, but magic has drawbacks. Developers have to understand what the magic is doing for them. In languages where magic is hard, developers can read the code and know, with very little context, exactly what it will do and why. With magic, the object model and even elements of the language can be warped. I'd agree that magic is good for an ORM: it should mean that you can get an object that magically persists. You don't need to know what's going on to make it persist. However, Django goes way too far with the magic. Admin screens are driven by magic variables. The URL configuration is slightly magical. I think permissions are almost entirely magical but I haven't got into that yet.

Then there's the templating language. There is a much-vaunted feature called template inheritance (according to the documentation "the real power of Django templates". In practice this doesn't do anything more than allowing you to stuff some chunk of content into another template. The only situation this appears to be useful is in stuffing the main content of your page into a layout. Woo. It doesn't allow you to do really useful things, like calling templates that create generic page elements with arbitrary content. At least, without creating an inheritance tree of dozens of little template element files in a manner that is likely to be quite unmaintainable. Templates can read Python variables and attributes of Python objects that are passed to them. They can apply "filters" which are written in the core Python code. They can't do anything else, like even computing simple expressions. This is annoying; few other templating languages are so restricted.

In future it's likely that I'll write a Django middleware to do XSL transformation. Then I won't need Django's templating. This doesn't sound like it will be too hard.

I'm currently trying to solve a problem with User authentication. I have a function, tickets_remaining() which I wanted to add to the User authentication class. But I can't do this. I need to create a separate Profile model and retrieve this instead.

So Django: on the whole, I'm impressed with it. The design is easy, extensible, and fast to get going with, but there are some caveats with the API and particularly the templating.

Payment Gateways

Why are payment gateways such a pain to integrate with? There are only two real models:

  1. Merchant site directs the user to the payment gateway site with details of what they are paying for. Confirmation of the transaction is POSTed back.
  2. Merchant site collect the data on a secure server, and requests payment via RPC.

So why does every payment gateway have a myriad slight variations? This means developers have to write adapters for each payment gateway, running the risk of introducing various security vulnerabilities in each. But not just that, error handling is at best muddy. I've used systems with non-existent error handling. I've even written systems without error handling (the trick is in the wording; you have to avoid saying "Thank you for your payment" and instead say "You will receive confirmation of your payment by email." This architecture is not my design, I hasten to add).

Strangely, these variations on a theme seem to have a single meme for the documentation: to split it between at least two PDFs. These PDFs are usually along the lines of "Integration Notes" and "Advanced Integration Notes", although if both of the above models are supported there could be one or two more. I haven't the faintest idea why these people think that PDF documentation is better than HTML.Which is what the rest of the world uses.

The payment gateways should put their heads together and come up with a standard. Two protocols. So that with a single library, you would be able to use whatever payment gateway you want, without having to maintain dozens of adapters. This could also take the time to remove paranoid "security" checks, like verifying HTTP Referer headers (which is nonsense), and they could codify how to make Payment Gateway pages look less rubbish (because they always do).

MySpace Errors

Ha! For as long as I've had an account, MySpace has been plagued with messages saying "Sorry, an unexpected error has occurred"". This happens quite a lot, probably every couple of minutes. Obviously, MySpace has unique load problems, but I'd be cautious that any application the size of MySpace written in ColdFusion as opposed to plain Java Servlets won't just fall apart.

But now, the MySpace administrators have come up with a really clever solution. It appears they've changed the error message. It now reads "This user's profile is down for routine maintenance". Not an error at all!

It's easy to tell this is a lie, because the errors appear when you aren't viewing a profile, like when checking your mailbox or viewing bulletins: in fact, at the same frequency as the old one used to appear. Even if you buy it's relevant somehow, profiles going down for maintenance ever couple of minutes sounds equally incompetent.