The HTML Standards, Part 1

Daniel Pope

I am an XML addict. XML has that simplicity and elegance that programmers crave. XML represents a flow of structured data between applications in a form that is an ideal blend of computer-readability and human readability, and that makes profound sense to a lot of people.

XHTML bottles that for web markup. HTML does not.

I have been using XHTML exclusively since before I started Mauve Internet. The transition was not hard because I had already been working for a long time in the kind of rigid mindset that XHTML mandates. My HTML was not tag soup, and this was instinctive, because I'm a perfectionist and not a pragmatist. There are actually advantages to this anyway; for example it's possible to relocate <P>'s that are explicitly closed anywhere within an HTML document without changing their semantics. This is not possible with implicitly-closed elements, because implicit closing is context-sensitive. Also DOM scripting makes much more sense if the UA's DOM matches the apparent source structure.

Anyway, publishing XHTML requires these steps:

Change the DOCTYPE and add the XHTML namespace.
Get your markup to validate as XHTML. This is simple, because XML is simple.
Get rid of the inelegant commenting you've been using to hide styles and scripts from old browsers. This always made me queasy anyway. So link scripts and stylesheets instead.
Negotiate on the HTTP Accept header (because not having a working website in IE is not usually acceptable). I prefer to procedurally convert XHTML to real HTML rather than use the XHTML compatibility provisions. This requires maybe 30 lines of code in Python but obviously adds a small overhead in extra processing.
Make any scripts XHTML/HTML agnostic. document.write(), the function that largely guarantees your pages won't degrade gracefully without Javascript, must go. document.createElementNS() should now replace document.createElement(), if it exists.
Make any styles XHTML/HTML agnostic. The big catch is the difference in body versus html element semantics.

Serving XHTML gains you XML elegance, an extensive suite of tools, embedding XML from other namespaces, embedding XHTML in other XML, custom extensions (useful for scripting), DOM libraries and easier processing, screen-scraping and so on. You lose very little. There's a few niggles involved in serving it and then Mozilla won't display it incrementally until Firefox 3.

Other than that, and as I've already implied, XHTML codifies the best practice for web page design. Much stuff that was inelegant and hard to maintain in HTML is banned or really inconvenient in XHTML, and this is as a direct consequence of XML being rigidly elegant and hard to shoehorn sloppiness into. You should treat XHTML conformance not as conformance to a different markup language but to a best practice, maintainance-friendly school of thought.

I briefly mentioned Internet Explorer's lack of support. Poor, dear old Internet Explorer, being a shit, as ever, like a bigoted, racist, unintelligible old man whom you'd rather not converse with any longer than you have to and who you secretly hope would just die. IE got left behind with XHTML, or rather it got left behind entirely for five years before it had cottoned onto XHTML. IE has its little crowd of web developers who prioritise it, treating IE's behaviour as the standard rather than... well, the standards. Similarly, there are those people who just don't know or don't care but use software which embeds the IE-powered MFC CHTMLView and therefore targets IE.

Obviously nobody would reasonably hold up the corpus of websites which aren't using XHTML yet, and that small collection of compatibility problems, as evidence that XHTML is dead, in the face of the overwhelming value to its users, would they?

Comments