Sir Tim Inaugural Lecture

Just watching the live video feed of Prof. Sir Tim Berners-Lee's inaugural lecture in the Electronics and Computer Science department at Southampton Uni. I can't see the slides which is a nuisance. I thought I'd type up a few notes as I listen.

He started off talking about engineering versus analysis of network systems. And creativity, which is part of engineering.

Amusingly, he is trying to talk about Web 2.0 sites but without mentioning the actual term "Web 2.0".

He made a big point about macrosopic social elements (the web community) deriving from microscopic (URI schemes and HTTP and HTML and stuff and junk). (This is exactly the point I make when trying to explain where TBL fits in to the history of the web: TBL is not responsible for the massive cultural system built on top of the web. It's mere chance that his distributed hypermedia system took root. A lot of people can't distinguish the utility of the web now from the seed protocols (not even ideas, as such, which were already established) that TBL gave us.)

He mentioned something about email and how it's abused.

The web - what it was intended to do and the primary concepts that drive it. Layering technologies on top of one another. Wow. Abstraction.

The web is an information space. A mapping between a URI and some information.

PageRank. Google. Deriving macrosopic web usage models from something very simple like number of links. Audio went a bit rubbish for a while but it's back now.

Wiki. How microscopic behaviour like collaborative editing grows into macroscopic systems like Wikipedia. This will revolutionise democracy and politics.

Blogs. Woo. The Blogosphere. May be rubbish. Who knows. Probably both rubbish and excellent at the same time.

Information in HTML format is not manipulatable. Se we need a semantic web to re-use data as data. RDF, OWL, SPARQL. Use URIs for things rather than web pages. And the relationships between overhead projectors and colours. Merge and query is very easy. FOAF networks. (Yay! I know all about those.)

Some websites are tables, some are trees, some are "hypercubes". (He keeps calling tables and matrices "rectangles". That strikes me a such a cute web-kiddy thing to do, labelling arrays as "Square, daddio" while graphs are new and "cool")

Something to do with trees and top-down OOP. (shrug)

What shape is the Internet? It's a net. (It's not. It's a fluffy cumulus cloud. Every first-year computer science student knows that.) It's robust.

The web is a web. What shape is that? What does that mean? It should be shaped like the world.

Common vocabularies for describing things with RDF. You get local collaboration to produce specific ontologies and you use some terms from global ontologies. Spatial things can be used in lots of applications. Overlapping ontologies.

The web is actually fractal. Structure at all different levels.

Much less work is done in describing ontologies than using them.

Web Science includes

  • User interface for the web. SemWeb doesn't have this.
  • Building resiliant systems. Against slashdotting, attack. At an architectural level.
  • New devices - handheld and large screens.
  • Creativity. Connecting people and making them more effective. Allowing them to understand one another; letting half-formed ideas in two different people's heads on different sides of the planet connect.

Right, done.

It was a whistlestop tour of web science I suppose, but I didn't really feel that it was particularly insightful. Of course I'm not in the business of rationalising the way that the web works. I just program. I think TBL has to try to rationalise it because that's what he's famous for; at a personal level he probably feels people look to him to explain the ways of the beast. But of course he didn't create it. Mainly people just create web apps and it either catches on or not, or it needs a bit of pointling to actually make it work the way people want it to. With a lot of Web 2.0 sites, it just involves a huge amount of development to get to the point of having a web app that works well enough and scales, and then creative ideas can be tried out on pieces, beta tested and deployed.

This is exactly how the web started and evolved and I don't think I understand how we got to where we are now any better than I did before. I don't think it's possible to either; the web evolves in parallel across the globe. It doesn't have a single history behind it or a single motivation driving it.

There is a podcast available.

Mauvespace 0.1.0

Yesterday evening I finally managed to release Mauvespace. You can read more, download or signup on mauvespace.com.

Version 0.1.0 is a kind of halfway house to a full social network. It's got a blog, user details, photos and a templating language, but it can't syndicate any of the information that it exports.

I'm eyeing up Magpie as the parser behind blog syndication, and RAP can already parse RDF/XML so that's a pretty good start. The main issues are in finding profiles to syndicate, importing them into the database and making sure that it's all updated properly.

I keep thinking of new mashups that the Mauvespace model allows. In fact it's a bit rich to even call them mashups. Mashups are usually defined as third-party scripts that combine and relate data from various large online databases to display interesting or useful things. With Mauvespace everything is a kind of mashup. Its data is (well, will be) drawn from a distributed semantic web and the templating language makes no distinction between local data and syndicated data.

My next task is to do some publicising. I'm also going to do a couple of more varied themes for 0.1, I think, before I start doing anything involved for 0.2.

IE DOM Tables

More outstanding issues are biting me with Internet Explorer 7, specifically these two well-known issues:

IE doesn't render table row elements <tr> unless they are added to an explicit <tbody> element. (The HTML and XHTML DTDs allow <tbody> to be optional/implicit.)

IE doesn't accept XHTML attribute names as XML DOM setAttribute() keys, requiring instead HTML DOM HTMLElement member variable names. (The HTML DOM is defined as subclassing the XML DOM without overriding these methods.)

I'm now using the Internet Explorer Developer Toolbar, which comprises a few of the features of Firefox's Web Developer Toolbar and Firebug extensions. It's really helpful in diagnosing these IE issues.

Looking through some of the unofficial bug lists for Internet Explorer, you start to get an idea of just how far behind Internet Explorer still is.

Subtle IE tab order glitch

In Internet Explorer, both 6 and 7, automatically reloading an <iframe/> breaks the tab order and makes IE focus the address bar instead of the next field when you tab.

I wonder if anyone else has spotted this; it seems triflingly minor, but the client is so keen on tab order working correctly that they want me to rewrite the <iframe/> with AJAX polling.

Writing an RSS client

Interestingly, my latest paid project is to build an RSS reader. I am doing this not out of bloody-minded determination to reinvent the wheel, and I would be perfectly happy to adapt an existing project to work in the way I want, but none of the apps I have seen or tried does what I want it to do.

This project is a desktop feed notifier. It will poll feeds and pop up messages (non-intrusively) either when it starts or when it first sees them.

I have mentioned my views on RSS before, but happily they don't conflict this project. Because this is aimed at intranet service notifications there is a contract between producer and consumer, not merely a shared protocol.

I think that one good aspect of RSS is its ubiquity. Several apps already in use in this Intranet are RSS-aware and can be wired into this system with a minimum of work.

Without wanting to revisit the previous arguments too much, I might as well summarise them for completeness. I can envisage only two useful strategies for a syndication format:

  • Fixed contract: Specify a unique set of obligations for producer and consumer including both syntax and semantics. eg. RSS 0.90
  • Negotiated contract: Specify obligations of syntax, but encourage producers to offer a complete semantic representation, and allow consumers to build a customised syndication from within it. eg. RDF.

Rape Conspiracy

Channel 4 news was just reporting on the conviction of three men for conspiracy to rape children. The details are horrific, obviously. I'm not trying to get into the details of that.

However I was amazed that Channel 4 proceeded from details of the crime to an absolute rinsing of the hosting company that was hosting their website. It was introduced by some woman from the NSPCC who was demanding that web communications be restricted in some non-specific but utopian way.

There followed a confused explanation of the DNS system that sounded accusatory but didn't really serve to illustrate anything even if viewers had understood it. Were they claiming that the company hosting the DNS should be policing websites?

Then they started talking about the hosting company - and by this point I assumed it was the web host and perhaps the DNS guff had simply been a red herring - and how the hosting company, while not bound by law to police its websites, should be doing so anyway.  And then they actually contacted the hosting company's other clients to badger them on the issue.

The web is being policed. It is being policed by... the police. The police are ideally placed to locate and identify sex offenders online due to access to a wide variety of data from a range of sources. The police received public funds to do this. The police have powers to demand that members of the public turn over encryption keys. The police can obtain warrants, confiscate computers, detain people, and if they have a case, they can prosecute.

And the story, if you actually remember what the story was and haven't just been sold on the idea that hosting companies are to blame for child abuse, was that this exact strategy has just put three potential sex offenders behind bars.

PHP Gotcha

While it has become quite common practice for me to berate PHP, I really never imagined I would come across actual rock-solid evidence that PHP is just one big practical joke the Zend people are playing on me. Noel Edmonds is probably hiding somewhere waiting to pop out and present a trophy, and then pour gunge all over all PHP developers and just generally be weird.

You know classes, right? Those things which are, by definition, not null? That are in fact, the exact opposite of null?


<?php

class TestCls { };

$a = new TestCls();

print ($a == null) ? 'null': 'not null';

?>

Guess. Go on. I'll give you one guess what PHP prints.

Yes, I know it works if I use a === rather than an ==. But that doesn't mean the == behaviour isn't sick and wrong.

Mauvespace vs Facebook

I find Facebook very annoying. I can't seem to make it do anything useful. It seems to get certain, key things stunningly wrong, assumptions which are disingenuous in my case and make it seem broken. I can't find any friends on it and I'm getting bombarded by junk which isn't applicable to me. I can't find options to do many of the things which I'm sure are possible.

However, I'm impressed with what Facebook is supposed to do. It's far and away the closest of the social networking sites to what Mauvespace aims to do. That in itself is interesting. I didn't invent very many of the concepts regarding what Mauvespace can do: many of the suggestions about the combined expressibility of RDF vocabularies come from the web. However, it occurs to me that a fair number of those might have been inspired by Facebook or others, and Mauvespace merely inherits those suggestions (albeit mostly unimplemented as yet).

Specifically, things like annotating not only pictures as depicting a person, but regions of pictures, are things that I've read specifically about in comments describing RDF ontologies. I'm surprised Facebook isn't semantic.

Still, several key factors differentiate Mauvespace as a social network even if it could do everything Facebook can (and the eventual plan is certainly to implement some of those things):

  • It's open source.
  • It's entirely themable.
  • It's semantic.
  • It's distributed and interoperable (as a result of being semantic).

Not all of these will matter to all people. Many people I've spoken to simply say "I'm interested, but only because I tried x and didn't like it." But regardless of what matters to other people, these things are exactly the most important things to me personally:

  • I can make it work the way I want it to (as can anyone else).
  • I can make it look as pretty as I like without resort to hackery (as can anyone else).
  • I can use whatever data users make available in any way I see fit.
  • No for-profit organisation controls my data, forces me to use their system to talk to my friends, forces my friends to use their system to talk to me, requires me to pay them money or requires me to view their ads.

I don't think any proprietary social networking site could ever meet these requirements. That is why Mauvespace exists. Or very soon will.

Burning SQL bras

RDF makes me feel so liberated now that I've actually got it all up and running! Storing data in a freeform RDF graph is so easy when you don't have to worry about setting up tables or writing queries or anything. Add an arc, remove an arc. It's that simple.

Liberation is not enough of an incentive on its own, perhaps, but the fact that your web applications are trivially Semantic Web-ready when using an RDF database means that this is definitely the way I will be writing web applications from now on! (Subject to caveats about speed and optimisation and legacy code and pure appropriateness).

In particular, I've been able to write a single, easy-to-use class that displays a configurable form, pre-filled from the model, and saves changes back to the model on submit. Code for this looks like this:


$form=new RDFForm($model, $me);

$form->setAction('profile.php?view=basics');

$form->addMultipleFieldMapping(vocab('foaf:name'), new StringLiteralProperty(_('Name')));

$form->addFieldMapping(vocab('foaf:title'), new StringLiteralProperty(_('Title'), 4));

$form->addFieldMapping(vocab('foaf:givenName'), new StringLiteralProperty(_('First Name')));

$form->addFieldMapping(vocab('foaf:surname'), new StringLiteralProperty(_('Surname')));

$form->addMultipleFieldMapping(vocab('foaf:nick'), new StringLiteralProperty(_('Nickname')));

$gender=new LiteralEnumProperty(_('Gender'));

$gender->addOption('male', _('Male'));

$gender->addOption('female', _('Female'));

$form->addFieldMapping(vocab('foaf:gender'), $gender);

if (isset($_POST['save']))

{ $form->updateModel(); }

$form->render();

Of course, this is possible with relational databases too given enough layers of wrappers, but this approach makes it trivial to

implement new fields and new field types. Here is a screenshot of how this appears on the page.

RDF Form Screenshot

File uploads

I have mentioned briefly work that I was doing to wrap file uploading in AJAX for a proper experience. Browser-based file uploads have been downtrodden over the past few years.

In client terms, file uploads work in almost exactly the same way as they have always done: the page blocks while the data is posted, and a very small progress bar shows up in the status bar. This is a user interface disaster for big files.

On the server side, the situation is more varied, but there is often little support for streaming of file uploads. In PHP, file uploads are read wholly into memory, parsed and saved out to a temporary folder before a script even gets called. The request must fit within both PHP's file upload size limit and its memory limit. As far as I can tell, something similar happens in Zope although you can argue that Zope allows other standards for upload such as DAV and FTP natively. In plain CGI, of course, there is no handling of the uploads, so if you're using a CGI wrapper, it can do whatever you want to handle this. Perl's CGI.pm module allows a hook, at least. Python's cgi module doesn't, nor is it easy to subclass.

All in all, the situation of binding file uploads to form submissions, and processing of those in common server-side languages is wholly inadequate as file size gets large. File uploads are convenient because they are a commonly-supported fall-back, but the workarounds, although solving some of these problems, don't have the simplicity of a browser-native solution.

In my recent project I looked at ways of working around these limitations. The best workaround for the client-side problems I have found so far is to perform the upload in an <iframe>, using AJAX queries to present a progress bar. This still has problems, notably that it's one file at a time, both on the choosing and the uploading. In Firefox I can actually perform two concurrent uploads in different <iframes>, but the AJAX progress bar doesn't then update.

Server-side, I wrote the whole thing as a webserver so that the AJAX queries could talk directly to the thread streaming the upload. Additionally I wrote my own parser to parse on-the-fly the data uploaded, so that the daemon knows what is uploading at any given stage. It works quite well, and the system is extensible in that it could combine a daemon that allows other forms of upload; feedback for these would also appear in the browser windows.

Even so, I wish that file uploading was something people were thinking about more. It's central to so many web applications now.

There are numerous problems:

  • File uploads are synchronous. Downloads can happen in the background in their own, but uploads can't.
  • File uploads don't have a proper UI. Current browsers appear to show a tiny upload bar that isn't really very accurate and doesn't give data rates or estimated time remaining.
  • Uploads are chosen one at a time.
  • Javascript can't be used for polish. The model that has empowered Web 2.0 improvements is that of taking an existing HTML/HTTP model and allowing it to be controlled by Javascript. However, there is no way into the uploading or the file selection processes with Javascript.

The most general solution I can see would provide a Javascript API for uploading. This would allow Javascript to show a (native) file chooser dialog, and instruct the browser on what to do with the files it returns. POST or PUT to the origin server seem useful, as does FTP upload. Clearly there are security concerns, but I fail to see how, as long as Javascript may instigate an operation, read upload statistics, but not read the filesystem, this presents a problem.

Perhaps an AJAX-style API could be along these lines:





//configure a native dialog to present to the user

var ufc=new UploadFileChooser();

ufc.setAcceptableFileTypes(['image/jpeg', 'image/png']);

var uploads=ufc.chooseFiles();



for each (var u in uploads)

{

u.onreadystatechange=doSomething; //callback

//this URL is constrained the origin server to prevent XSS

u.beginHttpPost('http://example.com/upload');

}

After this, the user could close the tab or leave the page, and the browser would upload the files in the background, perhaps with a progress bar appearing within the Downloads window. Note that it could queue the files rather than uploading them all at once, depending on user settings. The Javascript, and indeed the user, should be able to request that an upload is aborted. The Javascript should also be able to query the upload, using the object reference provided.