Semantic Whitespace

Daniel Pope

Perhaps a little-known feature of many applications, including most web browsers, is that as well as click-and-drag selection, you can often use double-click-and-drag word selection. There's also a triple-click paragraph or line selection that you may not be aware of. (Internet Explorer has a heuristic selection model that makes it easier to select words at the expense of making it harder to select arbitrary amounts of text.)

Though little-known, it's extremely useful! A favourite trick is to double-click and drag to select words, then right-click and "Search current search provider for".

This word selection can show up an accessibility problem. Browsers and probably some search engines identify words by splitting the content on whitespace and block-level HTML tags - not on inline-level tags! This is sensible. If I write HyperText Markup Language (ie. highlighting initials in bold), I don't want the semantic content to be "H yper T ext M arkup L anguage"!

The accessibility problem is this. With CSS it's possible to accidentally write HTML that is neatly padded to look like words, but which doesn't tokenize (split up into words) properly. For two words to be considered separate you need to include semantic whitespace. Sites as big as Facebook and Twitter still make this mistake!

If your browser supports proper word selection (Internet Explorer's word-selection model is useless here), try double-clicking near formatting changes to check that your website is semantically correct.

Try it out! Can you detect the difference between these?

withoutsemanticwhitespace

with semantic whitespace

Comments