Damian Cugley’s Weblog

I once visited a real printing house, and discovered that the keyboards actually have two quotation-mark keys: one for the apostrophe (’) and one for the inverted comma (‘). Alas! That such simplicity was denied to us by, well, by Apple.

Donald E. Knuth could not depend on anything beyond the limited repertoire of 7-bit ASCII (ISO-646, essentially), so had to use the otherwise useless quote (') and backquote (`) characters as substitutes. The result was that English typesetting was actually fairly reasonable:

I said `She said ``Foo!'', and meant it.'

becomes

I said ‘She said “Foo!”, and meant it.’

This was in 1982. Twenty-one years ago.

When Apple designed the Macintosh they gave it its own file system, bitmap fonts, screen and keyboard. This was in an era when everyone was designing their own character sets. There was nothing stopping them from having 2 keys for the quotation marks—or from mapping code points 0x27 and 0x60 in their proprietary character set on to the apostrophe (now Unicode U+2019) and turned comma (U+2018). They went part way to proper punctuation, allowing access via obscure keystrokes using the Option key. This extra complexity lead to the impression that proper punctuation was a difficult alternative to the default of bashing on the ' key. Hence the term smart punctuation to describe what could have been the default.

SGML came along. They did not adopt Knuth’s sensible kludge, preferring to use elements to enclose quotations:

I said <q>She said <q>Foo!</q>, and meant it.</q>

With complicated rules for working out whether single or double marks are required (whereas the TeX system is simple and foolproof).

The requirement that element start and end tags match causes trouble in those situations where English tradition includes unmatched symbols (such as at the start of a new paragraph if a quotation extends beyond one paraqraph). It is also much more intrusive than the TeX style.

What does it gain us to force one particular gramatical structure, quotations, to be reflected in the XML structure of the document? We do not require sentences to be marked up so that the full stops be added!

<sentence class="question">what does it gain us to force one particular gramatical structure <parenthesis class="comma">quotations</parenthesis> to be reflected in the <acronym>xml</acronym> structure of the document</sentence> <sentence class="exclamation">we do not require sentences to be marked up so that the full stops be added</sentence>

HTML 4 added its own q element, but no web browsers support it—at least none that I know of. In desperation the CSS standard was augmented with features to force browsers to insert quote marks—a huge burden on CSS processors to crack this fairly pointless nut.

It also means loss of CSS removes the quotation marks!

The way to get English punctutation is to use the correct Unicode characters. You should be able to use &lsquo; to get an inverted comma and &rsquo; to get an apostrophe. Some browsers do not yet support these HTML-4 character entities, so perhaps numeric entities (&#x2018;, &#x2019;) are better. Alas! Older browsers require numeric entities to be in decimal (&#8216;, &#8217;), which is annoying because Unicode numbers are conventionally always shown in hex...

All of these options are much more intrusive than Knuth’s convention.

There are now some Unicode-savvy text editors that allow one to store proper punctuation in your document, so long as you somehow manage to type them in the first place. Presumably on a Mac you can still use those Option key keystrokes. On Windows you use Character Map or one or two apps that try to guess which quotation mark you intended when you pressed the quote key. Sigh.

It’s crazy that my keyboard has ¬, |, `, ~, etc. keys, but not the punctuation marks we were all taught at school.

14 January 2003

An anonymous poster points out some browsers do support q elements.

Obviously I was remembering an older version of Mozilla—my information is out of date, sorry! I just tried it again, and it sort-of works. Here’s what that phrase comes out as in whatever browser you are using right now:

I said She said Foo! and meant it.

In Phoenix 0.5 and Opera 5 this comes out as:

I said "She said "Foo!" and meant it".

Which means that, yes, the q tag does do something, but, alas! does not use the correct punctuation marks... :-(

The main bug for Mozilla quotes is 16206. They say that they don’t support inner quotes at all, and adding multiple languages is going to be horribly inefficient.

In fact It’s really hard to work out quoting depth properly, because any HTML element can be set to have quotation-mark-inserting properties! This is because by the time you add enough features to CSS 2 to specify quotation marks, these new features can be combined in strange ways that are tough to implement correctly and efficiently.

I stand by my claim that trying to make q do the right thing in the general case has required and will require enormous programming effort, and that being able to just type in the quotation marks oneself would be immesurably simpler and more transparent!

Also... apostrophes also appear in non-quote contexts, and q does nothing to help there.

16 January 2003

Article Archive by Year