Style guide for online hypertext

Written by Arnoud Engelfriet. Largely based on Tim Berners-Lee's style guide.

The latest version of this document is available from <URL:http://www.htmlhelp.com/design/style/>

According to some of the search engines, there are now over thirty million documents on the Web. This means that almost every topic is covered in many locations. If a document is hard to read, or the information therein is hard to find, chances are your reader will go elsewhere instead. That's why it is important to make documents accessible to everyone. This style guide will hopefully help you write easy to read documents.

Terminology

An HTML document (often also called Web page) is the unit by which information is provided to a reader. An on-line document can be as much as a whole book, just a chapter, a page or even only a footnote. In this guide, a set of documents which logically belong together is referred to as a site.

This style guide requires some knowledge of HTML and the functionality of the Web. The WDG's HTML reference discusses HTML elements mentioned in this guide in more detail.

Overview

When writing for the Web, the documents usually become part of a larger collection. It's important that the site follows a common structure so all documents are available in a logical place. Of course, each individual document has its own structure considerations as well.

For a document, the style is also very important. By using a common style, you ensure that a reader can use the site effectively. Some important aspects are indicating the status of the document, using images and icons, and writing in a device-independant way. Also, don't forget to validate your documents!

Structure of hypertext documents
- The structure of a Web site
- The structure of a single document
  - Document size
Referring vs copying
Document style
Validate the document

About this style guide

This guide is still in an experimental phase. Please send me your comments, suggestions and recommendations.

Structure of hypertext documents

There are two separate structures to keep in mind when writing for the Web. First, there's the structure of your site, the way the documents are connected and can be navigated. It is important that this can be done easily, so that a reader does not get lost when browsing the site. And second, there's the structure of each separate document.

In general, there are a few things to watch out for.

Be search-engine friendly

A very important thing nowadays is to be search-engine friendly. Most of your visitors will find your site through a search engine. There are just too many documents on the Web to find one particular one in any other way. Because of this, you have to make sure that the search engines can index your site properly.

Normally, search indices will use the first few lines of text from a document as a short description to display in a search result. If for some reason you want a different text, use the <META> element as follows: <META NAME="description" CONTENT="Your description here."> The description cannot contain HTML markup, and should be less than 1024 characters in length.

In the same way, you can add extra keywords: <META NAME="keywords" CONTENT="Keyword, keyword, keyword."> These keywords are used next to the ones found in your document. If you overuse a keyword (some engines use an upper limit of 7 occurrencies) the entire list will be ignored.

You have the reader's attention

In traditional media, the purpose of an advertisement or flyer is to grab the reader's attention to give him your message. On the Web, you have his attention already. The reader found the URL of your site and came there to find some piece of information. The Web site should not look like a TV advertisement, but instead offer the information that the readers came for.

That does not mean the site has to be boring and devoid of graphics. On the contrary - just make sure they don't distract from the main purpose of the site. Plug-ins, background music and animations have a purpose, but unless they are essential to the site's message, do not focus the reader's attention to them. And never exclude a reader just because he doesn't have a plug-in for something you offer (don't bother with the mechanics).

Also, if there is more information on some subject, put it on the site, don't just add a 1-800 phone number with the text "Call us for more information."

Don't use imagemaps as the sole means of navigation

Make sure that all documents you want indexed can be reached with normal links (no imagemaps) from the index documents. A search engine cannot use an imagemap to navigate your site. This also makes it possible for people who have image loading disabled to use the site.

Imagemaps also often take a long time to load. For this reason, avoid them on the main index pages. An index should load fast so it can be used immediately.

Keep the technical details out of sight

If the documents are generated automatically, or indexed by a script periodically, they usually contain special processing information somewhere. Do not put this in the text, but in a comment so your reader doesn't have to see it.

Similarly, if a document uses plug-ins, you can add a link to download the viewer, but don't distract the reader from the actual contents of the document.

It is usually not necessary to include information about the links included in documents. Only add information if the resource is in some unusual format, or is very large. For example, if the site offers a 1.44 megabytes AVI movie, a text like "(AVI, 1.44MB)" after the link could be used to indicate this.

The structure of a Web site

The great thing about the WWW is the ability to create crosslinks, find documents with search engines, and follow topics in ways not foreseen by the author. It is important that you establish a clear structure for your site. If your readers cannot figure out how your site is organized, they will soon become disoriented and go elsewhere. A tree structure is often a good way to organize a site.

Using the tree model for a Web site, each "leaf" represents a document, and a branch represents a link to that document. The main index then forms the root of the tree, and offers a route to each document. It is not guaranteed that this index will be the only way to reach a document! A reader can always bookmark a file, or locate it with a search engine and then go there directly. Make sure that each document can be used out of context.

To make it easier to follow the tree model, let the directory structure for the documents reflect it. Create directories for each distinc topic, even when there are only one or two documents concerning it. In the future, there might be more information on that topic, and then having the structure in place saves a lot of trouble. Trying to get the rest of the Web to change a reference is a nightmare.

Dividing the contents of your site into logical groups depends on who the expected audience is. For novices, providing a firm structure, perhaps also a "Guided tour", greatly helps to navigate the information. An experienced user might want to skip the introductions and go to the interesting material immediately. This reader has his own expectations about the organization of the information. If the site uses a different structure than they expect, they can become confused and be put off if there is no way to bypass it.

When making a reference, it can be useful to indicate what type of information can be found there. This allows a reader to determine if he should read the linked document. For example, "A step-by-step introduction is in the tutorial" or "The technical details are available in the reference section".

All documents should be available through more than one means. An organized index helps those who want to browse the site, but an alphabetical table of contents, or an alternative index sorted on some different criteria is very helpful. When the information is very technical, or there are many documents available, a local search engine is a useful addition.

The structure of a single document

As said in the introduction, an on-line document can correspond to a whole book, or just a footnote. Usually the document will be something in between, which means that there can be more than one HTML document for one piece of information. The most important thing here is that one document should contain one well-defined concept. Just splitting a file to reduce its size, or join many small files into one is not generally a good idea.

If the information is available in separate documents, the reader has to load each subdocument to read it. On a slow network connection (or busy server) this might take longer than the reader is willing to wait.

However, presenting everything in one large document also has its disadvantages. If it does not fit in one "screen" (whatever is displayed at once in a browser window) then the reader has to scroll through the document. If his interest hasn't been grabbed within the first couple of screens, he will likely go elsewhere. To prevent this, don't split up the document into arbitrary pieces, but add an overview and perhaps a table of contents at the top.

A one-document, or archived/compressed version of all the information on a particular topic is often useful. A reader can then download it and read (or print) it offline.

It's hard to give even a rough size for the size of a document, since there is no way to predict how much space a document will occupy on a reader's screen. One of the few aspects you can control is the loading time. A typical speed for loading a document is about 1 kilobyte per second. Many people use slow modems, and even when the physical connection is faster, the network can be very slow.

For introductory pages, tables of contents and the likes, keep the total size (text and images) under 60 kilobytes in all cases. A size of 30 kilobytes is a recommended upper limit, since then a document takes only 30 seconds to download and render completely. Any longer and your reader may go elsewhere.
Informational documents should be split up in separate documents, as discussed above. There are situations where this is impractical - if each aspect is contained in one or two paragraphs of text, splitting it up is not necessary. A reasonable upper limit for such documents is 60 to 100 kilobytes. To make things easier for readers, the document could also be made available in a compressed archive, or in a one-part file.

On a related note, make sure that any navigational images in documents are less than 450 pixels wide. Most browser windows are about this size, and if your image is wider, the reader has to scroll horizontally to view the rest. For preformatted text, use a maximum of 75 characters for the same reason. Scrolling horizontally to read a document that is slightly wider than what you are used to gets tiresome really quickly.

Having a table of contents does not mean you can't directly link to documents available from it. Include links between related documents (e.g. "Next", "Previous", "More") so that readers aren't forced to navigate to the table of contents (this is known as the "staircase syndrome") every time.

Referring vs copying

When referring to information which is on a server outside your own control, you might be tempted to make a local copy of it. In some cases, this can be a good thing, but there are also good arguments against it.

Reasons for leaving the document where it is:

When the document is updated by its owner, the link automatically refers to the updated information, so there no longer is a need to keep checking the remote site for any changes to the document.
The other server may be on a faster connection than yours, depending on where your readers will be coming from.
Copying a document to your server requires permission from the copyright owner. Referring to it doesn't.

Reasons for making a local copy:

If the information is only temporarily available (a news article, for example), then you have to make a copy. But make sure you do not violate the copyright of the author if you want to make the copy available.
If you want to refer to a particular piece of information, which might be changed, then a copy ensures you will keep that version.
Several documents might each contain information on one aspect of a problem. To combine them, links to the separate documents can be confusing to your readers. Extracting the relevant information and merging it into one document is a better choice.

Hotlists and indices

There already are many collections of "cool" links, indices to RFCs, FAQ lists or information on specific topics. Rather than creating the 1,000th list of links, make a personalized, annotated, list of pointers of specific interest. The best WWW pages are those that have something meaningful to say for themselves.

Document style

The style of a document helps a reader to browse through it, in order to find the information he is looking for. All documents on one subject should share the same style. This makes it easier to understand any particular document, and to jump to related information. Use the same "skeleton" or template to create all documents, and ensure that each element is used for the same function in all documents. This also helps maintaining the documents in the long term; because all documents share the same markup style, extracting information can be done automatically.

The following aspects are the most important:

Sign all documents to indicate who owns them and where to send comments,
Give the status of a document,
Make it usable out of context, so a reader isn't lost when he comes in through the back door,
Use images and icons in a responsible way,
Don't write for one browser, use browser-specific elements in a responsible way,
Use the right tag for the job, don't rely on how one browser renders a tag,
Avoid online-only aspects, so the document is still usable when printed,
Don't bother with the mechanics, that can be done in HTML, and lastly,
Validate the document.

Sign all documents

An important aspect of information is the ability to trace its author. With hypertext this is easy to do - just add a link to the author's home page and include an e-mail address or form to which comments can be sent. The <ADDRESS> element is commonly used for this purpose.

Instead of the author's home page, there can also be a generic "About" document, containing copyright notices, disclaimers and the likes. This prevents cluttering up each document with long signatures. This reduces the information block to just something like Copyright © 1996 by Name with the word "Copyright" a link to that About document.

It is also important to include contact information for comments, suggestions and the likes. This is usually done by adding a "mailto" URL with an e-mail address for this purpose. Always make sure you include the address itself in the text, so people can e-mail you with their favourite mail program. When the document is printed, for example, the link itself no longer works and the URL behind it is no longer visible.

Give the document's status

As the saying goes, "All Web pages are permanently under construction." Adding the infamous "Under construction" logo serves little purpose, but a note that some sections are incomplete is useful. Even when it is not fully ready, it might be enough for someone reading it. Do not link to pages that are not ready.

The date of the last update or modification is almost required. If it is at the top of a document, a reader can quickly determine if he has already seen this version and abort the transfer if this is the case. This also explains why it is so important to have a consistent style for all documents: it helps a reader to locate the information he wants.

Make the document usable out of context

Even when you structure your site according to the tree model described earlier, people can jump in at any point. They can find the document through a search engine, or they could simply have bookmarked it on an earlier visit. Regardless of the reason, there is no guarantee that a reader has followed the path he is supposed to.

If the documents are naturally sorted in a specific order, then keeping the flow from one to the next is important. It is not necessary to rewrite the entire document set to help those who jump in half-way, but there should be enough information to prevent them from being completely lost. Some ways to do this are:

The introduction in a document should not rely on the intended context. "The next thing discussed..." or "The solution to this problem is..." as the first line in your document will certainly confuse readers.
If you use acronyms or technical terms, link the first instance of the word in a document to an explanation in a glossary, or to an information document about the concept behind it. For example, "IETF" could be a link to the IETF's Web site.
A navigation bar can give explicit pointers to indices and the previous and next document. This is also useful for impatient readers.

Always make sure there is a way to navigate to the index or overview document(s).

Backlinks

Because of the nonlinear structure of the Web, readers may come into your site at any point. A link claiming to take them "Back" to some document makes no sense. Their browser will already have a "Back" function that takes them back to the previous document that they were viewing, which is not necessarily the previous document in your structure. All links in the document are forward links, as far as the browser is concerned. The job of the navigational links/icons is to help navigating around the structure that you have defined. To do this, add links to make the logical structure visible, but make sure they are usable out of context. For example, "Up" (to table of contents), "Previous/Next" (to documents that belong in the same logical sequence), or "More" (a document that gives more detail about the topic in the current document).

The document's title

Since the title is often used by search engines to list the results, and by browsers to label a bookmark for the document, it should be understandable out of context. "Introduction" makes no sense when it appears in a hotlist, but "The Gutenberg Project -- Introduction" does. Try to keep the length under 64 characters; this prevents it from being cut off in browser windows and bookmark lists.

Using images & icons

Images are used for a variety of purposes in HTML documents, but the most popular ones are logos and navigation icons.

In all cases, it is important that images are small. On the Web, an image might be worth a thousand words, but it often costs more than a thousand bytes. Because of the extra delay in loading they introduce, only add images that are essential to the document, and make them as small as possible. And even for essential images, make sure the document still "works" if they are not loaded. Readers on the other end of a slow link will most likely not load any images at all. The ALT text is the most important tool to achieve this.

To make the documents render faster, include the WIDTH and HEIGHT attributes on the IMG element. These attributes indicate the width and height of the image, which a browser can use to draw a box of the appropriate size. It can then continue rendering the rest of the document, and load the image inside the box later. It also makes the calculation of table appearance easier.

If you want to provide large images, then use small thumbnails. They allow a user to get a preview of the image quickly. This is especially important if a document contains a lot of images, as in a gallery. If an image is essential to the text, then inline it, but also add a link to the image itself, so people who do not load images by default can still explicitly load this particular image easily.

Following the notes on document size, keep the size of document and inline images under 60 kilobytes total whenever possible. It is strongly recommended that you put all images in a separate directory, so that all documents can refer to the same images. Since images are stored in the browser's cache, they can be re-loaded from there for every document after the first one.

Logos

A logo is often used instead of a header to make a better impression. They typically include the name of the subject of the page, and some graphical logo. As long as you keep the image small, and use a sensible ALT text, this is perfectly acceptable. In addition to the notes on ALT text below, make sure the ALT text replaces the logo text. "Company logo" is hardly a replacement. "The XYZ Company" is. If you also include the image in a header, the alternative text will even stand out when the document is viewed with image loading off.

Navigational icons

Icons are a good way to make it more obvious how to navigate the site. They stand out against the text, so the links are easier to spot. Of course this only works if the icons are consistently used on the site. Make sure all documents use the same sequence of icons, with the same text and to the same logical destination. Eg., an icon labeled "Home" should always go to a document that serves as "Home page" for the current document. When the user is reading a document that occurs in the toolbar, replace the icon with a non-linked version so the "look" stays the same.

Make sure the meaning of an icon is obvious, and add effective ALT texts. There are several "standard" icons for navigation. Use these as a basis for your icons to make it easier to identify the meaning of the icon. A magnifying glass, for example, is commonly used to indicate an interface to a local search engine.

For large navigational images, a good technique to reduce loading time when some information on the image changes is to split it up into several small images, each for each part. If the images are loaded, they are put next to each other and should provide a seamless view.

Use a sensible ALT text

The ALT attribe on the IMG tag specifies a text to be displayed when the image itself isn't loaded. It should replace the image's meaning and not be a description of the image. In no case should it contain instructions on how to load images!

Due to various reasons, the ALT text is rather limited. It cannot be longer than 1024 characters, and it may not contain markup. This sometimes makes it hard to create an acceptable alternative text for the image. However, the IMG tag itself may be enclosed in HTML markup, and the ALT text then "inherits" this markup when displayed. Use the proper HTML tag to mark up the image as if it were text. For example, if the image serves as the main header for the document, use <H1><IMG SRC="logo.gif" ALT="The XYZ Company"></H1> in your document. This either results in the company's logo, or the name displayed as a level-one header.

Alan Flavell has written an excellent discussion on choosing good ALT texts for images.

Don't write for one browser

Unless your documents are intended for a local network, you cannot know in advance which browsers your readers will be using. By writing for one browser, you limit your audience needlessly.

There is a difference between using browser-specific or experimental elements in your documents and writing HTML that produces garbage on every browser but the one you have in mind. If you use browser-specific elements, do not try to avoid problems by telling your reader to "Download NetXploder 4 NOW!", but fix your document so it still works if the browser-specific material is ignored.

If the browser-specific elements you use are essential to the document or cannot degrade gracefully on other browsers, then provide an alternative way to get at the information. For example, if you have a complex table in a document, you can make a screenshot of that table available separately.

Use the right tag for the job

The body part of an HTML document contains the actual information. This information is contained in several block elements, each of which are marked up further with appropriate text-level elements. Block elements include headers, paragraphs and lists. Text-level elements include (for example) <EM>, <CITE> and <TT>.

The block elements can express the meaning of the document most clearly. Use the right one to describe what its contents is about.

As an aside, if a particular element does not "work" the right way in a browser, fix it in the browser, not in the document. Other browsers may handle the element in the desired way, and the "fix" will only break it there. For example, don't skip header levels (from H1 to H3 directly, without H2 in between) just because a particular header looks ugly on your system. Reconfigure the browser instead. If your browser doesn't let you do that, get one that does.

And bear in mind that the site may be "viewed" in unexpected ways, e.g. by indexing robots, character cell browsers, or speaking machines. If the text is marked according to its structure, it will "work" even in situations that you are unfamiliar with: if the text is kludged for one particular browser, there is no way to predict what they will "look" like to others.

Current browsers will support most, if not all, elements from HTML 3.2, but also other, non-standard elements. These are usually experimental and not (or differently) supported in other browsers. Use these with care, and make sure the document still "works" if those elements were omitted.

Tables for layout

Currently, a popular technique to lay out documents in a specific way is to enclose the entire document in a table. Since the various cells can be marked up individually, it is now quite easy to line up various elements in ways that otherwise would not be possible.

There are several disadvantages to this technique. First of all, it usually results in a mess in browsers that do not support tables, unless special precautions are taken. Second, calculating the layout of a large table can only be done once the entire table and all images in it are loaded. This can significantly increase the rendering time of the document.

Third, an over-designed layout takes away one of the greatest advantages of HTML: the ability to adjust the layout to the browser's capabilities and the reader's choices. This is particularly a hazard when the table specifies explicit column widths in pixels or is used to force images to be laid out side-by-side. If the browser window is smaller than the table, the reader is forced to scroll horizontally to read it. This is one of the most annoying things to do.

Presentation vs structure

The text-level elements can roughly be divided into three groups. The presentation elements (such as B, I or TT) concern themselves only with the looks of the text they contain, whereas the structural elements (such as STRONG, EM or CITE) describe the meaning of that text. The third group contains elements like IMG and A, and is more action-oriented.

It is recommended that you use the structural elements whenever you can. They allow a browser to present the text with the most appropriate style available on the platform it is running on. If, for example, an italics version of the current font is not available, emphasized text can be displayed in a different color. This is only possible if the browser knows the text is emphasized. When the text is only indicated with the presentation-element I (italics), it can't know that. It could just as well be a citation or book title.

The presentation elements are most useful if the text must be displayed in that way by convention, for example the book title in a list of references.

HTML 3.2 formally specifies the FONT element, which was previously a browser-specific one. Unless you are very careful when using it, it should be avoided. It gives the author a false impression of control over the appearance of the document. If you use it, then bear in mind that many browser situations will not honor the font name, size or color indicated, and will display the text no differently than normal text. Never use it as a substitute for headers or for appropriate logical markups. Start with a document that's properly marked up as to its logical structure, and then, if absolutely necessary, use FONT tags as optional enhancements only.

For a better solution to presentation problems, use style sheets. These are separate documents which can be used to suggest a style for the presentation of an HTML document, or even an entire site.

Avoid online-only aspects

While the WWW is the most common destination for an HTML document, it's certainy not the only one. And even when it is on the WWW, the reader's way to access it may not be anything like yours. Fortunately, this is not important when the document is properly written in HTML. The fact that the language is structure-based allows for device-independence. It can be viewed on any platform, no matter how limited.

Especially important for reference documents, but also for indices and home pages, is the printability. People still prefer a hardcopy version of useful texts. If the text is full of explicit references to things that only work on the on-line version, the printed document becomes unusable.

A particularly well-known aspect of this is the "click here" syndrome. That is, hyperlinks with "click here" or just "here" as the anchor text. Not only does it assume that the user is using a mouse, it also draws the attention away from the surrounding text. The reader has to re-read the surrounding text to make sure he is selecting the right "here". A hyperlink anchor should be understandable on its own.

Similarly, things like "Information about X is available by following this link" or "Click here for technical details" are to be avoided. Embed the link in the actual text, in such a way that it is not required to follow the link to understand the text. Instead of the explicit text in the first example, just link the first mention of "X" to the explanatory document on "X". Someone who reads this document off-line will not be able to directly find out more about "X", but the document is still readable.

Don't bother with the mechanics

As said earlier, "click here" is a bad anchor text. Not just because it makes the text awkward to read when printed, but also because it assumes the user is using a mouse and doesn't know how to operate his browser. Unless the document in question is "My first steps with a graphical browser", the user does know how to navigate the Web.

The possibility to include hyperlinks to almost every resource available on the Internet means that you don't have to include instructions anymore. Before the WWW was started, any document about Internet services had to contain some basic instructions on how to download files, use a mail server or connect to a particular computer. Now, all this can be done with a simple hyperlink, which hides all the technical details from the user.

So, when offering a file in a particular format, do not discuss where to download a program to view or play it, how to save the file to disk or how to decompress it: just add the link and make sure the server sends the right MIME type to identify the file.

Validate the document

To ensure that a document can be successfully read by any browser, make sure it adheres to the syntax rules for HTML. You don't need to do this by hand, that's what computers are for. Several tools exist to check HTML for syntax errors. See the WDG's list of validators for a complete overview.

Don't assume that since the document renders as you expect in the browser you use, it is valid and will be displayed this way by all browsers. A browser is designed to fix bad HTML, and sometimes it may even be able to completely repair a syntax error so the result is what was intended. However, such fixes are usually dependant on the browser's parser, so a future release (with an updated parser) may repair the invalid HTML differently.

Test the document

Even when the document passes validation, it will not necessarily work as you intended on all platforms. Several programs are available tho check documents for stylistic problems. Even a syntactically valid document can be hard to read because of things like forgotten alternative texts for images, unwanted whitespace, deeply nested lists or non-hierarchical use of headers.