This document answers questions asked frequently by web authors. While its focus is on HTML-related questions, this FAQ also answers some questions related to CSS, HTTP, JavaScript, server configuration, etc.
This document is maintained by Darin McGrew <darin@htmlhelp.com> of the Web Design Group, and is posted regularly to the newsgroup comp.infosystems.www.authoring.html. It was last updated on April 26, 2007.
The following questions have moved to another section of the FAQ.
Everyone has a different preference for which tool works best for them. Keep in mind that typically the less HTML the tool requires you to know, the worse the output of the HTML. In other words, you can always do it better by hand if you take the time to learn a little HTML.
Within the HTML example, first replace the "&" character with "&" everywhere it occurs. Then replace the "<" character with "<" and the ">" character with ">" in the same way.
Note that it may be appropriate to use the CODE and/or PRE elements when displaying HTML examples. Also, the next Q&A addresses the more general issue of representing arbitrary characters in HTML documents.
The previous Q&A addressed the special case of the less-than ('<'), greater-than ('>'), and ampersand ('&') characters. In general, the safest way to write HTML is in US-ASCII (ANSI X3.4, a 7-bit code), expressing characters from the upper half of the 8-bit code by using HTML entities.
Working with 8-bit characters can also be successful in many practical situations: Unix and MS-Windows (using Latin-1), and also Macs (with some reservations).
Latin-1 (ISO-8859-1) is intended for English, French, German, Spanish, Portuguese, and various other western European languages. (It is inadequate for many languages of central and eastern Europe and elsewhere, let alone for languages not written in the Roman alphabet.) On the Web, these are the only characters reliably supported. In particular, characters 128 through 159 as used in MS-Windows are not part of the ISO-8859-1 code set and will not be displayed as Windows users expect. These characters include the em dash, en dash, curly quotes, bullet, and trademark symbol; neither the actual character (the single byte) nor its &#nnn; decimal equivalent is correct in HTML. Also, ISO-8859-1 does not include the Euro currency character. (See the last paragraph of this answer for more about such characters.)
On platforms whose own character code isn't ISO-8859-1, such as MS-DOS and Mac OS, there may be problems: you have to use text transfer methods that convert between the platform's own code and ISO-8859-1 (e.g., Fetch for the Mac), or convert separately (e.g., GNU recode). Using 7-bit ASCII with entities avoids those problems, but this FAQ is too small to cover other possibilities in detail.
If you run a web server (httpd) on a platform whose own character code isn't ISO-8859-1, such as a Mac or an IBM mainframe, then it's the job of the server to convert text documents into ISO-8859-1 code when sending them to the network.
If you want to use characters not in ISO-8859-1, you must use HTML 4 or XHTML rather than HTML 3.2, choose an appropriate alternative character set (and for certain character sets, choose the encoding system too), and use one method or other of specifying this.
It is never wrong to quote attribute values, and many people recommend quoting all attribute values even when the quotation marks are technically optional. XHTML 1.0 requires all attribute values to be quoted. Like previous HTML specifications, HTML 4 allows attribute values to remain unquoted in many circumstances (e.g., when the value contains only letters and digits).
Be careful when your attribute value includes double quotes, for instance when you want ALT text like "the "King of Comedy" takes a bow" for an image. Humans can parse that to know where the quoted material ends, but browsers can't. You have to code the attribute value specially so that the first interior quote doesn't terminate the value prematurely. There are two main techniques:
Both these methods are correct according to the specification and are supported by current browsers, but both were poorly supported in some earlier browsers. The only truly safe advice is to rewrite the text so that the attribute value need not contain quotes, or to change the interior double quotes to single quotes, like this: ALT="the 'King of Comedy' takes a bow".
Technically, since HTML is an SGML application, HTML uses SGML comment syntax. However, the full syntax is complex, and browsers don't support it in its entirety anyway. Therefore, use the following simplified rule to create HTML comments that both have valid syntax and work in browsers:
An HTML comment begins with "
<!--
", ends with "-->
", and does not contain "--
" or ">
" anywhere in the comment.
The following are examples of HTML comments:
<!-- This is a comment. -->
<!-- This is another comment,
and it continues onto a second line. -->
<!---->
Do not put comments inside tags (i.e., between "<
"
and ">
") in HTML markup.
The URL structure defines a hierarchy (or relationship) that is similar to the hierarchy of subdirectories (or folders) in the filesystems used by most computer operating systems. The segments of a URL are separated by slash characters ("/"). When navigating the URL hierarchy, the final segment of the URL (i.e., everything after the final slash) is similar to a file in a filesystem. The other segments of the URL are similar to the subdirectories and folders in a filesystem.
A relative URL omits some of the information needed to locate the referenced document. The omitted information is assumed to be the same as for the base document that contains the relative URL. This reduces the length of the URLs needed to refer to related documents, and allows document trees to be accessed via multiple access schemes (e.g., "file", "http", and "ftp") or to be moved without changing any of the embedded URLs in those documents.
Before the browser can use a relative URL, it must resolve the relative URL to produce an absolute URL. If the relative URL begins with a double slash (e.g., //www.htmlhelp.com/faq/html/), then it will inherit only the base URL's scheme. If the relative URL begins with a single slash (e.g., /faq/html/), then it will inherit the base URL's scheme and network location.
If the relative URL does not begin with a slash (e.g., all.html , ./all.html or ../html/), then it has a relative path and is resolved as follows.
Some examples may help make this clear. If the base document is <URL:http://www.htmlhelp.com/faq/html/basics.html>, then
Please note that the browser resolves relative URLs, not the server. The server sees only the resulting absolute URL. Also, relative URLs navigate the URL hierarchy. The relationship (if any) between the URL hierarchy and the server's filesystem hierarchy is irrelevant.
The URL structure defines a hierarchy similar to a filesystem's hierarchy of subdirectories or folders. The segments of a URL are separated by slash characters ("/"). When navigating the URL hierarchy, the final segment of the URL (i.e., everything after the final slash) is similar to a file in a filesystem. The other segments of the URL are similar to the subdirectories and folders in a filesystem.
When resolving relative URLs (see the answer to the previous question), the browser's first step is to strip everything after the last slash in the URL of the current document. If the current document's URL ends with a slash, then the final segment (the "file") of the URL is null. If you remove the final slash, then the final segment of the URL is no longer null; it is whatever follows the final remaining slash in the URL. Removing the slash changes the URL; the modified URL refers to a different document and relative URLs will resolve differently.
For example, the final segment of the URL http://www.htmlhelp.com/faq/html/ is empty; there is nothing after the final slash. In this document, the relative URL all.html resolves to http://www.htmlhelp.com/faq/html/all.html (an existing document). If the final slash is omitted, then the final segment of the modified URL http://www.htmlhelp.com/faq/html is "html". In this (nonexistent) document, the relative URL all.html would resolve to http://www.htmlhelp.com/faq/all.html (another nonexistent document).
When they receive a request that is missing its final slash, web servers cannot ignore the missing slash and just send the document anyway. Doing so would break any relative URLs in the document. Normally, servers are configured to send a redirection message when they receive such a request. In response to the redirection message, the browser requests the correct URL, and then the server sends the requested document. (By the way, the browser does not and cannot correct the URL on its own; only the server can determine whether the URL is missing its final slash.)
This error-correction process means that URLs without their final slash will still work. However, this process wastes time and network resources. If you include the final slash when it is appropriate, then browsers won't need to send a second request to the server.
The exception is when you refer to a URL with just a hostname (e.g., http://www.htmlhelp.com). In this case, the browser will assume that you want the main index ("/") from the server, and you do not have to include the final slash. However, many regard it as good style to include it anyway.
HTML validators check HTML documents against a formal definition of HTML syntax and then output a list of errors. Validation is important to give the best chance of correctness on unknown browsers (both existing browsers that you haven't seen and future browsers that haven't been written yet).
HTML checkers (linters) are also useful. These programs check documents for specific problems, including some caused by invalid markup and others caused by common browser bugs. Checkers may pass some invalid documents, and they may fail some valid ones.
All validators are functionally equivalent; while their reporting styles may vary, they will find the same errors given identical input. Different checkers are programmed to look for different problems, so their reports will vary significantly from each other. Also, some programs that are called validators (e.g. the "CSE HTML Validator") are really linters/checkers. They are still useful, but they should not be confused with real HTML validators.
When checking a site for errors for the first time, it is often useful to identify common problems that occur repeatedly in your markup. Fix these problems everywhere they occur (with an automated process if possible), and then go back to identify and fix the remaining problems.
Link checkers follow all the links on a site and report which ones are no longer functioning. CSS checkers report problems with CSS style sheets.
According to HTML standards, each HTML document begins with a DOCTYPE declaration that specifies which version of HTML the document uses. Originally, the DOCTYPE declaration was used only by SGML-based tools like HTML validators, which needed to determine which version of HTML a document used (or claimed to use).
Today, many browsers use the document's DOCTYPE declaration to determine whether to use a stricter, more standards-oriented layout mode, or to use a "quirks" layout mode that attempts to emulate older, buggy browsers.