This document answers questions asked frequently by web authors. While its focus is on HTML-related questions, this FAQ also answers some questions related to CSS, HTTP, JavaScript, server configuration, etc.
This document is maintained by Darin McGrew <darin@htmlhelp.com> of the Web Design Group, and is posted regularly to the newsgroup comp.infosystems.www.authoring.html. It was last updated on April 26, 2007.
The following questions have moved to another section of the FAQ.
HTML itself offers no way to seamlessly incorporate the content of one file into another.
True dynamic inclusion of one HTML document (even in a different "charset") into another is offered by the OBJECT element, but due to shortcomings of browser versions in current use, it seems unwise to rely on this yet for essential content. The same can be said for IFRAME.
Two popular ways of including the contents of one file seamlessly into another for the WWW are preprocessing and server-side inclusion. A preprocessor converts its source into a plain HTML document that you publish on your server. In contrast, documents that use server-side inclusion are processed every time the document is retrieved from the server.
Preprocessing techniques include the C preprocessor and other generic text manipulation methods, and several HTML-specific processors.
Beware of making your "source code" non-portable. Also, the HTML can only be validated after preprocessing, so the typical cycle "Edit, Check, Upload" becomes "Edit, Preprocess, Check, Upload" (here, "Check" includes whatever steps you use to preview your pages: validation, linting, management walk-through etc.; and "upload" means whatever you do to finally publish your new pages to the web server).
A much more powerful and versatile preprocessing technique is to use an SGML processor (such as the SP package) to generate your HTML; this can be self-validating.
Examples of server-side inclusion are Server Side Includes (SSI, supported by Apache, NCSA, and other web servers), and Microsoft's Active Server Pages (ASP, supported by MS IIS). Processing occurs at the time the documents are actually retrieved. A typical inclusion looks like
<!--#include virtual="/urlpath/to/myfile.htm" -->
However, be sure to consult your own server's documentation, as the details vary somewhat between implementations. The whole directive gets replaced by the contents of the specified file.
Using server-side inclusion (a potentially powerful tool) merely as a way to insert static files such as standard header/footers has implications for perceived access speed and for server load, and is better avoided on heavily loaded servers. If you use it in this way, consider making the result cacheable (e.g., via "XBitHack full" on Apache; setting properties of the "Response" object in ASP).
Proper HTML validation of server-side inclusion is only possible after server-side processing is done (e.g. by using an on-line validator that retrieves the document from the server).
Another approach is to create a database-backed site, as described in Philip and Alex's Guide to Web Publishing. A simple change to the database template instantly changes the whole site.
Finally, note that if the included file contains arbitrary plain text, then some provision must be made to convert the characters "&" and "<" (in the plain text file) to the entities "&" and "<" (in the HTML document).
Browsers don't allow web authors to download and run arbitrary programs on the client system, because that would be an unacceptable security risk. Users would be unable to visit untrusted web sites safely.
You can link to an executable program file, allowing users to download it. Users could then choose to run the program, assuming that it runs on their operating systems, and that they are not concerned about software viruses.
If you want to run the program on your web server, then check your server documentation for configuration details for server-side programs.
When client-side scripting (e.g., JavaScript) is enabled on the client system (browser), it can be used to perform computations and to manipulate the data on and appearance of a web page.
If you want to launch a specialized viewer for a particular kind of file, say Adobe Acrobat Reader when the visitor follows a link to a PDF file, then that should be handled automatically by the visitor's browser, assuming that it is configured correctly. You need only configure your server to send the file with the correct MIME type.
In HTML, characters can be represented in three ways:
In theory these representations are equally valid. In practice, authoring convenience and limited support by browsers complicate the issue.
HTTP being a guaranteed "8-bit clean" protocol, you can safely send out 8-bit or multibyte coded characters, in the various codings that are supported by browsers.
By now there seems no convincing reason to choose &entityname; versus &#number;, so use whichever is convenient.
If you can confidently handle 8-bit-coded characters this is fine too, probably preferred for writing heavily-accented languages. Take care if authoring on non-ISO-8859-based platforms such as Mac, Psion, IBM mainframes etc., that your upload technique delivers a correctly coded document to the server. Using &-representations avoids such problems.
In such codings as ISO-8859-7 Greek, koi8-r Russian Cyrillic, and Chinese, Japanese and Korean (CJK) codings, use of coded characters is the most widely supported and used technique.
Although not covered by HTML 3.2, browsers have supported this quite widely for some time now; it is a valid option within the HTML 4 specifications--use a validator such as the WDG HTML Validator or the W3C HTML Validation Service which supports HTML 4 and understands different character encodings.
Browser support for coded characters may depend on configuration and font resources. In some cases, additional programs called "helpers" or "add-ins" supply virtual fonts to browsers.
"Add-in" programs have in the past been used to support numeric references to 15-bit or 16-bit code protocols such as Chinese Big5 or Chinese GB2312.
In theory you should be able to include not only coded characters but also Unicode numeric character references, but browser support is generally poor. Numeric references to the "charset-specified" encoding may appear to produce the desired characters on some browsers, but this is wrong behavior and should not be used. Character entities are also problematical, aside from the HTML-significant characters <, & etc.
Recent versions of the popular browsers have support for some of these features, but at time of writing it seems unwise to rely on this when authoring for a general audience.
In HTML, element and attribute names in tags are case insensitive, so it doesn't matter. The choice is just a matter of style and personal preference. (You may have noticed that this FAQ is not absolutely consistent in capitalization.) Many people prefer upper case, which makes the tags "stand out" more amongst the text, and makes it easier to find element or attribute names using an editor's search function. Others prefer lower case, which makes the tags "stand out" less amongst the text, and which is required by XHTML.
Note that some attribute values are case sensitive.
For example,
<OL TYPE="A">
and <ol type="A">
are the same, but
<ol type="a">
is different from either of them.
(For clearer communication, it's worth getting the terminology right.
In this example,
OL
is the element,
TYPE
is the attribute name, and
A
or a
is the attribute value.
The tag is <OL TYPE="A">
.)
Entity names like
are sometimes incorrectly
referred to as tags.
They are all case sensitive.
For example,
É
and é
are two different
and valid entities, and while
is a valid entity,
&NBSP;
is invalid.
Note that XHTML requires all element and attribute names to be in lower case.
HTML does not depend on screen size. Normally, the text will be wrapped by the browser when the end of its display area is encountered. (Note that graphical browsers are often used with windows that are smaller than the full area of the screen.)
Preformatted lines (text within <PRE>
elements)
should only ever exceed 70 characters if the nature of the content makes it
unavoidable.
Longer lines will cause ugly line breaks on text-mode browsers, and will
force horizontal scrolling on graphical browsers.
Readers strongly dislike horizontal scrolling, except where they can
realise that the nature of the content made it inevitable.
Images cannot be wrapped, so you have to be careful with them. It seems that 600 pixels is a reasonable width; anything wider will mean a certain fraction of users will have to scroll to see the rightmost bit. This fraction increases with your image width. (Keep in mind that not everyone uses full-screen browser windows!)
MSN TV (formerly WebTV) users have no ability to scroll horizontally, so anything beyond 544 pixels will be compressed by their browser. Other devices (especially portable devices) are even more limited.
The use of tables for layout, especially when fixed-width cells are used, is the most usual single factor that prevents pages from adapting to various window widths.
There are several possibilities.
First, you may have incorrect HTML or CSS syntax. Browsers vary in their ability to guess what you meant, and different browsers recover differently from syntax errors.
Second, you may have valid HTML and CSS that different browsers interpret differently. For example, the CSS specifications allow conforming browsers to ignore certain properties and property values. Also, it is not clear from the specifications what should be done with a string of characters. Some browsers will collapse them for rendering as a single space; others will render one space per .
Third, your server may be sending incorrect MIME types for some of your files. Internet Explorer incorrectly ignores server-provided MIME types, so it sometimes "does the right thing" when the server is misconfigured. Other browsers correctly heed the server-provided MIME types, so they will reveal server misconfigurations. This includes external style sheets, which should be sent as "text/css".
Fourth, you have have encountered a browser bug.
For example, many common browsers handle CSS better when HTML documents
include optional closing tags like </p>
,
</li>
, and
</td>
.
Another possibility is different user option settings in the browsers.
If Microsoft Internet Explorer displays your document normally, but other browsers display your plain HTML source, then most likely your web server is sending the document with the MIME type "text/plain". Your web server needs to be configured to send that filename with the MIME type "text/html". Often, using the filename extension ".html" or ".htm" is all that is necessary.
If you are seeing this behavior while viewing your HTML documents on your local Windows filesystem, then your text editor may have added a ".txt" filename extension automatically. You should rename filename.html.txt to filename.html so that Windows will treat the file as an HTML document.
This is a "feature" of using frames: The browser displays the URL of the frameset document, rather than that of the framed documents.
However, this behavior can be circumvented easily by the user. Many browsers allow the user to open links in their own windows, to bookmark the document in a specific frame (rather than the frameset document), or to bookmark links. Thus, there is no reliable way to stop a user from getting the URL of a specific document.
Furthermore, preventing users from bookmarking specific documents can only antagonize them. A bookmark or link that doesn't find the desired document is useless, and probably will be ignored or deleted.
A common way to do this is to use a two-column table with your links in the left column and your content in the right column. This is often combined with a background image that creates a colored strip on the left behind the links. The background image can tile vertically, but to avoid horizontal tiling the image should be extremely wide (e.g., 1600 pixels).
A variation of this technique (which minimizes some of the problems with using tables for layout) uses a single-cell table with ALIGN="left". Only the links go inside the table, which floats to the left. The document's content wraps to fill the space remaining to the right of and below the table. Here is an example:
<table align="left">
<tr><td><!-- links go here --></td></tr>
</table>
<!-- content goes here -->
Layout tables can be avoided entirely by using CSS. The navigation links and the page's main content are placed inside separate DIV elements, and then CSS is used to position these DIV elements relative to each other. The style sheet can use a smaller background image that repeats vertically and is aligned along the left, for example:
body { color: black;
background: white url(foo.gif) repeat-y left }
Finally, a navigation strip on the left can be implemented with frames. However, frames introduce problems that are best avoided if possible.
If you want others to view your web page with specific colors, the most appropriate way is to suggest the colors with a style sheet. Cascading Style Sheets use the color and background-color properties to specify text and background colors. To avoid conflicts between the reader's default colors and those suggested by the author, these two properties should always be used together.
With HTML, you can suggest colors with the TEXT, LINK, VLINK (visited link), ALINK (active link), and BGCOLOR (background color) attributes of the BODY element.
Note that these attributes are deprecated by HTML 4. Also, if one of these attributes is used, then all of them should be used to ensure that the reader's default colors do not interfere with those suggested by the author. Here is an example:
<body bgcolor="#ffffff" text="#000000" link="#0000ff"
vlink="#800080" alink="#000080">
Authors should not rely on the specified colors since browsers allow their users to override document-specified colors.
The most appropriate way is to use suitable structural markup, and to suggest the desired color with a style sheet. If you want to specify a color for only certain cases of an element, then you can use a class to specify which cases are special. The following CSS example specifies that emphasized text with the class "special" should be green (on a white background):
em.special { color: green; background: white; }
When displayed according to this CSS ruleset, the emphasized text in the following HTML example will be displayed in green:
normal text <em class="special">emphasized text</em>
normal text
With HTML, the FONT element can also be used to suggest colors. Note that the FONT element is deprecated by HTML 4. Also, use of the FONT element brings numerous usability and accessibility problems.
In Internet Explorer 5.5, Microsoft introduced proprietary CSS properties for scrollbar colors. Since then, other browsers (e.g., KDE Konqueror, Opera) have added support for these properties. These properties are: scrollbar-3dlight-color, scrollbar-arrow-color, scrollbar-base-color, scrollbar-darkshadow-color, scrollbar-face-color, scrollbar-highlight-color, and scrollbar-shadow-color.
If you want others to view your web page with a specific font, the most appropriate way is to suggest the font rendering with a style sheet. Cascading Style Sheets use the font-family property to specify font faces.
With HTML, the BASEFONT element can be used to suggest specific fonts for the entire document.
With HTML, the FONT element can also be used to suggest specific fonts. The FONT element must be repeated inside every block-level element, since it can contain only inline (text-level) elements. Use of the FONT element brings numerous usability and accessibility problems.
Whether specifying fonts with CSS or with HTML, authors run the risk that a reader's system has a font by the same name but which is significantly different. For example, "Chicago" can be a nice text font, a display font with letters formed by "bullet holes", or a novelty font containing images of city buildings (for creating skylines).
Also, authors must either use fonts (or groups of similar fonts) that are commonly available on many systems, or provide dynamic fonts for their readers. Readers who do not have the specified font(s) installed, or who do not download the dynamic fonts provided by the author, will see a default font. Some browsers may use a less legible substitute font than their normal default font in cases where author-specified fonts are not found.
Internet Explorer supports embedded fonts with Microsoft's Web Embedding Fonts Tool (WEFT).