Next section: HTML and SGML
Up to main index
[see parent document for copyright information]
lint for C. It compares your HTML document to the defined
syntax of HTML and reports any discrepancies.
Be conservative in what you produce; be liberal in what you accept.
Browsers follow the second half of this maxim by accepting Web pages and trying to display them even if they're not legal HTML. Usually this means that the browser will try to make educated guesses about what you probably meant. The problem is that different browsers (or even different versions of the same browser) will make different guesses about the same illegal construct; worse, if your HTML is really pathological, the browser could get hopelessly confused and produce a mangled mess, or even crash.
That's why you want to follow the first half of the maxim by making sure your pages are legal HTML. The best way to do that is by running your documents through one or more HTML validators.
Weblint: http://www.cre.canon.co.uk/~neilb/weblint/
HTMLChek: http://uts.cc.utexas.edu/~churchh/htmlchek.html
WebTechs: http://www.webtechs.com/html-val-svc/
(née HALSoft)
Weblint and HTMLChek are heuristic validators --- that is, they
do not completely parse your HTML markup, but simply scan it looking for
errors. The advantage of this is that they can detect constructs that
are legal HTML but considered "bad style", such as an
<IMG> tag without an ALT attribute; the disadvantage
is that they can fail to detect some errors. WebTechs is similar to
KGV; both operate directly from the HTML language definition, and both
strictly obey the rules of SGML. If your
document passes one of these validators, you know it's clean.
I recommend using a combination of validators: one of Weblint or HTMLChek, and one of WebTechs or KGV. Each has features that the others don't, and they complement each other nicely.
nsgmls SGML
parser. The Validator itself is a CGI script that fetches your URL,
passes it through nsgmls, and post-processes the resulting
error list for easier reading.
nsgmls, KGV can perform
other checks and transformations on your document:
<Hn> elements in your document to construct an
outline or "table of contents" of the document. If you've used the
<Hn> elements for something other than section
headers (for instance, to get fake font changes), this outline will
probably come out very strange.
nsgmls to construct a
neatly formatted parse tree of the HTML markup in your document. The
exact format of the resulting parse tree is undergoing revision.
You may notice that the parse tree contains HTML tags which don't
appear in your document --- end tags for P and
LI elements, for instance, or start and end tags for the
HTML, HEAD and BODY elements.
This is a reflection of how SGML parsers
operate. The tags in question are marked as optional in the HTML DTD; the author may therefore omit these tags,
and the parser will infer their presence at the appropriate place. For
instance, after a <LI> start tag, the parser will
infer a </LI> end tag just before the first
subsequent tag that cannot appear in a LI element ---
typically the </UL> end tag or the start tag of the
next LI element.
DOCTYPE declaration to your
document; see the section on
DOCTYPE for more information. If your document does
not have a DOCTYPE declaration, KGV will assume an HTML
2.0 document type (and will tell you it is doing so).
http://validator.w3.org/source/.
DOCTYPE
declaration (or lack thereof)? Make sure your document has a
syntactically correct DOCTYPE declaration, as described in section 2, and make sure it correctly
identifies the type of HTML you're using. Then run it through KGV
again; if you're lucky, you should get a lot fewer errors.
If this doesn't help, then you may be experiencing a cascade failure --- one error that gets KGV so confused that it can't make sense of the rest of your page. Try correcting the first few errors and running your page through KGV again.
<TABLE BORDER> to indicate
the presence of a border) and as a width specifier (ie. <TABLE
BORDER=5> to specify a five-pixel border). There's no way to
write a DTD that can encompass both of these uses; WebTechs' Mozilla DTD,
which KGV uses, allows the latter usage and not the former.
<TABLE WIDTH=50> specifies
a width of 50 pixels) and relative (ie. <TABLE
WIDTH=50%> specifies half the page width). KGV appears to
have fixed this part of the WebTechs Mozilla DTD; if you put the
attribute value in quotes, as noted above, this should come out fine.
Next section: HTML and SGML
Up to main index
[see parent document for copyright information]
Sending feedback? Check here first.
Last update 08 Aug 98
dsb@killerbunnies.org