There are generally two files that are processed by an XML-compliant application to display XML content: the XML document and a style sheet (discussed in this section). In addition, some documents also use a Document Type Definition (DTD) that defines each tag allowed in the document along with its attributes and rules for use. The XML client can use the DTD to "decode" the markup and check it for accuracy. DTDs are discussed later in this chapter.
The software that interprets the information in XML documents is called a parser. Both Microsoft and Netscape have built XML parsers into the latest versions of their browsers (Internet Explorer 5.5 and higher and Netscape 6).
As we've seen at the beginning of this chapter, an XML document contains marked-up content and looks similar to HTML in that regard. While you may think of a document as containing paragraphs and headings, XML documents can contain a vast variety of content. An XML document might be text-based, like a magazine article, or it could contain only numerical data to be transferred from one database or application to another. An XML document might also contain an abstract structure such as a particular vector graphic shape or a mathematical equation.
It is important to note that an XML document is not limited to one physical file but may be made up of content from multiple files. Markup is used to integrate the contents of different files to form the logical structure of a single XML document.
XML documents are comprised of units called elements, which are indicated by tags. These elements may be further described or enhance by attributes. These terms should be familiar to you if you have any experience with HTML.
In addition, XML documents may contain entities, placeholders for content which you declare once and use throughout the document. We've seen character entities used in HTML (see Chapter 10, "Formatting Text" and Appendix F, "Character Entities"), but in XML, entities have a more versatile role.
In XML, entities can be used not only for single characters, but for any string of text, even another chunk of XML markup. Entities provide a useful shortcut for adding frequently used information to a document, such company contact information or a legal boilerplate. Special external entities are what's used to place parts of the XML document that reside in separate files. Entities may be defined in the document itself (general entities) or in the DTD for the XML application (parameter entities).
Remember that a markup language only describes the structure of a document; it is not concerned with how it looks. Documents refer to external style sheets that give instructions on how each element should look when displayed in a browser (or other display device).
Like HTML, XML documents can use Cascading Style Sheets (see Chapter 17, "Cascading Style Sheets"). A more robust style sheet language called the Extensible Stylesheet Language (XSL) exists for XML documents. The W3C's general rule for which style sheet to use is "Use CSS whenever you can; use XSL whenever you must." XSL creates a large overhead in processing, whereas CSS is fast and simple, making it generally preferable.
XSL is useful when the contents of the XML document need to be "transformed" before final display. Transforming generally refers to the process of converting one XML language to another, such as turning a particular XML language into XHTML on the fly, but it can also be used for transformations as simple as replacing words with other words. An XSLT (Extensible Stylesheet Language for Transformations, a subset of XSL) style sheet works as a translator in the transformation process. XSL is not covered in this chapter; for more information, see the XSL information on the W3C site at http://www.w3.org/Style/XSL/.
Copyright © 2002 O'Reilly & Associates. All rights reserved.