XML

XML, stands for Extensible Markup Language, is a family of standards for the exchange of structured information that was developed by the World Wide Web Consortium (W3C).

What is XML?

XML stands for Extensible Markup Language, a family of standards for the exchange of structured information that was developed by the World Wide Web Consortium (W3C). XML is viewed as the successor to Hypertext Markup Language (HTML), which is still commonly used for creating Web sites on the Internet and for publishing corporate intranet content.

XML and its various components allow richly formatted and structured information to be delivered over the Web, and XML promises to be widely used in electronic commerce and electronic business applications.

How XML Works

Like HTML, XML uses embedded tags to mark up documents for formatting purposes and to create relationships between documents (that is, to create hypertext). In fact, XML is a restricted subset of Standard Generalized Markup Language (SGML), which has existed for years but is unsuitable for implementation on the Web.

Unlike HTML, with its fixed syntax of tags, XML allows users to declare and use their own tags by using document type definitions (DTDs), which define the syntax, structure, and meaning of their tags. In other words, XML does not specify the set of available tags or their syntax, but instead functions as a meta-language for creating and describing other markup languages.

Various DTDs have been created for different subject areas, such as science, commerce, and documentation. XML also extends the idea of a “document” to include not only text files but also e-commerce transactions, server application programming interfaces (APIs), vector graphics, and many other forms. As a result, XML is far more universal than HTML.

XML also uses Extensible Stylesheet Language (XSL), in which you can define classes of XML documents and how they are formatted. You can use the XML Linking Language (XLL) to create links in XML documents to external objects such as multimedia objects, and use the XML Pointer Language (XPointer) to define link addresses in an XML document. These two languages go beyond the simple anchor tag (<A>) of HTML and provide ways to create one-to-many links, bidirectional links, read-only links, and other complex structural interactions between XML documents. Other components of the XML system include namespaces, query languages, and schema languages, many of which are still under development.

Here is a simple example of an XML document:

<?XML VERSION="1.0">
<HUMOR>
<BOB><QUOTE>Knock knock.</QUOTE>
<SALLY><QUOTE>Who's there?</QUOTE>
<LAUGHTER/>
</HUMOR>

This example illustrates two of the XML markup types:

  • Processing instructions:
    Supply necessary information to the application parsing the XML document, such as <?XML VERSION="1.0">, which tells the application that the document being parsed is written in XML.

     

  • Elements:
    Surround content with start and end tags, as in <QUOTE>…</QUOTE>. Elements of the form <…/>, such as <LAUGHTER/>, are called empty elements.

     

Other types of XML markup include the following:

  • Attributes, which are name-value pairs that extend the definition of a start tag.
  • Comments, which are represented by <!--…-->, as in HTML.
  • CDATA sections, such as <![CDATA[…]]>, which indicate to the parser in the application reading the document that the enclosed section is to be read unparsed. This might be used for computer code, for example.
  • Entity references, which specify reserved and special characters. For example, &LT; represents the less than symbol (<) that indicates the beginning of an element’s start tag.

XML also includes declarations that enable the XML document to communicate various types of meta-information to the application parsing the document. These include declarations for new elements, lists of attributes, and new entities. In the preceding sample XML document, for example, the elements <HUMOR>, <BOB>, <SALLY>, <LAUGHTER/>, and <QUOTE> would all need to be declared using <!ELEMENT…> declaration statements.

NOTE

Microsoft’s Channel Definition Format (CDF) was one of the earliest uses for XML in Internet environments.

Web references: