Extensible Markup Language: Difference between revisions
imported>Paul Derry |
imported>Paul Derry |
||
Line 2: | Line 2: | ||
== Features and Syntax == | == Features and Syntax == | ||
XML consists of a hierarchical tree of '''elements''' that contain '''attributes''' to express the structure of data. | |||
=== W3C Definitions === | |||
==== Documents and Well-formedness ==== | |||
XML consists of a hierarchical tree of '''elements''' that contain '''attributes''' to express the structure of data. Ideally, all XML documents are '''well-formed''' as defined in the specifications provided by the W3C. As of version 1.0 the definition is this: | |||
<!-- http://www.w3.org/TR/2006/REC-xml-20060816/ --> | |||
<blockquote> | |||
# Taken as a whole, it matches the production labeled document. | |||
# It meets all the well-formedness constraints given in this specification. | |||
# Each of the parsed entities which is referenced directly or indirectly within the document is well-formed. | |||
</blockquote> | |||
===== Tags and Elements ===== | |||
There are some initial ambiguities in this definition such as what a ''document'' is exactly. The XML specification defines a document as a text object that contains a prolog and one or more elements. A document must have one and only one root element, and any other elements must nest inside of it delimited by start and end tags | |||
<pre> | |||
<?xml version="1.0"?> | |||
<root-element> | |||
<child-element> | |||
... | |||
</child-element> | |||
</root-element> | |||
</pre> | |||
<code><element></code> is a start tag, notice that it does not have a forward slash in front of the name of the element. | |||
<code></element></code> is an end tag, notice that it '''does''' have a forward slash ''in front'' of the name of the element. There are also '''single tag''' elements which also have a forward slash but are ''behind'' the name of the element: | |||
<code><single-tag-element /></code>. By convention there is a space between the forward slash and the element name. This single tag element appears in the XHTML definition for a line break <code><br /></code>. | |||
== Usage Examples == | == Usage Examples == |
Revision as of 10:19, 16 May 2007
eXtensible Markup Language (XML) is a W3C markup language derived from SGML (ISO8879-1986) used in a wide variety of applications for the storage and representation of textual data.
Features and Syntax
W3C Definitions
Documents and Well-formedness
XML consists of a hierarchical tree of elements that contain attributes to express the structure of data. Ideally, all XML documents are well-formed as defined in the specifications provided by the W3C. As of version 1.0 the definition is this:
- Taken as a whole, it matches the production labeled document.
- It meets all the well-formedness constraints given in this specification.
- Each of the parsed entities which is referenced directly or indirectly within the document is well-formed.
Tags and Elements
There are some initial ambiguities in this definition such as what a document is exactly. The XML specification defines a document as a text object that contains a prolog and one or more elements. A document must have one and only one root element, and any other elements must nest inside of it delimited by start and end tags
<?xml version="1.0"?> <root-element> <child-element> ... </child-element> </root-element>
<element>
is a start tag, notice that it does not have a forward slash in front of the name of the element.
</element>
is an end tag, notice that it does have a forward slash in front of the name of the element. There are also single tag elements which also have a forward slash but are behind the name of the element:
<single-tag-element />
. By convention there is a space between the forward slash and the element name. This single tag element appears in the XHTML definition for a line break
.
Usage Examples
Data Storage
Address Book Example
Suppose Thomas wants to write a simple address book program that stores his addresses and phone numbers in a simple structured manner. He decides his best option is to use XML to store his information because he can define what information he wishes to store. XML works best when the information has a hierarchical arrangement, so Thomas designs the schema of how his information will be held.
- Person or Company Name
- Addresses
- Mailing Address
- E-mail Address
- Phone Numbers
- Fax Number
- Cell/Mobile Number
- Note
- Addresses
This arrangement of data is then transformed into XML, one possible arrangement is below:
<?xml version="1.0"?> <address-book> <person name="Thomas Paine"> <addresses> <mailing>812 Juniper Road</mailing> <email>tpaine@foo.net</email> <telephone> <primary>987-654-4321</primary> <fax>555-555-5555</fax> <cell>123-456-7890</cell> <note>Me.</note> </person> </address-book>
Let's dissect this example and see what the different components of this structure do.
<?xml version="1.0"?>
The first line of the example is known as the prolog, it tells the parser that the file it's about to parse is indeed XML. The version attribute is the version of the XML specification the document represents. In order for an XML document to be well-formed it must contain this prolog. "Well-formed"ness and validity will be discussed later.
The root element in this structure is <address-book>
The root element is the top of the hierarchical tree that XML documents typically represent. In this case it represents the address book as a whole. There are ways of specifying whether or not an element can contain multiple sub-elements with the same tag however that will not be covered in this example just yet.
The first sub-element in the tree is <person name="Thomas Paine">
. Obviously this element represents a person within the address book. The name
attribute is the name of the person or entity that the <person>
element will represent. Each of the other sub-elements are embedded inside the first a little like Russian stacking dolls, each one defines a more specific entity within the larger entity.