Extensible Markup Language: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Paul Derry
mNo edit summary
 
(51 intermediate revisions by 6 users not shown)
Line 1: Line 1:
'''eXtensible Markup Language''' ('''XML''') is a [[W3C]] markup language derived from [[SGML]] (ISO8879-1986) used in a wide variety of applications for the storage and representation of textual data.
{{subpages}}


== Features and Syntax ==
'''XML''' (eXtensible Markup Language) is a platform-independent, human-readable format for representing data.  It was released as a [[W3C]] Recommendation in November 1998 and, perhaps because of its platform independence and readability both by humans and software, obtained immediate acceptance.  Today, XML is in widespread use across the [[World Wide Web]], especially for sending data between [[computer]]s and for [[serialization|serializing]] data to disk, and most computer programming languages include support for it. XML is one of several [[wire format|wire formats]] used in [[Ajax]] (Asynchronous Javascript And Xml), the ''background'' interaction of a web page with programs residing elsewhere on a network.


=== W3C Definitions ===
== XML Specification and Origin ==


==== Documents and Well-formedness ====
XML is a subset of the Standard Generalized Markup Language, or [[SGML]] (ISO8879-1986). XML's specification first emerged in 1996 through the efforts of the XML Special Interest Group and the SGML Editorial Review Board, chaired by John Bosak of Sun Microsystems. The group, also known as the XML Working Group, laid out the following set of guidelines or design goals for XML:
XML consists of a hierarchical tree of '''elements''' that contain '''attributes''' to express the structure of data. Ideally, all XML documents are '''well-formed''' as defined in the specifications provided by the W3C. As of version 1.0 the definition is this:
<blockquote>


<!-- http://www.w3.org/TR/2006/REC-xml-20060816/ -->
#''XML shall be straightforwardly usable over the Internet.''
#''XML shall support a wide variety of applications.''
#''XML shall be compatible with SGML.''
#''It shall be easy to write programs which process XML documents.''
#''The number of optional features in XML is to be kept to the absolute minimum, ideally zero.''
#''XML documents should be human-legible and reasonably clear.''
#''The XML design should be prepared quickly.''
#''The design of XML shall be formal and concise.''
#''XML documents shall be easy to create.''
#''Terseness in XML markup is of minimal importance.''


<blockquote>
<ref>Bray, Tim, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, and François Yergeau, eds. "Extensible Markup Language (XML) 1.0 (Fourth Edition)." World Wide Web Consortium Recommendations. 29 Sept. 2006. 18 May 2007 <http://www.w3.org/TR/2006/REC-xml-20060816/#sec-origin-goals>.</ref>
# Taken as a whole, it matches the production labeled document.
# It meets all the well-formedness constraints given in this specification.
# Each of the parsed entities which is referenced directly or indirectly within the document is well-formed.
</blockquote>
</blockquote>


===== Tags and Elements =====
Notably missing from the above list is ''efficiency''.  XML is plain text and can be verbose, and in applications where speed is everything, a different data representation may be preferred.
 
=== Grammar and Definitions ===
 
This section contains a brief over view of the components that define XML, for a more in-depth examination of those components read the individual sections under the heading '''''Structure'''''
 
 
 
'''Prolog''' - A prolog is a sort of 'announcement' to the XML parser as to what version the following set of XML objects is written in. The prolog also contains other information pertinent to the XML document (discussed later.)


There are some initial ambiguities in this definition such as what a ''document'' is exactly. The XML specification defines a document as a text object that contains a prolog and one or more elements. A document must have one and only one root element, and any other elements must nest inside of it delimited by start and end tags
<pre>
<pre>
<?xml version="1.0"?>
<?xml version="1.0" standalone="no" ?>
</pre>
 
 
'''Tag''' - A tag denotes the beginning, end, or existence of an XML object. They consist of the '''<''' character followed by the name of the tag, and, in the case of an opening tag, just a '''>''' at the end. The exception is with end tags and single tags. End Tags always have a '''<''' then a forward slash before the name of the tag with a '''>''' at the end. Single tags always have a space after the name of the tag followed by a forward slash and then end with the '''>'''.
 
''Example''
<pre>
<ninja>
    <inventory>
        <nunchucks />
        <cookies />
    </inventory>
</ninja>
</pre>
 
 
 
'''Element''' - An element is an XML object composed of a start tag and a stop tag or a single tag.
 
''Example''
<pre>
<?xml version="1.0" standalone="no" ?>
<family>
    <member>
        <name>dad</name>
        <favorite-food>pancakes</favorite-food>
        <favorite-animal>wombats</favorite-animal>
    </member>
</family>
</pre>
 
'''Document''' - A document is any XML object that contains a prolog and one or more elements excluding the root element.
 
<pre>
<?xml version="1.0" standalone="no" ?>
<root-element>
<root-element>
     <child-element>
     <lots-of-elements>
     ...
     ...
     </child-element>
     </lots-of-elements>
</root-element>
</root-element>
</pre>
</pre>


<code><element></code> is a start tag, notice that it does not have a forward slash in front of the name of the element.
=== Structure ===
<code></element></code> is an end tag, notice that it '''does''' have a forward slash ''in front'' of the name of the element. There are also '''single tag''' elements which also have a forward slash but are ''behind'' the name of the element:
<code><single-tag-element /></code>. By convention there is a space between the forward slash and the element name. This single tag element appears in the XHTML definition for a line break <code><br /></code>.


== Usage Examples ==
=== Data Storage ===
==== Address Book Example ====


Suppose Thomas wants to write a simple address book program that stores his addresses and phone numbers in a simple structured manner. He decides his best option is to use XML to store his information because he can define what information he wishes to store. XML works best when the information has a hierarchical arrangement, so Thomas designs the '''schema''' of how his information will be held.
==== Prolog ====


*Person or Company Name
The XML prolog is a mandatory object present at the beginning of the document.
**Addresses
***Mailing Address
***E-mail Address
**Phone Numbers
***Fax Number
***Cell/Mobile Number
**Note
 
This arrangement of data is then transformed into XML, one possible arrangement is below:


<pre>
<pre>
<?xml version="1.0"?>
<?xml version="###" {encoding="???" standalone='y/n'} ?>
<address-book>
    <person name="Thomas Paine">
        <addresses>
            <mailing>812 Juniper Road</mailing>
            <email>tpaine@foo.net</email>
        <telephone>
            <primary>987-654-4321</primary>
            <fax>555-555-5555</fax>
            <cell>123-456-7890</cell>
        <note>Me.</note>
    </person>
</address-book>
</pre>
</pre>


Let's dissect this example and see what the different components of this structure do.
'''version''' equals the version of XML the document was written to.
 
''(Optional)'' '''encoding''' equals the character codebook the document was written to. The default is '''UTF-8'''. The character encoding declaration must be written using latin characters only.<ref>Bray, Tim, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, and François Yergeau, eds. "Extensible Markup Language (XML) 1.0 (Fourth Edition)." World Wide Web Consortium Recommendations. 29 Sept. 2006. 18 May 2007 <http://www.w3.org/TR/2004/REC-xml-20040204/#NT-EncodingDecl> §4.3.3 Character Encoding in Entities.</ref>
 


<code><?xml version="1.0"?></code>
<pre>EncName    ::=   [A-Za-z] ([A-Za-z0-9._] | '-')* </pre>
The encoding name '''must''' begin with an alphabetic character and all other characters must be alphanumeric, the underscore ( _ ), the decimal ( . ), or a dash ( - ).


The first line of the example is known as the '''prolog''', it tells the '''parser''' that the file it's about to parse is indeed XML. The version '''attribute''' is the version of the XML specification the document represents. In order for an XML document to be '''well-formed''' it must contain this prolog. "Well-formed"ness and validity will be discussed later.


The '''root element''' in this structure is <code><address-book></code> The root element is the top of the hierarchical tree that XML documents typically represent. In this case it represents the address book as a whole. There are ways of specifying whether or not an element can contain multiple sub-elements with the same tag however that will not be covered in this example just yet.
''(Optional)'' '''standalone''' is either ''yes'' or ''no''. ''Yes'' in the instance that the XML document does not have an external DTD or Schema. ''No'' if the XML document does have an external DTD or Schema.


The first sub-element in the tree is <code><person name="Thomas Paine"></code>. Obviously this element represents a person within the address book. The <code>name</code> attribute is the name of the person or entity that the <code><person></code> element will represent. Each of the other sub-elements are embedded inside the first a little like Russian stacking dolls, each one defines a more specific entity within the larger entity.
====Elements====
An '''element''' is an XML object defined by the accompanying DTD or Schema that is described by the use of tags, parsed text, attributes, and entities. An element may consist of a start tag and a end tag or a single tag.


==== SOAP ====
<pre>
=== Data Formatting ===
<?xml version="1.0" standalone="no" ?>
==== XHTML ====
<army>
==== MathML ====
    <ninja name="woody">
==== SVG ====
        <inventory>
            <nunchucks count="2" />
            <cookies count="15" />
        </inventory>
    </ninja>
</army>
</pre>


=== Data Description ===
In this case, '''<army>''' represents a root element that contains a series of elements, in this example the element shown is a '''<ninja>'''. A ninja has an inventory that contains a certain number of nunchucks and a certain number of cookies. The number of cookies and nunchucks are defined by the attribute '''count'''. Note, the attribute values are in quotation marks, all well formed XML documents have their attributes quoted in that fashion either with single quotes ''' ' '''or double quotes '''"'''.
==== RDF ====
==== Schema ====


== References ==
== References ==
<references/>


== See Also ==
== External links ==
*[[Standard Generalized Markup Language|SGML]]
* [http://www.w3schools.com/xml/default.asp http://www.w3schools.com/xml/]
*[[HyperText Markup Language|HTML]]
*[[Scalable Vector Graphics|SVG]]


== Related Technologies ==


[[Category:Computers Workgroup]]
*[[XSL|Extensible Stylesheet Language]]
[[Category: CZ Live]]
*[[XSLT|Extensible Stylesheet Language Transformations]]
*[[XSL-FO|Extensible Stylesheet Language Formatting Objects]]
*[[RSS|Really Simple Syndication]]
*[[RDF|Resource Description Framework]]
*[[XPath|XPath]]
*[[XQuery|XQuery]]
*[[XForms|XForms]]
*[[XPointer|XPointer]]
*[[XLink|XLink]]
*[[WSDL|Web Services Description Language]][[Category:Suggestion Bot Tag]]

Latest revision as of 16:01, 14 August 2024

This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

XML (eXtensible Markup Language) is a platform-independent, human-readable format for representing data. It was released as a W3C Recommendation in November 1998 and, perhaps because of its platform independence and readability both by humans and software, obtained immediate acceptance. Today, XML is in widespread use across the World Wide Web, especially for sending data between computers and for serializing data to disk, and most computer programming languages include support for it. XML is one of several wire formats used in Ajax (Asynchronous Javascript And Xml), the background interaction of a web page with programs residing elsewhere on a network.

XML Specification and Origin

XML is a subset of the Standard Generalized Markup Language, or SGML (ISO8879-1986). XML's specification first emerged in 1996 through the efforts of the XML Special Interest Group and the SGML Editorial Review Board, chaired by John Bosak of Sun Microsystems. The group, also known as the XML Working Group, laid out the following set of guidelines or design goals for XML:

  1. XML shall be straightforwardly usable over the Internet.
  2. XML shall support a wide variety of applications.
  3. XML shall be compatible with SGML.
  4. It shall be easy to write programs which process XML documents.
  5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
  6. XML documents should be human-legible and reasonably clear.
  7. The XML design should be prepared quickly.
  8. The design of XML shall be formal and concise.
  9. XML documents shall be easy to create.
  10. Terseness in XML markup is of minimal importance.

[1]

Notably missing from the above list is efficiency. XML is plain text and can be verbose, and in applications where speed is everything, a different data representation may be preferred.

Grammar and Definitions

This section contains a brief over view of the components that define XML, for a more in-depth examination of those components read the individual sections under the heading Structure


Prolog - A prolog is a sort of 'announcement' to the XML parser as to what version the following set of XML objects is written in. The prolog also contains other information pertinent to the XML document (discussed later.)

<?xml version="1.0" standalone="no" ?>


Tag - A tag denotes the beginning, end, or existence of an XML object. They consist of the < character followed by the name of the tag, and, in the case of an opening tag, just a > at the end. The exception is with end tags and single tags. End Tags always have a < then a forward slash before the name of the tag with a > at the end. Single tags always have a space after the name of the tag followed by a forward slash and then end with the >.

Example

<ninja>
    <inventory>
        <nunchucks />
        <cookies />
    </inventory>
</ninja>


Element - An element is an XML object composed of a start tag and a stop tag or a single tag.

Example

<?xml version="1.0" standalone="no" ?>
<family>
    <member>
        <name>dad</name>
        <favorite-food>pancakes</favorite-food>
        <favorite-animal>wombats</favorite-animal>
    </member>
</family>

Document - A document is any XML object that contains a prolog and one or more elements excluding the root element.

<?xml version="1.0" standalone="no" ?>
<root-element>
    <lots-of-elements>
    ...
    </lots-of-elements>
</root-element>

Structure

Prolog

The XML prolog is a mandatory object present at the beginning of the document.

<?xml version="###" {encoding="???" standalone='y/n'} ?>

version equals the version of XML the document was written to.

(Optional) encoding equals the character codebook the document was written to. The default is UTF-8. The character encoding declaration must be written using latin characters only.[2]


EncName    ::=    [A-Za-z] ([A-Za-z0-9._] | '-')* 

The encoding name must begin with an alphabetic character and all other characters must be alphanumeric, the underscore ( _ ), the decimal ( . ), or a dash ( - ).


(Optional) standalone is either yes or no. Yes in the instance that the XML document does not have an external DTD or Schema. No if the XML document does have an external DTD or Schema.

Elements

An element is an XML object defined by the accompanying DTD or Schema that is described by the use of tags, parsed text, attributes, and entities. An element may consist of a start tag and a end tag or a single tag.

<?xml version="1.0" standalone="no" ?>
<army>
    <ninja name="woody">
        <inventory>
            <nunchucks count="2" />
            <cookies count="15" />
        </inventory>
    </ninja>
</army>

In this case, <army> represents a root element that contains a series of elements, in this example the element shown is a <ninja>. A ninja has an inventory that contains a certain number of nunchucks and a certain number of cookies. The number of cookies and nunchucks are defined by the attribute count. Note, the attribute values are in quotation marks, all well formed XML documents have their attributes quoted in that fashion either with single quotes ' or double quotes ".

References

  1. Bray, Tim, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, and François Yergeau, eds. "Extensible Markup Language (XML) 1.0 (Fourth Edition)." World Wide Web Consortium Recommendations. 29 Sept. 2006. 18 May 2007 <http://www.w3.org/TR/2006/REC-xml-20060816/#sec-origin-goals>.
  2. Bray, Tim, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, and François Yergeau, eds. "Extensible Markup Language (XML) 1.0 (Fourth Edition)." World Wide Web Consortium Recommendations. 29 Sept. 2006. 18 May 2007 <http://www.w3.org/TR/2004/REC-xml-20040204/#NT-EncodingDecl> §4.3.3 Character Encoding in Entities.

External links

Related Technologies