Xml root element. Description of the structure of XML documents

Twitter

XML (eXtensible Markup Language) is a simplified dialect of the SGML language designed to describe hierarchical data structures in World Wide Web. It is being developed working group W3C since 1996; the currently accepted recommendation is the second edition XML language 1.0 (October 2000), which is the basis for further presentation.

XML is undoubtedly one of the most promising technologies WWW, which explains the interest it receives from both developer corporations and the general public. Before moving on to its description, it seems appropriate to discuss the reasons for its appearance and subsequent rapid development. To do this, let's try to look at the problems of the WWW that must be solved by means of the new generation of Web technologies.

HTML does not express the meaning of documents. HTML language was created to describe structures documents (title, headings, lists, paragraphs, etc.) and, to some extent, their rules display(bold, italic, etc.). It is in no way intended to describe sense documents written on it, and in many cases it is the data that constitutes the essence of the document, be it a stock market report or scientific publication. Therefore, there was a need for a language for describing data, and data organized in hierarchical structures. HTML is cumbersome and inflexible. Behind last years HTML has turned into a jumble of tags that often duplicate each other and do not bring clarity to the text of the document. If we add here non-standard HTML extensions, which all browser developers are guilty of, then creating more or less complex HTML documents becomes a serious task. On the other hand, a once and for all fixed set of tags is often not flexible enough to express the content we need. The Web Browser concept is too limited. With the advent of Java applets, scripting languages and ActiveX controls Web browsers are no longer simply "renderers" of HTML documents; today they look more like programs that run specific applications. However, the very concept of a browser imposes unnecessary restrictions on the user; in many cases we need Web-based applications, i.e. programs that can read specialized information from Web sites and provide it to us in a familiar form, for example, in the form of spreadsheets. Document search returns too many links. We all use it all the time search engines and constantly blame them for the inconvenience of work. Let's say that I need all the texts of Sergei Dovlatov's books available on the Internet. Trying to search by author's name will result in me getting a list of all links with that name, including memories of Dovlatov, reviews of his books, etc. It would be much more convenient to use a special tag to indicate what exactly I'm looking for. Unable to find related resources. Let us now assume that I did find several stories by Dovlatov, which clearly constitute a single collection. It's nice if they include a link to the table of contents, but often they don't. Therefore, a way is needed to indicate that a given group of pages constitutes a single resource and should be treated as such. This requires a standardized and developed system metadescriptors Web pages.

XML is an attempt to solve these problems by creating a simple markup language that describes arbitrary structured data. More precisely, it is a metalanguage in which specialized languages are written that describe data of a certain structure. Such languages are called XML dictionaries. Unlike HTML, XML does not contain any instructions on how the data described in the XML document should be displayed. The way data is displayed for different devices is specified by the XSL stylesheet, which plays roughly the same role for XML as CSS does for HTML. Another fundamental difference from HTML is that XML can contain any tags that the creators of the XML dictionary deem necessary to use. Here is a list of just a few specialized XML-based languages that are currently in various stages of development by W3C working groups:

MathML language of mathematical formulas;
SMIL Multimedia Integration and Synchronization Language;
SVG two-dimensional vector graphics language;
RDF resource meta description language;
XHTML reformulation of HTML in XML terms.

The process of processing an XML document is as follows. Its text is analyzed by a special program called XML processor. The XML processor knows nothing about the semantics of the data in the document; it only parses the text of the document and checks its correctness in terms of XML rules. If the document correctly formatted(well-formed), then the results of text parsing are transferred by the XML processor to the application program, which performs their meaningful processing; if the document is formatted incorrectly, that is, it contains syntax errors, then the XML processor must report them to the user.

8.1.2. Applications of XML

The question arises: what is the point in using “empty language”, devoid of its own content? The fact is that, despite its apparent simplicity, XML has quite sophisticated mechanisms for monitoring the correctness of data, allows checking hierarchical relationships within a document, and, most importantly, establishes a single standard for documents storing data, whatever the nature of this data. Let's take a closer look at some areas of application of the XML language.

Traditional data processing The capabilities listed above allow us to consider XML as a platform-independent standard for storing and presenting information, which, in combination with other modern technologies (in particular, Java technologies), can become the basis for creating any machine-independent applications, including data exchange between server and client. In addition, the XML-based query languages that are actively being developed today can seriously compete with the SQL language. Document Driven Programming XML documents can serve as containers for building applications from existing interfaces and components. In this case, the document consists of references to user interface components and data processing modules that are linked as the page is displayed on the screen. Component Archiving Modern programming is based on the use of components, which ideally should be easily assembled into a single whole using simple additional coding. The basis for this is the archiving of components, which, in turn, requires a uniform approach to their storage and subsequent use. There is every reason to believe that in the near future, XML documents will provide an alternative to storing components as binary modules, which is common today. Data embedding Once we have defined the structure of the XML data, it is fundamentally easy to write a code generator that processes this data. As such software develops, all routine data processing (including checking its correctness, presentation in the required format, etc.) can be automated, allowing developers to focus on non-standard parts of the product being created.

8.1.3. XML Document Structure

An XML document consists of declarations, elements, comments, special characters, and directives. All these components of the document are described in this chapter.

8.1.3.1. Elements and Attributes

XML this tagged language marking up documents. In other words, any XML document is a collection elements, and the beginning and end of each element are indicated by special marks called tags.

An element consists of three parts: a start tag, content, and an end tag. The tag is the text enclosed in angle brackets "<" и ">". The end tag has the same name as the start tag, but begins with a forward slash "/". Example XML element:

Sergey Dovlatov

Element names are case sensitive, i.e. , And these are the names of various elements. The closing tag is always required. If the tag is empty, i.e. does not have content and a closing tag, then it has a special form:

<элемент/>

Any element can have attributes, containing additional information about the element. Attributes are always included in the element's start tag and look like this:

Attribute_name="attribute_value"

The attribute must have a value, which must always be enclosed in single or double quotes. Attribute names are also case sensitive. An example of an element that has an attribute:

Sergey Dovlatov

The elements must either follow each other or be nested inside one another:

Part of speech Brodsky, Joseph March of the Lonely Dovlatov, Sergey

Here the books element contains two nested book elements, which in turn have an isbn attribute and contain three consecutive elements: title, author and present, the latter being empty , because in this case it corresponds to a logical flag.

From the above description it is clear that the XML syntax resembles the HTML syntax (which is natural, since both of them are dialects of the same language SGML), but the requirements for the design of correct XML documents are higher. Another very important difference between XML and HTML is that the content of elements, that is, everything contained between the start and end tags, is considered data. This means that XML does not ignore space and line breaks like HTML does.

8.1.3.2. Prologue and directives

Any XML document consists of prologue And root element, For example:

March of the Lonely Dovlatov, Sergey

In this example, the prologue is reduced to a single directive (first line of the document) indicating the XML version. It is followed by an XML element with a unique name, which contains all other elements and is called the root. Directive (processing instruction) is an expression enclosed in special tags "", which contains instructions to the program that processes the XML document.

The XML standard reserves only one directive , indicating the version of the XML language that corresponds to this document(second XML versions Not yet). In fact, this directive is somewhat richer and in its most general form looks like this:

Here the encoding attribute specifies the character encoding of the document. By default, XML documents should be created in UTF-8 or UTF-16 format. If any other character encoding is used, then its name according to Table A7.1 should be indicated in this attribute, as shown in the example. The standalone attribute indicates whether the document contains. The value yes means that there are no such sections, the value no means that they exist.

8.1.3.3. Comments

XML documents may contain comments, which are ignored by the application processing the document. Comments follow the same rules as in HTML:

start your comment with "",
Do not use "--" characters inside comments.

Example comments:

8.1.3.4. Names and details

All names elements, attributes, and sections must begin with a Unicode letter and consist of letters, numbers, periods (.), underscores (_), and hyphens (-). The only restriction is that they must not begin with a combination of xml letters in any case; such names are reserved for future language extensions. It is important that the standard allows the use in names not only English letters, but also any others, although existing XML processors are often limited by the encoding systems that are included in them by their creators. That's why we write names in English in our examples.

Data, that is, element contents and attribute values, can consist of any characters except those listed in the next section.

8.1.3.5. Special symbols

A number of characters in XML are reserved and must be represented in a special way:

If desired, you can use numeric character encoding in Unicode standard. In this case, the symbol can be specified by its own decimal code (code; ) or hexadecimal code ( code; ). For example © represents the copyright symbol © , A A– Russian letter A. As we will see later, XML is much richer than HTML in the use of such constructions, since it allows the substitution of any symbolic expressions into the text of documents.

8.1.3.6. CDATA Sections

Another way to include invalid characters in the content of XML elements is to use the so-called. CDATA sections(abbreviated from Character DATA, i.e. character data). Let's say that we want to make the content of the layout element a fragment of HTML text, for example:

Heading

This construction is incorrect, because the H1 HTML tag will be in in this case perceived as XML tag. In order for the entire contents of the layout element to be treated as data, we must enclose it in a CDATA section:

As we can see from this example, the CDATA section is enclosed in delimiters. Everything inside this section is considered character data; in particular, CDATA sections cannot be nested.

8.1.4. Sections and their declarations

8.1.4.1. XML Document Sections

Physically, an XML document can consist of several sections(entities). In this case, the root element of the document is also a section, which is called section of the document, although it is not specially designed in any way. All sections have content; All of them, except the document section and the external DTD, have a name.

From the point of view of document parsing, sections are divided into parsed and unparsed. Unparsed section(unparsed entity) this is a resource whose contents are perceived by the XML processor as external data without them parsing(for example, text that is not an XML document). Unparsed sections always have notation, indicating their format. Analyzed sections(parsed entities) are designed for text substitution: whenever the XML processor encounters the name of such a section in a document, it replaces it with the contents of that section.

8.1.4.2. Internal sections

Section declarations are divided into internal and external. Internal Section Declaration looks like that:

It includes the contents of the object (the value parameter) and is used to substitute this value instead of the section name. We can, for example, introduce the attribute in the example with books genre and use internal sections to set the genre:

]> Part of speech Brodsky, Joseph March of the Lonely Dovlatov, Sergey

From this example it is clear that link to section (entity reference) looks exactly the same as a special character reference, i.e. it has the form &name; . In fact, Special symbols these are exactly the same links, but the corresponding sections are specified implicitly in the internal declaration of the XML language. Such text substitutions are useful for specifying abbreviations to reduce the size of a document, and for introducing notations for frequently changed document fields. So, for example, we can put the date of the next revision of a publication in an internal section and then change only the value of this section.

8.1.4.3. External partitions

There are two options outer section declarations:

The first option is called system partition , second public section. They both associate the section name with an external resource specified by its URI, which must be in encoded form and not contain. The URI of the external resource is called system ID of the partition. The use of an external resource depends on several factors:

If the declaration contains an NDATA parameter specifying section notation, then the section is unparsed.
If the NDATA parameter is not specified, then the section is parsed and the corresponding resource must be an XML document. This means that instead of a link to a section, the text of the document will include the text of the corresponding resource.
The public section may contain a line specifying public section ID. An XML processor can use this identifier to generate an alternative URI for this section. If it fails, then it must use the system ID to load the contents of the partition.

Examples of external resource declarations:

The outer section being parsed must begin with a directive, which may not contain a version number, but must contain a character encoding. This directive is not part of the inline text.

8.1.5. Document type declaration

XML Document Type Declaration(document type declaration) contains document type definition(document type definition, DTD) or points to one. DTD is a special grammar that describes the syntax of a certain class of documents; The rules for creating DTDs are discussed in Chapter. 8.2. Here we only describe the declarations that provide access to the DTD. A document type declaration, like a section declaration, can be internal or external. The internal declaration looks like:

and external the same two options as external partitions:

Thus, the difference between a document type declaration and a section declaration is only that:

it starts with the keyword!DOCTYPE, not!ENTITY;
it may have a body enclosed in square brackets.

The name of such a declaration must match the name of the root element that it describes, and the body must comply with the rules of DTD construction and will be described in Chapter. 8.2. For now, note that it may contain section declarations. An example of an internal declaration was given in. Examples of external declarations:

Note that an external document type declaration may also contain a reference to a DTD, which is called external subset DTD, and a body that describes additions to the external DTD (it's called internal subset DTD).

8.1.6. Example XML Document

To put all the concepts described above into a single whole, here is an example of a complete XML document containing a bookstore price list.

]> March of the Doomed Sergey Dovlatov 60.00 Part of speech Joseph Brodsky 55.00 Antigone Sophocles 103.50

(English) Standard Generalized Markup Language - standard generalized markup language) has declared itself as a flexible, comprehensive and comprehensive meta-language for creating markup languages. Although the concept of hypertext dates back to 1965, SGML does not have a hypertext model. The creation of SGML can be confidently called an attempt to embrace the immensity, since it combines capabilities that are extremely rarely used all together. This is its main drawback - the complexity and, as a result, the high cost of this language limits its use only to large companies that can afford to buy the appropriate software and hire highly paid specialists. In addition, small companies rarely have problems that are complex enough to involve SGML in solving them.

SGML is most widely used to create other markup languages; it was with its help that the hypertext document markup language was created - HTML, the specification of which was approved in 1992. Its appearance was associated with the need to organize the rapidly increasing array of documents on the Internet. The rapid growth in the number of connections to the Internet and, accordingly, web servers entailed a need for encoding electronic documents that SGML could not cope with due to the high difficulty of development. The advent of HTML - very simple markup language- quickly solved this problem: ease of learning and richness of document preparation tools made it the most popular language for Internet users. But as the number and quality of documents on the Web grew, so did the requirements placed on them, and the simplicity of HTML became its main drawback. The limited number of tags and complete indifference to the structure of the document prompted developers represented by the W3C consortium to create a markup language that would not be as complex as SGML and not as primitive as HTML. The result was XML, a language that combines the simplicity of HTML with the markup logic of SGML and meets the demands of the Internet.

Well-formed and valid XML documents

The standard defines two levels of correctness for an XML document:

Properly built(English) well-formed). A well-formed document follows all of the general rules of XML syntax that apply to any XML document. And if, for example, the start tag does not have a corresponding end tag, then this incorrectly constructed XML document. A document that is not properly constructed cannot be considered an XML document; The XML processor (parser) should not process it normally and should classify the situation as a fatal error.

Valid(English) valid). A valid document additionally conforms to certain semantic rules. This is a more stringent additional check of the document’s correctness for compliance with predetermined, but already external rules, in order to minimize the number of errors, for example, the structure and composition of a given, specific document or family of documents. These rules can be developed either by the user themselves or by third-party developers, for example, developers of dictionaries or data exchange standards. Typically, such rules are stored in special files - diagrams, where the structure of the document, all valid names of elements, attributes, and much more are described in detail. And if a document, for example, contains an element name that is not previously defined in the schemas, then the XML document is considered void; When checking for compliance with rules and schemas, the checking XML processor (validator) is obliged (at the user's choice) to report an error.

These two concepts do not have a well-established standardized translation into Russian, especially the concept valid, which can also be translated as valid, legitimate, reliable, fit, or even tested for compliance with rules, standards, laws. Some programmers use established tracing paper in everyday life " Valid».

XML syntax

This section only discusses correct construction XML documents, that is, their syntax.

Let's look at an example of a simple recipe marked up using XML:

> Simple bread > > Flour > Yeast > Warm water > Salt > > > > Mix all ingredients and knead thoroughly. > > Cover with a cloth and leave for one hour in a warm room. > > Knead again, place on a baking sheet and put in the oven. > > >

XML declaration

The first line of the XML document is called XML declaration(English) XML declaration) is a string indicating the XML version. In version 1.0 XML declaration can be omitted, in version 1.1 it is required. The character encoding and the presence of external dependencies can also be indicated here.

The specification requires XML processors to support Unicode encodings UTF-8 and UTF-16 (UTF-32 is optional). Other encodings based on the ISO/IEC 8859 standard are recognized as acceptable, supported and widely used (but not required); other encodings are also acceptable, for example, Russian Windows-1251, KOI-8. Often, non-Latin letters are not used in tags; in this case, UTF-8 is a very convenient encoding - the volume is usually smaller than with UTF-16; decoding can be performed both for the entire document and for specific attributes and texts; the entire document does not contain prohibited characters when parsing is attempted with incorrect encoding.

Root element

The most important mandatory syntactic requirement is that the document has only one root element(English) root element) (also sometimes called document element(English) document element)). This means that the text or other data of the entire document must be located between a single start root tag and its corresponding end tag.

The following simplest example is a well-formed XML document:

The following example is not a valid XML document because it has two root element:

> Entity #1 > > Entity #2 >

A comment

An element can be placed anywhere in the tree - a comment. XML comments are placed inside a special tag starting with the characters. Two hyphens (--) cannot be present within a comment.

Tags inside a comment should not be processed.

Advantages and disadvantages

Advantages

Flaws

Modeling ambiguity.

XML does not have data type support built into the language. It does not have strong typing, that is, the concepts of “integers”, “strings”, “dates”, “booleans”, etc.
The hierarchical data model offered by XML is limited compared to the relational model and the object-oriented graph and network data model.

Displaying XML on the World Wide Web

The three most common ways to convert an XML document into a user-displayable form are:

Applying CSS styles;
Application of XSL;
Writing an XML document handler in any programming language.

To specify an XSL transformation (XSLT) on the client side, the following XML instructions must be present:

XML Dictionaries

Since XML is a fairly abstract language, XML vocabularies have been developed.

A dictionary allows developers to agree on a finite set of tag names and the attributes of those tags. One of the first vocabularies is XHTML, which is understood by most browsers. XHTML is often used to store and edit content in a CMS.

More specialized vocabularies have been created, such as the SOAP data transfer protocol, which is not human-friendly and is quite difficult to read. There are commercial vocabularies such as CommerceML, xCBL and cXML that are used to transfer trade-oriented data, these dictionaries include descriptions of the ordering system, suppliers, products and more.

Usually, when describing a document, a person comes up with some vocabulary for himself, which is then described using DTD, XSD, or simply explained “on the fly” to interested parties.

One of the dictionaries that has become widespread is FB2 - a dictionary that describes the format of a book, with all kinds of footnotes, quotations, even pictures.

XML versions

XML 1.0
XML 1.1

Notes

Literature

David Hunter, Jeff Rafter, Joe Faucett, Eric van der Vlist, etc. XML. Working with XML, 4th Edition = Beginning XML, 4th Edition. - M.: “Dialectics”, 2009. - 1344 p. - ISBN 978-5-8459-1533-7
David Hunter, Jeff Rafter and others. XML. Basic course = Beginning XML. - M.: Williams, 2009. - 1344 p. - ISBN 978-5-8459-1533-7
Robert Tabor. Implementation of XML Web services on the Microsoft .NET platform = Microsoft .NET XML Web Services. - M.: Williams, 2002. - 464 p. - ISBN 0-672-32088-6

Links

XML on the World Wide Web Consortium (W3C) website
Official XML 1.0 Standard Specification

Official XML 1.1 Standard Specification
XML documentation on the IBM website articles, forums

World Wide Web Consortium Standards
Recommendations	Canonical XML CDF CSS DOM Geolocation API HTML ITS MathML OWL P3P PLS RDF (Schema) SISR SKOS SMIL SOAP SRGS SSML SVG SPARQL Timed Text VoiceXML WSDL XForms XHTML XHTML+RDFa XInclude XLink XML(Base Encryption Events Information Set namespace Schema Signature) XPath / 1.0 / 2.0 XPointer XProc XQuery XSL XSL-FO XSLT (elements) XUP
Notes	XAdES XHTML+SMIL
Work projects
Guidelines	Web Content Accessibility Guidelines
Initiative	Multimodal Interaction Activity Markup Validation Service Web Accessibility Initiative
Deprecated
Organizations
BY
Conferences	IW3C2 World Wide Web Conference WWWC1

Semantic Web
Basics	The World Wide Web · Internet · Hypertext · Database · Semantic networks · Ontologies · Description logic
Subsections	Linked Data · Data Web · Hyperdata · Dereferenceable URIs · Rule bases · Data Spaces
Applications	Semantic wiki · Semantic publishing · Semantic search · Semantic computing · Semantic advertising · Semantic reasoner · Semantic matching · Semantic mapper · Semantic broker · Semantic analytics · Semantic service oriented architecture
Related Topics

Description of the structure of XML documents.

Each XML document carries information about the data and its structure (metadata description).

XML documents can be of two types:

1. documents created taking into account logical and structural rules;

2. documents that do not use any rules other than syntactic rules for the design of XML documents.

Documents of the first type are checked for compliance with specified rules by an XML processor. The second type of document is checked by the developer.

When creating a document of the first type, a description of its structure can be performed using languages such as Document Type Definitions (DTD), XML Schema, RELAX NG, XML Data-Reduced, etc. The most widely used languages are DTD and XML Schema.

The following analyzes the strengths and weaknesses of the most common structure description languages and provides a summary of their fundamentals. Since this tutorial is devoted to the problems of information systems integration, when considering structure description languages, the main attention will be paid to the issues of modularity and reuse of schemes.

XML Schema Definition (XSD) language.

XML Schema Definition (XSD) is based on XML and is more capable of describing document structure than DTD. It supports data typing, namespaces, regular expressions.

XML Schema contains a description of the elements and attributes of an XML document, rules for inheritance of elements, including the order and number of children, the content type of elements, data types of elements and attributes, values of elements and attributes, and additional restrictions on values. In addition, the use of XML Schema provides the transformation of an XML document into a hierarchy of objects of certain types that can be accessed programmatically using an interface (PSV1 functionality).

The main advantage of the XML Schema language is its support for strongly typed data. When exchanging data between different applications and databases, the task of agreeing on data types always remains relevant, since the definitions of data types may differ in different systems. These differences include: maximum and minimum possible values, maximum length, support for fractional numbers, internal encoding, and external format (for example, for date and time). Thus, despite the possible overlap in the names of data types, their implementation in different products may differ. The use of data types in schemas allows for the necessary verification of document data when exchanging or sharing data among multiple systems.

This tutorial is not a detailed guide to the XML Schema language, so here we will limit ourselves to only the basic information about the XSD language that is necessary to understand the subsequent material.

XML Schema is always created in a separate file with the xsd extension. The XML file is associated with the corresponding schema using the schemaLocation attribute of the schema namespace. To use the schemaLocation attribute, you must define a schema namespace. All of these definitions are specified in the root element of the XML document.

Let's look at the main elements of the XML Schema structure.

The root element is always the element . Description of element attributes is given in table. 2.10.

Root element may contain the following child elements:

1. - used to define elements of an XML document;

2. - used to define the attributes of an XML document;

3. - necessary to define a group of elements intended for reuse within the scheme by reference to the group name;

4. - used to define attributes of a group of elements;

5. - allows you to include documentation in an XML document;

6. - allows the use of components of the specified external circuit in the main circuit (provides circuit modularity);

7. - adds all components of the specified external circuit to the main circuit (provides circuit modularity);

8. - contains a definition of the notation that describes the format of non-XML data in an XML document;

9. - overrides components of an external schema that has the same namespace as the main schema;

10. - declares the element's simple content type. Elements with a simple data type can only contain character data and cannot include attributes or child elements;

11. - declares a complex element content type, which can include attributes and other elements.

XML Schema supports three main categories of data types:

1. predefined primitive types - fundamental data types that can be referenced and applied to elements and attributes. Examples of primitive data types are String, Float, Double, Time, Date, Decimal, AnyURI;

2. predefined derived types - built-in types derived from primitive types. Examples of derived data types are Integer, Long, Byte, Short, nonPositiveInteger, nonNegativeInteger, ID, etc.;

3. non-standard types- user-defined data types that are created from primitive or derived types by introducing additional restrictions. Support for non-standard data types is extremely useful for data verification taking into account business logic.

To describe elements and attributes that have predefined (primitive and derived) data types, XML Schema uses the following syntactic constructs.

Additionally, for elements and attributes, you can specify fixed or default attributes to specify fixed values of elements/attributes or default values.

If you need to describe a non-standard data type for an element or attribute, then this should be done using the tag , describing a new data type in it.

New non-standard simple data types are obtained by:

1. narrowing (restriction) of a built-in or previously defined simple type by specifying additional restrictions;

2. union of simple types;

3. using a list of simple types.

An example of using a new simple data type obtained by narrowing a predefined type (the base type String is subject to restrictions on the maximum and minimum allowed string length):

An example of using a new simple data type obtained by combining base types (an element or attribute can take non-negative or non-positive integer values):

An example of using a list of simple types (the shoeSizes attribute is declared as a list containing the decimal values 10.5, 9, 8 and 11):

The XML Schema language uses various types of data restrictions (see Table 2.8):

1. length restrictions (number of characters);

2. value boundaries (the largest and smallest values as a range or threshold);

3. restrictions on the number of digits of a decimal number (total number of digits or number of digits after the decimal point);

4. list of acceptable values;

5. templates;

6. Processing of space characters.

Examples of using various restrictions are given in table. 2.11.

Elements that have a simple type or predefined standard types can only contain data (cannot contain attributes or child elements).

Any simple data type can contain an arbitrary set of restrictions, which are determined by the business logic of the application working with the data.

If a simple data type is given a name, then a reference to a new non-standard data type can be used repeatedly within a given schema (similar to a reference to predefined data types).

In this example, a non-standard data type with the name “Code” is defined, based on the “string” type: it is used as the data type for the elements “Code1” and “Code2”.

To describe XML document elements that contain child elements and attributes, the schema uses a complex data type that is specified using the tag .

When describing a complex type, the order of occurrence of child elements is indicated (using special tags - order indicators, see Table 2.11), as well as the degree of cardinality of repeating elements (using the minOccurs and maxOccurs attributes).

minOccurs attribute defines the minimum degree of cardinality, that is, the smallest possible number of repetitions of a child element. A minOccurs value of zero indicates that the element is optional.

maxOccurs attribute defines the maximum degree of cardinality, or the greatest number of repetitions of an element. The maximum and minimum degrees of cardinality are specified by certain values. maxOccurs can be set to unbounded (the element occurs any number of times).

This example describes a complex data type for the “Book” element, which contains child elements “Title”, “Author”, “Code”, “Price”. Tag is an indicator of the order of occurrence of child elements (Table 2.12), and the maxOccurs attribute shows the maximum allowed number of repetitions of the “Author” element.

The choice order indicator indicates that an element of this Price type can contain either a Rubles element or a Dollars element, but not both.

Related information.

Today we will begin to consider a very popular and convenient XML markup language. Since this format for presenting data is very flexible and universal, and it can be used almost anywhere, this means conscientiously with something. Therefore, a novice programmer will sooner or later have to deal with this language, and it doesn’t matter what exactly you do, be it web programming or database administration, because everyone uses XML, and you will also use it to implement the tasks you need.

We will start, as usual, with theory, let's look at what kind of language it is, why it is good, how to use it and where it is used.

XML Language Definition

XML (eXtensible Markup Language) is a universal and extensible data markup language that is independent of the operating system and processing environment. Xml is used to present certain data in the form of a structure, and you can develop this structure yourself or customize it for a particular program or some service. That is why this language is called extensible, and this is its main advantage, for which it is so valued.

As you know, there are quite a lot of markup languages, for example, the HTML language, but all of them, one way or another, depend on the processor, for example, the same html, the code of which the browser parses, is standardized and not extensible, there are clear tags there, syntax that cannot be violated, and in xml you can create your own tags, i.e. your markup. The main difference between HTML and XML is that html just describes the markup for displaying data, and xml is an abstract data structure that can be processed and displayed as you wish and anywhere, and therefore there is no need to compare these languages, they have completely different purposes.

As noted above, xml is a very common and universal language, through which almost all applications, both web and just for the computer, use it as an exchange of information, since with the help of it you can very easily exchange data between applications or services that are even written different languages. In this connection, every novice programmer who is involved in absolutely any programming should have an understanding of XML. If you want to become a web master, then you simply must know XML, and we have already discussed how to become a WEB Master and what you need to know for this.

For example, I once had the task of writing a certain service that should return data in xml form upon request, i.e. a kind of development of the server part of the application, and I had no idea what the client that would process this data was written in, and that I wrote a service that returned the data in xml form and that’s it, the application worked perfectly. And this is just an example that I had to deal with, but now imagine how many different organizations collaborate and conscientiously develop software and exchange data, and I would not be surprised that this data will be in xml form.

For example, I once had a task to write a certain service that should return data in xml form upon request, i.e. a kind of development of the server part of the application, and I had no idea what the client that would process this data was written in, and that I wrote a service that returned the data in xml form and that’s it, the application worked perfectly. And this is just an example that I had to deal with, but now imagine how many different organizations collaborate and conscientiously develop software and exchange data, and I would not be surprised that this data will be in xml form.

Also, I once had to store xml data in a MS SQL 2008 database in order to better represent this data and exchange it between the server and the client part of the application, we discussed this in the article - Transact-sql - working with xml.

The XML language itself is very simple, and it is simply impossible to get confused in it; all the complexity arises precisely in the processing and interaction of XML with other applications, technologies, i.e. everything that surrounds xml, which is where you can easily get confused.

Today we are talking only about the basics of XML, and we will not focus on technologies for processing and interacting with this language, since this is true, very voluminous material, but I think in the future we will continue to get acquainted with related technologies.

Let's move on to practice. And I will write all the examples that we will consider in Notepad++ only because it is very convenient, but we will not talk about this now, since we have already discussed this in the article - Why Notepad++ is good for a novice developer.

XML tags

XML language uses tags ( tags are case sensitive), but not the same tags as in html, but those that you come up with yourself, but the xml document also has a clear structure, i.e. there is an opening tag and a closing tag, there are nested tags and, of course, there are values that are located in these tags. In other words, all you need for basic knowledge of xml is just to adhere to these rules. Together, the opening, closing tag and value are called an element, and the entire xml document consists precisely of elements that together form a data structure. An xml document can only have one root element, remember this, because if you write two root elements, it will be an error.

And it's time to give an example of xml markup, and the first example for now is for syntax:

<Начало элемента> <Начало вложенного элемента>Nested element value

As you can see, everything is quite simple, and there can be a lot of such elements nested within each other.

Now let's give an example of a real xml document:

As you can see, I just gave an example of a kind of book catalog here, but I did not declare this document, i.e. I didn’t write an XML declaration that tells the application that will process this data that the XML data is located here and in what encoding it is presented. You can also write comments and attributes, so let's give an example of such a document:

Book 1 Ivan Just book 1 Book 2 Sergey Just book 2 Book 3 Novel Just book 3

Where the first line is the declaration that this is an XML document and must be read in UTF-8 encoding.

This data without processing will look, for example, in a browser (Mozilla Firefox) as follows:

I hope you understand that here catalog is the root element, which consists of the book elements, which in turn consists of the name, author and comment elements, and for the example, I also set several attributes for the catalog element and the book element.

For the basics, I think that's enough, because if we dive deeper and deeper into XML, and into all the technologies that are associated with this language, then this article will never end. So that's all for today. Bye!