A special markup language for text documents is called. HTML document markup language. Logical and visual markup

markup languages) is a set of special instructions, called tags, designed to form a structure in documents and define the relationships between various elements of this structure. In other words, markup shows which part of the document is a heading, which is a subtitle, what should be considered the name of the author, etc. Markup is divided into stylistic markup, structural and semantic. Stylistic markup

Stylistic markup is responsible for the appearance of the document. For example, in HTML this type of markup includes tags such as (italics), (bold), (underline), (strikethrough text), etc.

Structural marking

Structural markup defines the structure of the document. In HTML, for example, tags (paragraph), (title), (section), etc. are responsible for this type of markup.

Semantic markup

Semantic markup informs the content of data. Examples of this type of markup are the tags (document name), (code, used for code listings), (variable), (author's address).

The basic concepts of any markup language are tags, elements, and attributes.

Tags and elements.

The meanings of tags and elements are often confused.

Tags, or control descriptors as they are also called, serve as instructions for the program that displays the contents of the document on the client side on what to do with the contents of the tag. In order to highlight the tag relative to the main content of the document, angle brackets are used: the tag begins with a less-than sign (), inside which the name of the instructions and their parameters are placed. For example, in HTML the tag indicates that the text that follows should be in italics.

An element is tags together with their content. The following construction is an example of an element:

This text is in italics .

The element consists of an opening tag (in our example this is the tag ), tag content (in the example this is the text "This is text in italics") and the closing tag(), although sometimes in HTML, the closing tag can be omitted.

Attributes

In order to set any parameters that clarify the characteristics of this element when defining an element, attributes are used.

Attributes consist of a name = value pair that can be specified when defining an element in the start tag. You can leave spaces to the left and right of the equals symbol. The attribute value is specified as a string enclosed in single or double quotes.

Any tag can have an attribute if that attribute is defined.

When the attribute is used, the element takes the following form:

tag content

Text is aligned to the center

One opening tag can contain multiple attributes, for example:

Specified text size and color

History of the development of markup languages.

The concept of hypertext was introduced by W. Bush in 1945, and starting in the 60s, the first applications using hypertext data began to appear. However, this technology received its main development when a real need arose for a mechanism for combining multiple information resources, providing the ability to create and view non-linear text.

In 1986, ISO approved the Standardized Generalized Markup Language. This language is intended for creating other markup languages; it defines a valid set of tags, their attributes and the internal structure of the document. Thus, it is possible to create your own tags related to the content of the document. It now becomes obvious that such documents are difficult to interpret without the markup language definition, which is stored in the Document Type Definition (DTD). The DTD groups all the rules of the language in the SGML standard. In other words, the DTD describes the relationship of tags with each other and the rules for their use. Moreover, for each class of documents, its own set of rules is defined that describe the grammar of the corresponding markup language. Thus, only with the help of a DTD can one verify the correct use of tags and, therefore, it must be sent along with the SGML document or included in the document.

At that time, in addition to SGML, there were several other similar languages ​​competing with each other, but the popularity (HTML, which is one of its descendants) gave SGML an undeniable advantage over its counterparts.

Using SGML, you can describe structured data, organize information contained in documents, and present this information in some standardized format. But because of its complexity, SGML was used primarily to describe the syntax of other languages, and few applications worked with SGML documents directly. SGML is usually used only in large projects, for example, to create a unified document management system for a large company.

HTML markup language is much simpler and more convenient than SGML, its instructions are primarily intended to control the process of displaying document content on the screen. HTML as a way to mark up technical documents was created by Tim Berners-Lee in 1991 specifically for the scientific community. It was originally just one of the SGML applications.

Despite the fact that the only thing HTML can do is classify parts of a document and ensure its correct display in the browser, it is the most popular markup language. This is because HTML is quite easy to learn. All you have to do is learn the HTML commands. The DTD for HTML is stored in the browser. In addition, it should be noted that HTML is designed to work on a wide variety of platforms. But it has a number of significant limitations:

  • HTML has a fixed set of tags, and this set cannot be expanded or changed;
  • HTML language tags show only how the data should be presented, that is, the appearance of the document. HTML does not carry information about the meaning of the content contained in the tags or the structure of the document.
  • In early February 1998, the international organization W3C approved the Extensible Markup Language (XML) 1.0 specification, which marked the beginning of the development of many new markup languages ​​for transmitting information over the Internet based on the XML standard. In essence, this meant a new step in the development of hypertext markup languages. Over the four years of its existence, XML has not only attracted quite a lot of attention from both ordinary users and many web designers, but has also become an integral part of the Internet. Today there are practically no servers that do not, to one degree or another, use this technology as an analogue of HTML. However, it is still at least premature to say that XML is now becoming the main method of transmitting hypertext over the global network. The language itself is still quite young, and some of its elements are still under development. So far, only a general framework has been created for what, perhaps, will replace HTML in the future, but in what specific form it will be is impossible to say yet.

    From start

    In November 1990, when Internet users first heard about a new technology, the name of which could easily fit into just three letters, almost no one could imagine that very little time would pass and this technology would become practically the only way to transmit information on the global network. Today, for many inexperienced users, the word Internet is strongly associated with WWW, although in fact these things are, of course, related to each other, but still a little different.

    By and large, it was the incredible popularity of the World Wide Web and its integral part, HTML, that certainly became the reason for the extremely increased attention to the structures of hypertext markup of documents.

    The concept of hypertext was first introduced by V. Bush back in 1945. However, real applications using such data structures began to be used only from the 60s, and a truly extraordinary surge of activity around this technology began only when there was a real need for a mechanism for combining multiple information resources, providing the ability to create and view non-linear text. And an example of the implementation of this mechanism was the very same WWW.

    The document markup language itself is a set of special instructions called tags (in some translated publications, tags are called shortcuts), designed to create a structure in documents and define relationships, respectively, between the various elements of this structure. Markup language tags, or control descriptors as they are sometimes called, are encoded in such documents in a very specific way, allocated relative to the main content of the document, and then serve as instructions for the program that interprets and displays the contents of the document, in fact, to the person who it is viewed, if you try to find analogies with the Internet, then this someone is the client, and the interpreter program in the most common case is the browser). Already in the very first systems, it was decided to use the symbols “” to designate these commands, inside which the names of instructions and their parameters were placed. Today, this method of naming tags is a generally accepted standard.

    The very use of hypertext breakdown of a text document in modern information systems is largely due to the fact that hypertext allows you to create a mechanism for the so-called nonlinear viewing of information. This means that in systems, data is not presented as a continuous stream of text structures, but as a set of interconnected components, which are navigated using hyperlinks.

    The most popular and well-known hypertext markup language today, HTML, was created specifically for structuring and transmitting information located on the Internet, and is undoubtedly a key component of WWW technology. With the use of the hypertext document model, the way of presenting various information resources on the network has become more orderly, and users have received a convenient mechanism for searching and viewing the necessary information. However, the first sign in this matter is still considered to be a much older language - SGML.

    SGML (Standard Generalized Markup Language) was officially adopted in 1986 as an international standard (ISO 8879:1986) for describing input/output device and environment independent methods for representing textual information in electronic form. The basis for its creation was the rather old markup language GML (Generalized Markup Language), developed by IBM back in the days of the first personal computers. To be precise, SGML is a metalanguage designed to describe other markup languages.

    Originally, the word markup was typically used to describe annotations or other markings within text that were intended to instruct the document writer, or "layout designer" as it is sometimes called, exactly how a particular passage should be typed. Such methods may include squiggly underlining to indicate italics, some special icons to skip certain phrases or print them in a specific font, and so on. As formatting and printing became automated over time, the term encompassed all kinds of special markup codes that were inserted into electronic text documents to control formatting, printing, or other processing.

    A markup language thus refers to a set of formatting conventions that are used to encode blocks of text. The markup language must clearly indicate what markup is acceptable in a given document, what markup is required, how to distinguish its elements from plain text, and what the markup means. SGML was able to solve the first three problems, the solution to the last one required the presence of an informal description.

    SGML, unlike all other markup languages ​​created on its basis, uses the principle of so-called descriptive markup instead of procedural markup. Such a system uses markup elements that simply provide names to assign individual parts of a document to certain categories. In other words, tags such as Or \end(list) simply identify a portion of a document and state that “this portion is a paragraph,” or that “this portion is the end of a started list,” etc. A system that uses procedural markup (this includes word processors, for example, Microsoft Word) determines what kind of processing will be performed at a specific point in a text document: “at this place, call such and such a procedure with parameters 5, e and z” or “move border of the document 7 mm to the right relative to any element, skip one line, start the next one from the red line, etc. In SGML, the instructions that are needed to process a document for some specific purpose (for example, formatting) are clearly separated from the descriptive markup that occurs within the document. They are usually collected outside the document in separate procedures or programs.

    By using descriptive rather than procedural markup, the same document can be processed by different programs, each of which can apply its own processing instructions to those parts of it that it deems important. For example, a content parsing program might ignore footnotes entirely, while a formatting program might extract and assemble them for printing at the end of each part. Different kinds of processing instructions may be associated with the same part of the file. For example, one program might extract people's names and place names from a document to create an index or database, while another program processing the same text might print the names in a different font.

    SGML also introduces the concept of a document type, and, accordingly, ways of defining it (document type definition, DTD). Documents are considered typed, just like other computer-processed objects. The type of document is formally determined by its constituent parts and their structure. For example, one might define a document type such that it consists of a title and perhaps an author's name, followed by an abstract and a sequence of one or more paragraphs. Any document lacking a title, according to this formal definition, will not be a report, any more than a sequence of paragraphs followed by an abstract will be, no matter how report-like the document may be from the point of view of a human reader. .

    Because documents are of known types, you can use a special program called a parser to process a document that claims to be of a particular type and check whether all the elements required for that document type are present and found. in the correct sequence and correctly structured. More importantly, different documents of the same type can be processed in a uniform manner. It is possible to write programs that use the knowledge contained in the information structure of a document, which can thus be more intelligent.

    SGML, as a metalanguage, allows the definition of specific languages ​​(often called "SGML applications") that target specific applications. An example of this is the HTML language, widely used on the WWW. Each such language is described in the form of a DTD, defining elements and their attributes. Once given such a DTD, SGML software can correctly process documents written according to that DTD.

    Even in the project, this language was conceived specifically to implement the model of information transfer to the global network that we have now. In other words, HTML is a product of the Internet. Although, in fact, HTML is a simplified version of the Standard Generalized Markup Language - SGML (Standard Generalized Markup Language), which was approved by ISO as a standard back in the 80s of the last century. SGTML is not a language in its pure form, but rather a set of some rules and descriptions for creating other languages; it defines a valid set of tags, their attributes and the internal structure of the document. Control over the correct use of descriptors is carried out using a special set of rules called DTD descriptions, which are used by the client interpreter program when parsing the document. For each class of documents, its own set of rules is defined that describe the grammar of the corresponding markup language. Using SGML, you can organize the information contained in documents, describe structured data, and present this information in some standardized format for subsequent use. However, due to some of its complexity, SGML was used mainly to describe the syntax of other languages ​​(the most famous of which is HTML), and few applications worked with SGML documents directly.

    HTML is a much more convenient and easy-to-use language than SGML. It does not allow additional languages ​​to be defined on its basis. Using HTML involves marking up a document according to a standard, which is defined by a fairly limited set of instructions or tags. Such instructions are intended, first of all, to control the process of displaying the contents of a document on the screen of a client program and thereby determine the method of presenting the document, but not its overall structure. In most cases, HTML data is represented in a plain text file that can be easily transferred over the network using the http protocol.

    However, as time goes on and places increasingly stringent demands on popular technologies, modern applications need not only a language for presenting data on the client screen, but also a mechanism that allows one to determine the structure of a document and describe the elements it contains. HTML has a simple set of commands and quite successfully copes with the task of describing text information and displaying it on the screen of a viewing program - a browser. However, the displayed data itself is in no way related to the tags that are used for formatting, so parsing programs do not have the ability to use HTML tags to find the document fragments we need. Those. having encountered, for example, such a description

    rose

    The viewer will know what color to display the text contained within the tags and, most likely, will display it correctly, but it is absolutely indifferent to where in the document this tag is found, what other tags the current fragment is enclosed in, whether there are fragments nested in it, whether the relationships between objects are constructed correctly. This “indifference” to the structure of a document leads to the fact that searching or analyzing information inside it will be no different from working with a continuous text file that is not broken into elements. And this, as you know, is not the most effective way to work with information.

    Another significant drawback of the idea itself, implemented in HTML, is the limited set of its tags. DTD rules for HTML define a fixed set of descriptors and therefore the developer does not have the opportunity to enter his own, special tags. Although new language extensions appear from time to time (today the latest version of HTML is HTML 4.0), the long road to their standardization, accompanied by constant disagreements between the main browser manufacturers, makes it almost impossible to quickly adapt the language, its use for displaying specialized information (for example, multimedia, mathematical, chemical formulas, etc.).

    To summarize all that has been said, it can be argued that HTML today does not fully satisfy the requirements imposed by modern developers for languages ​​of this kind. And to replace it, a new hypertext markup language was proposed: a powerful, flexible, and, at the same time, convenient XML language.

    XML (Extensible Markup Language) is a markup language that describes an entire class of data objects called XML documents. This language is used as a means to describe the grammar of other languages ​​and to control the correctness of documents. Those. XML itself does not contain any tags intended for markup, it simply defines the order in which they are created. So if, for example, we think we need to use a tag to represent the rose element in a document, then XML allows us to freely use the tag we define, and we can include snippets like the following in the document:

    rose

    The set of tags can be easily expanded. If, suppose, we also want to indicate that the description of the flower should meaningfully go inside the description of the greenhouse in which it blooms, then we simply set new tags and choose the order in which they appear:

    rose

    If we want to plant a few more flowers there, we must make the following changes:

    rose

    tulip

    cactus

    As you can see, the process of creating an XML document is very simple and requires us only to have basic knowledge of HTML and an understanding of the tasks that we want to perform using XML as a markup language. This gives developers the unique ability to define custom commands that allow them to most effectively define the data contained in a document. The author of the document creates its structure, builds the necessary connections between elements, using those commands that satisfy his requirements, and achieves the type of markup that he needs to perform the operations of viewing, searching, and analyzing the document.

    Another obvious advantage of XML is the ability to use it as a universal query language for information repositories. Today, in the depths of the W3C, a working version of the XML-QL (or XQL) standard is being considered, which may in the future become a serious competitor to SQL. In addition, XML documents can act as a unique way of storing data that includes both the means for parsing information and presenting it on the client side. In this area, one of the promising areas is the integration of Java and XML technologies, which makes it possible to use the power of both technologies when building machine-independent applications that also use a universal data format for information exchange.

    XML also allows you to control the correctness of data stored in documents, check hierarchical relationships within a document, and establish a unified standard for the structure of documents, the content of which can be a variety of data. This means that it can be used when building complex information systems, in which the issue of information exchange between different applications running in the same system is very important. By creating a structure for an information exchange mechanism at the very beginning of work on a project, a manager can save himself in the future from many problems associated with the incompatibility of data formats used by various components of the system.

    Also, one of the advantages of XML is that XML document processing programs are simple, and today all kinds of software products designed to work with XML documents are freely distributed. XML is supported today in all browsers of the Microsoft Internet Explorer family, starting from version 4.0. It was announced that it would be supported in subsequent versions of Netscape Communicator, Oracle DBMS, DB-2, and MS-Office applications. All this gives reason to assume that, most likely, in the near future, XML will become the main information exchange language for information systems, thereby replacing HTML. Well-known specialized markup languages ​​such as SMIL, CDF, MathML, XSL have already been created on the basis of XML, and the list of working drafts of new languages ​​under consideration by the W3C is constantly growing.

    What does an XML document look like?

    If you're familiar with HTML, learning XML won't take much effort on your part. Although XML is certainly very different in its capabilities and intent from HyperText Markup Language, both languages ​​are subsets of SGML, and therefore inherit its basic principles.

    Document structure

    A simple XML document might look like Example 1

    First

    Second subparagraph 1

    Third

    Last

    Please note that this document is very similar to a regular HTML page. Just like in HTML, instructions enclosed in angle brackets are called tags and serve to mark up the body of the document. In XML, there are opening, closing, and empty tags (in HTML, the concept of an empty tag also exists, but no special designation is required).

    The body of an XML document consists of markup elements and the actual content of the document - data (content). XML tags are designed to define document elements, their attributes and other language constructs. We will talk in more detail about the types of markup used in documents a little later.

    Any XML document must always begin with an instruction, inside which you can also specify the language version number, code page number, and other parameters necessary for the analyzer program to parse the document.

    Rules for creating an XML document

    In general, XML documents must satisfy the following requirements:

    The document header contains an XML declaration that specifies the document's markup language, version number, and additional information.

    Each opening tag that defines some data area in the document must have its own closing “partner”, i.e., unlike HTML, closing tags cannot be omitted.

    XML is case sensitive.

    All attribute values ​​used in tag definitions must be enclosed in quotation marks.

    The nesting of tags in XML is strictly controlled, so it is necessary to monitor the order of opening and closing tags.

    All information between the start and end tags is treated as data in XML, and therefore all formatting characters are taken into account (ie spaces, line breaks, tabs are not ignored as in HTML).

    If an XML document does not violate the above rules, then it is called formally correct and all analyzers designed to parse XML documents will be able to work with it correctly.

    However, in addition to checking for formal compliance with the grammar of the language, the document may contain means of control over the content of the document, over compliance with the rules that determine the necessary relationships between elements and form the structure of the document. For example, the following text, although a perfectly valid XML document, will be completely meaningless:

    Russia Novosibirsk

    In order to ensure the correctness of XML documents is checked, it is necessary to use analyzers that perform such checking and are called verifiers.

    Today, there are two main ways to control the correctness of an XML document: DTD definitions (Document Type Definition) and data schemas (Semantic Schema). We'll talk more about using DTDs and schemas next time. Unlike SGML, defining DTD rules in XML is not necessary, and this circumstance allows us to create any XML documents without racking our brains over the rather complicated DTD syntax.

    The basic principle

    An element is the basic structural unit of an XML document. By enclosing the word rose in the tags, we define a non-empty element called , whose content is rose. In the general case, the content of elements can be simply some text, or other nested document elements, CDATA sections, processing instructions, comments, i.e. almost any part of an XML document.

    Any non-empty element must consist of a start tag, an end tag, and the data enclosed between them.

    The set of all elements contained in a document defines its structure and determines all hierarchical relationships. Using elements, a flat data model is transformed into a complex hierarchical system with many possible relationships between elements.

    When subsequently searching a document, the client program will rely on the information embedded in its structure - using the elements of the document. Those. if, for example, you want to find the right university in the right city, then you will need to view the contents of a specific element located inside a specific element. The search in this case, naturally, will be much more effective than finding the desired sequence throughout the entire document.

    In an XML document, as a rule, at least one element is defined, called the root, and parsers begin scanning the document from this element. In the example above, this element is .

    In some cases, tags can change and clarify the semantics of certain fragments of a document, defining the same information in different ways and thereby providing the application that analyzes this document with information about the context of use of the described data. For example, after reading the Holliwood fragment, we can guess that this part of the document is about a city, but in the Holliwood fragment it is about a diner.

    Conclusion

    The Web page formatting language HTML was originally introduced as an application of SGML. Later, with the rapid development of the WWW, HTML began to expand in every possible way in order to give the author more control over the external presentation of information. New elements and attributes, such as or , focused on visual formatting. Tools that are not part of the markup language itself appeared and began to be actively used: imagemaps, Java and JavaScript, plugins, etc. There are also many HTML elements that are supported only by certain browsers, or that work differently in different browsers. Therefore, it is now difficult to say whether HTML is an SGML application or not. Very few pages are created according to HTML specifications and corresponding DTDs.

    This problem is partly intended to be alleviated by cascading styles, the standard for which has been adopted by the W3 consortium. CSS1 separates the style that defines the visual appearance of elements from the element's markup.

    Of great interest is the XML language, which is supposed to replace HTML as the markup language for Web pages. This is a variant of SGML, aimed primarily at use on the WWW. It does not require a DTD, and the language itself is simplified due to rarely used complex structures. This will make parsers simple, which will make it possible to actively use XML in browsers. (The likelihood of which is quite high, given the nods of both major players in the browser field towards XML).


    PRINTABLE VERSION>>
    Article read:once.

    (Standard Generalized Markup Language), presented in the ISO 8879 standard. This language is adopted as the main language for the design of technical documentation, including interactive electronic technical manuals for products created using CALS technologies.

    SGML defines the structure of documents as a sequence of data objects. Data objects representing parts of a document can be stored in different files. The SGML standard establishes a set of symbols and rules for representing information that allow various systems to correctly recognize and identify this information. These sets are described in a separate part of the document called the DTD declaration(Document Type Decfinition), which is transmitted along with the main SGML document. The DTD specifies the correspondence of characters and their codes, the maximum lengths of the identifiers used, the way delimiters for tags are represented, other possible conventions, the DTD syntax, and the document type and version. Therefore, SGML can be called a metalanguage for a family of specific markup languages. In particular, XML markup languages ​​can be considered subsets of SGML and HTML.

    The technical description in the form of an SGML document includes:

    • main file with technical manual, marked with SGML tags;
    • description of entities, if the document belongs to a group in which the same entities are used and their fame is implied;
    • a dictionary to explain SGML tags;

    However, SGML is difficult to learn and use. Therefore, for the widespread use of markup in documents submitted to the WWW-technologies, in 1991, a simplified HTML language was developed based on SGML(HyperText Markup Language), and in 1996 the XML language(eXtensible Markup Language), which, in combination with HTML, becomes the main language for presenting documents in various applications.

    The HTML language was developed for the widespread use of markup in documents presented in WWW technologies.

    An HTML description consists of ASCII text and a sequence of commands (control codes) included in it, also called descriptors or tags. This text is called an HTML document, or an HTML page, or when posted on a Web server, a Web page.. Tags are placed in the right places in the source text; they determine fonts, hyphens, the appearance of graphics, links, etc. When using WWW editors, commands are inserted by simply pressing the appropriate keys.

    XML, like HTML, is considered a subset of SGML. Currently, the XML language claims to be the main document presentation language in information technology; it can be considered as a metalanguage that serves as the basis for creating private markup languages ​​in various applications. At the same time, XML is more convenient than SGML, which is ensured by the elimination of some minor features of SGML in XML. Descriptions in XML are easier to understand and adapted for use in modern browsers while maintaining the core features of SGML.

    For specific applications, their own versions of XML are created, called XML dictionaries or XML applications. Thus, an XML application OSD (Open Software Description) has been developed to describe texts with specific mathematical symbols. Of interest to CALS is the Product Definition eXchange (PDX) option dedicated to data exchange. There are dictionaries for chemistry (CML - Chemical Markup Language), biology (BSML - Bioinformatic Sequence Markup Language), etc.

    Any document has three components:

    · structure;

    Content is the information that is displayed in the document. The content of a document on paper can be purely textual and also contain images. If a document is presented in electronic form, it may contain multimedia data, as well as links to other documents. Although the contents of different documents vary, they can be classified into types, such as a book or a train ticket.

    The style of a document determines the form in which its contents will be displayed on a particular device (for example, a printer or display). The concept of style includes the characteristics of the font (name, size, color) of the entire output document or its individual blocks, the order of pagination, the location of blocks on pages and other parameters. The same document can be output in different styles, both on different media, and on the same media.

    Document markup languages ​​are artificial languages ​​designed to describe the structure of a document and the relationships between various objects of the structure. Markup data is also called metadata.

    The first markup language is GML (Generalized Markup Language), developed by IBM employees back in the 60s of the last century. Its immediate successor was the SGML language (Standard Generalized Markup Language), which defines the rules for writing document markup elements. A document that follows the rules of a language is called an SGML document.

    The SGML language is defined in the ISO 8879 standard, which specifies the following basic requirements for document markup language:

    · The language must be human readable.

    · marked up document files must be text and encoded using ASCII code characters (American Standard Code for Information Interchange). However, the content of the document does not have to be ASCII encoded or text.

    SGML and similar languages ​​use special document markup tools:

    · elements and accompanying attributes;

    · entities;

    · comments.

    The structural unit of an SGML document is the element. In marked-up text, each element must be highlighted in a certain way. Selection is performed by inserting a start tag (from the English word tag - label) at the beginning of the element (start tag) and an end tag (end tag) at the end of the element. The start and end tags have the same name. To distinguish tags from plain text, they must begin with a character to indicate the beginning of a tag and end with a character to indicate the end of a tag. In addition, a symbol is specified in the end tag - a sign of the end tag. In SGML, any characters can be specified as such characteristics, but most often the character “” (left angle bracket) is used as the beginning of a tag, and the character “/” (slash) is used as the end tag character. Elements in an SGML document can enclose other elements, resulting in a graphical representation of the SGML document as a hierarchical (tree) structure.


    Example 4.3.1. An SGML document specifying a list of students with the results of their examination session can be specified as follows:

    List of student assessments in the session

    Ivanov Ivan Ivanovich

    TS-61

    A

    B

    B

    B

    Petrov Petr Petrovich

    TS-62

    C

    C

    D

    C

    In this document, the first element is the student-list element. This element contains one title element (title) and several student elements (student data). In turn, each student element contains one full-name element (last name, first name and patronymic of the student), one group-number element (group number) and one mark-list element (list of student grades in the session). And finally, the mark-list element contains several mark elements (score).

    A graphical representation of this list in Fig. 4.3.1 has a tree structure:

    Rice. 4.3.1. SGML Document Structure in Graphical Representation

    Attributes can be used to refine SGML elements. Attributes are written in the element's start tag as follows:

    attribute-name="attribute-value".

    An element can have multiple attributes specified. Attributes are separated from each other and the element name by at least one space.

    Example 4.3.2. For the mark elements in example 4.3.1, you can set the subject attribute, the value of which is the name of the discipline in which the exam was taken. Then for the first student the elements will take the following form:

    A

    B

    B

    B

    Languages ​​such as SGML use entities to work with groups of data. An entity is any named data, both text and non-text. When viewing a document, the entity name is replaced with its value. So, for example, the name of the text entity kpi will be replaced by its value: Kiev Polytechnic Institute, and the non-text entity image1 will be replaced by an image named image1.

    markup languages) is a set of special instructions, called tags, designed to form a structure in documents and define the relationships between various elements of this structure. In other words, markup shows which part of the document is a heading, which is a subtitle, what should be considered the name of the author, etc. Markup is divided into stylistic markup, structural and semantic. Stylistic markup

    Stylistic markup is responsible for the appearance of the document. For example, in HTML this type of markup includes tags such as (italics), (bold), (underline), (strikethrough text), etc.

    Structural marking

    Structural markup defines the structure of the document. In HTML, for example, tags (paragraph), (title), (section), etc. are responsible for this type of markup.

    Semantic markup

    Semantic markup informs the content of data. Examples of this type of markup are the tags (document name), (code, used for code listings), (variable), (author's address).

    The basic concepts of any markup language are tags, elements, and attributes.

    Tags and elements.

    The meanings of tags and elements are often confused.

    Tags, or control descriptors as they are also called, serve as instructions for the program that displays the contents of the document on the client side on what to do with the contents of the tag. In order to highlight the tag relative to the main content of the document, angle brackets are used: the tag begins with a less-than sign (), inside which the name of the instructions and their parameters are placed. For example, in HTML the tag indicates that the text that follows should be in italics.

    An element is tags together with their content. The following construction is an example of an element:

    This text is in italics .

    The element consists of an opening tag (in our example this is the tag ), tag content (in the example this is the text "This is text in italics") and the closing tag(), although sometimes in HTML, the closing tag can be omitted.

    Attributes

    In order to set any parameters that clarify the characteristics of this element when defining an element, attributes are used.

    Attributes consist of a name = value pair that can be specified when defining an element in the start tag. You can leave spaces to the left and right of the equals symbol. The attribute value is specified as a string enclosed in single or double quotes.

    Any tag can have an attribute if that attribute is defined.

    When the attribute is used, the element takes the following form:

    tag content

    Text is aligned to the center

    One opening tag can contain multiple attributes, for example:

    Specified text size and color

    History of the development of markup languages.

    The concept of hypertext was introduced by W. Bush in 1945, and starting in the 60s, the first applications using hypertext data began to appear. However, this technology received its main development when a real need arose for a mechanism for combining multiple information resources, providing the ability to create and view non-linear text.

    In 1986, ISO approved the Standardized Generalized Markup Language. This language is intended for creating other markup languages; it defines a valid set of tags, their attributes and the internal structure of the document. Thus, it is possible to create your own tags related to the content of the document. It now becomes obvious that such documents are difficult to interpret without the markup language definition, which is stored in the Document Type Definition (DTD). The DTD groups all the rules of the language in the SGML standard. In other words, the DTD describes the relationship of tags with each other and the rules for their use. Moreover, for each class of documents, its own set of rules is defined that describe the grammar of the corresponding markup language. Thus, only with the help of a DTD can one verify the correct use of tags and, therefore, it must be sent along with the SGML document or included in the document.

    At that time, in addition to SGML, there were several other similar languages ​​competing with each other, but the popularity (HTML, which is one of its descendants) gave SGML an undeniable advantage over its counterparts.

    Using SGML, you can describe structured data, organize information contained in documents, and present this information in some standardized format. But because of its complexity, SGML was used primarily to describe the syntax of other languages, and few applications worked with SGML documents directly. SGML is usually used only in large projects, for example, to create a unified document management system for a large company.

    HTML markup language is much simpler and more convenient than SGML, its instructions are primarily intended to control the process of displaying document content on the screen. HTML as a way to mark up technical documents was created by Tim Berners-Lee in 1991 specifically for the scientific community. It was originally just one of the SGML applications.

    Despite the fact that the only thing HTML can do is classify parts of a document and ensure its correct display in the browser, it is the most popular markup language. This is because HTML is quite easy to learn. All you have to do is learn the HTML commands. The DTD for HTML is stored in the browser. In addition, it should be noted that HTML is designed to work on a wide variety of platforms. But it has a number of significant limitations:

  • HTML has a fixed set of tags, and this set cannot be expanded or changed;
  • HTML language tags show only how the data should be presented, that is, the appearance of the document. HTML does not carry information about the meaning of the content contained in the tags or the structure of the document.