Theories of relational databases. Relational database theory: normalization, relations and joins. Ministry of Education and Science of the Russian Federation

Annotation: This and the next two lectures are devoted to the theory of relational databases. Since the whole relational approach to database organization is purely practical, this theory is mainly pragmatic. The main problem that relational database theory aims to solve is discovering the useful properties of certain database schemas and developing ways to construct such schemas. This problem is commonly referred to briefly as the relational database design problem.

Introduction

Despite its practical orientation, relational database theory is an independent scientific field in which many famous researchers have worked (and continue to work), whose names will appear in our lectures. We did not plan to describe in detail the main results in the field in this course. Our goal is to provide only the definitions and statements necessary for a general understanding of the process relational database design based on normalization.

Since the most important properties of relational databases from a practical point of view are based on the concept functional dependence, we have included a brief discussion of relevant theoretical issues in a separate lecture. Among these questions, closures and covering sets of functional dependencies, Armstrong's axioms and Heath's theorem on the sufficient condition lossless relationship decomposition. The concepts and statements of this lecture are really necessary for mastering the material in Lecture 7, but we also sought to demonstrate to readers with simple examples what it is relational database theory, what is its level of complexity and how intuitive it is.

Note that we did not separate theoretical material concerning multivalued dependencies And connection dependencies. This was done for two reasons. Firstly, these types of dependencies are less common in modeling subject area using databases. Therefore, we considered it sufficient to present within Lecture 8 only the basics of the relevant theoretical material. Secondly, although the theory multivalued dependencies And connection dependencies, in fact, not much more complicated than the theory functional dependencies, its definitions and statements are too cumbersome for this course.

Functional dependencies

The most important from a practical point of view normal forms of relationships are based on fundamental relational database theories concept functional dependence. For further presentation we will need several definitions and statements (we will explain and illustrate them as we proceed).

General definitions

Let it be given relation variable r , and X and Y are arbitrary subsets of the r header ("composite" attributes).

In meaning relation variable r attribute Y is functionally dependent on attribute X if and only if each value of X corresponds to exactly one value of Y. In this case it is also said that the attribute X functionally defines attribute Y (X is the determinant ( determinant) for Y, and Y is dependent on X). We will denote this as r.X->r.Y .

For example, we will use the relation EMPLOYEE_PROJECTS (SLUN_NOM, SLU_NAME, SLU_ZARP, PRO_NOM, PROJECT_RUK)(Fig. 6.1). Obviously, if SLU_NOM is primary key of the relationship EMPLOYEES, then for this relationship it is fair functional dependence (FD) SLN_NAME->SERV_NAME .

In fact, for the body of the relationship EMPLOYEES_PROJECTS in the form in which it is shown in Fig. 6.1, the following FDs (1) are also executed:


Rice. 6.1.

SLUN_NOM->SLUN_NAME SLUN_NAME->SLUN_ZARP SLU_NOM->PRO_NOM SLUN_NOM->PROJECT_RUK (SLUN_NAME, SLU_NAME)->SLUN_ZARP (SLUN_NAME, SLUN_NAME)->PRO_NOM (SLUN_NOM, SLUN_NAME)->(SLUN_ZARP, PRO_NAME) … PRO_NAME ->PROJECT_RUK and etc.

Since the names of all employees are different, the following FDs (2) are also satisfied:

SERV_NAME->SERV_NAME SER_NAME->SLU_ZARP SER_NAME->PRO_NAME, etc.

Moreover, for the example in Fig. 6.1 is satisfied and FD (3):

SLU_ZARP->PRO_NOM

However, note that the nature of the FD group (1) differs from the nature of the FD groups (2) and (3). It is logical to assume that employee identification numbers should always be different, and each project has only one manager. Therefore, the FDs of group (1) must be true for any valid value relation variable EMPLOYEES_PROJECTS and can be considered as invariants, or integrity constraints this relation variable.

Group FDs (2) are based on the less natural assumption that all employees have different names. This is true for the example in Fig. 6.1, but it is possible that over time the FD groups (2) will not be satisfied for any value relation variable EMPLOYEES_PROJECTS.

Finally, the FD of group (3) is based on the very unnatural assumption that no two employees involved in different projects receive the same salary. Again, this assumption is true for the example in Fig. 6.1, but most likely this is a coincidence.

In the future we will be interested only in those functional dependencies, which must be satisfied for all possible values relationship variables.

Note that if attribute A of a relation r is a possible key, then for any attribute B of this relation it always holds

PROGRAMMING IN DELPHI 6 ENVIRONMENT

Database. Create a report using Word.

Approved by the Editorial and Publishing Council

university as a laboratory workshop

Voronezh 2004


UDC 681.3

Vorobyov E.I., Korotkevich D.E.. Programming in the Delphi 6 environment: Laboratory workshop: Part 2: Databases. Create a report using Word. Streams. Voronezh: Voronezh. state tech. Univ., 2004. 107 p.

The second part of the laboratory workshop discusses theoretical and practical information for writing programs in the Delphi 6 environment on the topic: “Designing databases, creating reports in Word and using threads when creating high-performance applications.”

The publication meets the requirements of the State educational standard of higher professional education in the direction 230100 “Informatics and Computer Science”, specialty 230104 “Computer-Aided Design Systems”, discipline “Programming in High-Level Languages”.

Table 3. Il. 19. Bibliography: 7 titles.

Scientific editor: Dr. Tech. sciences, prof. Ya.E. Lvovich

Reviewers: Department of Computer Science, Voronezh Forestry Academy (head of the department, Doctor of Technical Sciences, Prof. V.E. Mezhov);

Dr. Tech. sciences, prof. O.Yu.Makarov

© Vorobyov E.I., Korotkevich D.E., 2004

© Design. Voronezh State

Technical University, 2004


Introduction

Database concept

Databases are considered the main advantage of Delphi. Even specialized languages ​​for working with databases (such as MS Visual FoxPro) are clearly inferior in the simplicity and power of programming this type of application. Delphi hides all the complexity and at the same time gives the greatest power. There has never been a task that could not be implemented in Delphi in a short period of time. And the main thing is that all this is implemented very conveniently and easy to understand. In Delphi you can create simple applications, even with complex databases, without a single line of code. This tutorial covers laboratory tasks for mastering techniques for working with local databases.

Relational database theory

Ten years ago, database programming was a very difficult task. Nowadays it’s hard to imagine, because thanks to Delphi the process of writing programs has been simplified, and the number of database varieties is already in the dozens.

Databases are divided into local (installed on the client’s computer, where the program runs) and remote (installed on the server, a remote computer). Server databases are located on a remote computer and run under the control of server software. Their main advantages include the ability to work with one database simultaneously by several users, and at the same time there is minimal load on the network. There are also network databases that create too much load on the network and are inconvenient to use for both the programmer and the end user. When a program connects to a network database, it downloads an almost complete copy of it from the server. If you made changes, your copy will be completely downloaded back. This is very inconvenient because it creates a large load on the network due to excessive data transfer. In client-server technology, the client program sends a simple text request to the server to receive some data. The server processes it and returns only the required portion of data. When you need to change some data, a request is again sent to the server to change it, and the server changes the data in its database. Thus, mainly only text requests are transferred over the network, which generally take up less than a kilobyte. All data is processed by the server, which means that the client’s machine is loaded much less and is not so demanding on resources. The server sends the client only the most necessary data, which means there is no unnecessary downloading of a copy of the entire database. Thanks to all this, network databases are already outdated and practically not used. They are almost completely replaced by client-server technology. But local databases will always live. The format of their storage may change or some new functions may be added, but the databases themselves will exist. For further consideration, we need to define a new concept - table. So far only general principles have been discussed, so the general concept has been used databases. A database table is like a two-dimensional array in which data is arranged in a column (a prime example of a table is Excel). A database is, roughly speaking, just a file that can store from one to several tables. Most local databases can only store one table (dBase, Paradox, XML). But there are representatives of local databases, where several tables are contained in one file (for example, Access).

Local databases

Among local databases, let's consider relational ones as the most common. What is a relational database? This is a table in which the columns are the names of the data stored in it, and each row stores the data itself. A database table is similar to an Excel spreadsheet (to be more precise, Excel stores its data in a proprietary format built on database technology). Local database tables can be stored on a local hard drive or centrally stored on a network drive on a file server. These files can be copied using standard tools like any other file, because the database tables themselves are not tied to a specific location. The main thing is that the program can find the table. Each table must have one unique field that will uniquely identify the row. This field is called the key field. These fields are very often used to link several tables together. But even if the table is not related, the key field is still required. It is advisable to use a numeric type as a key, and if the database allows, it will be better if it is of the “autoincrement” type (automatically increasing/decreasing number or counter). Column names in a database table must also be unique, but in this case not necessarily numeric. They can be called whatever you want, as long as it is unique and understandable. Each column (database field) must have a specific type. The number of types and their varieties depend on the type of database, for example, the dBASE format (files with the DBF extension) supports only 6 types, and Paradox already supports up to 15. The database can be stored in one file (Access) or in several (Paradox, dBase). More precisely, table data is always stored in one file, but additional information can be located in separate files. Additional information may include indexes, constraints, or a list of default values ​​for specific fields. If at least one of the files becomes corrupted or is deleted, the data may become unavailable for editing.

What's happened indices? Very often, data from tables undergoes some kind of changes, so before you edit any row, you need to find it. Even static tables used as reference books are also subject to search operations before displaying the requested data. Search is a rather time-consuming operation, especially if the table contains a lot of rows. Indexes are aimed at speeding up this procedure, and can also be used as a starting point for sorting. At this stage, it is enough to know that an unindexed field cannot be ordered.

If you need some table to be ordered by the field " Surname", then this field must first be indexed. Then you just need to indicate that the table should now work with such and such an index, and it will be sorted automatically.

In a well-designed database, data redundancy is eliminated and the likelihood of storing inconsistent data is minimized. Thus, the creation of databases has two main goals: to reduce data redundancy and increase their reliability.

The life cycle of any software product, including a database management system, consists (largely) of the stages of design, implementation and operation.

Naturally, the most significant factor in the life cycle of a database application is the design stage. The performance of the system and its information richness, and hence its lifetime, depend on how carefully the structure of the database is thought out and how clearly the connections between its elements are defined.

Database requirements

So, a well designed database:

1. Satisfies all user requirements for database content. Before designing a database, it is necessary to conduct extensive research into the user requirements for database functionality.

2. Ensures data consistency and integrity. When designing tables, you need to define their attributes and some rules that limit the possibility of the user entering incorrect values. To verify data before directly writing it into a table, the database must call the rules of the data model and thereby ensure that the integrity of the information is maintained.

3. Provides a natural, easy-to-understand structuring of information. High-quality database construction allows you to make queries to the database more “transparent” and easier to understand; Consequently, the likelihood of entering incorrect data is reduced and the quality of database maintenance is improved.

4. Meets users' database performance requirements. With large volumes of information, issues of maintaining productivity

begin to play a major role, immediately “highlighting” all the shortcomings of the design stage.

The following points represent the basic steps of database design:

1. Determine the information needs of the database.

2. Analyze real world objects that need to be modeled in the database. From these objects, form entities and characteristics of these entities (for example, for the “part” entity, the characteristics can be “name”, “color”, “weight”, etc.) and form a list of them.

3. Match the entities and characteristics - tables and columns (fields) in the notation of the DBMS you have chosen (Paradox, dBase, FoxPro, Access, Clipper, InterBase, Sybase, Informix, Oracle, etc.).

4. Define the attributes that uniquely identify each object.

5. Develop rules that will establish and maintain data integrity.

6. Establish connections between objects (tables and columns), normalize tables.

7. Plan for issues of data reliability and, if necessary, maintaining the secrecy of information.


Related information.


Relational algebra is based on set theory and is the basis of database logic.
When I was just studying the structure of databases and SQL, a preliminary familiarization with relational algebra greatly helped further knowledge to fit into my head correctly, and I will try to make this article have a similar effect.

So if you are going to start your studies in this area or you are just interested, please click on the cat.

Relational database

First, let's introduce the concept of a relational database in which we will perform all actions.

A relational database is a collection of relationships that contain all the information that must be stored in the database. In this definition, we are interested in the term relation, but for now we will leave it without a strict definition.
Let's imagine a table of products better.

PRODUCTS table

ID NAME COMPANY PRICE
123 Cookies Dark Side LLC 190
156 Tea Dark Side LLC 60
235 Pineapples OJSC “Frukty” 100
623 Tomatoes OOO "Vegetables" 130

The table consists of 4 rows, a row in the table is a tuple in relational theory. A set of ordered tuples is called a relation.
Before defining a relationship, let's introduce another term - domain. Domains in relation to a table are columns.

For clarity, we now introduce a strict definition of a relation.

Let N sets D1,D2, … be given. Dn (domains), a relation R over these sets is the set of ordered N-tuples of the form , where d1 belongs to D1, etc. The sets D1,D2,..Dn are called domains of the relation R.
Each element of the tuple represents the value of one of the attributes corresponding to one of the domains.

Keys in relationships
In a relation, the requirement is that all tuples must be distinct. To uniquely identify a tuple, there is a primary key. A primary key is an attribute or set of the minimum number of attributes that uniquely identifies a particular tuple and contains no additional attributes.
The implication is that all attributes in the primary key must be necessary and sufficient to identify a particular tuple, and omitting any of the attributes in the key will make it insufficient for identification.
For example, in such a table the key will be a combination of attributes from the first and second column.

DRIVERS table

It can be seen that an organization can have several drivers, and in order to uniquely identify the driver, both the value from the “Organization Name” column and from the “Driver Name” column are required. Such a key is called a composite key.

In a relational database, tables are interconnected and related to each other as master and subordinate tables. The connection between the main and subordinate tables is carried out through the primary key of the main table and the foreign key of the subordinate table.
A foreign key is an attribute or set of attributes that is the primary key in the main table.

This preparatory theory will be sufficient to get acquainted with the basic operations of relational algebra.

Operations of relational algebra

The basic eight operations of relational algebra were proposed by E. Codd.
  • An association
  • Intersection
  • Subtraction
  • Cartesian product
  • Sample
  • Projection
  • Compound
  • Division
The first half of the operations are similar to the same operations on sets. Some operations can be expressed in terms of other operations. Let's look at most of the operations with examples.

For understanding, it is important to remember that the result of any algebra operation on relations is another relation, which can then be used in other operations.
Let's create another table that will be useful to us in the examples.

SELLERS table

ID SELLER
123 OOO “Dart”
156 OJSC "Vedro"
235 CJSC “Vegetable Baza”
623 JSC "Firm"

Let's agree that in this table ID is a foreign key associated with the primary key of the PRODUCTS table.

First, let's look at the simplest operation - the name of the relationship. Its result will be the same relation, that is, by performing the PRODUCTS operation, we will receive a copy of the PRODUCTS relation.

Projection
A projection is an operation in which attributes from a relation are extracted only from the specified domains, that is, only the necessary columns are selected from the table, and if several identical tuples are obtained, then only one instance of such a tuple remains in the resulting relation.
For example, let's make a projection on the PRODUCTS table by selecting ID and PRICE from it.

Operation syntax:
π (ID, PRICE) PRODUCTS

In the sample condition, we can use any Boolean expression. Let's make another selection with a price greater than 90 and product ID less than 300:

σ(PRICE>90^ID<300) PRODUCTS

Multiplication
Multiplication or Cartesian product is an operation performed on two relations, as a result of which we obtain a relation with all domains from the two initial relations. The tuples in these domains will be all possible combinations of tuples from the initial relations. It will be clearer with an example.

We obtain the Cartesian product of the PRODUCTS and SELLERS tables.
Operation syntax:

PRODUCTS × SELLERS
You will notice that these two tables have the same ID domain. In this situation, domains with the same name are prefixed with the name of the corresponding relationship, as shown below.
For brevity, let’s multiply not the complete ratios, but the samples with the condition ID<235

(the same tuples are highlighted in color)

PRODUCTS.ID NAME COMPANY PRICE SELLERS.ID SELLER
123 Cookies Dark Side LLC 190 123 OOO “Dart”
156 Tea Dark Side LLC 60 156 OJSC "Vedro"
123 Cookies Dark Side LLC 190 156 OJSC "Vedro"
156 Tea Dark Side LLC 60 123 OOO “Dart”

For an example of using this operation, imagine the need to select sellers with prices less than 90. Without the product, it would be necessary to first obtain product IDs from the first table, then using these IDs from the second table to obtain the necessary SELLER names, and using the product the following query would be:

π (SELLER) σ (RODUCTS.ID=SELLERS.ID ^ PRICE<90) PRODUCTS × SELLERS

As a result of this operation we obtain the relation:

SELLER
OJSC "Vedro"
Connection and natural connection
The join operation is the inverse of the projection operation and creates a new relation from two existing ones. A new relation is obtained by concatenating the tuples of the first and second relations, while relations in which the values ​​of the specified attributes coincide are subject to concatenation. Specifically, if you connect the PRODUCTS and SELLERS relationships, these attributes are the attributes of the ID domains.

Also, for clarity, you can imagine a connection as the result of two operations. First, the product of the source tables is taken, and then from the resulting relation we make a selection with the condition of equality of attributes from the same domains. In this case, the condition is the equality of PRODUCTS.ID and SELLERS.ID.

Let's try to connect the relations PRODUCTS and SELLERS and get a relation.

PRODUCTS.ID NAME COMPANY PRICE SELLERS.ID SELLER
123 Cookies Dark Side LLC 190 123 OOO “Dart”
156 Tea Dark Side LLC 60 156 OJSC "Vedro"
235 Pineapples OJSC “Frukty” 100 235 CJSC “Vegetable Baza”
623 Tomatoes OOO "Vegetables" 130 623 JSC "Firm"

A natural join receives a similar relation, but if we have a correctly configured schema in the database (in this case, the primary key of the PRODUCTS ID table is linked to the foreign key of the SELLERS ID table), then the resulting relation contains only one ID domain.

Operation syntax:
PRODUCTS ⋈ SELLERS;

You get this relation:

PRODUCTS.ID NAME COMPANY PRICE SELLER
123 Cookies Dark Side LLC 190 OOO “Dart”
156 Tea Dark Side LLC 60 OJSC "Vedro"
235 Pineapples OJSC “Frukty” 100 CJSC “Vegetable Baza”
623 Tomatoes OOO "Vegetables" 130 JSC "Firm"
Intersection and subtraction.
The result of the intersection operation will be a relation consisting of tuples that are completely included in both relations.
The result of the subtraction will be a relation consisting of tuples that are tuples of the first relation and not tuples of the second relation.
These operations are similar to the same operations on sets, so I think there is no need to describe them in detail.
Information sources
  • Basics of using and designing databases - V. M. Ilyushechkin
  • course of lectures Introduction to Databases - Jennifer Widom, Stanford University

I would be grateful for reasoned comments

Briefly about the important things.

Database normalization

First normal form (1NF)

  • no duplicate data groups
  • the atomicity of the data is guaranteed (all data is autonomous and independent).

At the top level, this is achieved by creating a primary key, then moving repeating groups of data to new tables, creating primary keys for these tables, and so on. In addition, you must split all records whose columns contain composite information into separate rows for each piece of column data.

Second normal form (2NF)

  • the table satisfies the conditions of 1NF
  • each column depends on the entire key, not part of it.

Third normal form (3NF)

  • the table satisfies the conditions of 2NF
  • no column depends on a column that is not part of the primary key
  • does not contain derived data

Other normal forms that do not have much practical value:

Boyce-Codd normal form

Option 3NF. Designed to solve a situation where there are many overlapping candidate keys. In fact, there is no logical justification outside the academic community.

Fourth normal form

Designed to resolve issues with multivalued dependencies. Such situations arise if, in a table reduced to 3NF, one column of a composite primary key depends on another primary key column.

Fifth normal form

Used when working with decomposition of relationships with and without losses. It arises in a situation where it is possible to split one relationship into several different relationships, but after that we will no longer be able to logically return it to its original form.

Sixth normal form (domain key normal form)

Ensures that there are no modification anomalies in the database. In real conditions it is practically unattainable.

Relationship.

I once heard from women that men
immediately try to leave the room in which
The word "relationship" was heard.<...>the key to success
relationships is everyone's awareness of their role
in this regard, as well as the rules and restrictions,
imposed by this relationship.
(C) Robert Viera, “Professional SQL Server 2000 Programming”

Types of Relationships

  • One-to-one (makes sense when you need to store matching data in different databases or when the maximum row data size is exceeded)
  • Zero-or one-to-one
  • One-to-many
  • One to -zero, -one or -many
  • Many-to-many (junction tables)

Associations

INNER JOIN

Exclusive join. The selection result includes only those table records that have matches in the paired table for the given condition.

LEFT|RIGHT JOIN

Inclusive join. The selection result includes records from the table to the left/right of JOIN respectively. In this case, the data from the missing “paired” record will be filled in NULL.
FROM left_table LEFT JOIN right_table– all records from the left table are included left_table
FROM left_table RIGHT JOIN right_table– all records from the right table right_table are included

FULL JOIN

Inclusive join. The selection result includes not only records that have a match in the other table, but also records from both tables for which no match was found in the other table. In this case, the data from the missing “paired” record will be filled with NULL.

CROSS JOIN

Cross union (Cartesian product). Every record from one table is matched to every record from another table. The number of resulting records is equal to the product of the number of records in both tables.

Principles for arranging several JOIN's

If you need to join several tables, you need to remember two principles:

  1. All unions to the left JOIN treated as a single table to include or exclude from a query.
  2. All unions are to the RIGHT JOIN ALSO treated as a single table to include or exclude from a query.

A corollary of these principles is the following recommendation for the formation of complex associations:

  • Wherever possible, you should use INNER JOIN.
  • If there is a need to use OUTER JOINs, they should be placed last, and INNER JOINs should be placed at the beginning of the join.

P.S. All of the above are general “postulates” of the theory of relational databases, not tied to the features of certain DBMSs.

A data model is a set of data structures and operations for their processing. Using a data model, you can visually represent the structure of objects and the relationships established between them. Data model terminology is characterized by the concepts of “data element” and “binding rules”. A data element describes any set of data, and association rules define algorithms for interconnecting data elements. To date, many different data models have been developed, but in practice three main ones are used. There are hierarchical, network and relational data models. Accordingly, they talk about hierarchical, network and relational DBMSs.

O Hierarchical data model. Hierarchically organized data is very common in everyday life. For example, the structure of a higher education institution is a multi-level hierarchical structure. A hierarchical (tree) database consists of an ordered set of elements. In this model, initial elements give rise to other elements, and these elements in turn give rise to further elements. Each child element has only one parent element.

Organizational structures, lists of materials, tables of contents in books, project plans, and many other sets of data can be presented in a hierarchical form. The integrity of links between ancestors and descendants is automatically maintained. Basic rule: no child can exist without its parent.

The main disadvantage of this model is the need to use the hierarchy that was the basis of the database during design. The need for constant reorganization of data (and often the impossibility of this reorganization) led to the creation of a more general model - a network model.

O Network data model. The network approach to data organization is an extension of the hierarchical approach. This model differs from the hierarchical one in that each generated element can have more than one generating element. ■

Because a network database can directly represent all kinds of relationships inherent in the data of the corresponding organization, this data can be navigated, explored and queried in various ways, that is, the network model is not bound by just one hierarchy. However, in order to make a request to a network database, it is necessary to delve deeply into its structure (have the schema of this database at hand) and develop a mechanism for navigating the database, which is a significant drawback of this database model.

O Relational data model. The basic idea of ​​a relational data model is to represent any set of data as a two-dimensional table. In its simplest form, a relational model describes a single two-dimensional table, but more often than not, the model describes the structure and relationships between several different tables.

Relational data model

So, the purpose of the information system is to process data about objects real world, taking into account connections between objects. In database theory, data is often called attributes, and objects - entities. Object, attribute and connection are fundamental concepts of I.S.

An object(or essence) is something that exists and distinguishable, that is, an object can be called that “something” for which there is a name and a way to distinguish one similar object from another. For example, every school is an object. Objects are also a person, a class at school, a company, an alloy, a chemical compound, etc. Objects can be not only material objects, but also more abstract concepts that reflect the real world. For example, events, regions, works of art; books (not as printed products, but as works), theatrical performances, films; legal norms, philosophical theories, etc.

Attribute(or given)- this is a certain indicator that characterizes a certain object and takes a certain numeric, text or other value for a specific instance of the object. The information system operates with sets of objects designed in relation to a given subject area, using specific attribute values(data) of certain objects. For example, let's take classes in a school as a set of objects. The number of students in a class is a datum that takes on a numerical value (one class has 28, another has 32). The class name is a given one that takes a text value (one has 10A, another has 9B, etc.).

The development of relational databases began in the late 60s, when the first works appeared that discussed; the possibility of using familiar and natural ways of presenting data - the so-called tabular datalogical models - when designing databases.

The founder of the theory of relational databases is considered to be an IBM employee, Dr. E. Codd, who published an article on June 6, 1970 A Relational Model of Data for Large-Shared Data Banks(Relational data model for large collective data banks). This article was the first to use the term “relational data model.” The theory of relational databases, developed in the 70s in the USA by Dr. E. Codd, has a powerful mathematical basis that describes the rules for effectively organizing data. The theoretical framework developed by E. Codd became the basis for the development of the theory of database design.

E. Codd, being a mathematician by training, proposed using the apparatus of set theory (union, intersection, difference, Cartesian product) for data processing. He proved that any set of data can be represented in the form of two-dimensional tables of a special kind, known in mathematics as “relations”.

Relational A database is considered to be one in which all data is presented to the user in the form of rectangular tables of data values, and all operations on the database are reduced to manipulations with the tables.

The table consists of columns (fields) And lines (records); has a name that is unique within the database. Table reflects Object type real world (entity), and each of her string is a specific object. Each table column is a collection of values ​​for a specific attribute of an object. The values ​​are selected from the set of all possible values ​​for an object attribute, which is called domain.

In its most general form, a domain is defined by specifying some base data type to which the elements of the domain belong, and an arbitrary Boolean expression applied to the data elements. If you evaluate a Boolean condition on a data item and the result is true, then that item belongs to the domain. In the simplest case, a domain is defined as a valid potential set of values ​​of the same type. For example, the collection of the birth dates of all employees constitutes the “birthdate domain,” and the names of all employees constitute the “employee name domain.” The birthdate domain must have a point-in-time data type, and the employee name domain must have a character datatype.

If two values ​​come from the same domain, then a comparison can be made between the two values. For example, if two values ​​are taken from the domain of birth dates, you can compare them and determine which employee is older. If the values ​​are taken from different domains, then their comparison is not allowed, since, in all likelihood, it does not make sense. For example, nothing definite will come of comparing an employee's name and date of birth.

Each column (field) has a name, which is usually written at the top of the table. When designing tables within a specific DBMS, it is possible to select for each field its type, that is, to define a set of rules for its display, as well as to determine the operations that can be performed on the data stored in this field. Sets of types may vary between different DBMSs.

The field name must be unique in the table, but different tables can have fields with the same name. Any table must have at least one field; The fields are located in the table in accordance with the order in which their names appeared when it was created. Unlike fields, strings do not have names; their order in the table is not defined, and their number is logically unlimited.

Since the rows in the table are not ordered, it is impossible to select a row by its position - there is no “first”, “second”, or “last” among them. Any table has one or more columns, the values ​​of which uniquely identify each of its rows. Such a column (or combination of columns) is called primary key. An artificial field is often introduced to number records in a table. Such a field, for example, could be its ordinal field, which can ensure the uniqueness of each record in the table. The key must have the following properties.

Uniqueness. At any given time, no two different relation tuples have the same value for the combination of attributes included in the key. That is, there cannot be two rows in the table that have the same identification number or passport number.

Minimalism. None of the attributes included in the key can be excluded from the key without violating uniqueness. This means that you should not create a key that includes both the passport number and the identification number. It is enough to use any of these attributes to uniquely identify a tuple. You should also not include a non-unique attribute in the key, that is, using a combination of an identification number and an employee’s name as a key is prohibited. By excluding the employee's name from the key, each row can still be uniquely identified.

Every relation has at least one possible key, since the totality of all its attributes satisfies the condition of uniqueness - this follows from the very definition of the relation.

One of the possible keys is randomly selected in as the primary key. The remaining possible keys, if any, are taken as alternative keys. For example, if you select an identification number as the primary key, then the passport number will be the alternate key.

The relationship of tables is the most important element of the relational data model. It is supported foreign keys.

When describing a relational database model, different terms are often used for the same concept, depending on the level of description (theory or practice) and the system (Access, SQL Server, dBase). In table 2.3 provides a summary of the terms used.

Table 2.3. Database Terminology

Database theory____________ Relational databases_________ SQL Server __________

Relation Table Table

Tuple Record Row

AttributeField_______________Column

Relational Databases

Relational database is a set of relationships containing all the information that must be stored in the database. That is, the database represents a set of tables necessary to store all the data. The tables of a relational database are logically related to each other. The requirements for designing a relational database in general can be reduced to several rules.

О Each table has a unique name in the database and consists of rows of the same type.

O Each table consists of a fixed number of columns and values. More than one value cannot be stored in a single row column. For example, if there is a table with information about the author, publication date, circulation, etc., then the column with the author's name cannot store more than one last name. If the book is written by two or more authors, you will have to use additional tables.

O At no point in time will there be two rows in the table that duplicate each other. Rows must differ in at least one value in order to be able to uniquely identify any row in the table.

О Each column is assigned a unique name within the table; a specific data type is set for it so that homogeneous values ​​are placed in this column (dates, last names, telephone numbers, monetary amounts, etc.).

O The complete information content of a database is represented as explicit values ​​of the data itself, and this is the only method of representation. For example, relationships between tables are based on the data stored in the corresponding columns, and not on the basis of any pointers that artificially define relationships.

О When processing data, you can freely access any row or any column of the table. The values ​​stored in the table do not impose any restrictions on the order in which the data is accessed. Description of the columns,