What does relational mean? Features, structure and terms associated with the relational model


IN relational databases The data is stored in the form of tables consisting of rows and columns. Each table has its own, predefined set of named fields. Columns in relational database tables can contain scalar data of a fixed type, such as numbers, strings, or dates. Tables in a relational database can be related in a one-to-one or one-to-many relationship. The number of rows of records in the table is unlimited, and each record corresponds to a separate entity.

Relational databases now occupy a dominant position. Hierarchical and network database structures are a thing of the past, giving way to relational databases, for which most modern DBMSs are built (MS SQL Server, MS Access, InterBase, FoxPro, PostgreSQL, Paradox and others).

Details

The relational model focuses on organizing data in the form of two-dimensional tables. Each relational table is two-dimensional array and has the following properties:

  • Each table element is one data element
  • Each column has its own unique name
  • There are no identical rows in the table
  • All columns in the table are homogeneous, that is, all elements in the column are of the same type
  • The order of rows and columns can be arbitrary

Relational DBMSs, focused on implementing operational data processing systems, are less efficient in tasks analytical processing than multidimensional databases. This is due, firstly, to the presence of fairly strict restrictions imposed by the existing implementation of the SQL language. An example of such a real-life constraint is the assumption that the data in a relational database is unordered (or more precisely, randomly ordered). At the same time, their ordering requires additional time spent on sorting each time the database is accessed. IN analytical systems Data entry and retrieval is carried out in large portions. In turn, the data, once it enters the database, remains unchanged for long period time. And here it is more effective to store data in the form of partially denormalized tables, in which not only detailed, but also pre-calculated aggregated values ​​can be stored to increase performance. And for navigation and sampling, specialized addressing and indexing methods, based on the assumption of low variability and low mobility of data in the database, can be used. This method of organizing data is sometimes called pre-computed, thereby emphasizing its difference from the normalized relational approach, which involves dynamic calculation various types results (aggregation) and establishing connections between details from different tables (join operations).

Main disadvantages

In addition to the low efficiency, which was mentioned earlier, the disadvantages of traditional relational DBMSs include the fact that as the main and, often, the only mechanism that provides quick search and selecting individual rows in a table (or in tables linked through foreign keys), various modifications of indexes based on B-trees are usually used. This solution is effective only when processing small groups of records and high intensity of data modification in databases.

Relational database management systems may never go away, but their days of dominance are certainly numbered, says Paul Creel, who wrote an article about it in InfoWorld in September 2011. He quotes analyst Robin Blore, who argues that the architecture of relational DBMSs is obsolete, since it was created in a bygone era and does not meet modern requirements.

Relational DBMSs still dominate financial transaction processing systems, but today companies are increasingly using DBMSs of the new NoSQL architecture - horizontally scalable, distributed and developed in open source. Examples of such systems are Hadoop, MapReduce and VoltDB. According to Forrester analysts, about 75% of data in enterprises is either semi-structured information (XML, Email and EDI), or unstructured (text, images, audio and video), and only 5% of this data is stored in relational DBMSs, and the rest is stored in other types of databases or in the form of files, and is not subject to processing by relational systems.

According to Blore, relational DBMSs “may die without anyone noticing” - for example, if Oracle simply replaces the SQL engine with NoSQL in its DBMS. The analyst believes that one of the existing columnar DBMSs could become such a mechanism.

RELATIONAL DATABASE AND ITS FEATURES. TYPES OF RELATIONS BETWEEN RELATIONAL TABLES

Relational database is a collection of interconnected tables, each of which contains information about objects of a certain type. A table row contains data about one object (for example, a product, a customer), and the table columns describe various characteristics these objects - attributes (for example, name, product code, customer information). Records, i.e. table rows, have the same structure - they consist of fields that store object attributes. Each field, i.e. column, describes only one characteristic of the object and has a strictly defined data type. All records have the same fields, only they display different information properties of the object.

In a relational database, each table must have a primary key - a field or combination of fields that uniquely identifies each row in the table. If a key consists of several fields, it is called composite. The key must be unique and uniquely identify the entry. Using the key value, you can find a single record. Keys also serve to organize information in the database.

Relational database tables must meet the requirements for normalizing relationships. Normalization of relations is a formal apparatus of restrictions on the formation of tables, which eliminates duplication, ensures consistency of data stored in the database, and reduces labor costs for maintaining the database.

Let a Student table be created containing the following fields: group number, full name, student record number, date of birth, specialty name, faculty name. Such an organization of information storage will have a number of disadvantages:

  • duplication of information (the name of the specialty and faculty is repeated for each student), therefore, the volume of the database will increase;
  • the procedure for updating information in the table is complicated due to the need to edit each table entries.

Table normalization is designed to address these shortcomings. Available three normal forms of relationships.

First normal form. A relational table is reduced to first normal form if and only if none of its rows contains more than one value in any of its fields and none of its key fields is empty. So, if you need to obtain information from the Student table by the student’s name, then the Full Name field should be divided into Last Name, First Name, and Patronymic parts.

Second normal form. A relational table is defined in second normal form if it satisfies the requirements of first normal form and all its fields that are not included in the primary key have a full functional dependence on the primary key. To reduce a table to second normal form, it is necessary to determine the functional dependence of the fields. A functional dependence of fields is a dependence in which in an instance of an information object a certain value of a key attribute corresponds to only one value of a descriptive attribute.

Third normal form. A table is in third normal form if it satisfies the requirements of second normal form that none of its non-key fields is functionally dependent on any other non-key field. For example, in the Student table (group number, full name, grade book number, Date of Birth, Headman) three fields - grade book number, group number, Headman are in transitive dependence. The group number depends on the grade book number, and the Headman depends on the group number. To eliminate the transitive dependency, it is necessary to transfer some of the fields of the Student table to another Group table. The tables will take the following form: Student (group number, full name, grade book number, date of birth), Group (group number, Headman).

The following operations are possible on relational tables:

  • Merge tables with the same structure. The result is general table: first first, then second (concatenation).
  • Intersection of tables with the same structure. Result - those records that are in both tables are selected.
  • Subtracting tables with the same structure. Result - those records are selected that are not in the subtracted one.
  • Sample (horizontal subset). Result - records that meet certain conditions are selected.
  • Projection (vertical subset). The result is a relation containing some of the fields from the source tables.
  • Cartesian product of two tables The resulting table's records are obtained by combining each record of the first table with each record of the other table.

Relational tables can be related to each other, hence data can be retrieved from multiple tables simultaneously. Tables are linked to each other in order to ultimately reduce the size of the database. Each pair of tables is connected if they have identical columns.

The following types exist information links:

  • one to one;
  • one-to-many;
  • many-to-many.

One-to-one communication assumes that one attribute of the first table corresponds to only one attribute of the second table and vice versa.

One-to-many communication assumes that one attribute of the first table corresponds to several attributes of the second table.

Many-to-many communication assumes that one attribute of the first table corresponds to several attributes of the second table and vice versa.

Level 1: Level external models- this is the most top level where each model has its own view of the data. This layer defines the database viewpoint of individual applications.

Conceptual level: The central control link, where the database is presented in the most general form, which combines the data used by all applications. In fact, the conceptual level reflects a generalized model of the subject area.

Physical layer(Database): This is the data itself located in files or in page structures located on external storage media.


Data Models

The following data models are distinguished:

1. Infological

2. Date logical

3. Physical

The database design process begins with the design of an information model. An infological data model is a generalized informal description of the created database, made using natural language, mathematical formulas, tables, graphs and other tools that are understandable to all people working on database design.

Domain tuple

The information model reflects the real world in some human-understandable concept, completely independent of the data storage environment. Therefore, the Infology Model should not change until some change in the real world requires a change outside the definition so that the model continues to represent the domain.

There are many approaches to building this model: graph models, semantic networks, entity-connection and others.

Datalogical model

The infological model must be displayed in a datalogical model that is understandable to the DBMS. A datalogical model is a formal description of an information model in the DBMS language.

Hierarchical model

This model is a collection of related elements that form a hierarchical structure. The basic concepts of hierarchy include level, node, and relationship.

communication level


A node is a collection of data attributes that describe an object. Each node is connected to one node for more than high level and with any number of lower level nodes. The exception is the highest level node. The number of trees in the database is determined by the number of tree roots. Each database record has a single path from the root record. A simple example may serve as the Internet domain name system\address. On the first level (the root of the tree) lies our planet earth, on the second the Country, on the third the Region, on the fourth - locality, Street, house, flat. A typical representative is a DBMS from IBM - IMS.

All copies of this type descendant with common copy type of ancestor is called twins. A complete traversal order is defined for the database. From top to bottom and from right to left.

Physical model

A physical model is built based on the datalogical model. The physical organization of data has a major impact on database performance. DBMS developers are trying to create the most productive physical data models, offering users one or another tool to customize the model for a specific database.

Example: In particular for a relational database, it already takes into account:

1. Physical aspects of storing tables in specific files.

2. Creating indexes that optimize the speed of data operations using the application.

3. Execution various actions over data at certain events defined by users using triggers and stored procedures.

Infological models X

Physical models


For all levels and for any presentation method subject area, lies the coding of concepts of relationships between concepts. A key step in the development of any information system is to conduct system analysis:

Formalization of the subject area and representation of the system as a set of components.

Composition as the basis of system analysis can be functional (building a hierarchy).

However, in most systems, when it comes to databases, data types are a more static element than the way they are processed. Therefore, such methods of system analysis as the data flow diagram have received intensive development. Development of relational databases. Stimulated the development of data development methodologies, in particular ER ER diagrams. The relational data model directly uses the concept of relationship as a mapping. She is closest to conceptual model data presentation. And often lies at the heart of it.

Unlike the graph model theorist, in the relational model, connections between relations are implemented in an inexplicit way, for which relation keys are used. For example, relations of a hierarchical type are implemented by the mechanism of primary and foreign keys, when the fact of attributes must be present in the subordinate relation.

Such an attribute of relationships in the main relationship will be called a primary key, and in a subordinate relationship, a secondary one.

Progress in the development of programming languages ​​associated primarily with data typing and the emergence of object-oriented languages ​​has made it possible to approach the analysis of complex systems from the point of view of hierarchical representations, that is, using classes of objects with the properties of polymorphism, inheritance, and encapsulation.

RELATIONSHIP IS A TABLE.

Editing tables, records...

Deleting what you created and

Editing.


Relational database model

Relational data models have currently gained the greatest popularity precisely for this representation of data.

The relational model can be thought of as a special method of representing data that contains its own data (in the form of tables) and ways of working and manipulating them (in the form of relationships). The relational model assumes three conceptual elements: Structure, Integrity and Data Processing. These elements have their own mandatory concepts that need to be explained for further presentation.

The table is considered as a direct data store. Traditionally in relational systems the table is called attitude. A table row is called motorcade, and the column attribute. In this case, the attributes have unique names (within the relation).

The number of tuples in a table is called cardinal number. Number of attributes degree. A unique identifier is established for a relationship, that is, one or more attributes whose values ​​​​are not the same at the same time - the identifier is called primary key.Domain this is the set of valid homogeneous values ​​for a particular attribute. Thus, a domain can be considered as a named set of data, and the components of this set are logically indivisible units (for example, a list of names of employees of an institution can act as a domain, but not all names can be present in the table).

SUMM Kireeva 25.50 Motyleva 17.05 … …. …

Attitude

attributes

The fields KOD, NAME, SUMM are table attributes contained in the header.

Pairs KOD 5216, NAME Kireeva, SUMM 25.50 are elements of the body of the relationship.

In relational databases, unlike other models, the user specifies what data is needed for him and not how to do it. For this reason, the process of moving and navigating a database in relational systems is automatic, and this task is performed in a DBMS optimizer. His job is to make the most effective way retrieve data from the database upon request. Thus, the optimizer at least must be able to determine from which tables the data is selected, how much information is in these tables and what is the physical order of the records in the tables and how they are grouped.

In addition, a relational database also performs directory functions. The directory stores a description of all the objects that make up the database: tables, indexes, triggers, etc. It is obvious that it is vital for proper operation the entire system, such a component as the optimizer. The optimizer uses the information stored in the directory. An interesting fact is that the catalog itself is a set of tables, so the DBMS can manipulate it in traditional ways, without resorting to any special techniques or methods.

Domains and Relationships

Basic definitions: Domains, types of relations, predicates.

Relationships have a number of basic properties:

1. In the most general case, there are no common tuples in relations - this follows from the very definition of relations. However, for some DBMSs, deviations from this property are allowed in some cases. As long as there is a primary key in the relationship, identical tuples are excluded.

2. Tuples are not ordered from top to bottom - there is simply no concept of a positional number in a relation. In relationships, without losing information, you can successfully arrange tuples in any order.

3. Attributes are not ordered from left to right. The attributes in the relationship header can be arranged in any order without compromising the integrity of the data. Therefore, the concept of a positional number in relation to an attribute also does not exist.

4. Attribute values ​​consist of logically indivisible units - this follows from the fact that the values ​​are taken from domains; otherwise, we can say that relations do not contain repetition groups. That is, they are normalized.

Relational systems support several types of relationships:

1. Named ones are relation variables defined in the DBMS by creation operators and, as a rule, necessary for a more convenient presentation of information for the user.

2. Basic relationships are directly important part DB, so when designing they are given their own name.

3. A derived relation is one that was defined through other, usually basic, relations by using DBMS tools.

4. This representation is actually a named derived relation, and the representation is expressed exclusively through DBMS operators applied to named relations, so they do not physically exist in the database.

5. The result of queries is an unnamed derived relation containing data (the result of a specific query). The result is not stored in the database but exists as long as the user needs it.

6. A stored relation is one that is physically maintained in the memory of relations; stored relations most often include the base of relations. Based on the above, we can define a relational database as a set of interconnected relationships.


Contact in in this case is the association of two or more relations.

KOD ADRES
1 1 A one-to-many relationship is that at any given time each element (tuple A) corresponds to several elements of tuples B
∞ Binary connection
Students
Teachers
Timetable of classes

Students

Ternary connections


Data integrity

In relational models, the issue of data integrity is given a special place. Recall that the key or potential clue this is the minimum set of attributes whose values ​​can be used to uniquely find the required tuple; minimality means that excluding any attribute from the set does not allow identifying the tuple by the remaining attributes.

Every relationship has at least one possible key. One of them is taken as the primary key.

When choosing primary key preference should be given to non-composite keys or keys composed of a minimal set of attributes. It is also undesirable to use keys with long text values(It is preferable to use integer attributes as keys). So, to identify an employee, you can use either a unique personnel number, or passport number, or a set of last names, middle names and department numbers. It is not allowed for the primary key of a relationship, that is, any attribute participating in the primary key, to take undefined values. In this case, a contradictory situation will arise ( collision): A non-unique primary key element appears. Therefore, this should be carefully monitored when designing a database.

About foreign keys. It is worth noting that since relation C links relations B and A, it must include foreign keys corresponding to the primary keys of relations A and B.

A table's foreign key is formed using several primary keys of other tables.

Thus, when considering the problem of choosing a method for connecting a relationship in a database, the question arises of what the foreign keys should be. At the same time, for everyone foreign key it is necessary to solve the problem associated with the possibility (or impossibility) of undefined values ​​(NULL – values ​​- value attribute for missing information). In other words, can there be some tuple in a relation for which the tuple in its associated relations is not known?

On the other hand, it is necessary to think in advance about what will happen when removing tuples from a relation referenced by a foreign key. The following possible possibilities exist:

· Operation cascades– that is, deleting tuples in relations leads to deleting tuples associated with the relation. For example, deleting information about last name, first name, etc. employee in one respect leads to the deletion of his salary in another respect;

· Operation limited - that is, only those tuples for which there is no other associated information are removed. Not all information is deleted (not in all respects) since it can be used in another respect, the removal of information in which leads to a violation of data integrity. If such information is available, deletion cannot be carried out, for example, deleting information about the first name, last name, etc. employee is possible only if there is no related information about his salary.

It is necessary to provide technology for what will happen when you try to update the primary key of a relationship that is referenced by a foreign key. Here you have the same options as when deleting:

· The operation is cascaded, that is, when the primary key is updated, the foreign key in the related relation is updated. For example, updating the primary key in a relation where employee information is stored leads to an update of the foreign key in a relation containing salary information.

· The operation is limited to updating only those primary keys for which there is otherwise no associated information. If such information is available, the update cannot be made. For example, updating the primary key in a relation where information about an employee is stored is only possible if information about his salary is missing in the related relation.1


Relational algebra

The formal basis of the relational database model is relational algebra, based on set theory and considering a special operator over relations, and relational calculus based on mathematical logic.

Work

A A A B B C Y Y D
G D
A
A B C Y Y D F F W

It should be noted that relational algebra has great power - complex queries to the database can be expressed using a single expression. It is for this reason that these mechanisms are included in the relational data model. Any query expressed using one relational algebra expression, or one relational calculus formula, can be expressed using one operator in this language.

Relational algebra has important property- it is closed regarding the concept of relationship. This means that a relational algebra expression is performed on the relations of relational databases and the results of their calculation also represent relations.

The main idea of ​​relational algebra is that the means of manipulating relationships considered as a set are based on traditional multiple operations supplemented by some database-specific operations.

Let us describe the version of algebra that was proposed by CODD. The operation consists of 8 main operators:

Relation fetch (unary operation)

Relation projection (unary operation)

· Merging relationships

· Intersection of relations (binary operation)

· Subtraction of ratios

Product of relations

· Connecting relationships

· Division of relationships

These operations can be explained as follows:

· The result of selecting a relation based on some condition is a relation that includes only those tuples of the original relation that satisfy this condition.

· When projecting a relation onto a given set of its attributes, a relation will be obtained whose tuples are taken from the corresponding tuples of the first relation.

· When performing the operation of merging two relations, a relation will be obtained that includes all tuples included in at least one of the relations participating in the operation.

· When performing the operation of intersection of two relations, a relation will be obtained that includes all tuples included in both initial relations.

· When performing the operation of subtracting two relations, a relation will be obtained that includes all tuples included in the first relation, except those that are also included in the second relation.

· When performing the direct product of two relations, a relation is obtained whose tuples are a combination of the tuples of the first and second relation.

· When two relations are connected according to some condition, a resulting relation of tuples is formed whose tuples are a combination of tuples of the first and second relations that satisfy this condition.

· The relational division operation has two operands – a binary relation (consisting of two attributes) and a unary relation (consisting of one attribute). The result of the operation is a relation consisting of tuples including the relation of the first attribute of tuples of the first relation, and such that the set of values ​​of the second attribute coincides with the set of values ​​of the second relation.

In addition to the above, there are a number of special operations specific to working with databases:

· As a result of the renaming operation, a relation is a set of tuples that coincides with the body of the original relation, but the attribute names have been changed.

It follows that the result of a relational operation is a certain relation; it is possible to form relational expressions in which, instead of the original relation (operand), an embedded relational expression will be used. This is due to the fact that the operations of relational algebra are truly closed to the concept of a relation. Let's start with the operation unification of relations, however, this equally applies to the operations of intersection and combination, that is, in relational algebra, the result of the union operation is a relation. If allowed into relational algebra opportunity associations arbitrary two relations with different sets of attributes, then the result of such an operation will be a set, but a set of tuples of different types, that is, generally speaking, not a relation. If we proceed from the requirement that relational algebra is closed with respect to the concept of relation, then such an operation associations is meaningless. This leads to the emergence of the concept relationship compatibility By unification: Two relations are compatible only if they have the same headers, that is, they have the same set of attribute names, and the attributes of the same name are defined in the same domain.

Provided that two relations are compatible in their union, when the operation of union, intersection, and subtraction is normally performed on them, the result of the operation is a relation with a correctly defined header that matches the header of each of the operand relations. If two relations are not fully join compatible, that is, compatible in everything except attribute names, then before performing a join type operation, these relations can be made fully join compatible by applying a rename operation.

The operation of direct product of two relations raises new problems. In Set Theory, the direct product can be obtained for any sets. The elements of the resulting set will be pairs made up of elements of the first and second sets. Since relations are sets, for any two relations it is possible to obtain a direct product. However, the result will not be a relation. The elements of the result will not be tuples, but pairs of tuples. Therefore, in relational algebra, a special form of the operation of taking a direct product is used - the extended direct product of relations. When taking the extended direct product of two relations, the element of the resulting relation is a tuple formed by merging one tuple of the first relation and one tuple of the second relation. A second problem immediately arises related to obtaining a correctly formed header of the resulting relationship; this leads to the need to introduce the concept of relationship compatibility by taking an extended direct product.

Two relations are compatible by taking a direct product only if the set of attribute names of these relations do not intersect. Any two relations can be converted to a compatible direct product form by applying a rename operation to one of the relations.

The fetch operation requires two relations: an initial relation, the operand, and a simple constraint condition. As a result of the selection operation, a relation is produced whose head coincides with the header of the operand relation, and the body includes those tuples of the operand relation that satisfy the values ​​of the constraint condition.

Let's introduce a number of operators.

Let union mean the union operation, intersect – the intersection operation, minus – the subtraction operation. To denote the sampling operation, we will use the construction A where B, where A is the operand relation, and B is a simple comparison condition. Let C1 and C2 be two simple sampling conditions

A where C1 AND C2 is identical (A where C1) intersect (A where C2)

A where C1 OR C2 is identical to (A where C1) union (A where C2)

A where C1 not C2 is identical to (A where C1) minus (A where C2)

Using these definitions, you can implement sampling operations in which the sampling condition is arbitrary logical expression made up of simple conditions using logical connections (and, or, not). The operation of taking projections of the relation A onto the list of attributes a1, a2,…,an will be a relation whose head is the set of attributes, a1,a2,…,an. The body of the result will consist of tuples for which in relation A there is a tuple, attribute a1 has the value b1, attribute a2 has the value b2< и так далее атрибут an – bn. По сути при выполнении операции проекции определяется «Вертикальная» вырезка отношения - операнда с удалением возникающих кортежей –дубликатов.

The join operation, sometimes called a conditional join, requires two operands, the relations being joined, and a third operand, the simple condition. Let the relation A and B be connected. As in the case of the selection operation, the join condition C has the form, (a comp –op b) or (a comp –op const) where A and B are the names of the attributes of the relations A and B, const is literally specified constant. Comp-op is a valid comparison operation in this context. Then, by definition, the result of the connection operation is the relation obtained by performing the restriction operation, according to condition C, the direct product of the relation A and B.

There is an important special case connections, natural connection. A join operation is called a natural join operation if the join conditions are of the form (a=b) where a and b are attributes of different join operands. This case is important because it is particularly common in practice and there are effective implementation algorithms for it in a DBMS. The natural join operation is applied to a pair of relations A and B that have a common attribute P, that is, an attribute with the same name and defined on the same domain. Let ab denote the union of the headers of relations A and B. Then a natural join is the result of the join of A and B projected onto ab. The operations of natural join are not directly included in the set of operations of relational algebra, but they have very important practical significance.

The operation of dividing relations needs more detailed explanation because it is difficult to understand. Let two relations A be given (a1,a2,..,an,b1,b2,…,bm)

B (b1,b2,…,bn) We assume that attribute b1 of relation A and attribute b1 of relation B are defined on the same domain. Let's call the set of attributes (aj) a composite attribute a, and the set (bj) c a composite attribute b. After this, we will talk about the relational division of the binary relation A (a,b) into the unary relation B (b).

The result of dividing A by B is a unary relation C (a), consisting of tuples v such that in relation A there are tuples which in the set of values ​​(w) include the set of values ​​of b in relation to B.

Since division is the most difficult operation, let us explain it with an example. Let there be two relations in the student database: STUDENTS (FULL NAME, NUMBER) and NAMES (FULL NAME), and the unary relation NAMES contains all the names that students of the institute have. Then, after performing the operation of relational division of the STUDENTS relation into the NAMES relation, a unary relation will be obtained containing the numbers of student cards belonging to students with all possible surnames at this institute.


Relational notation

Let's say there is a database with the structure STUDENTS (number, name, scholarship, group code), and the relation GROUPS (gr_nom, gr_col, gr old) Let's assume that you need to find out the names and numbers of students. tickets for students who are prefects of groups with more than 25 people. In relational algebra, you need to take the following actions for a request like this:

1. Connect the relations STUDENTS and GROUPS, according to the condition “student_number = gr_star”;

2. Limit the resulting ratio by the condition gr_col>25.

3. Project the result of the previous operation onto the attribute student_name, student_number.

Here is a step-by-step formulation of the sequence of query execution in the database, each of which corresponds to one relational operation. if we formulate the same query using relational calculus, then we would get a formula that can be read: Issue STUDY_NAME and STUDY_NUMBER for such students so that such a group GR_STAR and the value GR_NUM>25 coexist. In the second formulation, we indicated only the characteristics of the resulting relationship but said nothing about the method of its formation. In this case, the DBMS itself must decide what kind of operations and in what order should be performed on the STUDENTS and GROUPS relationships. Both methods discussed in the example are actually equivalent and there are not very complex conversions from one to the other.

The basic concepts of relational calculus are the concepts of a variable with a certain area of ​​its value, and the concepts of a correctly constructed formula based on variables and special functions. Functions. What is the domain of definition of a variable differs between tuple calculus and domain calculus, that is, along or across. In tuple calculus, the domains of variable definition are the database relation, i.e. valid value Each variable is a tuple of some relation. In domain calculus, the domains of variable definition are the domains on which the attributes of database relationships are defined, that is, the valid value of each variable is the value of each variable.

Byte Integer String Char
M
N
K

The RANGE command is used to define tuples. For example, to define the STUDENT variable whose scope is STUDENTS, you need to use the RANGE STUDENT IS STUDENTS construction. From this definition it follows that at any moment in time the student variable represents a certain tuple of the STUDENTS relation. When you use tuple variables in formulas, you can reference variable attribute values. For example, in order to refer to the value of the STUDENT_NAME attribute of the STUDENT variable, you need to use the STUDENT.STUDENT_NAME construction.

Correctly constructed formulas are used to express conditions imposed on tuple variables. Such formulas are based on simple comparisons, which are operations comparing the values ​​of attributes of variables and literal constants. For example, the construction STUDENT.STUD_NOM=123456. Is a simple comparison. More difficult option compound formulas are formed using logical connections AND, OR, NOT, IF…THEN. Finally, it is possible to construct well-formed formulas using quantifiers. If F is a well-formed formula involving the variable var, then the construction EXIST (existence quantifier) ​​var (F) and FORALL (for all tuples) var (F) are correct.

Variables included in properly constructed formulas can be free or bound. All variables included in their composition in the construction of which no quantifiers were used are free. This means that if for some set of values ​​of free tuple variables the value “true” is obtained when calculating formulas, then these values ​​can be included in the resulting relation. If a quantifier is used when constructing formulas, then the variables are related. When calculating the value of such a correctly constructed formula, not a single value of the associated variable is used, but its entire domain of definition.

1)EXISTS STUD2 (STUD.1STUD_STIP> STUD2.STUD_STIP)

2)FORALL STUD2 (STUD.1STUD_STIP> STUD2.STUD_STIP)

Let STUD1 and STUD2 be two tuple variables defined on the relation students, then the formula for the current tuple of the variable STUD1 takes on the value true only if in the entire relation students there is such a tuple associated with the variable STUD2 such that the value of its attribute STUD_STIP satisfies the internal comparison condition. Correctly constructed formula No. 2 for the constructed tuple STUDENT 1 takes the value true if for all tuples the relation STUDENTS associated with the variable STUDENT 2, the value of the STUDENT.STIP attribute satisfies the internal condition.

Thus, well-formed formulas provide a means of expressing the conditions for sampling from a database relationship. To be able to use relational calculus to actually work with a database, another component is required that determines the set and names of the columns of the resulting relation. This component is called target list.

Target list has the form:

· Var.attr is the name of a free variable, attr is the name of the relation attribute on which the var variable is defined.

· Var which is equivalent to the relation from the list, Var.attr1, Var.attr1... Var.attr№ includes the names of all attributes of the defining relation.

· New_name = var.attr; the new name of the corresponding attribute of the resulting relation.

The last option is required in cases where the code in the formula uses several free variables with the same scope. In domain calculus, the domain of definition of domains is not relations but domains. In relation to the STUDENTS GROUP database, we can talk about domain variables NAME(Domain values ​​are valid names or NOM STUD). (Domain values ​​are valid student numbers).

The main difference between domain calculus and tuple calculus is the presence of an additional set of predicates that make it possible to express so-called membership conditions. If R is an n-ary relation with attributes (a1, a2, … an) then the membership condition has the form R(ai1:Vi1,ai2:Vi2,…aim:Vim) where (m<=n). Где в Vij это либо литерально заданная константа либо имя кортежной переменной. Условие членства принимает значение истина, только в том случае если в отношении R существует кортеж, содержащий следующие значения указанных атрибутов. Если от Vij константа то на атрибут aij накладывается жёсткое условие независящее от текущих доменных переменных. Если же Vij имя доменной переменной то условие членства может принимать различные значения при разных значениях этой переменной.

A predicate is a logical function that returns true or false for some argument. A relation can be considered as a predicate with arguments that are attributes of the relation in question. If a given specific set of tuples is present in the relation, then the predicate will produce a true result, otherwise it will produce a false result.

In all other respects, the formulas and expressions of domain calculus look similar to the formulas and expressions of tuple calculus. Relational domain reckoning is the basis for most form-based language queries.


Related information.


Relational Database - Basic Concepts

Often, when talking about a database, they simply mean some automated data storage. This idea is not entirely correct. Why this is so will be shown below.

Indeed, in the narrow sense of the word, a database is a certain set of data necessary for work (up-to-date data). However, data is an abstraction; no one has ever seen “just data”; they do not arise or exist on their own. Data is a reflection of objects in the real world. Let, for example, you want to store information about parts received at the warehouse. How will a real world object - a part - be displayed in the database? In order to answer this question, you need to know which features or aspects of the part will be relevant and necessary for the job. These may include the name of the part, its weight, dimensions, color, date of manufacture, material from which it is made, etc. In traditional terminology, real-world objects, information about which is stored in a database, are called entities (don’t let this word scare the reader - this is a generally accepted term), and their actual characteristics are called attributes.

Each attribute of a specific object is an attribute value. Thus, the engine part has a weight attribute value of 50, which reflects the fact that this engine weighs 50 kilograms.

It would be a mistake to think that only physical objects are reflected in the database. It is capable of absorbing information about abstractions, processes, phenomena - that is, about everything that a person encounters in his activities. For example, in a database you can store information about orders for the supply of parts to a warehouse (although it is not a physical object, but a process). The attributes of the "order" entity will be the name of the part being supplied, the number of parts, the name of the supplier, delivery time, etc.

Objects in the real world are connected to each other by many complex dependencies that must be taken into account in information activities. For example, parts are supplied to the warehouse by their manufacturers. Therefore, it is necessary to include the “manufacturer’s name” attribute among the part attributes. However, this is not enough, since additional information about the manufacturer of a particular part may be needed - his address, telephone number, etc. This means that the database must contain not only information about parts and purchase orders, but also information about their manufacturers. Moreover, the database must reflect the relationships between parts and manufacturers (each part is produced by a specific manufacturer) and between orders and parts (each order is issued for a specific part). Note that only relevant, significant connections need to be stored in the database.

Thus, in the broad sense of the word, a database is a set of descriptions of real-world objects and connections between them that are relevant for a specific application area. In what follows, we will proceed from this definition, clarifying it as we go along.

Relational data model

So now we have an idea of ​​what is stored in the database. Now we need to understand how entities, attributes, and relationships map to data structures. This is determined by the data model.

Traditionally, all DBMSs are classified depending on the data model that underlies them. It is customary to distinguish between hierarchical, network and relational data models. Sometimes they are supplemented with a data model based on inverted lists. Accordingly, they talk about hierarchical, network, relational DBMS or DBMS based on inverted lists.

In terms of prevalence and popularity, relational DBMSs today are unrivaled. They have become a de facto industrial standard, and therefore the domestic user will have to deal with a relational DBMS in their practice. Let's briefly look at the relational data model without delving into its details.

It was developed by Codd back in 1969-70 on the basis of the mathematical theory of relations and is based on a system of concepts, the most important of which are table, relation, row, column, primary key, foreign key.

A relational database is one in which all data is presented to the user in the form of rectangular tables of data values, and all operations on the database are reduced to manipulations with tables. A table consists of rows and columns and has a name that is unique within the database. The table reflects the type of real world object (entity), and each of its rows represents a specific object. Thus, the Part table contains information about all parts stored in the warehouse, and its rows are sets of attribute values ​​for specific parts. Each table column is a collection of values ​​for a specific attribute of an object. So, the Material column represents a set of values ​​​​"Steel", "Tin", "Zinc", "Nickel", etc. The Quantity column contains non-negative integers. The values ​​in the Weight column are real numbers equal to the weight of the part in kilograms.

These values ​​don't appear out of thin air. They are selected from the set of all possible values ​​for an object attribute, which is called the domain. Thus, the values ​​in the material column are selected from a set of names of all possible materials - plastics, wood, metals, etc. Therefore, it is fundamentally impossible for a value to appear in the Material column that does not exist in the corresponding domain, for example, “water” or “sand”.

Each column has a name, which is usually written at the top of the table ( Rice. 1). It must be unique within the table, but different tables can have columns with the same name. Any table must have at least one column; The columns are arranged in the table according to the order in which their names appeared when it was created. Unlike columns, rows do not have names; their order in the table is not defined, and their number is logically unlimited.

Figure 1. Basic database concepts.

Since the rows in the table are not ordered, it is impossible to select a row by its position - there is no "first", "second", or "last" among them. Any table has one or more columns, the values ​​of which uniquely identify each of its rows. This column (or combination of columns) is called a primary key. In the Part table, the primary key is the Part Number column. In our example, each part in the warehouse has a single number, by which the necessary information is retrieved from the Part table. Therefore, in this table, the primary key is the Part Number column. There cannot be duplicate values ​​in this column - there should be no rows in the Part table that have the same value in the Part Number column. If a table satisfies this requirement, it is called a relation.

The relationship of tables is the most important element of the relational data model. It is supported by foreign keys. Let's consider an example in which a database stores information about ordinary employees (Employee table) and managers (Manager table) in some organization ( Rice. 2). The primary key of the table Head is the Number column (for example, personnel number). The Last Name column cannot serve as a primary key, since two managers with the same last names can work in the same organization. Any employee is subordinate to a single manager, which must be reflected in the database. The Employee table contains a column Manager Number, and the values ​​in this column are selected from the Number column of the Manager table (see. Rice. 2). The Manager Number column is a foreign key in the Employee table.

Figure 2. Relationship between database tables.

Tables cannot be stored and processed if there is no "data about data" in the database, such as handles for tables, columns, etc. They are usually called metadata. Metadata is also presented in tabular form and stored in a data dictionary.

In addition to tables, the database can store other objects, such as displays, reports, views, and even application programs that work with the database.

For users of an information system, it is not enough for the database to simply reflect real-world objects. It is important that such a reflection is unambiguous and consistent. In this case, the database is said to satisfy the integrity condition.

In order to guarantee the correctness and mutual consistency of data, certain restrictions are imposed on the database, which are called data integrity constraints.

There are several types of integrity constraints. It is required, for example, that the values ​​in a table column be selected only from the corresponding domain. In practice, more complex integrity constraints are also taken into account, for example, referential integrity. Its essence is that a foreign key cannot be a pointer to a non-existent row in the table. Integrity constraints are implemented using special means, which will be discussed in Sec.Database server .

SQL language

The data itself in computer form is of no interest to the user if there are no means of accessing it. Data is accessed in the form of database queries that are formulated in a standard query language. Today, for most DBMSs, this language is SQL.

The emergence and development of this language as a means of describing database access is associated with the creation of the theory of relational databases. Prototype SQL language originated in 1970 as part of the System/R research project, work on which was carried out at IBM's Santa Teresa Laboratory. Nowadays SQL is the standard interface with relational DBMS. Its popularity is so great that developers of non-relational DBMSs (for example, Adabas) supply their systems with a SQL interface.

The SQL language has an official standard - ANSI/ISO. Most DBMS developers adhere to this standard, but often extend it to implement new data processing capabilities. New data management mechanisms that will be described in Sec.Database server , can only be used through special SQL statements, which are generally not included in the language standard.

SQL is not a traditional programming language. Not programs are written in it, but queries to the database. That's why SQL is a declarative language. This means that it can be used to formulate what needs to be obtained, but it cannot indicate how it should be done. In particular, unlike procedural programming languages ​​(C, Pascal, Ada), the SQL language does not have operators such as if-then-else, for, while, etc.

We will not go into detail about the syntax of the language. We will touch upon it only to the extent necessary to understand simple examples. With their help, the most interesting data processing mechanisms will be illustrated.

A SQL query consists of one or more statements, one after the other, separated by a semicolon. Table 1 below lists the most important operators that are included in the ANSI/ISO SQL standard.

Table 1. Basic SQL operators.

SQL queries use names that uniquely identify database objects. In particular, this is the table name (Detail), column name (Title), as well as the names of other objects in the database that belong to additional types (for example, names of procedures and rules), which will be discussed in Sec.Database server . Along with simple ones, complex names are also used - for example, a qualified column name determines the name of the column and the name of the table to which it belongs (Part.Weight). For simplicity, in the examples, names will be written in Russian, although in practice this is not recommended.

Each column in any table stores specific types of data. There are basic data types - fixed-length character strings, integers and real numbers, and additional data types - variable-length character strings, monetary units, date and time, logical data (two values ​​- "TRUE" and "FALSE"). In SQL, you can use numeric, string, character, and date and time constants.

Let's look at a few examples.

The query “determine the number of parts in stock for all types of parts” is implemented as follows:

SELECT Name, Quantity

FROM Part;

The result of the query will be a table with two columns - Name and Quantity, which are taken from the original Part table. Essentially, this query allows you to get a vertical projection of the original table (more strictly, a vertical subset of the set of table rows). From all rows of the Part table, rows are formed that include values ​​​​taken from two columns - Name and Quantity.

The query “What steel parts are in stock?” formulated in SQL looks like this:

FROM Part

WHERE Material = "Steel";

The result of this query will also be a table containing only those rows of the source table that have the value "Steel" in the Material column. This query allows you to get a horizontal projection of the Part table (the asterisk in the SELECT statement means selecting all columns from the table).

The request “to determine the name and quantity of parts in stock that are made of plastic and weigh less than five kilograms” will be written as follows:

SELECT Name, Quantity

FROM Part

WHERE Material = "Plastic"

AND Weight< 5;

The result of the query is a table with two columns - Name, Quantity, which contains the name and number of parts made of plastic and weighing less than 5 kg. In essence, the sampling operation is the operation of first forming a horizontal projection (find all rows of the Part table for which Material = "Plastic" and Weight< 5), а затем вертикальной проекции (извлечь Название и Количество из выбранных ранее строк).

One of the tools that provides quick access to tables is indexes. An index is a database structure that is a pointer to a specific row in a table. A database index is used in the same way as an index in a book. It contains values ​​taken from one or more columns of a particular table row and a reference to that row. The values ​​in the index are ordered, allowing the DBMS to quickly search the table.

Let's assume that a query is formulated to the Warehouse database:

SELECT Name Quantity, Material

FROM Part

WHERE Number = "T145-A8";

If there are no indexes for a given table, then to execute this query the DBMS must scan the entire Part table, sequentially selecting rows from it and checking the selection condition for each of them. For large tables, such a query will take a very long time to complete.

If an index was previously created on the Table Number Detail column, then the search time in the table will be reduced to a minimum. The index will contain the values ​​from the Number column and a link to the row with this value in the Part table. When executing a query, the DBMS will first find the value “T145-A8” in the index (and do this quickly, since the index is ordered and its rows are small), and then, using the link in the index, determine the physical location of the searched row.

An index is created with the SQL CREATE INDEX statement. In this example, the operator

CREATE UNIQUE INDEX Part index

ON Part(Number);

will create an index with the name "Part Index" on the column Number of the table Part.

For a DBMS user, it is not the individual SQL statements that are of interest, but a certain sequence of them, designed as a single whole and making sense from his point of view. Each such sequence of SQL statements implements a specific action on the database. It is carried out in several steps, at each of which certain operations are performed on the database tables. Thus, in the banking system, the transfer of a certain amount from a short-term account to a long-term account is carried out in several operations. Among them are withdrawing an amount from a short-term account and crediting it to a long-term account.

If there is a failure during this action, for example, when the first operation is completed but the second is not, then the money will be lost. Therefore, any action on the database must be performed entirely, or not performed at all. This action is called a transaction.

Transaction processing relies on a log, which is used to roll back transactions and restore the state of the database. More details about transactions will be discussed in Sec.Transaction Processing .

Concluding our discussion of the SQL language, let us once again emphasize that it is a query language. It is impossible to write any complex application program that works with a database. For this purpose in modern DBMS a fourth generation language (Forth Generation Language - 4GL) is used, which has both the basic capabilities of third generation procedural languages ​​(3GL), such as C, Pascal, Ada, and the ability to embed SQL statements into the program text, as well as user interface control tools (menu , forms, user input, etc.). Today, 4GL is one of the de facto standards for database application development tools.

Relational model

The relational database model was proposed in 1969 by mathematician and IBM researcher E.F. Codd (E.F. Codd). For some basic information about relational databases, see the overview article “ DB and DBMS" 2. Since relational databases are currently dominant, in this article (as well as in the articles " Data Description”, “Data processing" And " Database design” 2) the most essential concepts of the relational model are discussed in detail.

Let us immediately note that the theory of relational databases was initially formulated in strict mathematical language, and it is precisely strict, formally defined mathematical concepts that best describe the essence of things. At the same time, in most cases it is possible, without much damage, to sacrifice the rigor of terminology in favor of transparency of presentation, which is what we will try to do.

The main idea of ​​the relational model is as follows. The database consists of a series of unordered tables(in the simplest case - from one table). Tables can be manipulated through non-procedural (declarative) operations - requests, the results of which are also tables.

Often the word “relational” ( relational) in the term “relational model” is interpreted based on the fact that connections are established in a relational database ( relate) between tables. This explanation is convenient, but it is not accurate. In Codd's original system of terms, the connection terms ( relations), attributes ( attributes) and tuples ( tuples) were used where most of us use the more familiar terms tables, columns (fields) and rows (records).

When building an information model of a subject area (see “ DB and DBMS”, “Database design” 2) stand out essence(objects), describe them properties a (characteristics, attributes) that are essential for modeling purposes, and connections between entities are established. At the stage of transition from an infological to a datalogical relational model, tables appear. Typically, each entity is represented by one table. Each row of the table (one record) corresponds to one instance of the entity, and each field describes a certain property (attribute).

For example, if we need to store information about people, including each person’s last name, first name, patronymic, TIN, country of residence and date of birth, then the entity is the person, and the specified data is the attributes. The entity itself naturally becomes the name of the table.

Table “Man”

The relational model requires that each row in a table be unique, i.e. so that any two rows differ in the value of at least one attribute.

The traditional tabular form is useful when you need to present the data itself. If, as in the example above, you are only interested in structure- field names, then from the point of view of clarity, ease of use in diagrams and saving space, it is more convenient to depict it as follows:

Keys

Key tablesis a field or group of fields containing unique values ​​within a given table. The key uniquely identifies the corresponding table row. If the key consists of a single field, it is often called simple, if from several - composite. In the example above, the key is the TIN field (we assume it is known that TINs are unique within a country).

Let's look at an example of a table with a composite key. Weather forecast websites often present information as follows: for each date, they indicate the predicted temperature at night, in the morning, in the afternoon and in the evening. To store this information, you can use a table like this:

In this table, neither the Date, Time of Day, nor Temperature fields are keys - values ​​can be repeated in each of these fields. But the combination of the Date + Time of day fields is unique and uniquely identifies a table row. This is a composite key.

There are often situations in which the choice of key is not unambiguous. Let's go back to the first example. Let's say that in addition to the last name, first name, patronymic, TIN, date of birth, it is required to store the series and number of a general passport and the series and number of a foreign passport. The table will look like this:

You can select as many as three keys in this table. One of them is simple (TIN), the other two are composite (Series + General passport number and Series + Foreign passport number). In such a situation, the developer chooses the key that is most convenient from the point of view of organizing the database (in general, the key whose value takes the least time to find). The selected key in this case is often called the main key, or primary, key, and other combinations of columns from which a key can be made are possible, or alternative, keys. Note that there is always at least one possible key in a table, since rows cannot be repeated and, therefore, the combination of all columns is guaranteed to be a possible key.

When depicting tables, it is customary to highlight the primary keys of tables. For example, relevant fields are often underlined. And Microsoft Access highlights key fields in bold.

Even more often than with ambiguity in choosing a key, developers are faced with the absence of a key among the data that needs to be stored. A similar fact can be established in the process of analyzing the subject area. For example, if you need to store a simple list of people - first names, last names, patronymics and dates of birth, then there is no key in this set of attributes at all - a conceivable situation is when two different people have the same data completely. In this case, you have to artificially enter an additional field, for example, a unique person number. Such a key is sometimes called in the literature surrogate. Often a surrogate key is introduced for reasons of efficiency. If, for example, a table has a long composite key, then developers often introduce an additional short numeric surrogate key and make it the primary key. This is often done even if there is a simple key that has an “inconvenient” (ineffective for searching) data type, for example, a string. Such operations are no longer relevant to theory, but are often encountered in practice.

The attentive reader may note that the key can almost always be expanded (unless it includes all the fields of the table) by including redundant fields. Formally, such a key will remain a key, but from a practical point of view, this is just a game of concepts. Such keys are not even considered possible, since it is always necessary to strive to minimize the length (complexity) of the key.

Normal forms, normalization

Not every table that we can draw on paper or in Word can be a relational database table. And not every table that can be used in a relational database is correct from the point of view of the relational model requirement.

Firstly, requires that all data within the same column be of the same type(about types seeData Description” 2). From this point of view, the example below does not even make sense to discuss:

Secondly, requires the table to have a primary key assigned.

These requirements are necessary, but not sufficient. The theory of relational databases introduces the concepts of so-called “normal forms” - requirements for organizing data in tables. Normal forms are numbered sequentially as the requirements become more stringent. In a properly designed database, tables are in at least third normal form. Accordingly, we will consider the first three normal forms. Let us recall that we are dealing with tables that satisfy the two basic requirements formulated above.

First normal form (1NF)

First normal form dictates that all data contained in a table must be atomic(indivisible). The list of corresponding atomic data types is determined by the DBMS. The 1NF requirement is completely natural. It means that each field of each record should contain only one value, and not an array or any other data structure. Let's give a meaningful example of a table that is not in 1NF. Let us have lists of students' grades in a certain subject.

Since the value of the Ratings field is not atomic, the table does not meet the requirements of 1NF.

A possible way to present a list of ratings is described in the guidelines for the article. “DB Design” 2.

Second normal form (2NF)

A table is said to be in second normal form if it is in 1NF and every non-key column is completely dependent on the primary key. In other words, the value of each field must be determined entirely by the value of the primary key. It is important to note that dependence on a primary key is understood precisely as a dependence on the key as a whole, and not on its individual component (in the case of a composite key). Let's give an example of a table that is not in 2NF. To do this, let's return to the example of the weather forecast and add another column to the table - the time of sunrise (this is a completely plausible example; this kind of information is often provided on weather forecast sites).

As we remember, this table has a composite key Date+Time of day. The Temperature field is completely dependent on the primary key - there are no problems with it. But the Sunrise field depends only on the Date field. The time of day does not naturally affect the time of sunrise.

Here it is appropriate to ask the question: what is the practical meaning of 2NF? What is the benefit of these restrictions? It turns out it's big. Let's say that in the above example the developer ignores the 2NF requirements. Firstly, the so-called redundancy- storage of unnecessary data. After all, if for one record with a given date the sunrise time is already stored, then for all other records with a given date it should be the same and, generally speaking, there is no need to store it.

Let's pay attention to the words “should be”. What if it doesn't? After all, at the database level this is not controlled in any way - the key in the table is composite, there can be identical dates (and in terms of meaning most likely there will be). And no formal restrictions (and our understanding that “this cannot happen” is not one of them) prohibits indicating different sunrise times for the same date.

Third normal form (3NF)

A table is said to be in 3NF if it is 2NF and all non-key columns are mutually independent.

The mutual dependence of the columns is conveniently understood as follows: Columns are mutually dependent if it is impossible to change one of them without changing the other.

Let's give an example of a table that is not in 3NF. Let's consider an example of a simple notebook for storing home telephone numbers of people living, perhaps, in different regions of the country.

This table has a dependency between the non-key columns City and City Code, therefore the table is not in 3NF.

Note that the developer determines the presence of the above dependency by analyzing the subject area - such a collision cannot be seen by any formal methods. When changing the properties of the subject area, the dependency between the columns may disappear. For example, if different codes are entered within the same city (like 495 and 499 in Moscow), the corresponding columns are no longer related in terms of violation of 3NF requirements.

In the theory of relational databases, higher order forms are also considered - Boyce-Codd normal form, 4NF, 5NF and even higher. These forms do not have much practical significance, and developers, as a rule, always stop at 3NF.

Database normalization

Normalization is the process of reducing database tables to a selected normal form. Normalization to 2NF, as a rule, comes down to decomposition - splitting one table into several. Normalization to 3NF can usually be accomplished by removing dependent (calculated) columns. In some cases, when normalizing to 3NF, you also have to perform decomposition.

Multi-table databases, relationships between tables, foreign keys

In practice, single-table databases are quite rare, since from the point of view of modeling a domain database, the presence of one table means the presence of one entity. In turn, the presence of several entities usually means the presence of connections between them.

Without setting the goal of a complete database design, let's consider an example that allows us to demonstrate relationships in multi-table databases.

Suppose we are dealing with a school in which there are students grouped into classes and teachers teaching certain subjects. We immediately distinguish four entities: students, teachers, classes and objects. These entities already give us four tables.

Next, we need to resolve the issue of entity attributes - what kind of information we will store. Since our example is for demonstration purposes only, we will try to minimize the amount of stored information. We will agree to store the surname and first name for each student, for the class - the parallel number and the letter identifying the class within the parallel, for the teacher - the surname, first name and patronymic, for the subject - only its name.

Now we need to solve the issue with primary keys. The student and teacher tables basically do not have a key, so we will introduce a surrogate numeric key into them - a number. Tables of classes and items, generally speaking, have keys. In the table of classes, the key is composite, it is formed by the attributes Parallel Number + Letter, and in the table of objects, a simple key consists of a single field - the name of the object. Recall that when we talked about keys, we mentioned that surrogate keys are often added for efficiency reasons, trying to get rid of compound keys or key fields of inconvenient types, such as strings. That's what we'll do. Let's add a surrogate numeric key to each of the tables.

As a result, we will receive the following set of tables corresponding to the described entities.

Understanding what subject area we are dealing with, we know that our entities do not exist on their own - they are connected by certain relationships that we outlined above. But how to connect them technically? Here you cannot do without introducing additional fields and even additional tables. Let's look at the relationships between entities in order.

To assign a student to a certain class, we add an additional field Class number in the “Student” table. (It is clear that its type must completely coincide with the type of the Class Number field in the “Class” table.) Now we can link the “Student” and “Class” tables using the matching values ​​of the Class Number fields (it is not by chance that we named these fields the same, in practice This is often done to easily navigate the connecting fields). Note that one record in the “Class” table can correspond to many records in the “Student” table (and in practice most likely corresponds - it is difficult to imagine a class of one student). Such tables are said to be related by the relationship “ one to many" And the Class Number field in the “Student” table is called foreign key. As you can see, the purpose of foreign keys is to link tables. Note that a foreign key always refers to the primary key of the related table (i.e. the foreign key is on the “many” side). The primary key associated with a foreign is called parental, although this term is used less frequently.

Let’s illustrate this with a diagram in the Microsoft Access style (more information about the Access “Data Schema” is written in the article “Data Description” 2).

Now let's think about teachers and subjects. Analyzing the subject area (this is the only way - after all, it is impossible to extract the true state of affairs from the formal model itself), we notice that the type of connection between the entities “teacher” and “subject” is different from that discussed above. After all, not only can many teachers teach one subject, but one teacher can teach many subjects. Thus, there is a connection between these entities “ many to many" Here you can’t do without introducing additional fields (try it!). Many-to-many relationships are always resolved by introducing an additional table. Namely, we organize the “Teacher-Subject” table, which has the following structure:

Table “Teacher-Subject”

This table has a composite key formed from two of its fields. Both the “Teacher” table and the “Subject” table are related to this table in a one-to-many relationship (of course, in both cases the “many” is on the “Teacher-Subject” side). Accordingly, in the “Teacher-Subject” table there are two foreign keys (both are parts of a composite primary key, which is not prohibited), which serve to link with the corresponding tables.

In practice, in addition to the considered “one-to-many” and “many-to-many” relationships, there is also the relationship “ one to one" From a theoretical point of view, such a relationship is not of interest, since two tables connected by a one-to-one relationship can always be simply combined into one. However, in real databases, a one-to-one relationship is used to optimize data processing. Let us illustrate this with an example.

Let's say we store a lot of different information about people - data from all kinds of documents, telephone numbers, addresses, etc. Most likely, most of this data will be used very rarely. And often we only need a last name, first name, patronymic and telephone number. Then it makes sense to organize two tables and relate them in a one-to-one relationship. Store frequently used information in one (small) table, and the rest in another. Naturally, tables related by a one-to-one relationship have the same primary key.

Integrity Rules

The relational model defines two general rules for database integrity: object integrity and referential integrity.

Integrity Rule objects very simple. It requires that table primary keys do not contain null (null) values.

Referential integrity rule requires that foreign keys do not contain values ​​that are inconsistent with their parent keys. Returning to the example discussed above, we must require, for example, that students belong only to the class whose number is indicated in the “Classes” table.

Most DBMSs are able to monitor data integrity (of course, this requires corresponding efforts from the developer at the stage of describing data structures). In particular, mechanisms are used to maintain referential integrity cascading operations. Cascading implies, in particular, that when deleting a record from a “parent” table that is connected to another table by a one-to-many relationship, all related records are automatically deleted from the “many” table (by the DBMS itself, without user intervention). And this is natural, because such records “hang in the air”; they are no longer connected with anything.

Indexing

Indexing is extremely important from the point of view of practical application, but optional from the standpoint of pure theory. The main purpose of indexing is to optimize (speed up) search (and, accordingly, some other operations with the database). Indexing in any case requires additional resources (special index files are most often created at the physical level). Indexing may even slow down operations related to data modification; therefore, tables that are rarely modified and frequently searched are usually indexed.

An index file is very similar to a regular book index. For each index value, a list of table rows that contain that value is stored. Accordingly, to search, you do not need to look through the entire table - just look into the index. However, when modifying records, you may need to rebuild the index. And this takes extra time.

Of course, there is no question of presenting the theory of relational databases as part of a basic computer science course! Nevertheless, this article is very important for our encyclopedia, since in this case we are dealing with material that cannot be fully presented in lessons, but the teacher must master it. Why?

Firstly, because a number of concepts are studied within the framework of the basic course. This includes a table view of the data and table keys. And we all know that it is very difficult to correctly and accurately present only some concepts without presenting the overall picture.

Secondly, performing simple database queries with children (the relevant material is presented in the article "Data processing" 2), it is necessary to deal with tables that are correct from the point of view of relational theory. There is no need to explain to students that these tables are correct, but “if only..., then the table would be incorrect,” but it is unacceptable to use bad examples.

In a specialized computer science course, the situation may be fundamentally different. The most important and extremely productive form of work in specialized classes is project work. As part of educational projects, it is possible and necessary to develop simple databases, and here one cannot do without the foundations of the theory presented. However, the following must be taken into account:

The subject areas being modeled should not be too large;

They should be very familiar to students (in this sense, the “School” project, which everyone was pretty fed up with, is not the worst choice!);

It is naive to expect that after listening to the basics of the theory, students will be able to design something themselves. You must go through each step with them, justifying your actions in detail.







2024 gtavrl.ru.