Relational databases. Basic rules for normalizing a relational entity


Database (DB) - is a named collection of structured data related to a specific subject area and intended for storage, accumulation and processing using a computer.

Relational Database (RDB) is a set of relations whose names coincide with the names of schema relations in the database schema.

Basic Conceptsrelational databases data:

· Data type– type of values ​​of a specific column.

· Domain(domain) – the set of all valid attribute values.

· Attribute(attribute) – table column header characterizing a named property of an object, for example, student’s last name, order date, employee’s gender, etc.

· Cortege– a table row representing a set of values ​​of logically related attributes.

· Attitude(relation) – a table reflecting information about real-world objects, for example, about students, orders, employees, residents, etc.

· Primary key(primary key) – a field (or set of fields) of a table that uniquely identifies each of its records.

· Alternative key is a field (or set of fields) that does not match the primary key and uniquely identifies an instance of a record.

· External key is a field (or set of fields) whose values ​​match the existing values primary key another table. When two tables are linked, the primary key of the first table is linked external key second table.

· Relational Data Model (RDM)- organizing data in the form of two-dimensional tables.

Each relational table must have the following properties:

1. Each table record is unique, i.e. the set of values ​​in the fields is not repeated.

2. Each value written at the intersection of a row and a column is atomic (inseparable).

3. The values ​​of each field must be of the same type.

4. Each field has a unique name.

5. The order of the entries is not significant.

Main elements of the database:

Field- an elementary unit of logical organization of data. The following characteristics are used to describe the field:

· name, for example, Last Name, First Name, Patronymic, Date of Birth;

· type, for example, string, character, numeric, date;

· length, for example, in bytes;

· precision for numeric data, such as two decimal places to show the fractional part of a number.

Record- a set of values ​​of logically related fields.

Index– a means of speeding up the record search operation, used to establish relationships between tables. A table for which an index is used is called indexed. When working with indexes, you need to pay attention to the organization of the indexes, which is the basis for classification. A simple index is represented by a single field or a Boolean expression that processes a single field. A composite index is represented by several fields that can be used various functions. Table indexes are stored in an index file.


Data integrity– this is a means of protecting data on communication fields, which allows you to maintain tables in a consistent (consistent) state (that is, it does not allow the existence of records in the subordinate table that do not have corresponding records in the parent table).

Request– a formulated question for one or more interconnected tables containing data sampling criteria. The request is made using a structured language SQL queries(Structured Query Language). Retrieving data from one or more tables can result in a set of records called a view.

Data presentation– a named query stored in the database to retrieve data (from one or more tables).

A view is essentially a temporary table created as a result of a query. The request itself can be sent to separate file, report, temporary table, table on disk, etc.

Report– a system component whose main purpose is to describe and print documents based on information from the database.

general characteristics working with RDB:

The most common interpretation of the relational data model seems to be that of Data, who reproduces it (with various refinements) in almost all of his books. According to Date, the relational model consists of three parts that describe different aspects of the relational approach: the structural part, the manipulation part, and the holistic part.

The structural part of the model states that the only data structure used in relational databases is the normalized n-ary relation.

The manipulation part of the model affirms two fundamental mechanisms for manipulating relational databases - relational algebra and relational calculus. The first mechanism is based mainly on classical set theory (with some refinements), and the second is based on the classical logical apparatus of first-order predicate calculus. Note that the main function of the manipulation part of the relational model is to provide a measure of the relationality of any specific relational database language: a language is called relational if it has no less expressiveness and power than relational algebra or relational calculus.


28. ALGORITHMIC LANGUAGES. TRANSLATORS (INTERPRETERS AND COMPILERS). ALGORITHMIC LANGUAGE BASIC. PROGRAM STRUCTURE. IDENTIFIERS. VARIABLES. OPERATORS. PROCESSING OF ONE-DIMENSIONAL AND TWO-DIMENSIONAL ARRAYS. USER FUNCTIONS. SUBROUTINES. WORKING WITH DATA FILES.

Language high level - a programming language whose concepts and structure are convenient for human perception.

Algorithmic language(Algorithmic language) - a programming language - an artificial (formal) language designed for writing algorithms. A programming language is defined by its description and implemented in the form of a special program: a compiler or interpreter. Examples of algorithmic languages ​​are Borland Pascal, C++, Basic, etc.

Basic Concepts algorithmic language:

Composition of the language:

Ordinary spoken language consists of four basic elements: symbols, words, phrases and sentences. An algorithmic language contains similar elements, only words are called elementary constructions, phrases are called expressions, and sentences are called operators.

Symbols, elementary constructions, expressions and operators form a hierarchical structure, since elementary constructions are formed from a sequence of symbols.

Expressions is a sequence of elementary structures and symbols,

Operator- a sequence of expressions, elementary structures and symbols.

Language description:

A character description consists of listing the valid characters of the language. The description of elementary structures means the rules of their formation. Description of expressions is the rules for the formation of any expressions that have meaning in a given language. The description of operators consists of a consideration of all types of operators allowed in the language. The description of each language element is given by its SYNTAX and SEMANTICS.

Syntactic definitions establish rules for constructing language elements.

Semantics defines the meaning and rules of use of those language elements for which syntactic definitions have been given.

Language symbols- these are the basic indivisible signs in terms of which all texts in the language are written.

Elementary structures- these are the minimum units of language that have independent meaning. They are formed from the basic symbols of the language.

Expression in an algorithmic language, it consists of elementary structures and symbols; it specifies a rule for calculating a certain value.

Operator specifies a complete description of some action that needs to be performed. For description complex action a group of operators may be required.

In this case, the operators are combined into Compound operator or Block. Actions, specified by the operators, are executed on the data. Statements of an algorithmic language that provide information about data types are called declarations or non-executable statements. A set of descriptions and operators united by a single algorithm forms a program in an algorithmic language. In the process of studying an algorithmic language, it is necessary to distinguish the algorithmic language from the language with which the description of the algorithmic language being studied is carried out. Usually the language being studied is called simply a language, and the language in terms of which the description of the language being studied is given - Metalanguage.

Translators - (English translator - translator) is a translator program. It converts a program written in one of the high-level languages ​​into a program consisting of machine instructions.

A program written in any high-level algorithmic language cannot be directly executed on a computer. The computer understands only the language of machine commands. Consequently, a program in an algorithmic language must be translated (translated) into the command language of a specific computer. Such translation is carried out automatically by special translator programs created for each algorithmic language and for each type of computer.

There are two main broadcast methods - compilation and interpretation.

1.Compilation: Compiler(English compiler - compiler, collector) reads the entire program, translates it and creates a complete version of the program in machine language, which is then executed.

At compilation the entire original program is immediately converted into a sequence of machine instructions. After this, the resulting resulting program is executed by a computer with the available source data. The advantage of this method is that the translation is performed once, and the (multiple) execution of the resulting program can be carried out with high speed. At the same time, the resulting program can take up a lot of space in the computer's memory, since one language operator is replaced by hundreds or even thousands of commands during translation. In addition, debugging and modifications of the broadcast program are very difficult.

2. Interpretation: Interpreter(English interpreter - interpreter, interpreter) translates and executes the program line by line.

At interpretations the source program is stored in the computer memory almost unchanged. The interpreter program decodes the statements original program one at a time and immediately ensures their execution with the available data. The interpreted program takes up little space in the computer's memory and is easy to debug and modify. But the execution of the program is quite slow, since with each execution the interpretation of all operators is carried out anew.

Compiled programs run faster, but interpreted ones are easier to fix and change.

Each specific language is oriented either towards compilation or interpretation - depending on the purpose for which it was created. For example, Pascal is usually used to solve rather complex problems in which program speed is important. That's why given language usually implemented using a compiler.

On the other hand, BASIC was created as a language for novice programmers, for whom line-by-line execution of a program has undeniable advantages.

Sometimes there is both a compiler and an interpreter for the same language. In this case, you can use an interpreter to develop and test the program, and then compile the debugged program to improve its execution speed.

RELATIONAL DATABASE AND ITS FEATURES. TYPES OF RELATIONS BETWEEN RELATIONAL TABLES

Relational database is a collection of interconnected tables, each of which contains information about objects of a certain type. A table row contains data about one object (for example, a product, a customer), and the table columns describe various characteristics of these objects - attributes (for example, name, product code, customer information). Records, i.e. table rows, have the same structure - they consist of fields that store object attributes. Each field, i.e. column, describes only one characteristic of the object and has a strictly defined data type. All records have the same fields, only they display different information properties of the object.

In a relational database, each table must have a primary key - a field or combination of fields that uniquely identifies each row in the table. If a key consists of several fields, it is called composite. The key must be unique and uniquely identify the entry. Using the key value, you can find a single record. Keys also serve to organize information in the database.

Relational database tables must meet the requirements for normalizing relationships. Normalization of relations is a formal apparatus of restrictions on the formation of tables, which eliminates duplication, ensures consistency of data stored in the database, and reduces labor costs for maintaining the database.

Let a Student table be created containing the following fields: group number, full name, student record number, date of birth, specialty name, faculty name. Such an organization of information storage will have a number of disadvantages:

  • duplication of information (the name of the specialty and faculty is repeated for each student), therefore, the volume of the database will increase;
  • the procedure for updating information in the table is complicated due to the need to edit each table entries.

Table normalization is designed to address these shortcomings. Available three normal forms of relationships.

First normal form. A relational table is reduced to first normal form if and only if none of its rows contains more than one value in any of its fields and none of its key fields is empty. So, if you need to obtain information from the Student table by the student’s name, then the Full Name field should be divided into Last Name, First Name, and Patronymic parts.

Second normal form. A relational table is defined in second normal form if it satisfies the requirements of first normal form and all its fields that are not included in the primary key have a full functional dependence on the primary key. To reduce a table to second normal form, it is necessary to determine the functional dependence of the fields. A functional dependence of fields is a dependence in which in an instance of an information object a certain value of a key attribute corresponds to only one value of a descriptive attribute.

Third normal form. A table is in third normal form if it satisfies the requirements of second normal form that none of its non-key fields is functionally dependent on any other non-key field. For example, in the Student table (Group No., Full Name, Gradebook No., Date of Birth, Headman), three fields - Gradebook No., Group No., Headman are in transitive dependence. The group number depends on the grade book number, and the Headman depends on the group number. To eliminate the transitive dependency, it is necessary to transfer some of the fields of the Student table to another Group table. The tables will take the following form: Student (group number, full name, grade book number, date of birth), Group (group number, Headman).

The following operations are possible on relational tables:

  • Merge tables with the same structure. The result is general table: first first, then second (concatenation).
  • Intersection of tables with the same structure. Result - those records that are in both tables are selected.
  • Subtracting tables with the same structure. Result - those records are selected that are not in the subtracted one.
  • Sample (horizontal subset). Result - records that meet certain conditions are selected.
  • Projection (vertical subset). The result is a relation containing some of the fields from the source tables.
  • Cartesian product of two tables The resulting table's records are obtained by combining each record of the first table with each record of the other table.

Relational tables can be related to each other, hence data can be retrieved from multiple tables simultaneously. Tables are linked to each other in order to ultimately reduce the size of the database. Each pair of tables is connected if they have identical columns.

The following types exist information links:

  • one to one;
  • one-to-many;
  • many-to-many.

One-to-one communication assumes that one attribute of the first table corresponds to only one attribute of the second table and vice versa.

One-to-many communication assumes that one attribute of the first table corresponds to several attributes of the second table.

Many-to-many communication assumes that one attribute of the first table corresponds to several attributes of the second table and vice versa.

Level 1: Level external models- this is the most top level where each model has its own view of the data. This layer defines the database viewpoint of individual applications.

Conceptual level: The central control link, where the database is presented in the most general form, which combines the data used by all applications. In fact, the conceptual level reflects a generalized model of the subject area.

Physical layer(Database): This is the data itself located in files or in page structures located on external storage media.


Data Models

The following data models are distinguished:

1. Infological

2. Date logical

3. Physical

The database design process begins with the design of an information model. An infological data model is a generalized informal description of the created database, made using natural language, mathematical formulas, tables, graphs and other tools that are understandable to all people working on database design.

Domain tuple

The information model reflects the real world in some human-understandable concept, completely independent of the data storage environment. Therefore, the Infology Model should not change until some change in the real world requires a change outside the definition so that the model continues to represent the domain.

There are many approaches to building this model: graph models, semantic networks, entity-connection and others.

Datalogical model

The infological model must be displayed in a datalogical model that is understandable to the DBMS. A datalogical model is a formal description of an information model in the DBMS language.

Hierarchical model

This model is a collection of related elements that form a hierarchical structure. The basic concepts of hierarchy include level, node, and relationship.

communication level


A node is a collection of data attributes that describe an object. Each node is connected to one node at a higher level and to any number of nodes at a lower level. The exception is the highest level node. The number of trees in the database is determined by the number of tree roots. Each database record has a single path from the root record. A simple example is the Internet domain name system\address. On the first level (the root of the tree) lies our planet earth, on the second the Country, on the third the Region, on the fourth - the settlement, street, house, apartment. A typical representative is a DBMS from IBM - IMS.

All instances of a given descendant type with common copy type of ancestor is called twins. A complete traversal order is defined for the database. From top to bottom and from right to left.

Physical model

A physical model is built based on the datalogical model. The physical organization of data has a major impact on database performance. DBMS developers are trying to create the most productive physical data models, offering users one or another tool to customize the model for a specific database.

Example: In particular for a relational database, it already takes into account:

1. Physical aspects of storing tables in specific files.

2. Creating indexes that optimize the speed of data operations using the application.

3. Performing various actions on data upon certain events defined by users using triggers and stored procedures.

Infological models X

Physical models


For all levels and for any method of representing a subject area, there is a coding of concepts of relationships between concepts. A key step in the development of any information system is to conduct a system analysis:

Formalization of the subject area and representation of the system as a set of components.

Composition as the basis of system analysis can be functional (building a hierarchy).

However, in most systems, when it comes to databases, data types are a more static element than the way they are processed. Therefore, such methods of system analysis as the data flow diagram have received intensive development. Development of relational databases. Stimulated the development of data development methodologies, in particular ER ER diagrams. The relational data model directly uses the concept of relationship as a mapping. She is closest to conceptual model data presentation. And often lies at the heart of it.

Unlike the graph model theorist, in the relational model, connections between relations are implemented in an inexplicit way, for which relation keys are used. For example, relations of a hierarchical type are implemented by the mechanism of primary and foreign keys, when the fact of attributes must be present in the subordinate relation.

Such an attribute of relationships in the main relationship will be called a primary key, and in a subordinate relationship, a secondary one.

Progress in the development of programming languages ​​associated primarily with data typing and the emergence of object-oriented languages ​​has made it possible to approach the analysis of complex systems from the point of view of hierarchical representations, that is, using classes of objects with the properties of polymorphism, inheritance, and encapsulation.

RELATIONSHIP IS A TABLE.

Editing tables, records...

Deleting what you created and

Editing.


Relational database model

Relational data models have currently gained the greatest popularity precisely for this representation of data.

The relational model can be thought of as a special method of representing data that contains its own data (in the form of tables) and ways of working and manipulating them (in the form of relationships). The relational model assumes three conceptual elements: Structure, Integrity and Data Processing. These elements have their own mandatory concepts that need to be explained for further presentation.

The table is considered as a direct data store. Traditionally in relational systems the table is called attitude. A table row is called motorcade, and the column attribute. In this case, the attributes have unique names (within the relation).

The number of tuples in a table is called cardinal number. Number of attributes degree. A unique identifier is established for a relationship, that is, one or more attributes whose values ​​​​are not the same at the same time - the identifier is called primary key.Domain this is the set of valid homogeneous values ​​for a particular attribute. Thus, a domain can be considered as a named set of data, and the components of this set are logically indivisible units (for example, a list of names of employees of an institution can act as a domain, but not all names can be present in the table).

SUMM Kireeva 25.50 Motyleva 17.05 … …. …

Attitude

attributes

The fields KOD, NAME, SUMM are table attributes contained in the header.

Pairs KOD 5216, NAME Kireeva, SUMM 25.50 are elements of the body of the relationship.

In relational databases, unlike other models, the user specifies what data is needed for him and not how to do it. For this reason, the process of moving and navigating a database in relational systems is automatic, and this task is performed in a DBMS optimizer. His job is to make the most effective way retrieve data from the database upon request. Thus, the optimizer must at least be able to determine from which tables the data is selected, how much information is in these tables, and what is the physical order of the records in the tables and how they are grouped.

In addition, a relational database also performs directory functions. The directory stores a description of all the objects that make up the database: tables, indexes, triggers, etc. It is obvious that it is vital for proper operation the entire system, such a component as the optimizer. The optimizer uses the information stored in the directory. An interesting fact is that the catalog itself is a set of tables, so the DBMS can manipulate it in traditional ways, without resorting to any special techniques or methods.

Domains and Relationships

Basic definitions: Domains, types of relations, predicates.

Relationships have a number of basic properties:

1. In the most general case, there are no common tuples in relations - this follows from the very definition of relations. However, for some DBMSs, deviations from this property are allowed in some cases. As long as there is a primary key in the relationship, identical tuples are excluded.

2. Tuples are not ordered from top to bottom - there is simply no concept of a positional number in a relation. In relationships, without losing information, you can successfully arrange tuples in any order.

3. Attributes are not ordered from left to right. The attributes in the relationship header can be arranged in any order without compromising the integrity of the data. Therefore, the concept of a positional number in relation to an attribute also does not exist.

4. Attribute values ​​consist of logically indivisible units - this follows from the fact that the values ​​are taken from domains; otherwise, we can say that relations do not contain repetition groups. That is, they are normalized.

Relational systems support several types of relationships:

1. Named ones are relation variables defined in the DBMS by creation operators and, as a rule, necessary for a more convenient presentation of information for the user.

2. Basic relationships are directly important part DB, so when designing they are given their own name.

3. A derived relation is one that was defined through other, usually basic, relations by using DBMS tools.

4. This representation is actually a named derived relation, and the representation is expressed exclusively through DBMS operators applied to named relations, so they do not physically exist in the database.

5. The result of queries is an unnamed derived relation containing data (the result of a specific query). The result is not stored in the database but exists as long as the user needs it.

6. A stored relation is one that is physically maintained in the memory of relations; stored relations most often include the base of relations. Based on the above, we can define a relational database as a set of interconnected relationships.


Contact in in this case is the association of two or more relations.

KOD ADRES
1 1 A one-to-many relationship is that at any given time each element (tuple A) corresponds to several elements of tuples B
∞ Binary connection
Students
Teachers
Timetable of classes

Students

Ternary connections


Data integrity

In relational models, the issue of data integrity is given a special place. Recall that a key or potential key is a minimal set of attributes whose values ​​can be used to uniquely find the required tuple; minimality means that excluding any attribute from the set does not allow identifying the tuple by the remaining attributes.

Every relationship has at least one possible key. One of them is taken as the primary key.

When choosing a primary key, preference should be given to non-composite keys or keys made up of a minimal set of attributes. It is also undesirable to use keys with long text values ​​(It is preferable to use integer attributes as keys). So, to identify an employee, you can use either a unique personnel number, or passport number, or a set of last names, middle names and department numbers. It is not allowed for the primary key of a relationship, that is, any attribute participating in the primary key, to take undefined values. In this case, a contradictory situation will arise ( collision): A non-unique primary key element appears. Therefore, this should be carefully monitored when designing a database.

About foreign keys. It is worth noting that since relation C links relations B and A, it must include foreign keys corresponding to the primary keys of relations A and B.

A table's foreign key is formed using several primary keys of other tables.

Thus, when considering the problem of choosing a method for connecting a relationship in a database, the question arises of what the foreign keys should be. At the same time, for each foreign key it is necessary to solve the problem associated with the possibility (or impossibility) of undefined values ​​(NULL - values ​​- value attribute for missing information). In other words, can there be some tuple in a relation for which the tuple in its associated relations is not known?

On the other hand, it is necessary to think in advance about what will happen when removing tuples from a relation referenced by a foreign key. The following possible possibilities exist:

· Operation cascades– that is, deleting tuples in relations leads to deleting tuples associated with the relation. For example, deleting information about last name, first name, etc. employee in one respect leads to the deletion of his salary in another respect;

· Operation limited - that is, only those tuples for which there is no other associated information are removed. Not all information is deleted (not in all respects) since it can be used in another respect, the removal of information in which leads to a violation of data integrity. If such information is available, deletion cannot be carried out, for example, deleting information about the first name, last name, etc. employee is possible only if there is no related information about his salary.

It is necessary to provide technology for what will happen when you try to update the primary key of a relationship that is referenced by a foreign key. Here you have the same options as when deleting:

· The operation is cascaded, that is, when the primary key is updated, the foreign key in the related relation is updated. For example, updating the primary key in a relation where employee information is stored leads to an update of the foreign key in a relation containing salary information.

· The operation is limited to updating only those primary keys for which there is otherwise no associated information. If such information is available, the update cannot be made. For example, updating the primary key in a relation where information about an employee is stored is only possible if information about his salary is missing in the related relation.1


Relational algebra

The formal basis of the relational database model is relational algebra, based on set theory and considering a special operator over relations, and relational calculus based on mathematical logic.

Work

A A A B B C Y Y D
G D
A
A B C Y Y D F F W

It should be noted that relational algebra is very powerful - complex database queries can be expressed using a single expression. It is for this reason that these mechanisms are included in the relational data model. Any query expressed using one relational algebra expression, or one relational calculus formula, can be expressed using one operator in this language.

Relational algebra has an important property - it is closed with respect to the concept of relation. This means that a relational algebra expression is performed on the relations of relational databases and the results of their calculation also represent relations.

The main idea of ​​relational algebra is that the means of manipulating relationships considered as a set are based on traditional multiple operations supplemented by some database-specific operations.

Let us describe the version of algebra that was proposed by CODD. The operation consists of 8 main operators:

Relation fetch (unary operation)

Relation projection (unary operation)

· Merging relationships

· Intersection of relations (binary operation)

· Subtraction of ratios

Product of relations

· Connecting relationships

· Division of relationships

These operations can be explained as follows:

· The result of selecting a relation based on some condition is a relation that includes only those tuples of the original relation that satisfy this condition.

· When projecting a relation onto a given set of its attributes, a relation will be obtained whose tuples are taken from the corresponding tuples of the first relation.

· When performing the operation of merging two relations, a relation will be obtained that includes all tuples included in at least one of the relations participating in the operation.

· When performing the operation of intersection of two relations, a relation will be obtained that includes all tuples included in both initial relations.

· When performing the operation of subtracting two relations, a relation will be obtained that includes all tuples included in the first relation, except those that are also included in the second relation.

· When performing the direct product of two relations, a relation is obtained whose tuples are a combination of the tuples of the first and second relation.

· When two relations are connected according to some condition, a resulting relation of tuples is formed whose tuples are a combination of tuples of the first and second relations that satisfy this condition.

· The relational division operation has two operands – a binary relation (consisting of two attributes) and a unary relation (consisting of one attribute). The result of the operation is a relation consisting of tuples including the relation of the first attribute of tuples of the first relation, and such that the set of values ​​of the second attribute coincides with the set of values ​​of the second relation.

In addition to the above, there are a number of special operations specific to working with databases:

· As a result of the renaming operation, a relation is a set of tuples that coincides with the body of the original relation, but the attribute names have been changed.

It follows that the result of a relational operation is a certain relation; it is possible to form relational expressions in which, instead of the original relation (operand), an embedded relational expression will be used. This is due to the fact that the operations of relational algebra are truly closed to the concept of a relation. Let's start with the operation unification of relations, however, this equally applies to the operations of intersection and combination, that is, in relational algebra, the result of the union operation is a relation. If allowed into relational algebra opportunity associations arbitrary two relations with different sets of attributes, then the result of such an operation will be a set, but a set of tuples of different types, that is, generally speaking, not a relation. If we proceed from the requirement that relational algebra is closed with respect to the concept of relation, then such an operation associations is meaningless. This leads to the emergence of the concept relationship compatibility By unification: Two relations are compatible only if they have the same headers, that is, they have the same set of attribute names, and the attributes of the same name are defined in the same domain.

Provided that two relations are compatible in their union, when the operation of union, intersection, and subtraction is normally performed on them, the result of the operation is a relation with a correctly defined header that matches the header of each of the operand relations. If two relations are not fully join compatible, that is, compatible in everything except attribute names, then before performing a join type operation, these relations can be made fully join compatible by applying a rename operation.

The operation of direct product of two relations raises new problems. In Set Theory, the direct product can be obtained for any sets. The elements of the resulting set will be pairs made up of elements of the first and second sets. Since relations are sets, for any two relations it is possible to obtain a direct product. However, the result will not be a relation. The elements of the result will not be tuples, but pairs of tuples. Therefore, in relational algebra, a special form of the operation of taking a direct product is used - the extended direct product of relations. When taking the extended direct product of two relations, the element of the resulting relation is a tuple formed by merging one tuple of the first relation and one tuple of the second relation. A second problem immediately arises related to obtaining a correctly formed header of the resulting relationship; this leads to the need to introduce the concept of relationship compatibility by taking an extended direct product.

Two relations are compatible by taking a direct product only if the set of attribute names of these relations do not intersect. Any two relations can be converted to a compatible direct product form by applying a rename operation to one of the relations.

The fetch operation requires two relations: an initial relation, the operand, and a simple constraint condition. As a result of the selection operation, a relation is produced whose head coincides with the header of the operand relation, and the body includes those tuples of the operand relation that satisfy the values ​​of the constraint condition.

Let's introduce a number of operators.

Let union mean the union operation, intersect – the intersection operation, minus – the subtraction operation. To denote the sampling operation, we will use the construction A where B, where A is the operand relation, and B is a simple comparison condition. Let C1 and C2 be two simple sampling conditions

A where C1 AND C2 is identical (A where C1) intersect (A where C2)

A where C1 OR C2 is identical to (A where C1) union (A where C2)

A where C1 not C2 is identical to (A where C1) minus (A where C2)

Using these definitions, you can implement sampling operations in which the sampling condition is arbitrary logical expression made up of simple conditions using logical connections (and, or, not). The operation of taking projections of the relation A onto the list of attributes a1, a2,…,an will be a relation whose head is the set of attributes, a1,a2,…,an. The body of the result will consist of tuples for which in relation A there is a tuple, attribute a1 has the value b1, attribute a2 has the value b2< и так далее атрибут an – bn. По сути при выполнении операции проекции определяется «Вертикальная» вырезка отношения - операнда с удалением возникающих кортежей –дубликатов.

The join operation, sometimes called a conditional join, requires two operands, the relations being joined, and a third operand, the simple condition. Let the relation A and B be connected. As in the case of the selection operation, the join condition C has the form, (a comp –op b) or (a comp –op const) where A and B are the names of the attributes of the relations A and B, const is literally specified constant. Comp-op is a valid comparison operation in this context. Then, by definition, the result of the connection operation is the relation obtained by performing the restriction operation, according to condition C, the direct product of the relation A and B.

There is an important special case connections, natural connection. A join operation is called a natural join operation if the join conditions are of the form (a=b) where a and b are attributes of different join operands. This case is important because it is particularly common in practice and there are effective implementation algorithms for it in a DBMS. The natural join operation is applied to a pair of relations A and B that have a common attribute P, that is, an attribute with the same name and defined on the same domain. Let ab denote the union of the headers of relations A and B. Then a natural join is the result of the join of A and B projected onto ab. The operations of natural join are not directly included in the set of operations of relational algebra, but they have very important practical significance.

The operation of dividing relations needs more detailed explanation because it is difficult to understand. Let two relations A be given (a1,a2,..,an,b1,b2,…,bm)

B (b1,b2,…,bn) We assume that attribute b1 of relation A and attribute b1 of relation B are defined on the same domain. Let's call the set of attributes (aj) a composite attribute a, and the set (bj) c a composite attribute b. After this, we will talk about the relational division of the binary relation A (a,b) into the unary relation B (b).

The result of dividing A by B is a unary relation C (a), consisting of tuples v such that in relation A there are tuples which in the set of values ​​(w) include the set of values ​​of b in relation to B.

Since division is the most difficult operation, let us explain it with an example. Let there be two relations in the student database: STUDENTS (FULL NAME, NUMBER) and NAMES (FULL NAME), and the unary relation NAMES contains all the names that students of the institute have. Then, after performing the operation of relational division of the STUDENTS relation into the NAMES relation, a unary relation will be obtained containing the numbers of student cards belonging to students with all possible surnames at this institute.


Relational notation

Let's say there is a database with the structure STUDENTS (number, name, scholarship, group code), and the relation GROUPS (gr_nom, gr_col, gr old) Let's assume that you need to find out the names and numbers of students. tickets for students who are prefects of groups with more than 25 people. In relational algebra, you need to take the following actions for a request like this:

1. Connect the relations STUDENTS and GROUPS, according to the condition “student_number = gr_star”;

2. Limit the resulting ratio by the condition gr_col>25.

3. Project the result of the previous operation onto the attribute student_name, student_number.

Here is a step-by-step formulation of the sequence of query execution in the database, each of which corresponds to one relational operation. if we formulate the same query using relational calculus, then we would get a formula that can be read: Issue STUDY_NAME and STUDY_NUMBER for such students so that such a group GR_STAR and the value GR_NUM>25 coexist. In the second formulation, we indicated only the characteristics of the resulting relationship but said nothing about the method of its formation. In this case, the DBMS itself must decide what kind of operations and in what order should be performed on the STUDENTS and GROUPS relationships. Both methods discussed in the example are actually equivalent and there are not very complex conversions from one to the other.

The basic concepts of relational calculus are the concepts of a variable with a certain area of ​​its value, and the concepts of a correctly constructed formula based on variables and special functions. Functions. What is the domain of definition of a variable differs between tuple calculus and domain calculus, that is, along or across. In tuple calculus, the domains of variable definition are the database relation, i.e. valid value Each variable is a tuple of some relation. In domain calculus, the domains of variable definition are the domains on which the attributes of database relationships are defined, that is, the valid value of each variable is the value of each variable.

Byte Integer String Char
M
N
K

The RANGE command is used to define tuples. For example, to define the STUDENT variable whose scope is STUDENTS, you need to use the RANGE STUDENT IS STUDENTS construction. From this definition it follows that at any moment in time the student variable represents a certain tuple of the STUDENTS relation. When you use tuple variables in formulas, you can reference variable attribute values. For example, in order to refer to the value of the STUDENT_NAME attribute of the STUDENT variable, you need to use the STUDENT.STUDENT_NAME construction.

Correctly constructed formulas are used to express conditions imposed on tuple variables. Such formulas are based on simple comparisons, which are operations comparing the values ​​of attributes of variables and literal constants. For example, the construction STUDENT.STUD_NOM=123456. Is a simple comparison. More difficult option compound formulas are formed using logical connections AND, OR, NOT, IF…THEN. Finally, it is possible to construct well-formed formulas using quantifiers. If F is a well-formed formula involving the variable var, then the construction EXIST (existence quantifier) ​​var (F) and FORALL (for all tuples) var (F) are correct.

Variables included in properly constructed formulas can be free or bound. All variables included in their composition in the construction of which no quantifiers were used are free. This means that if for some set of values ​​of free tuple variables the value “true” is obtained when calculating formulas, then these values ​​can be included in the resulting relation. If a quantifier is used when constructing formulas, then the variables are related. When calculating the value of such a correctly constructed formula, not a single value of the associated variable is used, but its entire domain of definition.

1)EXISTS STUD2 (STUD.1STUD_STIP> STUD2.STUD_STIP)

2)FORALL STUD2 (STUD.1STUD_STIP> STUD2.STUD_STIP)

Let STUD1 and STUD2 be two tuple variables defined on the relation students, then the formula for the current tuple of the variable STUD1 takes on the value true only if in the entire relation students there is such a tuple associated with the variable STUD2 such that the value of its attribute STUD_STIP satisfies the internal comparison condition. Correctly constructed formula No. 2 for the constructed tuple STUDENT 1 takes the value true if for all tuples the relation STUDENTS associated with the variable STUDENT 2, the value of the STUDENT.STIP attribute satisfies the internal condition.

Thus, well-formed formulas provide a means of expressing the conditions for sampling from a database relationship. To be able to use relational calculus to actually work with a database, another component is required that determines the set and names of the columns of the resulting relation. This component is called target list.

Target list has the form:

· Var.attr is the name of a free variable, attr is the name of the relation attribute on which the var variable is defined.

· Var which is equivalent to the relation from the list, Var.attr1, Var.attr1... Var.attr№ includes the names of all attributes of the defining relation.

· New_name = var.attr; the new name of the corresponding attribute of the resulting relation.

The last option is required in cases where the code in the formula uses several free variables with the same scope. In domain calculus, the domain of definition of domains is not relations but domains. In relation to the STUDENTS GROUP database, we can talk about domain variables NAME(Domain values ​​are valid names or NOM STUD). (Domain values ​​are valid student numbers).

The main difference between domain calculus and tuple calculus is the presence of an additional set of predicates that make it possible to express so-called membership conditions. If R is an n-ary relation with attributes (a1, a2, … an) then the membership condition has the form R(ai1:Vi1,ai2:Vi2,…aim:Vim) where (m<=n). Где в Vij это либо литерально заданная константа либо имя кортежной переменной. Условие членства принимает значение истина, только в том случае если в отношении R существует кортеж, содержащий следующие значения указанных атрибутов. Если от Vij константа то на атрибут aij накладывается жёсткое условие независящее от текущих доменных переменных. Если же Vij имя доменной переменной то условие членства может принимать различные значения при разных значениях этой переменной.

A predicate is a logical function that returns true or false for some argument. A relation can be considered as a predicate with arguments that are attributes of the relation in question. If a given specific set of tuples is present in the relation, then the predicate will produce a true result, otherwise it will produce a false result.

In all other respects, the formulas and expressions of domain calculus look similar to the formulas and expressions of tuple calculus. Relational domain reckoning is the basis for most form-based language queries.


Related information.


Relational database is a database based on a relational data model (RDM).

RMD is based on the concept of relationship, or relation (relation - relationship, English, hence the term relational databases). To work with relational databases, relational DBMSs are used. The use of relational databases was proposed by Dr. Codd of IBM in 1970. These models are characterized by simplicity of data structure, user-friendly tabular representation and the ability to use the formal apparatus of relational algebra and relational calculus for data processing.

In the RMBD the main structural unit is table (relation). The relational model is focused on organizing data in the form of two-dimensional tables. Each relational table is a two-dimensional array and has the following properties:

Each table element is one data element;

All columns in the table are homogeneous, i.e. all elements in a column have the same type (numeric, character, etc.) and length;

Each column has a unique name;

There are no identical rows in the table;

The order of rows and columns can be arbitrary.

The relationships are presented in the form of tables, the rows of which correspond to records, and the columns are attributes relationships, domains, fields. Each line stores data about one object, and each field characterizes one of the object’s parameters. Each table must have a unique database name.

A field whose each value uniquely identifies the corresponding record is called a simple key (key field). If records are uniquely identified by the values ​​of several fields, then such a database table has a composite key. If there is no such field, then it must be introduced artificially. To link two relational tables, you must include the key of the first table as part of the key of the second table (the keys may coincide); otherwise, you need to enter a foreign key into the structure of the first table - the key of the second table.

32 Basic database models (DBs)

DB– a structured set of information related to one subject area or several related areas. All existing databases can be built on various principles, which are characterized by the concept of a database model.

Database model determines the method of communication between objects in the database, the method of storing information on a medium (in computer memory), and the method of retrieving and presenting data. DB models: 1) hierarchical, 2) network, 3) relational.

1) Hierarchical ( first floor. 60s) was intended for storing databases on paper and magnetic tapes. Communication structure between the data is based on Graph theory and is presented in the form of a tree (inverted). Diff. objects are created tree nodes, i.e. are on different hierarchy levels. Connections described in the categories of father-son or ancestor-descendant. Each node of the i-th level of the hierarchy relates to a node of the i-1 level (i>1), as a son relates to a father, or a father to a son, namely, a son can have one father, and a father can have one or more sons, i.e. . an object of a given i-th level relates to objects of the i+1 level as 1 relates to many (1:N, 1:∞). Flaws: 1) the user must know the structure of the tree, otherwise searching for data is difficult; 2) search required. The data always starts from the root, and then navigation is carried out along the branches of the tree.



2) Network(second half of the 60s) to reduce the impact of the shortcomings of the previous model. Basic difference from hierarchical: there can be a connection between objects located both at the same hierarchy level and at different ones. This led to an increase in the speed of data retrieval. However, the essence flaw: The user must know the structure of such a tree.

Basic lack of two models: very weak mathematical basis.

3) Relational, which is based on the developed apparatus of two branches of mathematics: the theory of relations (sets) and the theory of predicates. Set theory is associated with the formalization of procedures for analyzing logical conditions. There is a two-dimensional set in it, which is called relation (relationship). In this model basic structural unit is a table (relation). Each table must have a name unique for this database in Russian or using Lat. letters

A relational computer database, like any other database, is an IS, schematically represented:

DBMS(DB management system) – specialized software tool(shell) or platform with the help of which the user implements all the provided functions (operations) on the data. Functions: input (insertion), modification (change), extraction (selection), deletion of data.

The ISDB has an important component - the database administrator, who is responsible for the safety and value of data, establishing various. user access rights, etc.

Each table consists of fields and rows. Each line stores data about one object, and each field character is one of the parameters (attributes) of this object. In a separate field m.b. data is only one type. One of the attributes or fields must identify each object in the table. This means that this field should not contain duplicate values ​​(each value is unique). If this condition is met, the field is called key(table data key). Every table must have a key field. This key is called the main key. If a key consists of the values ​​of more than one field, then it is called a composite key. Preference is given simple key. If it is not there, then it is introduced artificially (for example, a number).

Relational Database - Basic Concepts

Often, when talking about a database, they simply mean some automated data storage. This idea is not entirely correct. Why this is so will be shown below.

Indeed, in the narrow sense of the word, a database is a certain set of data necessary for work (up-to-date data). However, data is an abstraction; no one has ever seen “just data”; they do not arise or exist on their own. Data is a reflection of objects in the real world. Let, for example, you want to store information about parts received at the warehouse. How will a real world object - a part - be displayed in the database? In order to answer this question, you need to know which features or aspects of the part will be relevant and necessary for the job. These may include the name of the part, its weight, dimensions, color, date of manufacture, material from which it is made, etc. In traditional terminology, real-world objects, information about which is stored in a database, are called entities (don’t let this word scare the reader - this is a generally accepted term), and their actual characteristics are called attributes.

Each attribute of a specific object is an attribute value. Thus, the engine part has a weight attribute value of 50, which reflects the fact that this engine weighs 50 kilograms.

It would be a mistake to think that only physical objects are reflected in the database. It is capable of absorbing information about abstractions, processes, phenomena - that is, about everything that a person encounters in his activities. For example, in a database you can store information about orders for the supply of parts to a warehouse (although it is not a physical object, but a process). The attributes of the "order" entity will be the name of the part being supplied, the number of parts, the name of the supplier, delivery time, etc.

Objects in the real world are connected to each other by many complex dependencies that must be taken into account in information activities. For example, parts are supplied to the warehouse by their manufacturers. Therefore, it is necessary to include the “manufacturer’s name” attribute among the part attributes. However, this is not enough, since additional information about the manufacturer of a particular part may be needed - his address, telephone number, etc. This means that the database must contain not only information about parts and purchase orders, but also information about their manufacturers. Moreover, the database must reflect the relationships between parts and manufacturers (each part is produced by a specific manufacturer) and between orders and parts (each order is issued for a specific part). Note that only relevant, significant connections need to be stored in the database.

Thus, in the broad sense of the word, a database is a set of descriptions of real-world objects and connections between them that are relevant for a specific application area. In what follows, we will proceed from this definition, clarifying it as we go along.

Relational data model

So now we have an idea of ​​what is stored in the database. Now we need to understand how entities, attributes, and relationships map to data structures. This is determined by the data model.

Traditionally, all DBMSs are classified depending on the data model that underlies them. It is customary to distinguish between hierarchical, network and relational data models. Sometimes they are supplemented with a data model based on inverted lists. Accordingly, they talk about hierarchical, network, relational DBMS or DBMS based on inverted lists.

In terms of prevalence and popularity, relational DBMSs today are unrivaled. They have become a de facto industrial standard, and therefore the domestic user will have to deal with a relational DBMS in their practice. Let's briefly look at the relational data model without delving into its details.

It was developed by Codd back in 1969-70 on the basis of the mathematical theory of relations and is based on a system of concepts, the most important of which are table, relation, row, column, primary key, foreign key.

A relational database is one in which all data is presented to the user in the form of rectangular tables of data values, and all operations on the database are reduced to manipulations with tables. A table consists of rows and columns and has a name that is unique within the database. The table reflects the type of real world object (entity), and each of its rows represents a specific object. Thus, the Part table contains information about all parts stored in the warehouse, and its rows are sets of attribute values ​​for specific parts. Each table column is a collection of values ​​for a specific attribute of an object. So, the Material column represents a set of values ​​​​"Steel", "Tin", "Zinc", "Nickel", etc. The Quantity column contains non-negative integers. The values ​​in the Weight column are real numbers equal to the weight of the part in kilograms.

These values ​​don't appear out of thin air. They are selected from the set of all possible values ​​for an object attribute, which is called the domain. Thus, the values ​​in the material column are selected from a set of names of all possible materials - plastics, wood, metals, etc. Therefore, it is fundamentally impossible for a value to appear in the Material column that does not exist in the corresponding domain, for example, “water” or “sand”.

Each column has a name, which is usually written at the top of the table ( Rice. 1). It must be unique within the table, but different tables can have columns with the same name. Any table must have at least one column; The columns are arranged in the table according to the order in which their names appeared when it was created. Unlike columns, rows do not have names; their order in the table is not defined, and their number is logically unlimited.

Figure 1. Basic database concepts.

Since the rows in the table are not ordered, it is impossible to select a row by its position - there is no "first", "second", or "last" among them. Any table has one or more columns, the values ​​of which uniquely identify each of its rows. This column (or combination of columns) is called a primary key. In the Part table, the primary key is the Part Number column. In our example, each part in the warehouse has a single number, by which the necessary information is retrieved from the Part table. Therefore, in this table, the primary key is the Part Number column. There cannot be duplicate values ​​in this column - there should be no rows in the Part table that have the same value in the Part Number column. If a table satisfies this requirement, it is called a relation.

The relationship of tables is the most important element of the relational data model. It is supported by foreign keys. Let's consider an example in which a database stores information about ordinary employees (Employee table) and managers (Manager table) in some organization ( Rice. 2). The primary key of the table Head is the Number column (for example, personnel number). The Last Name column cannot serve as a primary key, since two managers with the same last names can work in the same organization. Any employee is subordinate to a single manager, which must be reflected in the database. The Employee table contains a column Manager Number, and the values ​​in this column are selected from the Number column of the Manager table (see. Rice. 2). The Manager Number column is a foreign key in the Employee table.

Figure 2. Relationship between database tables.

Tables cannot be stored and processed if there is no "data about data" in the database, such as handles for tables, columns, etc. They are usually called metadata. Metadata is also presented in tabular form and stored in a data dictionary.

In addition to tables, other objects can be stored in the database, such as screen forms, reports, views, and even applications that work with the database.

For users of an information system, it is not enough for the database to simply reflect real-world objects. It is important that such a reflection is unambiguous and consistent. In this case, the database is said to satisfy the integrity condition.

In order to guarantee the correctness and mutual consistency of data, certain restrictions are imposed on the database, which are called data integrity constraints.

There are several types of integrity constraints. It is required, for example, that the values ​​in a table column be selected only from the corresponding domain. In practice, more complex integrity constraints are also taken into account, for example, referential integrity. Its essence is that a foreign key cannot be a pointer to a non-existent row in the table. Integrity constraints are implemented using special means, which will be discussed in Sec.Database server .

SQL language

The data itself in computer form is of no interest to the user if there are no means of accessing it. Data is accessed in the form of database queries that are formulated in a standard query language. Today, for most DBMSs, this language is SQL.

The emergence and development of this language as a means of describing database access is associated with the creation of the theory of relational databases. Prototype SQL language originated in 1970 as part of the System/R research project, work on which was carried out at IBM's Santa Teresa Laboratory. Nowadays SQL is the standard interface with relational DBMS. Its popularity is so great that developers of non-relational DBMSs (for example, Adabas) supply their systems with a SQL interface.

The SQL language has an official standard - ANSI/ISO. Most DBMS developers adhere to this standard, but often extend it to implement new data processing capabilities. New data management mechanisms that will be described in Sec.Database server , can only be used through special SQL statements, which are generally not included in the language standard.

SQL is not a traditional programming language. Not programs are written in it, but queries to the database. That's why SQL is a declarative language. This means that it can be used to formulate what needs to be obtained, but it cannot indicate how it should be done. In particular, unlike procedural programming languages ​​(C, Pascal, Ada), the SQL language does not have operators such as if-then-else, for, while, etc.

We will not go into detail about the syntax of the language. Let us touch upon it only to the extent necessary for understanding. simple examples. With their help, the most interesting data processing mechanisms will be illustrated.

A SQL query consists of one or more statements, one after the other, separated by a semicolon. Table 1 below lists the most important operators that are included in the ANSI/ISO SQL standard.

Table 1. Basic SQL operators.

SQL queries use names that uniquely identify database objects. In particular, this is the table name (Detail), column name (Title), as well as the names of other objects in the database that belong to additional types (for example, names of procedures and rules), which will be discussed in Sec.Database server . Along with simple ones, complex names are also used - for example, a qualified column name determines the name of the column and the name of the table to which it belongs (Part.Weight). For simplicity, in the examples, names will be written in Russian, although in practice this is not recommended.

Each column in any table stores specific types of data. There are basic data types - fixed-length character strings, integers and real numbers, and additional data types - variable length character strings, currency units, date and time, Boolean data (two values ​​- "TRUE" and "FALSE"). In SQL, you can use numeric, string, character, and date and time constants.

Let's look at a few examples.

The query “determine the number of parts in stock for all types of parts” is implemented as follows:

SELECT Name, Quantity

FROM Part;

The result of the query will be a table with two columns - Name and Quantity, which are taken from the original Part table. Essentially, this query allows you to get a vertical projection of the original table (more strictly, a vertical subset of the set of table rows). From all rows of the Part table, rows are formed that include values ​​​​taken from two columns - Name and Quantity.

The query “What steel parts are in stock?” formulated in SQL looks like this:

FROM Part

WHERE Material = "Steel";

The result of this query will also be a table containing only those rows of the source table that have the value "Steel" in the Material column. This query allows you to get a horizontal projection of the Part table (asterisk in SELECT statement means selecting all columns from the table).

The request “to determine the name and quantity of parts in stock that are made of plastic and weigh less than five kilograms” will be written as follows:

SELECT Name, Quantity

FROM Part

WHERE Material = "Plastic"

AND Weight< 5;

The result of the query is a table with two columns - Name, Quantity, which contains the name and number of parts made of plastic and weighing less than 5 kg. In essence, the sampling operation is the operation of first forming a horizontal projection (find all rows of the Part table for which Material = "Plastic" and Weight< 5), а затем вертикальной проекции (извлечь Название и Количество из выбранных ранее строк).

One of the means providing fast access to tables are indexes. An index is a database structure that is a pointer to a specific row in a table. A database index is used in the same way as an index in a book. It contains values ​​taken from one or more columns of a particular table row and a reference to that row. The values ​​in the index are ordered, which allows the DBMS to perform quick search in the table.

Let's assume that a query is formulated to the Warehouse database:

SELECT Name Quantity, Material

FROM Part

WHERE Number = "T145-A8";

If there are no indexes for a given table, then to execute this query the DBMS must scan the entire Part table, sequentially selecting rows from it and checking the selection condition for each of them. For large tables, such a query will take a very long time to complete.

If an index was previously created on the Table Number Detail column, then the search time in the table will be reduced to a minimum. The index will contain the values ​​from the Number column and a link to the row with this value in the Part table. When executing a query, the DBMS will first find the value “T145-A8” in the index (and do this quickly, since the index is ordered and its rows are small), and then, using the link in the index, determine the physical location of the searched row.

An index is created with the SQL CREATE INDEX statement. In this example, the operator

CREATE UNIQUE INDEX Part index

ON Part(Number);

will create an index with the name "Part Index" on the column Number of the table Part.

For a DBMS user, it is not the individual SQL statements that are of interest, but a certain sequence of them, designed as a single whole and making sense from his point of view. Each such sequence of SQL statements implements a specific action on the database. It is carried out in several steps, at each of which certain operations are performed on the database tables. Thus, in the banking system, the transfer of a certain amount from a short-term account to a long-term account is carried out in several operations. Among them are withdrawing an amount from a short-term account and crediting it to a long-term account.

If there is a failure during this action, for example, when the first operation is completed but the second is not, then the money will be lost. Therefore, any action on the database must be performed entirely, or not performed at all. This action is called a transaction.

Transaction processing relies on a log, which is used to roll back transactions and restore the state of the database. More details about transactions will be discussed in Sec.Transaction Processing .

Concluding our discussion of the SQL language, let us once again emphasize that it is a query language. It is impossible to write any complex application program that works with a database. For this purpose, modern DBMSs use the fourth generation language (Forth Generation Language - 4GL), which has both the basic capabilities of third generation procedural languages ​​(3GL), such as C, Pascal, Ada, and the ability to embed SQL statements into the program text, as well as user interface controls (menus, forms, user input, etc.). Today, 4GL is one of the de facto standards for database application development tools.







2024 gtavrl.ru.