Relational algebra, relational algebra operations. Operations on data in a network database model


The relationship between an owner record and a member record is also 1:N.

The main difference between these models is that in network model a record can be a member of more than one group relationship. According to this model, each group relation is named and a distinction is made between its type and its instance. The group relationship type is specified by its name and defines properties common to all instances of this type. A group relationship instance is represented by an owner record and a set of (possibly empty) subordinate records. There is the following limitation: record instance cannot be a member of two instances of group relationships of the same type (i.e., the employee from the example in paragraph 1, for example, cannot work in two departments).

  • trees (a) and (b) shown in Fig. 4.2 are replaced by one network structure in which the EMPLOYEE record is included in two group relationships;
  • to display the M:N type, enter the EMPLOYEE_CONTRACT record, which has no fields and serves only to link the CONTRACT and EMPLOYEE records (see Fig. 4.3). Note that this record may also contain helpful information, for example, share this employee in the total remuneration under this contract.


Rice. 4.3.

Each instance of a group relationship is characterized by the following characteristics:

How to arrange subordinate records:

  • arbitrary,
  • chronological /queue/,
  • reverse chronological /stack/,
  • assorted.

If a record is declared subordinate to several group relationships, then each of them can have its own ordering method assigned.

Mode for enabling subordinate records:

  • automatic - it is impossible to enter a record into the database without it being immediately assigned to a certain owner;
  • manual - allows you to remember a subordinate record in the database and not immediately include it in an instance of a group relationship. This operation is later initiated by the user.

Exception mode.

It is customary to distinguish three classes of membership of subordinate records in group relations:

  • Fixed. A subordinate record is tightly linked to an owner record and can only be removed from the group relationship by deleting it. When you delete an owner record, all subordinate records are automatically deleted. In the example above, fixed membership assumes a group relationship of CONCLUSION between the CONTRACT and CUSTOMER records, since a contract cannot exist without a customer.
  • Mandatory. It is possible to switch a subordinate record to another owner, but it cannot exist without an owner. To delete an owner record, it must have no subordinate records with required membership. The records “EMPLOYEE” and “DEPARTMENT” are connected by this relationship. If a department is disbanded, all of its employees must either be transferred to other departments or fired.
  • Optional. You can exclude a record from a group relationship, but keep it in the database without assigning it to a different owner. When an owner record is deleted, its subordinate records - optional members - are retained in the database, no longer participating in a group relationship of this type. An example of such a group relationship is “PERFORM” between “EMPLOYEES” and “CONTRACT”, since there may be employees in the organization whose activities are not related to the fulfillment of any contractual obligations to customers.

Operations on data in a network database model

Add - make a record in the database and, depending on the inclusion mode, either include it in a group relationship, where it is declared subordinate, or not include it in any group relationship.
Include in group relation - link an existing subordinate record to the owner record.
Switch - link an existing subordinate record to another owner record in the same group relationship.
Update - change the value of the elements of a previously extracted record.
Extract - extract records sequentially by key value, as well as using group relationships - from the owner you can go to member records, and from a subordinate record to the owner of the set.
Delete - remove a record from the database. If this record is the owner of a group relationship, then the membership class of the subordinate records is analyzed. Mandatory members must first be excluded from the group relationship, fixed members must be deleted along with the owner, optional members will remain in the database.
Exclude from group relationship - break the connection between the owner record and the member record.

Integrity Constraints

As in hierarchical model Only the maintenance of referential integrity is ensured (the owner of the relationship is a member of the relationship).

Advantages and disadvantages of early DBMSs

Advantages of early DBMS:

  • advanced means of managing data in external memory at a low level;
  • the ability to manually build effective application systems;
  • the ability to save memory by separating subobjects (in network systems)

Disadvantages of early DBMS:

  • difficulty of use;
  • high level of knowledge requirements about the physical organization of the database;
  • dependence of application systems on the physical organization of the database;
  • overloading the logic of application systems with details of organizing access to the database.

Both hierarchical and network data model requires the presence of highly qualified programmers. And even in such cases, the implementation of user requests is often delayed for long term.

Object-oriented DBMS

The emergence of object-oriented DBMSs was caused by the needs of programmers in OO languages, who needed tools for storing objects that did not fit in random access memory computer. Also important was the task of saving the state of objects between repeated launches of the application program. Therefore, most OODBMSs are a library whose data management procedures are included in the application program. Examples of OODBMS implementations like dedicated server databases are extremely rare.

It should immediately be noted that the generally accepted definition of " object-oriented data model"does not exist. Now we can only talk about a certain “object” approach to the logical representation of data and about various object-oriented ways of implementing it.

We know that any data model must include three aspects: structural, holistic and manipulation. Let's see how they are implemented on an object-oriented basis programming paradigms.

Structure

Structure object model is described using three key concepts:

encapsulation - each object has some internal state (stores a record of data inside itself), as well as a set of methods - procedures, with the help of which (and only in this way) you can access the data that determines the internal state of the object, or change them. Thus, objects can be considered as independent entities, separated from the external world;
inheritance - implies the ability to create new classes of objects from classes of objects that inherit the structure and methods of their ancestors, adding to them features that reflect their own individuality. Inheritance can be simple (one ancestor) or multiple (several ancestors);
polymorphism - different objects can react differently to the same external events depending on how their methods are implemented.

Data integrity

To maintain integrity, the object-oriented approach suggests using the following tools:

  • automatic maintenance of inheritance relationships; the ability to declare some data fields and methods of an object as “hidden”, not visible to other objects; such fields and methods are used only by methods of the object itself; creating integrity control procedures within the object

Data manipulation tools

Unfortunately, object-oriented programming lacks common data manipulation tools such as relational algebra or relational calculus. Data is processed using one of the object-oriented programming languages general purpose, usually SmallTalk, C++ or Java.

Let us now summarize some results

Object-oriented databases, unlike relational ones, store objects rather than records. The object-oriented approach provides more advanced means for displaying the real world than relational model, natural representation of data. In the relational model, all relationships belong to the same level, which is what complicates the transformation of the hierarchical relationships of the entity-relationship model into the relational model. The OO model can be viewed layer by layer, at different levels of abstraction. It is possible to define new data types and operations with them.

At the same time, OO - The model also has a number of disadvantages:

  • There are no powerful non-procedural means of retrieving objects from the database. All queries have to be written in procedural languages, the problem of optimizing them is assigned to the programmer;
  • instead of purely declarative integrity constraints(such as explicit declaration of primary and foreign keys relational tables using keywords PRIMARY KEY And REFERENCES) or semi-declarative triggers, you have to write procedural code to ensure internal integrity.

Obviously, both of these shortcomings are associated with the lack of developed means of data manipulation. This problem is solved in two ways - expanding OO languages ​​towards data management (ODMG standard), or adding object properties into relational DBMSs (SQL-3, as well as so-called object-relational DBMSs).

Relational algebra is a language of operations performed on relationships - tables of a relational database. The operations of relational algebra allow you to create another relation based on one or more relations without changing the original relations themselves. The resulting other relation is usually not written to the database, but exists as a result of executing the SQL query - an array, created by functions for working with databases in programming languages. For each relational algebra operation, its implementation will be given in the form of queries in the SQL language.

Let's consider the operations of relational algebra. So that you are not distracted by the contents of tables that are not your databases, such as “Products”, “Drivers”, “plums”, “pears”, “tea”, “coffee”, Vladimir, Sergei, etc. We will perform operations on relationships (tables) with abstract data, such as R1, R2 (names of tables - relationships), etc. and A1, A2, A3 (names of attributes - columns) and h15, w11, etc. (contents of database table records).

Priorities for performing relational algebra operations (in descending order of list items, and in one item - operations with equal priorities):

  • selection, projection
  • Cartesian product, connection, intersection, division
  • union, difference.

Sampling operation

The select operation operates on one relation and determines the resulting relation R, which contains only those tuples (or rows, or records), relations that satisfy a given condition (predicate P ).

Thus, the fetch operation is a unary operation and is written as follows:

Where P- predicate (logical condition).

SQL Query

Now let's see what happens as a result of performing this relational algebra operation and its corresponding SQL query. The table below shows one relation that this operation works on.

R3
A1A2A3A4
3 hhylms
4 ppa1sr
1 rrylms

We look at column A3 and establish that the predicate A3>"d0" is satisfied by the entries in the first and third rows of the original relation (since the number of the letter y in the alphabet is greater than the number of the letter d). As a result, we get the following new relation, which contains two lines:

R
A1A2A3A4
3 hhylms
1 rrylms

This material will help you combine all kinds of logical conditions for selections. "Boolean algebra (algebra of logic)" .

SQL Query

SELECT A1, A2, A3 from R1 UNION SELECT A1, A2, A3 from R2

Now let's see what happens when we run this relational algebra operation and its corresponding SQL query. Now two relations are given, since the union operation is a binary operation:

R1 R2
A1A2A3A1A2A3
Z7aaw11X8ppk21
B7hhh15Q2eeh15
X8ppw11X8ppw11

We combine the lines of the first and second relation and see that the third line, which is the third in both the first and second relation, is identical, so we include it in the new relation only once. We get the following relation:

R
A1A2A3
Z7aaw11
B7hhh15
X8ppw11
X8ppk21
Q2eeh15

The following is important: a join operation can be performed only when two relations have the same number and names of attributes (columns), or, formally speaking, are join compatible.

Intersection operation

The result of the intersection of two sets (relations) A and B () will be a set (relation) C that includes those and only those elements that are in both set A and set B. The operation of intersection of relational algebra is identical to the operation.

SQL Query

SELECT A1, A2, A3 from R1 INTERSECT SELECT A1, A2, A3 from R2

SQL is not available in some dialects keyword INTERSECT. Its replacement, for example, in MySQL and others, is INNER JOIN. About how it works SQL statement JOIN in general and its varieties INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN and FULL OUTER JOIN - in the lesson SQL JOIN - joining database tables .

MySQL Query

Now let's see what happens when we run this relational algebra operation and its corresponding SQL query. Again, two relations R1 and R2 are given:

R1 R2
A1A2A3A1A2A3
Z7aaw11X8ppk21
B7hhh15Q2eeh15
X8ppw11X8ppw11

We look through all the records in two relations, and find that in both the first and second relations there is one row - the one that is third in both the first and second relations. We get a new relation:

R
A1A2A3
X8ppw11

Difference operation

The difference of two relations R1 and R2 () consists of tuples (or records, or rows) that are present in relation R1, but not in relation R2. The relations R1 and R2 must be join compatible. The relational algebra difference operation is identical to the operation.

SQL Query

SELECT A1, A2, A3 from R2 EXCEPT
SELECT A1, A2, A3 from R1

Let's establish what happens as a result of executing this relational algebra operation and the corresponding SQL query. Again, two relations R1 and R2 are given:

R1 R2
A1A2A3A1A2A3
Z7aaw11X8ppk21
B7hhh15Q2eeh15
X8ppw11X8ppw11

From the relation R2 we exclude the row that also exists in the relation R2 - the third - and we obtain a new relation:

R
A1A2A3
X8ppw11
Q2eeh15

Cartesian product operation

The Cartesian product operation () defines a new relation R, which is the result of concatenating each tuple of relation R1 with each tuple of relation R2.

SQL Query

SELECT * from R3, R4

Let's establish what happens as a result of executing this relational algebra operation and the corresponding SQL query. Given two relations R3 and R4:

R3 R4
A1A2A3A4A5A6
3 hhylms3 hh
4 ppa1sr4 pp
1 rrylms

The new relation must contain all the attributes (columns) of the two relations. First, the first row of relation R3 is concatenated with each of the two rows of relation R4, then the second row of relation R3, then the third. The result should be 3 X 2 = 6 tuples (rows). We get this new relation:

R
A1A2A3A4A5A6
3 hhylms3 hh
3 hhylms4 pp
4 ppa1sr3 hh
4 ppa1sr4 pp
1 rrylms3 hh
1 rrylms4 pp

Typically, database systems are equipped with a query language that can help its users query instances. There are two such types - relational algebra and relational calculus. The first is procedural which takes relation instances as input and outputs relation instances as output. Uses unary or binary calculus for this. Relational algebra is performed recursively and treated as relations.

Cartesian product (Χ)

Combines information from two different relationships into one.

Designations - r Χ s,

where r and s are ratios, and their output will be defined as

r Χ s = (qt | q ∈ r and t ∈ s).

Conclusion. Sets up a relationship that shows all books and articles written using the textbook.

Rename operation (ρ).

Relational algebra relations are results, but without any name. The rename operation allows you to change the output value, indicated by a small Greek letter ρ .

Designation - ρ x(E),

where the result of expression E is stored with the name x.

Additional operations:

  • set intersection;
  • assignment;
  • natural connection.

Relational calculus

A query language is non-procedural, meaning it tells you what to do, but does not explain how to implement it. Relational calculus comes in two forms:

  • tuple correlation calculus;
  • filtering variable ranges.

Notation - T/Condition: Returns all tuples of T that satisfy a condition. Result. Returns tuples with name. TRC can be quantified. We can use existential (∃) and universal quantifiers (∀). Conclusion. The above query will give the same result as the previous one.

Domain Relational Calculus DRC

The filter variable uses the attribute domain instead of integer tuple values ​​(as is done in the TRC mentioned above).

Designations - (a 1, a 2, a 3, ..., a n | P (a 1, a 2, a 3, ..., a n)),

where a1, a2 are attributes, and P denotes formulas built by internal values.

Conclusion. Sets the article, page, and topic from the TutorialsPoint relationship, where subject is the database.

Like TRC, DRC can also be written using existential and universal quantifiers. DRC also includes relational algebra operators. The power of expression of calculation, calculus and correlation of relationships between points is equivalent.

Variations and schemes of relational calculus and algebra

The ER model, when conceptualized in diagrams, gives good review essential relationships that are easier to understand. Schematic representations can be mapped to a relational diagram, that is, they can be created in conjunction with each other. It is not possible to import all ER constraints into a relational model, but an approximate structure can be generated. There are several processes and algorithms available to convert charts to this system. Some of them are automated, while others are created manually. ER diagrams mainly consist of the following criteria:

  • essence and its attributes;
  • connection, which is the association between the above values.

The comparison of objects and relationships occurs in different ways and patterns. For example, an entity is a real world object with some attributes. The matching process, the algorithm is as follows:

  • create a table for each object;
  • attributes should become table fields with corresponding data types;
  • declare a primary key.

A relationship is an association between entities. The compilation process is as follows:

  • create a table for relationships;
  • add primary keys all participating entities as table fields with corresponding data types;
  • if the relation has any attribute, set each attribute as a table field;
  • combine the primary key that makes up all the others for the participating objects;
  • specify all foreign key constraints.

Weak sets and hierarchical objects are displayed according to specific system. First of all, it is necessary to understand the essential principles and meanings. A weak set of objects is one that does not have any primary key associated with it. The mapping process is as follows:

  • create a table for a weak set of objects;
  • add all attributes to the schema as a field;
  • specify the primary key for identification;
  • set all foreign key constraints.

The mapping of hierarchical objects is based on specialization or generalization of the language of relational algebra and occurs in the form of sequential entities. The algorithm is as follows:

  • create tables for all objects at a higher lower level;
  • add primary keys;
  • at a low level, implement all other attributes of lower-level objects;
  • declare the primary keys of the table;
  • set foreign key constraints.

Existing options for describing, storing, changing information

SQL is a programming language for relational databases data. It is developed on algebra and correlation calculus of tuples. SQL is supplied as a package with all major database management system distributions. Contains both data and languages ​​for manipulating them. Using Definition Properties SQL data relational algebra, you can design and modify the database schema, while the control and adjustment properties, as well as data modification, allow you to store and retrieve information installed in the system. Uses the following set of commands to define structure and system:

  • creates new tables and views from the DBMS.
  • throws out commands.
  • changes the database schema.
  • this command adds an attribute to an object of type string.

SQL is equipped with a Data Manipulation Language (DML). It modifies the database instance by inserting, updating, and deleting information. DML is responsible for changing all data. SQL contains the following set of commands in the DML section:

  1. SELECT is one of main commands request. It is similar to the projection operation of relational algebra. It selects attributes based on the condition described in the WHERE clause.
  2. FROM - This section takes a name as an argument from which attributes should be selected/projected. If more than one name is given, this item corresponds to the Cartesian product.
  3. WHERE - This section specifies the predicate or conditions that must be met to qualify the projected attribute.

There are also commands:

  • insert;
  • changing values;
  • deletion.

Creating Relational Algebra Queries

When constructing a search, the task is to find a structure of operations that will lead to the correct conclusion. The basic operations of relational algebra are simple operations with one or two relations as operands. Combined sequence effects determine final result. Since the relational algebra system in databases is quite simple, many intermediate results can be obtained before reaching the final output, these are also used as operands that produce new output data.

For most operators, the order of queries and their execution does not matter, meaning that the same output can be achieved by constructing and combining intermediate data in different ways. In practice, searching the database is quite easy. The system for executing operations and intermediate results is determined by the query optimizer. When forming questions and requirements, you need
first select what relationships are needed to achieve the answer, and then specify the operations and intermediate results. The structure of a relational algebra query in a database with the results can be represented as a diagram. Requirements optimizers attempt to achieve the most efficient execution possible. In practice, this usually means that they try to minimize intermediate results as quickly as possible. Common examples of relational algebra will help with this.

Information about 1996 model vehicles where deficiencies were found during the 1999 inspection.

First, information about the machines is displayed to understand the meanings of all attributes of the relationship. Inspection information is stored in the Inspection table, and if faults are found, they are recorded in the Problem table. So these three tables are needed to get the required information.

Only cars from 1996 are interesting. The lineup car is represented as a value set attribute in the machine information table row. The first intermediate result consists of tuples representing the 1996 variants.

Thus, only rows that cover this period are needed. You must use selection to extract them. Now there are cars and inspections that were required. The rows are then joined using the union operation. A common register number must be connected to them, since it is the only common column, a natural join is used.

To find out whether faults were found during the tests, you need to associate problem lines with the test. After connecting the control rows to the cars, you can connect this result to the fault table. Accession must be based on common registration number and verified date. These are the only common columns in the tables, so a natural join is used.

Calculus options without intermediate results

Required information: Driver's name for model year 1995 or older vehicles that have not been tested for 2000. The name is in the Driver table. Law enforcement are described in the table "Inspection and cars in the table car". So these three tables are needed. First, you need to recognize vehicles that have not been inspected for 2000. It is not possible to solve this problem using only the inspections listed in the table, since it contains data about those inspections that were done, not those that were not implemented. This problem is solved by finding complementary vehicles that are tested before the year 2000. In fact, only their registration numbers are needed.

There are other examples besides the ones above that show how information can be changed or found. Query variants can be optimized using special operations. Essentially, to make searching and finding data as easy and simple as possible, there is a relational model of calculus.

Where is information secured and protected?

Algebras are stored in file formats containing records. At the physical level, factual information is fixed in an electromagnetic format on some device. These storage devices can be divided into three categories:

  1. Primary. This category includes memory that is directly accessible to the CPU. Registers, fast memory(cache) and main (RAM) are directly accessible to the central one, since they are all located on the motherboard or chipset. This storage is typically very small, ultra-fast and volatile. To maintain the condition it is required permanent source nutrition. In the event of a failure, all of its data is lost.
  2. Secondary. Used to store information for future use or Reserve copy. Includes memory devices that are not part of the chipset or motherboard processor, for example magnetic disks, optical discs (DVD, CD, etc.), hard disks, flash drives and magnetic tapes.
  3. Tertiary. Used to store huge amounts of data. Since such storage devices are external to the computer system, they are the slowest in speed. These storage gadgets are mainly used to backup the entire system. Optical discs and magnetic tapes are widely used for tertiary storage.

Special operations of relational algebra are important for query efficiency.

Storage structure

The computer system has clearly a certain hierarchy memory. The CPU has direct access to the main system as well as on-chip registers. Main memory access time is obviously less than CPU speed. To minimize this discrepancy, a cache is introduced. Cache memory provides the most fast time access and contains data that is most frequently accessed by the CPU.

The memory with the fastest access is the most expensive. Large devices Data storage devices provide low speed and are cheaper, but they can store huge amounts of data compared to a processor register or cache.

Magnetic and hard drives are the most common secondary storage devices in modern computer systems. They are called magnetic and consist of a metal base. These disks are placed vertically on the spindle. The read/write head moves between them and is used to magnetize or remove such spot underneath. It can be recognized as 0 (zero) or 1 (one).

Hard drives are formatted in a clearly defined order for efficient data storage. It has many concentric circles called tracks. Each track is further divided into sectors, which typically store 512 bytes of data.

File operations

Operations on a relational algebra language system and its database can be broadly classified into two categories:

  • update;
  • search.

The first category modifies data values ​​by inserting, deleting, or updating. On the other hand, search operations do not edit information, but retrieve it after optional conditional filtering. In both types of operations, selection plays a significant role. In addition to creating and deleting a file, there are several operations that can be performed in them:

  1. Open - exists in one of two read or write modes. In the first case, the operating system does not allow anyone to change the data. In other words, the data is only read. Files opened in Read mode can be shared by multiple objects. Recording mode allows you to change data. The files can be read but cannot be shared.
  2. Close is the most important operation from the point of view operating system, since it removes all locks (if in mode public access), saves the data (if modified) to secondary media and frees all buffers and handlers associated with the file.
  3. Indexing is an information structure technique for efficiently retrieving records from the files of a system based on some attributes where the system was executed. Determined based on attributes.

Indexing can be of the following type:

  1. Primary is defined in the ordered data file. The information file is organized in a key field.
  2. A secondary index is generated from a field that is a candidate key and has a unique value in each record or is not a key with duplicate values.
  3. Clustering is defined in an ordered data file, in a non-key field.

Database Management System or DBMS refers to the technology of storing and retrieving user information with maximum efficiency along with appropriate security measures. A detailed consideration of this issue leads to the conclusion that relational algebra is a language of operators that take relations as arguments and return them as a result.

Each operation includes data selection (selection) and the actions that will be performed on the selected data. The main operations in a relational database are database update operations and relationship processing operations.

TO database update operations include those operations that insert new tuples, remove unnecessary ones, and adjust the attribute values ​​of existing tuples, namely: these are operations Turn on, Delete, Update.

Operation Turn on requires specifying the name of the relationship and preliminary generation of the attribute values ​​of the new tuple. The tuple key must be specified.

Operation Delete requires the name of the relation, as well as the identification of the tuple or group of tuples to be deleted.

Operation Update is executed for the named relation and can correct both one and several tuples. For example, if the company’s management decided to increase all employee salaries by the same amount, then several tuples will be adjusted at once with one Update operation.

Concerning processing operations, then they are borrowed from relational algebra. According to E. Codd’s approach, relational algebra includes eight operations, five of which are basic: Sample, Projection, Multiplication, Union, Subtraction.

Sample- select from the relation only those tuples that satisfy the given condition.

At Projections relation to a given set of its attributes, a new relation is obtained, created by extracting tuples containing the specified attributes from the original relation.

At Multiplication(the Cartesian product) of two relations, a new relation is obtained, the tuples of which are a concatenation of the tuples of the first and second relations.

As a result Associations two relations, a third one is obtained, including tuples included in at least one relation, that is, containing all the elements of the original relations.

At Subtraction Only those tuples of the first relation are returned that remain after subtracting the second relation, that is, all tuples of the second relation are discarded from the first relation.

The remaining three operations are derivatives; they can be obtained from the main operations; they are called additional: Connection, Intersection, Division.

Operation Compound applies to two relationships that have a common attribute. The result of this operation for two relations under some condition is a relation of tuples that are a combination of the first and second relations that satisfy the specified condition.

Intersection two relations is a relation that includes all tuples included in both relations.

Operation Divisions assumes that there are two relations: one is binary (containing two attributes), the other is unary (containing one attribute). The result is a relation consisting of tuples that include the values ​​of the first attribute of the tuples of the first relation, but only those for which the set of values ​​of the second attribute of the first relation coincides with the set of values ​​of the attributes of the second relation.

Distinctive feature relation processing operations is that the unit of processing in them is not tuples, but relations: one or two relations are used at the input of each operation, and the result of the operations is a new relation.

Let's look at some of the most commonly used operations of relational algebra in more detail.

Operation An association- the input is given two compatible relations of the same dimension: A and B. The result is a relation of the same structure, containing all tuples A and all tuples B

Intersection assumes the presence at the input of two relations of the same dimension: A and B. At the output, a relation of the same structure is created, containing only those tuples A that are in B.

Division. At the input of the operation, two relations are used: A and B. Let relation A, called divisible, contain attributes (A 1, A 2, A 3,..., A n). Relation B is a divisor and contains a subset of attributes A: (A 1, A 2, ..., A k), where k

In general, the operations of the relational data model provide the ability to manipulate relationships, allowing you to update the database, as well as select subsets of stored data and present them in the desired form.

When designing and working with databases, these eight operations are usually not enough. Therefore, operations such as renaming attributes, creating new calculated attributes, assignment operations, comparisons, etc. are added.

Modification Anomalies

Modification Anomalies manifest themselves in the fact that a change in the value of one data may entail scanning the entire table and a corresponding change in some other table records.

Deletion anomalies are that when you delete any data from the table, other information that is not directly related to the deleted data may also disappear.

Addition anomalies arise in cases where information cannot be placed in a table until it is incomplete, or inserting a new record requires additional scanning of the table

Let there be a relation that stores information about students, the courses they take, and the cost of these courses. From this relation, a tuple is removed that contains (in addition to information about the student) information about the name and cost of the course attended by this student. If information about the name and cost of the course was stored in a single copy only in this tuple, it will irrevocably disappear from the relation. This situation is called deletion anomaly. Performing a delete operation results in the loss of information about two entities.

This same relationship can be illustrated by an example insertion anomaly. Let's say we need to add information about the name and cost of a certain course, but we will not be able to add this information until no students are enrolled in the course. You can get rid of both anomalies by splitting the existing relationship into two, each of which will contain data from only one entity. Then deleting student information will not affect course data.

When splitting a relationship into two, problems also arise. For example, is it possible to enroll a student in a course that does not yet exist? These problems must be resolved by discussing business rules. If business rules require that course and cost information be available when a student enrolls in that course, then a check will be made to ensure that the required course exists when the student enrolls in the course. These types of checks are called referential integrity constraints or foreign key integrity constraints.

Integrity of entities - no primary key value must contain null.

Design stages:

Conceptual design - the database development process begins with requirements analysis. The designer at this stage of development must find answers to the following questions: what data elements should be stored, who will access them and how. A Spanish model is created. Information that does not depend on physical aspects, target database and programming languages

Boolean - the logical structure of the database is created. To do this, determine how the data will be grouped logically. The structure of the database at this stage is expressed in terms of application objects and relationships between them. Depends on the target DBMS, redundancy checking, normalization.

Physical - The logical structure of the database is converted into a physical structure taking into account performance aspects. Data elements at this stage receive attributes and are defined as columns in the tables of the DBMS chosen for the implementation of the database. Basic file and index organization relationships, integrity constraints, and security measures.

Just in case - transactions- an indivisible aftereffect of operations that transfer the database from one stable state to another. Properties - atomicity (indivisibility), consistency (from one state to another), isolation (user transactions do not interfere with each other), durability (the result must be recorded in the database after the release, even if it crashed at the next moment).

Entity-relationship method.

Entity-relationship modeling method gives an abstract model subject area using the following basic concepts: essence(entities), relationships(relationships) between entities and attributes(attributes) to represent properties of entities and relationships.

Any fragment of the subject area can be represented as set of entities, between which there is some many connections. Let's give definitions:

Essence is an object that can be identified in some way that distinguishes it from other objects. Examples: a specific person, enterprise, event, etc.

Entity set- a set of entities of the same type (having the same properties). Examples: all people, businesses, holidays, etc. Entity sets do not have to be disjoint. For example, an entity that belongs to the set MAN also belongs to the set PEOPLE.

An entity is actually a set attributes, which describe the properties of all members of a given entity set. Domain was already higher.

Entity Key- is one or more attributes that uniquely define a given entity.

Connection is an association established between several entities. Examples:

  • since each employee works in some department, there is a “works in” or DEPARTMENT-EMPLOYEE relationship between the EMPLOYEE and DEPARTMENT entities;

Unfortunately, there are no general rules for determining what is considered an entity and what is a relationship. In the example discussed above, we assumed that “leading” is a connection. However, we can consider the entity “manager”, which has connections “manages” with the entity “department” and “is” with the entity “employee”.

A relationship can also have attributes. For example, for the DEPARTMENT-EMPLOYEE relationship, you can set the attribute WORK_TERRENCE_IN_DEPARTMENT.

The role of the entity in the relationship- the function that an entity performs in a given connection. For example, in a PARENT-CHILD relationship, PERSON entities may have the roles "parent" and "child". Specifying roles in the entity-relationship model is optional and serves to clarify the semantics of the relationship.

Set of connections- this is the relationship between n(and n no less than 2) entities, each of which belongs to a certain set of entities.

Although, strictly speaking, the concepts of “connection” and “set of connections” are different (the first is an element of the second), they are nevertheless very often confused.

When n=2, i.e. when a relationship combines two entities, it is called binary. It has been proven that n-ary set of connections ( n>2) can always be replaced by many binary ones, but the former better reflect the semantics of the subject area.

The number of entities that can be associated through a set of connections with another entity is called degree of connection. Considering degrees is especially useful for binary relationships. The following degrees of binary bonds can exist:

  • one to one (denoted 1: 1 ). This means that in such a relationship, entities with one role always correspond to at most one entity with another role.

Another important characteristic connection in addition to its degree is membership class entities included in it or cardinality communications.

"EMPLOYEE" has mandatory membership class(this fact is also indicated by indicating the interval of the number of possible occurrences of an entity in a relationship, in in this case is 1,1), and the entity "DEPARTMENT" has optional membership class(0,1). Now this connection we can describe it as 0,1:1,1 .

  • one to many ( 1:n). In this case, an entity with one role can correspond to any number of entities with another role.

This figure further illustrates the fact that multiple sets of relationships can be defined between two entities.

  • many to one ( n: 1). This relationship is similar to mapping 1:n.

In this case, for completely obvious reasons (each contract is concluded with a specific customer, and each customer has at least one contract, otherwise it would not be such), each entity has a mandatory membership class.

  • many to many ( n: n). In this case, each of the associated entities can be represented by any number of instances.

If the existence of an entity x depends on the existence of an entity y, then x is called dependent entity(sometimes entity x is called "weak" and "entity" y is called strong). As an example, consider the relationship between the previously described entities WORKING_GROUP and CONTRACT. Working group is created only after a contract is signed with the customer, and ceases to exist upon completion of the contract. Then WORKING_GROUP is dependent on the CONTRACT entity. We will denote a dependent entity by a double rectangle, and its connection with a strong entity by a line with an arrow ( we had an oval for the dependent)

The cardinality of the connection for a strong entity will always be (1,1). The membership class and degree of relationship for a dependent entity can be anything.

12. Hierarchical and network data models.

Hierarchical model is a collection of elements arranged in the order of their subordination from general to specific and forming a tree (graph) inverted in structure.

The basic concepts of a hierarchical structure include level, node, and relationship. Knot is a set of data attributes that describe an object. In a hierarchical tree diagram, nodes are represented as vertices in the graph. Each node at a lower level is connected to only one node at a higher level. high level. A hierarchical tree has only one vertex, not subordinate to any other vertex and located at the topmost - first level. Dependent (slave) nodes are at the second, third, etc. levels. The number of trees in the database is determined by the number of root records. For each database record, there is only one hierarchical path from the root record.

The organization of data in a hierarchical type DBMS is defined in terms of: element, aggregate, record (group), group relation, database.

  • Attribute (data element)- the smallest unit of a data structure. Typically, each element in a database description is given a unique name. It is referred to by this name during processing. A data element is also often called a field.
  • Record- a named collection of attributes. Using records allows you to obtain some logically connected set of data in one access to the database. It is the records that are changed, added and deleted. The type of a record is determined by the composition of its attributes. Record instance - a specific record with a specific element value
  • Group attitude- hierarchical relationship between records of two types. The parent record (the owner of the group relationship) is called the source record, and the child records (members of the group relationship) are called subordinate records. A hierarchical database can only store such tree structures.

The root record of each tree must contain a key with a unique value. The keys of non-root records must have a unique value only within the group relationship. Each record is identified by a complete concatenated key, which is the set of keys of all records from the root along the hierarchical path.

At graphic representation group relationships are represented by arcs of a directed graph, and record types are represented by vertices.

For group relationships in the hierarchical model, it is provided auto mode inclusions and fixed membership. This means that for any non-root record to be remembered in the database, its parent record must exist. When a parent record is deleted, all subordinate records are automatically deleted.

Example: An enterprise consists of departments in which employees work. Each department can have multiple employees, but an employee cannot work in more than one department.

Therefore, for information system For personnel management, it is necessary to create a group relationship consisting of a parent record DEPARTMENT (NAME OF DEPARTMENT, NUMBER OF EMPLOYEES) and a child record EMPLOYEE (LAST NAME, POSITION, SALARY). (For simplicity, we assume that there are only two child records.) - rice a (further)

To automate the accounting of contracts with customers, it is necessary to create another hierarchical structure: customer - contracts with him - employees involved in working on the contract. This tree will include the records CUSTOMER(CUSTOMER_NAME, ADDRESS), CONTRACT(NUMBER, DATE, AMOUNT), CONTRACTOR (SURNAME, POSITION, DEPARTMENT_NAME) - fig. b.

Flaws hierarchical databases:

  • Information between records is partially duplicated (such records are called paired), and the hierarchical data model does not provide support for correspondence between paired records.
  • The hierarchical model implements the relationship between the source and child entry according to the 1:N scheme, that is, one parent record can correspond to any number of children. Let us now assume that the performer can take part in more than one contract (i.e., an M:N type relationship arises). In this case, it is necessary to enter another group relationship into the database, in which CONTRACTOR will be the original record, and CONTRACT will be the child. Thus, we are again forced to duplicate the information. (Figure C)
  • quite complex logical connections and corresponding cumbersomeness in data processing

Advantages:

Is the most simple enough efficient use memory and good time performance for performing operations on data. However, this model is convenient mainly for working with hierarchically organized information.

Operations on data defined in the hierarchical model:

  • ADD to the database new entry. For the root record, it is necessary to generate a key value.
  • CHANGE the data value of the previously retrieved record. Key data should not be changed.
  • DELETE some record and all records subordinate to it.
  • EXTRACT:
    • extract root record by key value, sequential viewing of root records is also allowed
    • retrieve next record (next record is retrieved in left traversal order)

In the EXTRACT operation, it is possible to specify selection conditions.

All modification operations are applied only to one “current” record (which was previously retrieved from the database). This approach to data manipulation is called “navigational”.

Integrity constraints.

Only the integrity of the relationships between the owners and members of the group relationship is maintained (no descendant can exist without an ancestor). The correspondence of paired records belonging to different hierarchies is not automatically maintained.

The first database management systems, which appeared in the mid-60s, made it possible to work with a hierarchical database. The most famous was the hierarchical IMS system from IBM. Other systems are also known: PC/Focus, Team-Up, Data Edge and ours: Oka, INES, MIRIS.

Network data model.

Network model- a structure in which any element can be associated with any other element. A network database consists of sets of records that are related to each other so that records can contain explicit links to other sets of records. Thus, sets of records form a network. Relationships between records can be arbitrary, and these relationships are explicitly present and stored in the database.

A network data model is defined in the same terms as a hierarchical one. It consists of many records that can be owners or members of a group relationship. The relationship between an owner record and a member record is also of the form 1:N.

The main difference between these models is that in the network model, a record can be a member more than one group relationship. According to this model, each group relation is named and a distinction is made between its type and its instance. A group relationship type is specified by its name and defines properties common to all instances of this type. A group relationship instance is represented by an owner record and a set of (possibly empty) subordinate records. However, there is the following limitation: a record instance cannot be a member of two instances of group relationships of the same type (an employee cannot work in two departments)

Hierarchical structure from the picture above. is converted to network as follows

Trees (a) and (b) are replaced by a single network structure in which the EMPLOYEE entry is included in two group relationships; to display the M:N type, enter the EMPLOYEE_CONTRACT record, which has no fields and serves only to link the CONTRACT and EMPLOYEE records

Each instance of a group relationship is characterized by the following characteristics:

  • way to organize subordinate records:

arbitrary,

chronological /queue/,

reverse chronological /stack/,

assorted.

If a record is declared subordinate to several group relationships, then each of them can have its own ordering method assigned.

  • mode for enabling subordinate records:

automatic - it is impossible to enter a record into the database without it being immediately assigned to a certain owner;

manual - allows you to remember a subordinate record in the database and not immediately include it in an instance of a group relationship. This operation is later initiated by the user).

  • exclusion mode It is customary to distinguish three classes of membership of subordinate records in group relations:

Fixed. A subordinate record is tightly linked to an owner record and can only be removed from the group relationship by deleting it. When you delete an owner record, all subordinate records are automatically deleted. In the example, fixed membership assumes the group relationship "CONCLUDES" between the records "CONTRACT" and "CUSTOMER" because a contract cannot exist without a customer.

Mandatory. It is possible to switch a subordinate record to another owner, but it cannot exist without an owner. To delete an owner record, it must have no subordinate records with required membership. The records “EMPLOYEE” and “DEPARTMENT” are connected by this relationship. If a department is disbanded, all of its employees must either be transferred to other departments or fired.

Optional. You can exclude a record from a group relationship, but keep it in the database without assigning it to a different owner. When an owner record is deleted, its subordinate records - optional members - are retained in the database, no longer participating in a group relationship of this type. An example of such a group relationship is “PERFORM” between “EMPLOYEES” and “CONTRACT”, since the organization may have employees whose activities are not related to the fulfillment of any contractual obligations to customers.

Operations on data.

ADD- make a record in the database and, depending on the inclusion mode, either include it in a group relationship, where it is declared subordinate, or not include it in any group relationship.

INCLUDE INTO GROUP RELATIONSHIP- link an existing subordinate record to the owner record.

SWITCH- link an existing subordinate record to another owner record in the same group relationship.

UPDATE- change the value of the elements of a previously extracted record.

EXTRACT- extract records sequentially by key value, as well as using group relationships - from the owner you can go to member records, and from a subordinate record to the owner of the set.

DELETE- remove a record from the database. If this record is the owner of a group relationship, then the membership class of the subordinate records is analyzed. Mandatory members must first be excluded from the group relationship, fixed members must be deleted along with the owner, optional members will remain in the database.
EXCLUDE FROM GROUP RELATIONSHIP- break the connection between the owner record and the member record.

Integrity constraints.

As in the hierarchical model, only the maintenance of referential integrity is ensured (the owner of the relationship is a member of the relationship).

Basics dignity network model is high memory efficiency and efficiency. Flaw– the complexity and rigidity of the base scheme, as well as the difficulty of understanding. In addition, integrity control is weakened in this model, since it allows arbitrary connections to be established between records. The complexity of the DBMS implementation, the complexity of the data access mechanism, and the need to clearly define data connections at the physical level

To the famous network systems database management include: DBMS, IDMS, TOTAL, VISTA, NETWORK, SETOR, COMPASS, etc.

Comparing hierarchical and network databases, we can say the following. In general, hierarchical and network models provide sufficient fast access to the data. But since in network databases the main structure of information presentation has the form of a network in which each vertex (node) can have a connection with any other, then the data in network database more equal than in a hierarchical one, since information can be accessed starting from any node.

Graph (hierarchical and network) models are implemented as data models in database management systems running on mainframe computers. For personal computers Relational databases are more common, although there are also database management systems that support the network model.


Introduction. 4

1. Databases and DBMS 6

2. Relational databases 20

3. Operations on relational database tables 29

4. Development of information models 49

5. Organization of access to data 64

6. Principles for building systems focused on data analysis 96

Conclusion. 106

List of the most common abbreviations. 107

Introduction.

There is very little literature in Russian on the topic of DBMS. It is impossible to recommend one or more books whose content would cover the material in the Databases course. Some of the best include K. Data's "Introduction to Database Systems" (Science, 1980) and "DB2 Relational Database Management Guide" (Finance and Statistics, 1988), as well as J. Ullman's book "Fundamentals of Database Systems" (Finance) and Statistics, 1983). Although these books are somewhat outdated (by English language Several updated editions have already been published), they are worth reading.

Given tutorial in our opinion, it is intended to systematize and present methodically in a form accessible for initial study and development of material in a volume and content that meets the requirements of the Databases course program. It consists of six interrelated sections, which address the following issues step by step:


    1. concept of databases, DBMS architecture (infological data model, datalogical data model, physical data model, types of datalogical data models, hierarchical datalogical model, network datalogical model, datalogical model based on inverted lists, relational datalogical model, object-relational datalogical model);

    2. relational databases (basic concepts of relational databases, data type, domain, relation schema, database schema, tuple, relation, integrity of relational databases, basic properties of relations of relational databases);

    3. operations on tables of relational databases (operations of set theory, normalization of relations of relational databases);

    4. using the language of ER diagrams to build information models (entity-relationship diagrams, information modeling, IDEF1X methodology, stages of developing an infological data model);

    5. organization of access to data (means of accelerated access to data, query language, transaction processing, means of recovery after failures);

    6. principles of building systems focused on data analysis (data warehouses; data models used in building data warehouses).
The textbook is intended for students of all specialties and forms of study.

1. Databases and DBMS

1.1. Data and computer

The perception of the real world can be correlated with a sequence of different, although sometimes interrelated, phenomena. Since ancient times, people have tried to describe these phenomena (even when they could not understand them). This description is called data.

Traditionally, data is captured using specific means communication (for example, using natural language or images) on a specific medium (for example, stone or paper). Typically, data (facts, phenomena, events, ideas or objects) and their interpretation (semantics) are recorded together, since natural language is flexible enough to represent both. An example would be the statement “The cost of an air ticket is 128.” Here “128” is a given, and “Air ticket price” is its semantics.

Often data and interpretation are separated. For example, the “Aircraft schedule” can be presented in the form of a table (Fig. 1.1.1), in the upper part of which (separate from the data) their interpretation is given. This division makes it difficult to work with the data (try quickly getting information from the bottom of the table).


Interpretation

Flight number

Days of the week

Point of departure

Departure time

Destination

Arrival time

Aircraft type

Ticket price

Data

138

2_4_7

Baku

21.12

Moscow

0.52

IL-86

115.00

57

3_6

Yerevan

7.20

Kyiv

9.25

TU-154

92.00

1234

2_6

Kazan

22.40

Baku

23.50

TU-134

73.50

242

1 to 7

Kyiv

14.10

Moscow

16.15

TU-154

57.00

86

2_3_5

Minsk

10.50

Sochi

13.06

IL-86

78.50

137

1_3_6

Moscow

15.17

Baku

18.44

IL-86

115.00

241

1 to 7

Moscow

9.05

Kyiv

11.05

TU-154

57.00

577

1_3_5

Riga

21.53

Tallinn

22.57

AN-24

21.50

78

3_6

Sochi

18.25

Baku

20.12

TU-134

44.00

578

2_4_6

Tallinn

6.30

Riga

7.37

AN-24

21.50

Rice. 1.1.1. Data and their interpretation.

The use of computers for conducting I (accompaniment, support) and data processing usually leads to even greater separation of data and interpretation. A computer deals primarily with data as such. Much of the interpretive information is not explicitly captured at all (the computer does not “know” whether “21.50” is the airfare or the departure time). Why did this happen?

There is a at least two historical reasons why the use of computers led to the separation of data from interpretation. Firstly, computers did not have sufficient capabilities for processing texts in natural language - the main language for interpreting data. Secondly, the cost of computer memory was initially very high. Memory was used to store the data itself, and interpretation was traditionally left to the user. The user put the interpretation of the data into his program, which “knew”, for example, that the sixth input value was associated with the time of arrival of the aircraft, and the fourth – with the time of its departure. This significantly increased the role of the program, since outside of interpretation, the data is nothing more than a collection of bits on a storage device. The strict dependence between data and the programs that use them creates serious problems in data management and makes their use less flexible.1.2. Database concept. DBMS architecture

Active work to find acceptable ways to socialize the continuously growing volume of information led to the creation in the early 60s of special software systems called " Database Management Systems"(DBMS). DBMSsoftware, which creates databases, maintains them in working order and ensures effective access to database data for users and applications. The main feature of a DBMS is the presence of procedures for entering and storing not only the data itself, but also descriptions of its structure. Files equipped with a description of the data stored in them and controlled by a DBMS began to be called data banks, and then " Database" (BD). Thus, Database(DB) – reflection of the subject area in the form of a structured collection of data. The data stored in it characterizes the composition of objects in the subject area, their properties and relationships.

The DBMS must provide access to data to any users, including those who have virtually no and (or) do not want to have any idea about:


  • physical placement of data and their descriptions in memory;

  • mechanisms for searching the requested data;

  • problems that arise when simultaneous request the same data by many users ( application programs);

  • ways to ensure data protection from incorrect updates and (or) unauthorized access;

  • keeping databases up to date
and many other DBMS functions.

When performing the main of these functions, the DBMS must use various descriptions data. How to create these descriptions?

Naturally, a database project must begin with an analysis of the subject area and identifying the requirements for it of individual users (employees of the organization for whom the database is being created). Design is usually entrusted to a person (group of persons) - database administrator(ABD). It can be either a specially designated employee of the organization or future user database, quite familiar with machine data processing.

1.2.1. Infological data model

By combining private views of the database content obtained from user interviews with its views of the data that may be needed in future applications, the DBA first creates a generalized informal description created base data. This description, made using natural language, mathematical formulas, tables, graphs and other tools that are understandable to all people working on database design are called infological data model(Fig. 1.2.1).

Rice. 1.2.1. Data Model Layers

This human-centric model is completely independent of the physical parameters of the data storage environment. In the end, this medium may be human memory, and not a computer. Therefore, the information model should not change until some changes in the real world require some definition to be changed in it so that this model continues to reflect the subject area.

The remaining models shown in Fig. 1.2.1 are computer-oriented. With their help, the DBMS allows programs and users to access stored data only by their names, without worrying about the physical location of this data. The necessary data is found by the DBMS on external storage devices using physical data model.

1.2.2. Datalogical data model

Since the specified access is carried out using a specific DBMS, the models must be described in data description language this DBMS. Such a description created by the DBA using the infological data model is called datalogical data model.

These changes to the physical and datalogical models will not be noticed by existing users of the system (they will be “transparent” to them), just as new users will not be noticed. Therefore, data independence enables the development of a database system without disrupting existing applications.

1.2.3. Physical data model

Unlike the infological data model, the physical model is completely dependent on the specific DBMS. It should take into account

  • restrictions on the length of names of database objects (tables, columns, indexes),

  • usage special characters in names

  • acceptable data types and their internal representation on data storage devices in a computer.
The same infological data model can correspond to several different physical models.

Three-level architecture (infological, datalogical and physical layers) allows you to provide independence of stored data from the programs that use them. The DBA can, if necessary, rewrite the stored data to other storage media and (or) reorganize its physical structure, changing only the physical data model. The DBA can connect any number of new users (new applications) to the system, adding, if necessary, to the datalogical model.







2024 gtavrl.ru.