Relational databases. Data creates problems

Youtube

Appearance computer equipment in our modern times has marked an information revolution in all spheres of human activity. But in order to prevent all information from becoming unnecessary trash V global network Internet, a database system was invented in which materials are sorted, systematized, making them easy to find and submit for subsequent processing. There are three main types - relational, hierarchical, and network databases.

Fundamental Models

Returning to the emergence of databases, it is worth saying that this process was quite complex; it originated with the development of programmable information processing equipment. Therefore, it is not surprising that the number of their models is this moment reaches more than 50, but the main ones are hierarchical, relational and network, which are still widely used in practice. What are they?

Hierarchical has a tree structure and is made up of data from different levels, between which there are connections. Network model The database is a more complex pattern. Its structure resembles a hierarchical one, and its scheme is expanded and improved. The difference between them is that the descendant data of a hierarchical model can only have a connection with one ancestor, while a network model can have several of them. The structure of a relational database is much more complex. Therefore, it should be analyzed in more detail.

Basic concept of a relational database

This model was developed in the 1970s by Dr. Edgar Codd. It is a logically structured table with fields that describes the data, their relationships with each other, the operations performed on them, and most importantly, the rules that guarantee their integrity. Why is the model called relational? It is based on relationships (from the Latin relatio) between data. There are many definitions for this type of database. Relational tables of information are much easier to systematize and process than in a network or hierarchical model. How to do this? It is enough to know the features, structure of the model and properties of relational tables.

The process of modeling and compiling basic elements

In order to create your own DBMS, you should use one of the modeling tools, think about what information you need to work with, design tables and relational single and multiple relationships between data, fill in entity cells and set the primary, foreign keys.

Modeling tables and designing relational databases is done using free tools, such as Workbench, PhpMyAdmin, Case Studio, dbForge Studio. After detailed design, you should save the graphically ready relational model and translate it into ready-made SQL code. At this stage, you can begin working with data sorting, processing and systematization.

Features, structure and terms associated with the relational model

Each source describes its elements in its own way, so to reduce confusion I would like to give a small hint:

relationalTable = entity;
layout = attributes = field names = entity column headers;
entity instance = tuple = record = table string;
attribute value = entity cell = field .

To move on to the properties of a relational database, you should know what basic components it consists of and what they are intended for.

Essence. There can be one table in a relational database, or there can be a whole set of tables that characterize the described objects thanks to the data stored in them. They have a fixed number of fields and a variable number of records. A relational database model table is made up of rows, attributes, and layout.
A record is a variable number of lines displaying data that characterizes the object being described. The numbering of records is carried out automatically by the system.
Attributes are data that describe the columns of an entity.
Field. Represents an entity column. Their number is a fixed value, set during table creation or modification.

Now, knowing the constituent elements of the table, you can move on to the properties of the relational database model:

Relational database entities are two-dimensional. Thanks to this property, it is easy to perform various logical and mathematical operations with them.
The order of attribute values and records in a relational table can be arbitrary.
A column within one relational table must have its own individual name.
All data in an entity column has a fixed length and the same type.
Any record is essentially considered one data item.
The components of the strings are one of a kind. IN relational entity there are no identical lines.

Based on the properties, it is clear that the attribute values should be same type, length. Let's look at the features of attribute values.

Main characteristics of relational database fields

Field names must be unique within one entity. Relational database attribute or field types describe what category of data is stored in entity fields. A relational database field must have a fixed size, measured in characters. The parameters and format of attribute values determine how data in them is corrected. There is also such a thing as a “mask” or “input template”. It is intended to define the data entry configuration for an attribute value. An error message must be issued if an incorrect entry is made in a field. Also, some restrictions are imposed on the field elements - conditions for checking the accuracy and error-freeness of data entry. There is some required attribute value that must definitely be filled with data. Some attribute strings may be filled with NULL values. Blank data is allowed in field attributes. Like the error notification, there are values that are filled in automatically by the system - this is the default data. An indexed field is designed to speed up the search for any data.

Diagram of a two-dimensional relational database table

To understand the model in detail using SQL, it is best to look at the diagram using an example. We already know what a relational database is. A record in each table is one data element. To prevent data redundancy, normalization operations must be performed.

Basic rules for normalizing a relational entity

1. The field name value for a relational table must be unique, one of a kind (first normal form - 1NF).

2. For a table that is already cast to 1NF, the name of any non-identifying column must be dependent on the table's unique identifier (2NF).

3. For an entire table that is already in 2NF, each non-identifying field cannot depend on an element of another unidentified value (3NF entity).

Databases: relational relationships between tables

There are 2 main relational tables:

"One-many". Occurs when one key record of table No. 1 corresponds to several instances of the second entity. A key icon at one end of a drawn line indicates that the entity is on the “one” side; the other end of the line is often marked with an infinity symbol.

A “many-many” relationship is formed when an explicit logical interaction occurs between several rows of one entity with a number of records of another table.
If a one-to-one concatenation occurs between two entities, this means that the key identifier of one table is present in the other entity, then one of the tables should be removed, it is redundant. But sometimes, purely for security reasons, programmers deliberately separate the two entities. Therefore, hypothetically, a one-to-one relationship could exist.

Existence of keys in a relational database

Primary and secondary keys define potential database relationships. Relational connections data models can only have one potential clue, this will be the primary key. What is he like? A primary key is an entity column or set of attributes through which data can be accessed for a specific row. It must be unique, unique, and its fields cannot contain empty values. If primary key consists of only one attribute, then it is called simple, otherwise it will be a component.

In addition to the primary key, there is also a foreign key. Many people don't understand the difference between them. Let's look at them in more detail using an example. So, there are 2 tables: “Dean’s Office” and “Students”. The “Dean’s Office” entity contains the following fields: “Student ID”, “Full name” and “Group”. The “Students” table has attribute values such as “Name”, “Group” and “GPA”. Since a student ID cannot be the same for multiple students, this field will be the primary key. “Full name” and “Group” from the “Students” table can be the same for several people; they refer to the student ID number from the “Dean’s office” entity, so they can be used as a foreign key.

Example relational database model

For clarity, we give a simple example of a relational database model consisting of two entities. There is a table called "Dean's Office".

It is necessary to make connections to create a full-fledged relational database. The entry “IN-41”, like “IN-72”, may appear more than once in the “Dean’s Office” sign, and in rare cases the last, first and patronymic names of students may coincide, so these fields cannot be made the primary key. Let's show the entity "Students".

As we can see, the field types of relational databases are completely different. There are both digital and symbolic records. Therefore, in the attribute settings you should specify the values \u200b\u200binteger, char, vachar, date and others. In the "Dean's Office" table, the only unique value is the student ID. This field can be taken as the primary key. Full name, group and phone number from the “Students” entity can be taken as a foreign key referencing the student ID. The connection has been established. This is an example of a one-to-one relationship model. Hypothetically, one of the tables is redundant; they can be easily combined into one entity. To prevent student ID numbers from becoming publicly known, it is entirely possible to have two tables.

In relational databases, data is stored in the form of tables consisting of rows and columns. Each table has its own, predefined set of named fields. Columns in relational database tables can contain scalar data of a fixed type, such as numbers, strings, or dates. Tables in a relational database can be related in a one-to-one or one-to-many relationship. The number of rows of records in the table is unlimited, and each record corresponds to a separate entity.

Relational databases now occupy a dominant position. Hierarchical and network database structures are a thing of the past, giving way to relational databases, for which most modern DBMSs are built (MS SQL Server, MS Access, InterBase, FoxPro, PostgreSQL, Paradox and others).

Details

The relational model focuses on organizing data in the form of two-dimensional tables. Each relational table is two-dimensional array and has the following properties:

Each table element is one data element
Each column has its own unique name
There are no identical rows in the table
All columns in the table are homogeneous, that is, all elements in the column are of the same type
The order of rows and columns can be arbitrary

Relational DBMSs, focused on implementing operational data processing systems, are less effective in analytical processing tasks than multidimensional databases. This is due, firstly, to the presence of fairly strict restrictions imposed by the existing implementation of the SQL language. An example of such a real-life constraint is the assumption that the data in a relational database is unordered (or more precisely, randomly ordered). At the same time, their ordering requires additional time spent on sorting each time the database is accessed. IN analytical systems Data entry and retrieval is carried out in large portions. In turn, the data, once it enters the database, remains unchanged for long period time. And here it is more effective to store data in the form of partially denormalized tables, in which not only detailed, but also pre-calculated aggregated values can be stored to increase performance. And for navigation and sampling, specialized addressing and indexing methods, based on the assumption of low variability and low mobility of data in the database, can be used. This method of organizing data is sometimes called pre-computed, thereby emphasizing its difference from the normalized relational approach, which involves dynamic calculation various types results (aggregation) and establishing connections between details from different tables (join operations).

Main disadvantages

In addition to the low efficiency, which was mentioned earlier, the disadvantages of traditional relational DBMSs include the fact that as the main and, often, the only mechanism that provides quick search and selecting individual rows in a table (or in tables linked through foreign keys), various modifications of indexes based on B-trees are usually used. This solution is effective only when processing small groups of records and high intensity of data modification in databases.

Relational database management systems may never go away, but their days of dominance are certainly numbered, says Paul Creel, who wrote an article about it in InfoWorld in September 2011. He quotes analyst Robin Blore, who argues that the architecture of relational DBMSs is obsolete, since it was created in a bygone era and does not meet modern requirements.

Relational DBMSs still dominate financial transaction processing systems, but today companies are increasingly using DBMSs of the new NoSQL architecture - horizontally scalable, distributed and developed in open source. Examples of such systems are Hadoop, MapReduce and VoltDB. According to Forrester analysts, about 75% of data in enterprises is either semi-structured information (XML, Email and EDI), or unstructured (text, images, audio and video), and only 5% of this data is stored in relational DBMSs, and the rest is stored in other types of databases or in the form of files, and is not subject to processing by relational systems.

According to Blore, relational DBMSs “may die without anyone noticing” - for example, if Oracle simply replaces the SQL engine with NoSQL in its DBMS. The analyst believes that one of the existing columnar DBMSs could become such a mechanism.

Relational Databases allow you to store information in several “flat” (two-dimensional) tables, interconnected through shared data fields called keys. Relational databases provide easier access to on-the-fly reports (typically via SQL) and provide increased data reliability and integrity by eliminating redundant information.

Everyone knows what a simple database is: telephone directories, product catalogs and dictionaries are all databases. They may be structured or otherwise organized: as flat files, as hierarchical or network structures, or as relational tables. Most often, organizations use relational databases to store information.

A database is a collection of tables made up of columns and rows, similar to spreadsheet. Each line contains one entry; each column contains all instances of a particular piece of data from all rows. For example, a typical telephone directory consists of columns containing telephone numbers, caller names, and caller addresses. Each line contains a number, name and address. This simple form called a flat file due to its two-dimensional nature and the fact that all data is stored in a single file.

Ideally, each database has at least one column with a unique identifier, or key. Let's consider phone book. There may be several entries for the caller John Smith, but none of the phone numbers are repeated. The phone number serves as a key.

In reality, everything is not so simple. Two or more people using the same phone number, can be listed in telephone directory separately, causing the phone number to appear in two or more places, so there are multiple key strings that are not unique.

Data creates problems

In the most simple bases data, each record occupies one line, in other words, telephone company It is necessary to create a separate column for each piece of accounting information. That is, one for the second subscriber of the “paired” phone, another one for the third, etc., depending on how many additional subscribers will be needed.

This means that every record in the database must have all of these additional columns, even if they are not used anywhere else. This also means that the database must be reorganized whenever the company offers new service. Tone dialing service is introduced - and the structure of the database changes, as new column. Support for caller ID, call waiting, etc. is introduced - and the database is rebuilt again and again.

In the 1960s, only the largest companies could afford to purchase computers to manage their data. Moreover, databases built with static data models and procedural programming languages such as Cobol can be expensive to maintain and not always reliable. Procedural languages define a sequence of events that a computer must go through to complete a task. Programming such sequences was difficult, especially if the database structure needed to be changed or the new kind reports.

Powerful connections

Edgar Codd, a researcher at IBM's San Jose Research Laboratory, essentially created and described the concept of relational databases in his seminal work, A Relational Model of Data for Large Shared Data Banks. Communications of the ACM, June 1970).

Codd proposed a model that allows developers to partition their databases into separate but related tables, which improves performance while maintaining the same appearance as the original database. Since then, Codd has been considered the founding father of the relational database industry.

This model works as follows. The telephone company could create a master table using the telephone number as the primary key and store it with other basic customer information. A company can define a separate table with columns for this primary key and for additional services such as caller ID support and call waiting. She can also create another table to control call bills, where each entry consists of a phone number and call charge data.

End users can easily get the information they want, the way they want it, even though the data is stored in different tables. Therefore, a telephone company customer service representative can display information about a subscriber's billing, as well as the status of special services or when the last payment was received, on the same screen.

Codd formulated 12 rules for relational databases, most of which concern data integrity, updating, and access. The first two are fairly clear even to non-technical users.

Rule 1, the information rule, specifies that all information in a relational database is represented as a set of values stored in tables.

Rule 2, the Access Guarantee Rule, specifies that every data element in a relational database can be accessed using a table name, a primary key, and a column name. In other words, all data is stored in tables, and if you know the table name, primary key and column where the required data item is located, it can always be retrieved.

The essence of Codd's work was that it was proposed to use declarative rather than procedural programming languages with relational databases. Declarative languages such as SQL queries(Structured Query Language) give users the ability to essentially tell the computer, “I want to get the next bits of data from all records that meet a certain set of criteria.” The computer itself will “understand” what steps need to be taken to obtain this information from the database.

To work with a huge amount actively used databases are applied software systems relational database management software from reputable vendors such as Oracle, Sybase, IBM, Informix and Microsoft.

Although most SQL implementations can only be called interoperable within a certain approximation, this mechanism, approved as an international standard, allows the creation of complex database-based systems. An easy-to-program interface between Web sites and relational databases gives end users the ability to add new records, update existing ones, and generate reports for a variety of services, such as online trading and access to online library catalogs.

Relational model

A relational database uses a set of tables related to each other through a specific key (in in this case this is the PhoneNumber field)

A database (DB) is a collection of information about objects, processes, events or phenomena related to a certain subject area, topic or task, organized in accordance with certain rules and maintained in computer memory. It is organized in such a way as to provide the information needs of users, as well as convenient storage of this collection of data, both as a whole and any part of it.

A relational database is a set of interconnected tables, each of which contains information about objects of a certain type. Each row of the table contains data about one object (for example, a car, a computer, a client), and the columns of the table contain various characteristics of these objects - attributes (for example, engine number, processor brand, telephone numbers of companies or clients).

The rows of a table are called records. All table records have the same structure - they consist of fields (data elements) in which object attributes are stored (Fig. 1). Each record field contains one characteristic of the object and represents a specified data type (for example, text string, number, date). A primary key is used to identify records. A primary key is a set of table fields whose combination of values uniquely identifies each record in the table.

Rice. 1. Names of objects in the table

Database management systems (DBMS) are used to work with data. Main functions of the DBMS:

Data definition (description of database structure);

Data processing;

Data management.

Development of database structure - the most important task, solved when designing the database. The structure of a database (the set, form and relationships of its tables) is one of the main design decisions when creating applications using a database. The database structure created by the developer is described in the DBMS data definition language.

Any DBMS allows you to perform the following operations with data:

Adding records to tables;

Removing records from a table;

Updating the values of some fields in one or more records in database tables;

Searches for one or more records that meet a specified condition.

To perform these operations, a query mechanism is used. The result of executing queries is either a set of records selected according to certain criteria, or changes in tables. Queries to the database are formed in a language specially created for this purpose, which is called “structured query language” (SQL - Structured Query Language).

Data governance typically refers to protecting data from unauthorized access, supporting multi-user data processing, and ensuring data integrity and consistency.

Relational databases. Data creates problems

Fundamental Models

Basic concept of a relational database

The process of modeling and compiling basic elements

Features, structure and terms associated with the relational model

Main characteristics of relational database fields

Diagram of a two-dimensional relational database table

Basic rules for normalizing a relational entity

Databases: relational relationships between tables

Existence of keys in a relational database

Example relational database model

Details

Main disadvantages

Data creates problems

Powerful connections

Relational model

Normalization

Normal forms

see also

See what “Relational databases” are in other dictionaries:

Books

Popular articles

Latest articles

Sections

Pages

Special projects

Contacts