Infological data model entity-relationship.


Infological model of databases "Entity-relationship" Basic concepts

The purpose of information modeling is to provide the most natural ways for humans to collect and present the information that is supposed to be stored in the database being created data. Therefore, they are trying to build an infological data model by analogy with natural language (the latter cannot be used in its pure form due to the complexity computer processing texts and ambiguity of any natural language). The main constructive elements of information models are entities, connections between them and their properties (attributes).

An entity is any distinguishable object (an object that we can distinguish from another), information about which must be stored in a database. Entities can be people, places, planes, flights, taste, color, etc. It is necessary to distinguish between concepts such as entity type and entity instance. The concept of entity type refers to a set of homogeneous individuals, objects, events or ideas acting as a whole. An entity instance refers to a specific thing in a set. For example, the entity type can be CITY, and the instance can be Moscow, Kyiv, etc.

An attribute is a named characteristic of an entity. Its name must be unique for a specific entity type, but can be the same for various types entities (for example, COLOR can be defined for many entities: DOG, CAR, SMOKE, etc.). Attributes are used to define what information should be collected about an entity. Examples of attributes for the CAR entity are TYPE, MAKE, LICENSE PLATE, COLOR, etc. Here too there is a distinction between type and instance. The COLOR attribute type has many instances or values:

Red, Blue, Banana, White Night, etc.,

However, each entity instance is assigned only one attribute value.

There is no absolute difference between entity types and attributes. An attribute is such only in relation to the entity type. In another context, an attribute may act as an independent entity. For example, for an automobile plant, color is only an attribute of the production product, but for a paint and varnish factory, color is a type of entity.

A key is a minimum set of attributes whose values ​​can be used to uniquely find the required instance of an entity. Minimality means that excluding any attribute from the set does not allow the entity to be identified by the remaining ones. For the Schedule entity (clause 1.2), the key is the Flight_number attribute or the set of: Departure_point, Departure_time and Destination_point (provided that one plane flies from point to point at each time).

A relationship is an association of two or more entities. If the purpose of the database was only to store individual, unrelated data, then its structure could be very simple. However, one of the main requirements for organizing a database is to ensure the possibility of finding some entities by the values ​​of others, for which it is necessary to establish between them certain connections. And since real databases often contain hundreds or even thousands of entities, theoretically more than a million connections can be established between them. The presence of such a multitude of connections determines the complexity of information models.

Characteristics of connections and modeling language

When building information models, you can use the language of ER diagrams (from the English Entity-Relationship, i.e. entity-relationship). In them, entities are depicted as marked rectangles, associations as marked diamonds or hexagons, attributes as marked ovals, and connections between them as non-directional edges, above which the degree of connection (1 or a letter replacing the word “many”) and the necessary explanation can be indicated.

Between two entities, for example, A and B, four types of connections are possible.

First type– ONE-TO-ONE relationship (1:1): at each moment of time, each representative (instance) of entity A corresponds to 1 or 0 representatives of entity B:

A student may not “earn” a scholarship, receive a regular scholarship, or receive one of the enhanced scholarships.

Second type– ONE-TO-MANY relationship (1:M): one representative of entity A corresponds to 0, 1 or several representatives of entity B.

The apartment may be empty; one or more residents may live in it.

Since connections in both directions are possible between two entities, there are two more types of relationships: MANY-TO-ONE (M:1) and MANY-TO-MANY (M:N).

Example 2.1. If the connection between the entities of MAN and WOMAN is called MARRIAGE, then there are four possible representations of such a connection:

The nature of connections between entities is not limited to those listed. There are also more complex connections:

Many relationships between the same entities

(a patient, having one attending physician, may also have several consulting physicians; a physician may be the attending physician of several patients and may simultaneously consult several other patients);

Training connections

(a doctor can order more than one patient for more than one test, a test can be ordered by more than one doctor for more than one patient, and a patient can be ordered for more than one test by more than one doctor);

Connections of higher orders, the semantics (meaning) of which is sometimes very complex.

In the examples given, to improve the illustrative nature of the relationships under consideration, the attributes of entities and associations in all ER diagrams are not shown. Thus, entering only a few basic attributes into the description of marriage ties will significantly complicate the ER diagram (Fig. 2.1a). In this regard, the language of ER diagrams is used to build small models and illustrate individual fragments of large ones. More often, a less visual but more meaningful information modeling language (IML) is used, in which entities and associations are represented by sentences of the form:

ENTITY (attribute 1, attribute 2, ..., attribute n) ASSOCIATION [ENTITY S1, ENTITY S2, ...] (attribute 1, attribute 2, ..., attribute n)

where S is the degree of connection, and the attributes included in the key must be marked with an underscore.

Thus, the above example of a set of connections between entities can be described in NAM as follows:

Doctor (Doctor_number, Last name, First name, Patronymic, Specialty)Patient (Registration_number, Bed number, Last name, First name, Patronymic, Address, Date of birth, Gender)Attending_doctor [Doctor 1, Patient M] (Doctor_number, Registration_number)Consultant [Doctor M, Patient N] (Doctor_number, Registration_number).

Infological data model Entity-relationship. The purpose of information modeling is to provide the most natural ways for humans to collect and present the information that is supposed to be stored in the database being created.

Therefore, they are trying to build an infological data model by analogy with natural language; the latter cannot be used in its pure form due to the complexity of computer text processing and the ambiguity of any natural language. The main constructive elements of information models are entities, connections between them and their properties and attributes. An entity is any distinguishable object, an object that we can distinguish from another, information about which must be stored in a database.

Entities can be people, places, planes, flights, taste, color, etc. It is necessary to distinguish between concepts such as entity type and entity instance. The concept of entity type refers to a set of homogeneous individuals, objects, events or ideas acting as a whole.

An entity instance refers to a specific thing in a set. For example, the entity type could be CITY, and the instance could be Moscow. An attribute is a named characteristic of an entity. Its name must be unique for a particular entity type, but can be the same for different entity types for example, COLOR can be defined for many entities DOG, CAR, SMOKE, etc. Attributes are used to define what information should be collected about the entity.

There is no absolute difference between entity types and attributes. An attribute is such only in relation to the entity type. In another context, an attribute may act as an independent entity. For example, for an automobile plant, color is only an attribute of the production product, but for a paint and varnish factory, color is an entity type. A key is a minimum set of attributes whose values ​​can be used to uniquely find the required instance of an entity.

Minimality means that excluding any attribute from the set does not allow the entity to be identified by the remaining ones. Relationship - association of two or more entities. If the purpose of the database was only to store individual, unrelated data, then its structure could be very simple. However, one of the main requirements for organizing a database is to ensure the ability to find some entities by the values ​​of others, for which it is necessary to establish certain connections between them.

And since real databases often contain hundreds or even thousands of entities, theoretically more than a million connections can be established between them. The presence of such a multitude of connections determines the complexity of information models. 1.2.4

End of work -

This topic belongs to the section:

Structure of the working Internet network

Traditionally, information is stored on paper. At the same time, it is difficult to quickly select the necessary data when applying for a job. The issue of reliability of storage and confidentiality of personal data about employees at the enterprise is also important.

If you need additional material on this topic, or you did not find what you were looking for, we recommend using the search in our database of works:

What will we do with the received material:

If this material was useful to you, you can save it to your page on social networks:

All topics in this section:

Analysis of the principles of PC-based workstations
Analysis of the principles of PC-based workstations. Automated workplace AWP, or, in foreign terminology, work station work-station, represents the place of the user-specialist of one or another

Classification of workstations
Classification of automated workstations. Workplaces can be individual, group, or collective. In relation to group and collective workstations for the purpose of effective functioning of the computer system - team specialists

Database concept
Database concept. Active efforts to find acceptable ways to socialize the continuously growing volume of information led to the creation of special software programs in the early 60s.

DBMS architecture
DBMS architecture. The DBMS must provide access to data to any users, including those who have little or no idea about the physical location of the data in memory.

Relational data structure
Relational structure data. At the end of the 60s, works appeared in which the possibilities of using various tabular datalogical data models were discussed, i.e. Possibility of use for

Basic Concepts
Basic concepts. The main idea behind this technology is to minimize the amount of data transmitted over the network, since the main time losses and failures occur precisely due to insufficiently high

Advantages of client-server architecture
Advantages of client-server architecture. At first, the question may arise: why such difficulties? Here are some thoughts in favor of this approach. The undoubted advantage is approx.

Components of Client-Server Architecture
Components of Client-Server architecture. There are three main software components of the client-server end-user software architecture. middleware. Server software. To the final software

Purpose and principle of operation of SQL
Purpose and principle SQL work. SQL is often pronounced as a sequel, an abbreviation for Structured Query Language to symbolize the Structured Query Language. SQL is a tool

Advantages of the SQL language
Advantages of the SQL language. The query language provides undoubted advantages. Firstly, it continues the ideology of the Client-Server architecture. The client part of the application prepares a request for processing information

"Basic Concepts

The purpose of information modeling is to provide the most natural ways for humans to collect and present the information that is supposed to be stored in the database being created. Therefore, they are trying to build an infological data model by analogy with natural language (the latter cannot be used in its pure form due to the complexity of computer text processing and the ambiguity of any natural language). The main constructive elements of information models are entities, connections between them and their properties (attributes).

An entity is any distinguishable object (an object that we can distinguish from another), information about which must be stored in a database. Entities can be people, places, planes, flights, taste, color, etc. It is necessary to distinguish between concepts such as entity type and entity instance. The concept of an entity type refers to a set of homogeneous individuals, objects, events or ideas acting as a whole. An entity instance refers to a specific thing in a set. For example, the entity type can be CITY, and the instance can be Moscow, Kyiv, etc.

An attribute is a named characteristic of an entity. Its name must be unique for a specific entity type, but can be the same for different entity types (for example, COLOR can be defined for many entities: DOG, CAR, SMOKE, etc.). Attributes are used to define what information should be collected about an entity. Examples of attributes for the CAR entity are TYPE, MAKE, LICENSE PLATE, COLOR, etc. Here too there is a distinction between type and instance. The COLOR attribute type has many instances or values:

Red, Blue, Banana, White Night, etc.,

However, each entity instance is assigned only one attribute value.

There is no absolute difference between entity types and attributes. An attribute is such only in relation to the entity type. In another context, an attribute may act as an independent entity. For example, for an automobile plant, color is only an attribute of the production product, but for a paint and varnish factory, color is a type of entity.

A key is a minimum set of attributes whose values ​​can be used to uniquely find the required instance of an entity. Minimality means that excluding any attribute from the set does not allow the entity to be identified by the remaining ones. For the Schedule entity (clause 1.2), the key is the Flight_number attribute or the set of: Departure_point, Departure_time and Destination_point (provided that one plane flies from point to point at each time).

A relationship is an association of two or more entities. If the purpose of the database was only to store individual, unrelated data, then its structure could be very simple. However, one of the main requirements for organizing a database is to ensure the ability to find some entities by the values ​​of others, for which it is necessary to establish certain connections between them. And since real databases often contain hundreds or even thousands of entities, theoretically more than a million connections can be established between them. The presence of such a multitude of connections determines the complexity of information models.

Characteristics of connections and modeling language

When building information models, you can use the language of ER diagrams (from the English Entity-Relationship, i.e. entity-relationship). In them, entities are depicted as marked rectangles, associations as marked diamonds or hexagons, attributes as marked ovals, and connections between them as non-directional edges, above which the degree of connection (1 or a letter replacing the word “many”) and the necessary explanation can be indicated.

Between two entities, for example, A and B, four types of connections are possible.

First type– ONE-TO-ONE relationship (1:1): at each moment of time, each representative (instance) of entity A corresponds to 1 or 0 representatives of entity B:

A student may not “earn” a scholarship, receive a regular scholarship, or receive one of the enhanced scholarships.

Second type– ONE-TO-MANY relationship (1:M): one representative of entity A corresponds to 0, 1 or several representatives of entity B.

The apartment may be empty; one or more residents may live in it.

Since connections in both directions are possible between two entities, there are two more types of relationships: MANY-TO-ONE (M:1) and MANY-TO-MANY (M:N).

Example 2.1. If the connection between the entities MAN and WOMAN is called MARRIAGE, then there are four possible representations of such a connection:

The nature of connections between entities is not limited to those listed. There are also more complex connections:

many connections between the same entities

(a patient, having one attending physician, may also have several consulting physicians; a physician may be the attending physician of several patients and may simultaneously consult several other patients);

Training connections

ENTITY (attribute 1, attribute 2, ..., attribute n) ASSOCIATION [ENTITY S1, ENTITY S2, ...] (attribute 1, attribute 2, ..., attribute n)

where S is the degree of connection, and the attributes included in the key must be marked with an underscore.

Thus, the above example of a set of connections between entities can be described in NAM as follows:

Doctor (Doctor_number, Last name, First name, Patronymic, Specialty)Patient (Registration_number, Bed number, Last name, First name, Patronymic, Address, Date of birth, Gender)Attending_doctor [Doctor 1, Patient M] (Doctor_number, Registration_number)Consultant [Doctor M, Patient N] (Doctor_number, Registration_number).

Rice. 2.1. Examples of ER diagrams

To identify relationships between entities, it is necessary, at a minimum, to define the entities themselves. But it is not simple task, since in different subject areas the same object can be an entity, attribute or association. Let us illustrate this statement with examples related to the description of marital ties (see example 2.1).

Example 2.2. The Civil Registration Office (ZAGS) does not deal with all people, but only with those who have applied to register a marriage, birth or death. Therefore, in countries where only traditional marriages are allowed, civil registry offices can post information about registered marriages in a single entity:

Marriage (Certificate_number, Husband's last name, Husband's first name, Husband's patronymic, Husband's date of birth, Wife's last name, ... , Registration_date, Place of registration, ...),

The ER diagram of which is shown in Fig. 2.1, b.

Example 2.3. Now consider a situation where the civil registry office is located in a country that allows polygamy. If you use the “Marriage” entity of Example 2.2 to register marriages, then information about husbands who have several wives will be duplicated (see Table 2.1).

Table 2.1

Certificate number Husband's last name ... Wife's last name ... Registration date
1-YUB 154745 Petukhov ... Kurochkina ... 06/03/1991
1-YUB 163489 Petukhov ... Pestrushkina ... 11/08/1991
1-UB 169887 Petukhov ... Ryabova ... 12/12/1992
1-UB 169878 Seleznev ... Utochkina ... 12/12/1992
1-YUB 154746 Parasyuk ... Svinyushkina ... 06/03/1991
1-UB 169879 Parasyuk ... Khavroniya ... 12/12/1992
... ... ... ... ... ...

Duplication can be eliminated by creating an additional entity "Husbands"

Husbands (Code_M, Last name, First name, Patronymic, Date of birth, Place of birth)

and replacing the “Marriage” entity with a characteristic (see clause 2.3) with reference to the corresponding description in the “Husbands” entity.

Marriage (Certificate number, Code_M, Wife's last name, ..., Registration date, ...) (Husbands).

The ER diagram of the connection between these entities is shown in Fig. 2.1,c, and an example of their copies is in table. 2.2 and 2.3.

Table 2.2

Table 2.3

Certificate number Code_M Wife's last name Wife's name Registration date ...
1-YUB 154745 111 Kurochkina Augustine 06/03/1991 ...
1-YUB 163489 111 Pestrushkina Mariana 11/08/1991 ...
1-UB 169877 111 Ryabova Milan 12/12/1992 ...
1-UB 169878 112 Utochkina Veronica 12/12/1992 ...
1-YUB 154746 113 Svinyushkina Elvira 06/03/1991 ...
1_UB 169879 113 Khavroniya Rufina 12/12/1992 ...
... ... ... ... ... ...

Example 2.4. Finally, consider the case when an organization needed data about the presence of married couples in it, and there was already an entity to store information about employees

Employees (Personnel_number, Last name, First name, ...).

The use of the “Marriage” entity discussed in example 2.2 is inappropriate: “Employees” already contains the surnames, first names, and patronymics of the spouses. Therefore, let's create an association

Marriage [Employee 1, Employee 1] (Personnel_number of husband, Personnel_number of wife, ...),

connecting certain instances of the “Employees” entity (Fig. 2.1,d).

In conclusion, we note that the ER diagram Fig. 2.1a describes the structure of placing data on marriages in the civil registry offices of countries that allow group marriages, and the ER diagrams of example 2.1 describe any types of marriages in organizations where there are entities “men” and “women”, including single and unmarried people.

What is “connection”? In ER diagrams, this is a line connecting geometric shapes depicting entities, attributes, associations and other information objects. In the text, this term is used to indicate the interdependence of entities. If this interdependence has attributes, then it is called an association.

Entity classification

The time has come to understand the terminology. K. Date defines three main classes of entities: core, associative and characteristic, as well as a subclass of associative entities - designations.

A core entity (pivot) is an independent entity (it will be defined in more detail below).

In the examples discussed earlier, the rods are “Student”, “Apartment”, “Men”, “Doctor”, “Marriage” (from example 2.2) and others, the names of which are placed in rectangles.

An associative entity (association) is a “many-to-many” (“-to-many”, etc.) relationship between two or more entities or instances of an entity (as in example 2.4). Associations are treated as full-fledged entities:

they can participate in other associations and designations just like core entities;

may have properties, i.e. have not only a set of key attributes necessary to indicate relationships, but also any number of other attributes that characterize the relationship. For example, the associations "Marriage" from examples 2.1 and 2.4 contain key attributes“Code_M”, “Code_Zh” and “Husband’s personnel number”, “Wife’s personnel number”, as well as clarifying attributes “Certificate number”, “Registration date”, “Registration_place”, “Registration number in the registry office book”, etc.

A characteristic entity (characteristic) is a many-to-one or one-to-one relationship between two entities ( special case associations). The only purpose of the characteristic within the framework of the considered subject area consists in describing or clarifying some other entity. The need for them arises due to the fact that entities of the real world sometimes have multi-valued properties. A husband can have several wives (example 2.3), a book can have several characteristics of a reprint (corrected, expanded, revised, ...), etc.

The existence of a characteristic depends entirely on the entity being characterized: women lose their status as wives if their husband dies.

To describe the characteristic, a new JIM proposal is used, which in general has the form:

CHARACTERISTIC (attribute 1, attribute 2, ...) (LIST OF CHARACTERIZED ENTITIES).

Let's also expand the language of ER diagrams by introducing a trapezoid to represent the characteristics (Fig. 2.2).

Rice. 2.2. Elements of the extended ER diagram language

A designating entity or designation is a many-to-one or one-to-one relationship between two entities and differs from a characteristic in that it does not depend on the designated entity.

Let's consider an example related to the enrollment of employees in various departments of the organization.

In the absence of strict rules (an employee can be simultaneously enrolled in several departments or not be enrolled in any department), it is necessary to create a description with the association Enrollment:

Departments (Department Number, Department Name, ...)Employees (Personnel Number, Last Name, ...)Enrollment [Departments M, Employees N] (Department Number, Personnel Number, Enrollment Date).

However, provided that each employee must be enrolled in one of the departments, you can create a description with the designation Employees:

Departments (Department number, Department name, ...)Employees (Personnel number, Last name, ..., Department number, Enrollment date)[Departments]

IN in this example employees have an independent existence (if a department is deleted, it does not follow that the employees of that department must also be deleted). Therefore, they cannot be characteristics of departments and are called designations.

Notations are used to store repeating values ​​of large text attributes: “codifiers” of disciplines studied by students, names of organizations and their departments, lists of goods, etc.

The description of a designation differs externally from the description of a characteristic only in that the designated entities are not braces, and in square ones:

DESIGNATION (attribute 1, attribute 2, ...) [LIST OF DESIGNATED ENTITIES].

Typically, designations are not treated as full entities, although this would not lead to any error.

Designations and characteristics are not completely independent entities, since they presuppose the existence of some other entity that will be “designated” or “characterized.” However, they still represent special cases of essence and can, of course, have properties, can participate in associations, designations and have their own (more low level) characteristics. We also emphasize that all instances of a characteristic must be associated with some instance of the characterized entity. However, it is allowed that some instances of the characterized entity do not have relationships. True, if this concerns marriages, then the essence of “Husbands” should be replaced by the essence of “Men” (there is no husband without a wife).

Let us now redefine the core entity as an entity that is neither an association, nor a designation, nor a characteristic. Such entities have independent existence, although they can designate other entities, such as employees designate departments.

In conclusion, let us consider an example of constructing an information model of the “Nutrition” database, where information about dishes (Fig. 2.3), their daily consumption, the products from which these dishes are prepared, and the suppliers of these products should be stored. The information will be used by the chef and manager of a small catering establishment, as well as its visitors.

Rice. 2.3. Example of a recipe

With the help of these users, the following objects and characteristics of the designed base were identified:

Dishes that require data included in their culinary recipes to describe: dish number (for example, from a book of culinary recipes), name of the dish, type of dish (appetizer, soup, main course, etc.), recipe (technology for preparing the dish), yield (portion weight), name, calorie content and weight of each product included in the dish. For each product supplier: name, address, name of product supplied, delivery date and price at the time of delivery. Daily food consumption (consumption): dish, number of servings, date.

Analysis of objects allows us to highlight:

Rods Dishes, Products and Cities; associations Composition (links Dishes with Products) and

Supplies (links Suppliers with Products);

Designation Suppliers; characteristics Recipes and Consumption.

The ER diagram of the model is shown in Fig. 2.4. and the model in the YAM language has the following form:

Dishes (BL, Dish, Type)Products (PR, Product, Calorie content)Suppliers (POS, City, Supplier) [City]Composition [Dishes M, Products N] (BL, PR, Weight (g))Supplies [Suppliers M, Products N] (POS, PR, Date_P, Price, Weight (kg))Cities (City, Country) Recipes (BL, Recipe) (Dishes)Consumption (BL, Date_P, Portions) (Dishes)

In these models, Dish, Product and Supplier are names, and BL, PR and POS are digital codes dishes, products and organizations that supply these products.

Rice. 2.4. Infological model of the "Nutrition" database

About primary and foreign keys

Recall that a key or possible key is a minimum set of attributes whose values ​​can be used to uniquely find the required instance of an entity. Minimality means that excluding any attribute from the set does not allow the entity to be identified by the remaining ones. Every entity has at least one possible key. One of them is taken as the primary key. When choosing primary key preference should be given to non-composite keys or keys composed of a minimum number of attributes. It is also inadvisable to use keys with long text values(it is preferable to use integer attributes). So, to identify a student you can use either unique number grade book, or a set of last name, first name, patronymic, group number and maybe additional attributes, since it is possible that two students (and more often female students) with the same last names, first names and patronymics will appear in the group. It is also bad to use as a key not the number of the dish, but its name, for example, “Appetizer of processed cheese “Friendship” with ham and pickled cucumber” or “Hare in sour cream with potato croquettes and red cabbage salad.”

The primary key of a core entity (any attribute participating in the primary key) is not allowed to have an undefined value. Otherwise, a contradictory situation will arise: a non-individual, and therefore non-existent, instance of the core essence will appear. For the same reasons, it is necessary to ensure the uniqueness of the primary key.

Now about foreign keys:

If entity C links entities A and B, then it must include foreign keys corresponding to the primary keys of entities A and B. If entity B refers to entity A, then it must include a foreign key corresponding to the primary key of entity A.

In paragraph 2.3, an example was considered where “Employees” denoted “Departments” and included a foreign key “Department Number” corresponding to the primary key of the “Departments” entity.

The relationship between primary and foreign keys of entities is illustrated in Fig. 2.5.

Rice. 2.5. Structures: a - associations; b - designations (characteristics)

Here, to designate any of the associated entities (cores, characteristics, designations or even associations), a new generalizing term “Goal” or “Target Entity” is used.

Thus, when considering the problem of choosing a way to represent associations and notations in a database, the main question that needs to be answered is: “What are foreign keys?” And then, for each foreign key, three questions need to be resolved:

1. Can this foreign key accept undefined values ​​(NULL values)? In other words, can there be some instance of an entity of this type, for which the target entity indicated by the foreign key is unknown? In the case of supplies, this is probably not possible - a supply from an unknown supplier or a supply of an unknown product does not make sense. But in the case of employees, such a situation could however make sense - it is quite possible that any employee in this moment not enrolled in any department at all. Note that the answer to this question does not depend on the whim of the database designer, but is determined by the actual course of action adopted in that part of the real world that is to be represented in the database in question. Similar remarks are relevant to the issues discussed below.

2. What should happen when you try to DELETE a target entity that is referenced by a foreign key? For example, when deleting a supplier who has implemented at least one delivery. There are three possibilities:

3. What should happen when you try to UPDATE the primary key of a target entity that is referenced by some foreign key? For example, an attempt may be made to update the number of a supplier for which there is at least one corresponding delivery. For clarity, we will again consider this case in more detail. You have the same three options as when deleting:

CASCADES The update operation is "cascaded" to also update the foreign key in that provider's supplies.
LIMITED. The primary keys of only those suppliers who have not yet made deliveries are updated. Otherwise, the update operation is rejected.
INSTALLED For all deliveries of that supplier, the NULL value of the foreign key is set to undefined, and then the supplier's primary key is updated. This feature, of course, is not applicable if the foreign key must not contain NULL values.

Thus, for each foreign key in a design, the database designer must specify not only the field or combination of fields that makes up that foreign key and the target table that is identified by that key, but also the answers to the questions above (the three constraints that apply to this foreign key).

Finally, about characteristics - denoting entities, the existence of which depends on the type of denoted entities. The designation is represented by a foreign key in the table corresponding to that characteristic. But the three foreign key restrictions discussed above for this case should be specified as follows:

NULL values ​​are not allowed DELETE FROM (target) CASCADED UPDATE (target primary key) CASCADED

The specified specifications represent dependence on the existence of characteristic entities.

Integrity Constraints

Integrity (from the English integrity - intactness, inviolability, safety, integrity) is understood as the correctness of data at any time. But this goal can only be achieved within certain limits: the DBMS cannot control the correctness of every single value entered into the database (although each value can be checked for plausibility). For example, it cannot be discovered that the input value 5 (representing the day of the week) should actually be 3. On the other hand, the value 9 would clearly be an error and should be rejected by the DBMS. However, to do this, she should be told that the numbers of the days of the week must belong to the set (1,2,3,4,5,6,7).

Maintaining database integrity can be thought of as protecting data from unauthorized changes or destruction (not to be confused with unauthorized changes and destruction, which are a security issue). Modern DBMS have a number of means to ensure that integrity is maintained (as well as means to ensure that security is maintained).

There are three groups of integrity rules:

Entity integrity. Referential integrity. User-defined integrity.

In section 2.4, the motivation for two integrity rules that are common to any relational databases was considered.

Any attribute participating in a primary key is not allowed to have an undefined value. The value of the foreign key must either: be equal to the value of the target's primary key; be completely uncertain, i.e. each attribute value involved in foreign key must be undefined. For any specific database, there are a number of additional specific rules that apply to it alone and are determined by the developer. Most often controlled:

the uniqueness of certain attributes,
range of values ​​(exam score from 2 to 5),
belonging to a set of values ​​(gender "M" or "F").

About building an information model

A reader who has become acquainted only with the material in this and the previous chapters will not be able to correctly perceive and evaluate those tips and recommendations for building a good information model, which have been developed over decades by the largest specialists in the field of data processing. To do this, you need to at least study the following materials. Ideally, it is necessary for the reader to first implement at least one information system project and propose it real users and have been a database and application administrator long enough to recognize at least a small fraction of the problems that arise from poorly thought out designs. The experience of the author and all the information systems specialists he knows shows that any theoretical recommendations are taken seriously only after several unsuccessful attempts to revive poorly designed systems. (Although there are also designers who continue to believe that they can revive a dying project by changing programs, rather than by changing the database infological model.)

Indeed, to determine the list and structure of stored data, it is necessary to collect information about real and potential applications, as well as about database users, and when building an information model, you should only care about the reliability of storing this data, completely forgetting about the applications and users for which the database is being created data

This is due to the completely different requirements for the database of application programmers and the database administrator. The former would like to have in one place (for example, in one table) all the data they need to implement a query from application program or from the terminal. The latter take care of eliminating possible distortions of stored data when entering new information into the database and updating or deleting existing information. To do this, they remove duplicates and unwanted functional relationships between attributes from the database, breaking the database into many small tables (see section 4.6). Since many years of global experience in using information systems built on the basis of databases shows that design flaws cannot be eliminated by any tricks in application programs, experienced designers do not allow themselves to meet application programmers halfway (even when they themselves are such).

Clearly distinguish between such concepts as requesting data and maintaining data (entering, changing and deleting); remember that, as a rule, the database is information basis not one, but several applications, some of which will appear in the future; a bad database design cannot be corrected by any (even the most sophisticated) applications. LITERATURE Atre Sh. Structural approach to organizing databases. – M.: Finance and Statistics, 1983. – 320 p. Boyko V.V., Savinkov V.M. Design of information systems databases. – M.: Finance and Statistics, 1989. – 351 p. Date K. Guide to relational DBMS DB2. – M.: Finance, 1990. – 386 p. Hubbard J. Automated database design. – M.: Mir, 1984. – 294 p. Tsikritisis D., Lochowski F. Data models. – M.: Finance and Statistics, 1985. – 344 p.

Basic Concepts

The purpose of information modeling is to provide the most natural ways for humans to collect and present the information that is supposed to be stored in the database being created. Therefore, they are trying to build an infological data model by analogy with natural language (the latter cannot be used in its pure form due to the complexity of computer text processing and the ambiguity of any natural language). The main constructive elements of information models are entities, connections between them and their properties (attributes).Essence– any distinguishable object (an object that we can distinguish from another), information about which must be stored in a database. Entities can be people, places, planes, flights, taste, color, etc. It is necessary to distinguish between concepts such as entity type And entity instance. The concept of entity type refers to a set of homogeneous individuals, objects, events or ideas acting as a whole. An entity instance refers to a specific thing in a set. For example, the entity type can be CITY, and the instance can be Moscow, Kyiv, etc.Attribute– a named characteristic of an entity. Its name must be unique for a particular entity type, but can be the same for different entity types (for example, COLOR can be defined for many entities: DOG, CAR, SMOKE, etc.). Attributes are used to define what information should be collected about an entity. Examples of attributes for the CAR entity are TYPE, MAKE, LICENSE PLATE, COLOR, etc. Here too there is a distinction between type and instance. The COLOR attribute type has many instances or values:

Red, Blue, Banana, White Night, etc.,

however, each entity instance is assigned only one attribute value.

There is no absolute difference between entity types and attributes. An attribute is such only in relation to the entity type. In another context, an attribute may act as an independent entity. For example, for an automobile plant, color is only an attribute of the production product, but for a paint and varnish factory, color is a type of entity.Key– a minimum set of attributes whose values ​​can be used to uniquely find the required instance of an entity. Minimality means that excluding any attribute from the set does not allow the entity to be identified by the remaining ones. For the Schedule entity (clause 1.2), the key is the Flight_number attribute or the set of: Departure_point, Departure_time and Destination_point (provided that one plane flies from point to point at each time).Connection– association of two or more entities. If the purpose of the database was only to store individual, unrelated data, then its structure could be very simple. However, one of the main requirements for organizing a database is to ensure the ability to find some entities by the values ​​of others, for which it is necessary to establish certain connections between them. And since real databases often contain hundreds or even thousands of entities, theoretically more than a million connections can be established between them. The presence of such a multitude of connections determines the complexity of information models.

Characteristics of connections and modeling language

When building information models, you can use the language ER diagrams(from the English Entity-Relationship, i.e. entity-relationship). In them, entities are depicted as marked rectangles, associations as marked diamonds or hexagons, attributes as marked ovals, and connections between them as non-directional edges, above which the degree of connection (1 or a letter replacing the word “many”) and the necessary explanation can be indicated.Between two entities, for example, A and B, four types of connections are possible.First type– ONE-TO-ONE relationship (1:1): at each moment of time, each representative (instance) of entity A corresponds to 1 or 0 representatives of entity B: A student may not “earn” a scholarship, receive a regular scholarship, or receive one of the enhanced scholarships.Second type– ONE-TO-MANY relationship (1:M): one representative of entity A corresponds to 0, 1 or several representatives of entity B. The apartment may be empty; one or more residents may live in it.Since connections in both directions are possible between two entities, there are two more types of relationships: MANY-TO-ONE (M:1) and MANY-TO-MANY (M:N).Example 2.1. If the connection between the entities of MAN and WOMAN is called MARRIAGE, then there are four possible representations of such a connection: The nature of connections between entities is not limited to those listed. There are also more complex connections:
    many connections between the same entities
(a patient, having one attending physician, may also have several consulting physicians; a physician may be the attending physician of several patients and may simultaneously consult several other patients);
    training connections
(a doctor can order more than one patient for more than one test, a test can be ordered by more than one doctor for more than one patient, and a patient can be ordered for more than one test by more than one doctor);
    connections of higher orders, the semantics (meaning) of which is sometimes very complex.
In the examples given, to improve the illustrative nature of the relationships under consideration, the attributes of entities and associations in all ER diagrams are not shown. Thus, entering only a few basic attributes into the description of marriage ties will significantly complicate the ER diagram (Fig. 2.1a). In this regard, the language of ER diagrams is used to build small models and illustrate individual fragments of large ones. Less visual, but more meaningful, is more often used. information modeling language(YIM), in which entities and associations are represented by sentences of the form:ENTITY (attribute 1, attribute 2, ..., attribute n)ASSOCIATION [ENTITY S1, ENTITY S2, ...] (attribute 1, attribute 2, ..., attribute n)where S is the degree of connection, and the attributes included in the key must be marked with an underscore.Thus, the above example of a set of connections between entities can be described in NAM as follows:Doctor (Doctor_number , Last name, First name, Patronymic, Specialty)Patient (Registration number , Bed number, Last name, Name, Patronymic, Address, Date of Birth, Gender)Attending_doctor [Doctor 1, Patient M] ( Doctor_number , Registration number ) Consultant [Doctor M, Patient N] ( Doctor_number , Registration number ).

Rice. 2.1. Examples of ER diagrams

To identify relationships between entities, it is necessary, at a minimum, to define the entities themselves. But this is not a simple task, since in different subject areas the same object can be an entity, attribute or association. Let us illustrate this statement with examples related to the description of marital ties (see example 2.1).Example 2.2. The Civil Registration Office (ZAGS) does not deal with all people, but only with those who have applied to register a marriage, birth or death. Therefore, in countries where only traditional marriages are allowed, civil registry offices can post information about registered marriages in a single entity:Marriage (Certificate number , Husband's last name, Husband's first name, Middle_name of husband, Date of birth_of husband, Last name of wife, ... , Registration_date, Registration_place, ...),The ER diagram of which is shown in Fig. 2.1, b.Example 2.3. Now consider a situation where the civil registry office is located in a country that allows polygamy. If you use the “Marriage” entity of Example 2.2 to register marriages, then information about husbands who have several wives will be duplicated (see Table 2.1).

Table 2.1

Certificate number

Duplication can be eliminated by creating an additional entity "Husbands"Husbands (Code_M , Last name, First name, Patronymic, Date of birth, Place of birth)and replacing the “Marriage” entity with a characteristic (see clause 2.3) with reference to the corresponding description in the “Husbands” entity.Marriage (Certificate number , Code_M , Wife's last name, ..., Date of registration, ... (Husbands).The ER diagram of the connection between these entities is shown in Fig. 2.1,c, and an example of their copies is in table. 2.2 and 2.3.

Table 2.2

Code_M

Table 2.3

Certificate number

Example 2.4. Finally, consider the case when an organization needed data about the presence of married couples in it, and there was already an entity to store information about employeesEmployees (Personnel Number , Last name, First name, ...).Using the “Marriage” entity discussed in example 2.2 is inappropriate: “Employees” already contains the surnames, first names, and patronymics of the spouses. Therefore, let's create an associationMarriage [Employee 1, Employee 1] (Husband_personnel_number, Wife_personnel_number, ...),connecting certain instances of the “Employees” entity (Fig. 2.1,d).In conclusion, we note that the ER diagram Fig. 2.1a describes the structure of placing data on marriages in the registry offices of countries that allow group marriages, and the ER diagrams of example 2.1 describe any types of marriages in organizations where there are entities “men” and “women”, including single and unmarried people.What is “connection”? In ER diagrams, this is the line connecting geometric figures, depicting entities, attributes, associations and other information objects. In the text, this term is used to indicate the interdependence of entities. If this interdependence has attributes, then it is called an association.

Entity classification

The time has come to understand the terminology. K. Date defines three main classes of entities: rod, associative And characteristic, as well as a subclass of associative entities – designations. Core Essence (kernel) is an independent entity (it will be defined in more detail below).In the examples discussed earlier, the rods are “Student”, “Apartment”, “Men”, “Doctor”, “Marriage” (from example 2.2) and others, the names of which are placed in rectangles.Associative Entity (association) is a many-to-many relationship ("-to-many", etc.) between two or more entities or instances of an entity (as in Example 2.4). Associations are treated as full-fledged entities:they can participate in other associations and designations just like core entities;may have properties, i.e. have not only a set of key attributes necessary to indicate relationships, but also any number of other attributes that characterize the relationship. For example, the “Marriage” associations from examples 2.1 and 2.4 contain key attributes “Code_M”, “Code_Zh” and “Husband’s Personnel Number”, “Wife’s Personnel Number”, as well as clarifying attributes “Certificate Number”, “Registration Date”, “Place of Registration ", "Entry number in the registry office book", etc.Characteristic entity (characteristic) is a many-to-one or one-to-one relationship between two entities (a special case of association). The sole purpose of a characteristic within the subject area under consideration is to describe or clarify some other entity. The need for them arises due to the fact that entities of the real world sometimes have multi-valued properties. A husband can have several wives (example 2.3), a book can have several characteristics of a reprint (corrected, expanded, revised, ...), etc.The existence of a characteristic depends entirely on the entity being characterized: women lose their status as wives if their husband dies.To describe the characteristic, a new JIM proposal is used, which in general has the form:CHARACTERISTIC (attribute 1, attribute 2, ...) (LIST OF CHARACTERIZED ENTITIES).Let's also expand the language of ER diagrams by introducing a trapezoid to represent the characteristics (Fig. 2.2).

Rice. 2.2. Elements of the extended ER diagram language

Designating entity or designation is a “many-to-one” or “one-to-one” relationship between two entities and differs from a characteristic in that it does not depend on the designated entity.Let's consider an example related to the enrollment of employees in various departments of the organization.In the absence of strict rules (an employee can be simultaneously enrolled in several departments or not be enrolled in any department), it is necessary to create a description with the association Enrollment:Employees (Personnel number, Last name, ...)Enrollment [Departments M, Employees N] (Department number, Personnel number, Date of enrollment).However, provided that each employee must be enrolled in one of the departments, you can create a description with the designation Employees:Departments (Department number, Department name, ...)Employees (Personnel number, Last name, ..., Department number, Date of enrollment)[Departments]In this example, the employees have an independent existence (if a department is deleted, it does not follow that the employees of that department should also be deleted). Therefore, they cannot be characteristics of departments and are called designations.Notations are used to store repeating values ​​of large text attributes: “codifiers” of disciplines studied by students, names of organizations and their departments, lists of goods, etc.The description of a designation differs externally from the description of a characteristic only in that the designated entities are enclosed not in curly brackets, but in square brackets:NOTE (attribute 1, attribute 2, ...)[LIST DESIGNATED ENTITIES].Typically, designations are not treated as full entities, although this would not lead to any error.Designations and characteristics are not completely independent entities, since they presuppose the existence of some other entity that will be “designated” or “characterized.” However, they still represent special cases of an entity and can, of course, have properties, can participate in associations, designations and have their own (lower level) characteristics. We also emphasize that all instances of a characteristic must be associated with some instance of the characterized entity. However, it is allowed that some instances of the characterized entity do not have relationships. True, if this concerns marriages, then the essence of “Husbands” should be replaced by the essence of “Men” (there is no husband without a wife).Let us now redefine the core entity as an entity that is neither an association, nor a designation, nor a characteristic. Such entities have independent existence, although they can designate other entities, such as employees designate departments.In conclusion, let us consider an example of constructing an information model of the “Nutrition” database, where information about dishes (Fig. 2.3), their daily consumption, the products from which these dishes are prepared, and the suppliers of these products should be stored. The information will be used by the chef and manager of a small catering establishment, as well as its visitors.1. Lobio in Georgian
Add salt, chopped peeled beans, chopped onions, pepper and simmer in oil with a small amount of broth; add cilantro, parsley, regan (basil) and cook until done. Then bake in the oven.
Green beans (fresh or canned) 200,
Green onions 40, Butter 30, Greens 10.
Yield 210. Calories 725.

Rice. 2.3. Example of a recipe

With the help of these users, the following objects and characteristics of the designed base were identified:
    Dishes that require data included in their culinary recipes to describe them: dish number (for example, from a book culinary recipes), name of the dish, type of dish (appetizer, soup, main course, etc.), recipe (technology for preparing the dish), yield (portion weight), name, calorie content and weight of each product included in the dish. For each product supplier: name, address, name of product supplied, delivery date and price at the time of delivery. Daily food consumption (consumption): dish, number of servings, date.
Analysis of objects allows us to highlight:
    stems Dishes, Products and Cities; associations Composition (links Dishes with Products) and
Supplies (links Suppliers with Products);
    designation Suppliers; characteristics Recipes and Consumption.
The ER diagram of the model is shown in Fig. 2.4. and the model in the YAM language has the following form:Dishes (BL, Dish, View)Products (PR, Product, Calorie content)Suppliers (POS, City, Supplier) [City]Composition [Dishes M, Products N] (BL, PR, Weight (g))Supplies [Suppliers M, Products N] (POS, PR, Date_P, Price, Weight (kg))Cities (City, Country)Recipes (BL, Recipe) (Dishes)Consumption (BL, Date_R, Portions) (Dishes)In these models, Dish, Product and Supplier are names, and BL, PR and POS are digital codes of dishes, products and organizations supplying these products.

Rice. 2.4. Infological model of the "Nutrition" database

About primary and foreign keys

Let us recall that key or possible clue is a minimum set of attributes whose values ​​can be used to unambiguously find the required instance of an entity. Minimality means that excluding any attribute from the set does not allow the entity to be identified by the remaining ones. Every entity has at least one possible key. One of them is taken as primary key. When choosing a primary key, you should give preference to non-composite keys or keys made up of a minimum number of attributes. It is also inappropriate to use keys with long text values ​​(integer attributes are preferable). Thus, to identify a student, you can use either a unique record book number, or a set of last name, first name, patronymic, group number and possibly additional attributes, since it is possible that two students (and more often female students) with the same last names, first names and patronymics. It is also bad to use as a key not the number of the dish, but its name, for example, “Appetizer of processed cheese “Friendship” with ham and pickled cucumber” or “Hare in sour cream with potato croquettes and red cabbage salad.”The primary key of a core entity (any attribute participating in the primary key) is not allowed to have an undefined value. Otherwise, a contradictory situation will arise: a non-individual, and therefore non-existent, instance of the core essence will appear. For the same reasons, it is necessary to ensure uniqueness primary key. Now about foreign keys:
    If entity C links entities A and B, then it must include foreign keys corresponding to the primary keys of entities A and B. If entity B refers to entity A, then it must include a foreign key corresponding to entity A's primary key.
In paragraph 2.3, an example was considered where “Employees” denoted “Departments” and included a foreign key “Department Number” corresponding to the primary key of the “Departments” entity.The relationship between primary and foreign keys of entities is illustrated in Fig. 2.5.

Rice. 2.5. Structures: a - associations; b - designations (characteristics)

Here, to designate any of the associated entities (cores, characteristics, designations or even associations), a new generalizing term “Goal” or “Target Entity” is used.Thus, when considering the problem of choosing how to represent associations and notations in a database, the main question that needs to be answered is: “What are foreign keys?” And then, for each foreign key, three questions need to be resolved:1. Can this foreign key accept undefined values ​​(NULL values)? In other words, can there be some instance of an entity of a given type for which the target entity indicated by the foreign key is unknown? In the case of supplies, this is probably not possible - a supply from an unknown supplier or a supply of an unknown product does not make sense. But in the case of employees, such a situation could however make sense - it is quite possible that any employee is not currently enrolled in any department at all. Note that the answer to this question does not depend on the whim of the database designer, but is determined by the actual course of action adopted in that part of the real world that is to be represented in the database in question. Similar remarks are relevant to the issues discussed below.2. What should happen when you try to DELETE a target entity that is referenced by a foreign key? For example, when deleting a supplier who has made at least one delivery. There are three possibilities: CASCADES 3. What should happen when you try to UPDATE the primary key of a target entity that is referenced by some foreign key? For example, an attempt may be made to update the number of a supplier for which there is at least one corresponding delivery. For clarity, we will again consider this case in more detail. You have the same three options as when deleting: CASCADES Thus, for each foreign key in a design, the database designer must specify not only the field or combination of fields that makes up that foreign key and the target table that is identified by that key, but also the answers to the questions above (the three constraints that apply to this foreign key).Finally, about characteristics - denoting entities, the existence of which depends on the type of denoted entities. The designation is represented by a foreign key in the table corresponding to that characteristic. But the three foreign key constraints discussed above for this case should be specified as follows:NULL values ​​are not allowedREMOVAL FROM (target) CASCADESUPDATE (target primary key) CASCADESThe specified specifications represent dependence on the existence of characteristic entities.

Integrity Constraints

Integrity(from the English integrity - intactness, inviolability, safety, integrity) - is understood as the correctness of data at any time. But this goal can only be achieved through within certain limits: The DBMS cannot control the correctness of every single value entered into the database (although each value can be checked for plausibility). For example, it cannot be discovered that the input value 5 (representing the day of the week) should actually be 3. On the other hand, the value 9 would clearly be an error and should be rejected by the DBMS. However, to do this, she should be told that the numbers of the days of the week must belong to the set (1,2,3,4,5,6,7).Maintaining database integrity can be thought of as protecting data from unauthorized changes or destruction (not to be confused with unauthorized changes and destruction, which are a security issue). Modern DBMSs have a number of means to ensure that integrity is maintained (as well as means to ensure that security is maintained).There are three groups of integrity rules:
    Entity integrity. Referential integrity. User-defined integrity.
In section 2.4, the motivation for two integrity rules that are common to any relational databases was considered.
    Any attribute participating in a primary key is not allowed to have an undefined value. The foreign key value must either:
      be equal to the value of the target's primary key; be completely uncertain, i.e. Each attribute value participating in a foreign key must be null.
    For any specific database, there are a number of additional specific rules that apply to it alone and are determined by the developer. Most often controlled:
the uniqueness of certain attributes,
range of values ​​(exam score from 2 to 5),
belonging to a set of values ​​(gender "M" or "F").

About building an information model

A reader who has become acquainted only with the material in this and the previous chapters will not be able to correctly perceive and evaluate those tips and recommendations for building a good information model, which have been developed over decades by the largest specialists in the field of data processing. To do this, you need to at least study the following materials. Ideally, it is necessary for the reader to first implement at least one information system project, offer it to real users, and be a database and application administrator long enough to realize at least a small fraction of the problems that arise from an insufficiently thought-out design. The experience of the author and all the information systems specialists he knows shows that any theoretical recommendations are taken seriously only after several unsuccessful attempts to revive poorly designed systems. (Although there are also designers who continue to believe that they can revive a dying project by changing programs, rather than by changing the database infological model.)The main difficulty in perceiving the recommendations given in the fourth chapter and Appendix B is purely psychological.Indeed, to determine the list and structure of stored data, it is necessary to collect information about real and potential applications, as well as about database users, and when building an information model, you should only care about the reliability of storing this data, completely forgetting about the applications and users for which the database is being created data.This is due to the completely different requirements for the database of application programmers and the database administrator. The former would like to have in one place (for example, in one table) all the data they need to implement a request from an application program or from a terminal. The latter take care of eliminating possible distortions of stored data when entering new information into the database and updating or deleting existing information. To do this, they remove duplicates and unwanted functional relationships between attributes from the database, breaking the database into many small tables (see section 4.6). Since many years of global experience in using information systems built on the basis of databases shows that design flaws cannot be eliminated by any tricks in application programs, experienced designers do not allow themselves to meet application programmers halfway (even when they themselves are such).And although the author recognizes that most people prefer to learn from their own mistakes, he still once again advises inexperienced database designers:
    clearly distinguish between such concepts as requesting data and maintaining data (entering, changing and deleting); remember that, as a rule, a database is the information basis of not one, but several applications, some of which will appear in the future; a bad database design cannot be corrected by any (even the most sophisticated) applications.

LITERATURE

    Atre S. Structural approach to organizing databases. – M.: Finance and Statistics, 1983. – 320 p. Boyko V.V., Savinkov V.M. Design of information systems databases. – M.: Finance and Statistics, 1989. – 351 p. Date K. Guide to the DB2 relational DBMS. – M.: Finance and Statistics, 1988. – 320 p. Jackson G. Designing relational databases for use with microcomputers. -M.: Mir, 1991. – 252 p. Kirillov V.V. Structured Query Language (SQL). – St. Petersburg: ITMO, 1994. – 80 p. Martin J. Planning the development of automated systems. – M.: Finance and Statistics, 1984. – 196 p. Meyer M. Theory of relational databases. – M.: Mir, 1987. – 608 p. Tiori T., Fry J. Design of database structures. In 2 books, - M.: Mir, 1985. Book. 1. – 287 pp.: Book. 2. – 320 s. Ullman J. Databases in Pascal. – M.: Mechanical Engineering, 1990. – 386 p. Hubbard J. Automated database design. – M.: Mir, 1984. – 294 p. Tsikritisis D., Lochowski F. Data models. – M.: Finance and Statistics, 1985. – 344 p.

Infological model (information-logical model)- a human-oriented and independent of the DBMS type domain model that defines sets of information objects, their attributes and relationships between objects, the dynamics of changes in the subject domain, as well as the nature of the information needs of users. The information model of the subject area can be described by the “entity-relationship” model (Chen’s model), which is based on the division of the real world into separate distinguishable entities that are in certain connections with each other, and both categories - essence and connection are considered primary, indefinite concepts .

The purpose of information modeling

  • providing the most natural ways for humans to collect and present the information that is supposed to be stored in the created database. Therefore, they are trying to build an infological data model by analogy with natural language (the latter cannot be used in its pure form due to the complexity of computer text processing and the ambiguity of any natural language). The main constructive elements of information models are entities, connections between them and their properties (attributes).

Basic Concepts

  • Essence– any distinguishable object (an object that we can distinguish from another), information about which must be stored in a database. Entities can be people, places, planes, flights, taste, color, etc. It is necessary to distinguish between concepts such as entity type and entity instance. The concept of entity type refers to a set of homogeneous individuals, objects, events or ideas acting as a whole. An entity instance refers to a specific thing in a set. For example, the entity type can be CITY, and the instance can be Moscow, Kyiv, etc.
  • Attribute– a named characteristic of an entity. Its name must be unique for a particular entity type, but can be the same for different entity types (for example, COLOR can be defined for many entities: DOG, CAR, SMOKE, etc.). Attributes are used to define what information should be collected about an entity. Examples of attributes for the CAR entity are TYPE, MAKE, LICENSE PLATE, COLOR, etc. Here too there is a distinction between type and instance. The COLOR attribute type has many instances or values: Red, Blue, Banana, White Night, etc., but each entity instance is assigned only one attribute value.

There is no absolute difference between entity types and attributes. An attribute is such only in relation to the entity type. In another context, an attribute may act as an independent entity. For example, for an automobile plant, color is only an attribute of the production product, but for a paint and varnish factory, color is a type of entity.

  • Key– a minimum set of attributes whose values ​​can be used to uniquely find the required instance of an entity. Minimality means that excluding any attribute from the set does not allow the entity to be identified by the remaining ones. For the Schedule entity, the key is the Flight_number attribute or the set of: Departure_point, Departure_time and Destination_point (provided that one plane flies from point to point at any given time).
  • Connection– association of two or more entities. If the purpose of the database was only to store individual, unrelated data, then its structure could be very simple. However, one of the main requirements for organizing a database is to ensure the ability to find some entities by the values ​​of others, for which it is necessary to establish certain connections between them. And since real databases often contain hundreds or even thousands of entities, theoretically more than a million connections can be established between them. The presence of such a multitude of connections determines the complexity of information models.

Requirements for the information model

  • Adequate mapping of the subject area
  • Avoiding ambiguous interpretation of the model
  • Clear definition of the modeled subject area (finiteness of the model)
  • Easy extensibility, ensuring the entry of new data without changing previously defined ones, the same applies to deleting data
  • Possibility of composition and decomposition of the model due to the large dimension of real information models
  • Easy to understand by different categories of users; It is desirable that the information model be built (or at least participated in its creation) by a specialist working in a given subject area, and not just by a designer of computer data processing systems
  • Applicability of the model specification language in both manual and computer-aided design information systems

Components of an information model

  • Description of objects and relationships between them, called the ER-model (stands for the “Entity-Relationship” model)
  • Description of user information needs
  • Algorithmic attribute relationships
  • Linguistic relations determined by the peculiarities of representing the subject area in the linguistic environment
  • Integrity Constraints

Building the "Object - Property - Relationship" model

Object classes

In the subject area, in the process of its examination and analysis, they distinguish object classes. An object class is a collection of objects that have the same set of properties. For example, if we consider a university as a subject area, then the following classes of objects can be distinguished: students, teachers, audiences, etc. Objects can be real, as mentioned above, or they can be abstract, such as objects, which students study.

When reflected in information system each object is represented by its identifier, which distinguishes one object of a class from another, and each class of objects is represented by the name of that class. So, for objects of the class “SUBJECTS STUDYED”, the identifier of each object will be “NAME OF THE SUBJECT”. The ID must be unique.

Each object has a specific set of properties. For objects of the same class, the set of these properties is the same, but their values, naturally, may differ. For example, for objects of the “STUDENT” class, such a set of properties describing objects of the class may be “YEAR OF BIRTH”, “GENDER”, etc.

When describing a subject area, it is necessary to depict each of the existing classes of objects and the set of properties fixed for objects of this class.

We will use the following notation to display objects and their properties.

Each object class in an information model is assigned a unique name. The name of an object class is a grammatical phrase of a noun (a noun that can have adjectives and prepositions). If the name consists of several words, then it is desirable that the noun come first. The noun must be used in the singular, not the plural. Therefore, for the class of objects “STUDYED DISCIPLINES” discussed above, it is better to give the name “DISCIPLINE STUDYED”. If in a subject area different names are traditionally used to designate any class of objects (i.e. there is synonymy), then all of them should be recorded when describing the system, then one of them is chosen as the main one, and only this should be used in the future at ILM. In addition to the name of the object class, the ILM can use its short code designation.

When constructing an information model, it is advisable to give a verbal interpretation of each entity, especially if an ambiguous interpretation of the concept is possible.

Relationships between an object and its properties

When describing a subject area, it is necessary to reflect the connections between the object and the properties that characterize it. This is depicted simply as a line connecting the designation of an object and its properties.

The relationship between an object and its property can be different. An object can have only one value for a property. For example, each person can only have one date of birth. Let's call these properties single. For other properties, it is possible for one object to have several values ​​at the same time. Let, for example, when describing an “EMPLOYEE”, the “Foreign LANGUAGE” that he speaks is recorded as his property. Since an employee can know several foreign languages, we will call this property multiple. When depicting the connection between an object and its properties, we will use a single arrow for single properties, and a double arrow for multiple properties.

In addition, some properties are permanent; their value cannot change over time. Let's call these properties static, and those properties whose value can change over time will be called dynamic.

Another characteristic of the connection between an object and its property is a sign of whether this property is present in all objects of a given class or absent in some objects. For example, for individual employees the property “ASCHARGE DEGREE” may exist, but other objects of this class may not have the specified property. Let us call such properties conditional.

When depicting the connection between a conditional property and an object, we will use a dotted line, and to indicate dynamic and static properties we will use the letters D and S above the corresponding line.

Sometimes it is useful to introduce the concept of a “composite property” in an infological model. Examples of such properties are “ADDRESS”, consisting of “CITY”, “STREET”, “HOUSE” and “APARTMENT”, and “BIRTH DATE”, consisting of “DAY”, “MONTH” and “YEAR”. In the ILM, to denote a composite property, we use a square, from which emanate lines connecting it with the designations of its constituent elements.

Relationships between objects

In addition to the connection between an object and its properties, the infological model captures connections between objects of different classes. There are connections of the following types:

  • “one to one” (1:1): at each moment of time, each representative (instance) of entity A corresponds to 1 or 0 representatives of entity B:
A student may not “earn” a scholarship, receive a regular scholarship, or receive one of the enhanced scholarships.
  • “one-to-many” (1:M): one representative of entity A corresponds to 0, 1 or several representatives of entity B.
The apartment may be empty; one or more residents may live in it.
  • "many to one" (M:1)

Sometimes these types of connections are called the degree of connection. In addition to the degree of connection in the infological model, to characterize the connection between different entities, it is necessary to indicate the so-called “membership class”, which shows whether there may be no connection between an object of a given class and any object of another class. An entity's membership class must be either required or optional.

Let's explain what was said on specific examples. As mentioned above, the infological model is not built for separate object, but displays classes of objects and relationships between them. A corresponding diagram that displays this is called an ER-type diagram (this name is due to the fact that in English the word “entity” is written “Entity”, and relationship is “Relationship”). However, sometimes, in addition to ER-type diagrams, ER-instance diagrams are used.

Let's assume that the infological model displays the connection between two classes of objects: “PERSONALITY” and “Foreign LANGUAGE”. -

Let's assume that the domain is a factory where some employees know a foreign language, but none of them speaks more than one language. Naturally, there are many languages ​​that none of the employees speak, and also that some of the employees speak the same foreign language.

Let us further assume that the subject area is an institute, and the PERSON object represents applicants entering this institute. Each of the applicants must be proficient in some foreign language, but no one speaks more than one language.

In both the first and second cases considered, the relation M:1 is observed between the entities. In the diagram, this is displayed from the side of the “PERSONALITY” object by a double arrow, and from the side of the “Foreign LANGUAGE” object by a single arrow on a line depicting the connection between these entities.

The difference in the situations under consideration is that in the first case, the membership class is optional for both entities, and in the second case, for the “PERSONALITY” entity, the membership class is mandatory. In the diagram, this is represented by a dot in the rectangle corresponding to the “PERSONALITY” object.

Let the subject area be the same as in the previous case, but there are situations where some applicants know several foreign languages. In this case, the connection between objects will be of type M: M.

Let us assume that the subject area is a certain linguistic institute, in which each of the employees necessarily knows several foreign languages, and for each of the languages ​​known to science in this institute there is at least one specialist who speaks it.

In this case, the relationship between the entities will be M:M, and the membership class of both entities is mandatory.

Simple and complex objects

An object is said to be simple if it is treated as indivisible. A complex object is a combination of other objects, simple or complex, also displayed in the information system. The concept of “simple” and “complex” object is relative. In one consideration, an object may be considered simple, and in another, the same object may be considered complex. For example, the “chair” object in the material assets accounting subsystem will be considered as a simple object, but for an enterprise that produces chairs, it will be a composite object (including “legs”, “backrest”, “seat”, etc.).

There are several varieties complex objects: Composite objects, generic objects and aggregated objects.

A composite object corresponds to the mapping of the “whole-part” relationship. Examples of composite objects are NODES - PARTS, CLASS - STUDENTS, etc.

To display composite objects in an information model, no special symbols are usually used. The relationship between a composite and its constituent objects is displayed in the same way as described above. Moreover, the nature of the connection can also be different: for example, “DETAILS” and “NODES” are connected by a relationship of type M: M, and “GROUP” and “STUDENTS” are connected by a relationship of 1: M.

A generalized object reflects the presence of a “genus-species” relationship between objects in the subject area. For example, the objects STUDENT, SCHOOLBOY, GRADUATE, TECHNICAL SCHOOL STUDENT form the generalized object STUDENTS. The objects that make up a generalized object are called its categories.

Both a “generic” object and “species” objects can have a certain set of properties. Moreover, the so-called inheritance of properties is observed, i.e., a “species” object has all the properties that a “generic” object has, plus properties inherent only to objects of this type.

Aggregated objects usually correspond to some process in which other objects are “involved.” For example, the aggregated object “DELIVERY” combines the objects “SUPPLIER”, which supplies products, “CONSUMER”, which receives these products, as well as the supplied “PRODUCTS” itself. A unique object is the “DELIVERY DATE”. An aggregated object, like a simple object, can have properties that characterize it. In the example under consideration, such a property could be the size of the delivery.

Comparison of methods for constructing ER models

ER models are very widely used in database design practice. Moreover, they are used both in manual and computer-aided design. Techniques for graphical representation of ER models differ somewhat in different systems design automation and in various literary sources.

Next, we will look at the features of representing ER models in the three most known systems design automation (CASE systems): Prokit*WORKBENCH, Design/IDEF and CASE ORACLE, as well as in some literary sources.

Several categories of differences in the depiction of ER models can be identified.

1. Minor differences associated with the use of different symbols to display the same entities. Thus, rectangles, blocks with rounded corners, ovals, etc.

The next set of differences is related to the way of depicting connections between objects and specifying the names of connections. Thus, in some techniques, to depict a connection in the connector of a line representing this connection, it is proposed to depict a rhombus and write the name of the connection inside or next to it (Chen’s model). Since connections are two-way, the name of the connection will change depending on which side it is viewed from. Therefore, it is often proposed to indicate both of these names in the ILM (for example, in CASE systems ORACLE, Prokit). Moreover, in order to make it clear which direction of communication which name refers to, certain agreements are adopted on how to place these names on the diagrams. For example, above the line place names related to the left side of the connection, and below the line - to the right. The presence of such large number designations and signatures clutter the model. In addition, the naming itself often presents some difficulty, which increases the complexity of information modeling. Therefore, in cases where this does not lead to ambiguities and ambiguities, if the system allows it, it can be recommended not to use special designations and names for relationships.

Different symbols are also used to depict the type of connection (1:1, 1: M, M:M). Some design automation systems, such as Prokit, give the user the opportunity to choose from a variety of possible notations those that he likes or are more familiar with. In this system, the following conventions can be used to indicate the type of connections between objects.

To display the mandatory occurrence of objects in a relationship (“membership/membership class”), different symbols are also used. Thus, in CASE ORACLE the membership class is passed as follows; on the side of the relationship with which the element may not necessarily be associated, a dotted line is used, and where membership is mandatory, a solid line is used. Given the membership class, the types of relationships shown in the figure are possible.

The notation used in CASE ORACLE is more convenient, since if an object is involved in large quantities connections, then additional rectangles with dots become inconvenient to place in the figure.

In Desing IDEF, the nature of membership in a relationship is depicted as shown in the figure.

2. Differences, also related to the way of depicting certain situations, but more significant, leading to differences in the models themselves. For example, in the 3RACLE system, a generalized object is depicted by “nesting” blocks denoting “species” objects inside a block depicting a “generic” object. The figure shows an image of the PERSON object discussed above in the convention used in CASE ORACLE.

As follows from the comparison of the figures, the image of generalized objects in the compared methods differs not only in the form of presentation. So, if an object is classified according to different criteria, then when using the first of the considered methods of depicting generalized objects, it is clearly visible by what criterion the classification is carried out. The second image method does not provide this. In other words, the method of depicting generalized objects proposed at the beginning of the chapter is semantically more meaningful and informative.

The figure shows the same generalized PERSON object using the syntax of the IDEF1X system. In its semantics, this method of representation is closer to the one we proposed basic method ILM images. The difference is that IDEF1X uses the same notation for category entities and “general” entities -

3. In addition to differences in the depiction of certain entities, in the theory of information modeling there is a discrepancy in the terminology used. For example, in CASE ORACLE, a generic object is called a supertype (syper-type), and a species object is called a sub-type. There are many such differences in terminology that can be cited, but this is not our goal now.

4. The next circle of differences is related to the spatial image of certain components of the ILM. For example, object properties are sometimes not shown on the same diagram as objects and the relationships between them, and their descriptions are done separately. Often “the writing of properties is presented in tabular or other analytical form, rather than in graphical form.

The ILM, even for a small and simple subject area, includes a description of a significant number of components and connections between them. This raises the problem of visibility. general scheme. This problem is solved differently in manual and automated construction of an information model. IN automated systems Most often, a single image of the ER model is constructed and the scaling technique is used, when, by decreasing or increasing the scale of the image, you can view both the entire diagram and its individual fragment on the screen.

Various techniques are also used to reduce the number of intersections of lines in the diagram. Thus, in the Prokit system, for these purposes, it is possible to duplicate the image of an object and place this duplicate next to the object with which it needs to be associated. To show that this is not a new object, some kind of symbol, for example, the corner of the corresponding blocks is crossed out.

When designing manually, it is usually not possible to depict the entire ER model in the form of a single diagram. In this case, we can recommend the following technique: depict and describe each object independently, assign a short code to each object. Using these codes, for each object indicate its relationships with other objects.

5. Some capabilities available in some systems or methods are not available in others. In these cases it is possible various options: a) to depict the situation, the possibilities provided by the model are used, but this requires the use of certain techniques, often somewhat artificial, to represent them; b) the situation is simply not reflected in the model.

For example, in many information modeling systems it is assumed that an object can have only single properties. In this case, each multiple property should be represented as an independent object and the relationship between this newly introduced object and the original object should be depicted.

In IDEF, object properties can only be single and always defined (not conditional). If a property may not be present in any objects, then it is necessary to select separate entities, for example, a STAFF EMPLOYEE with the SALARY attribute and an HOURLY EMPLOYEE who does not have such an attribute. This will lead to the need to select a large number of objects and connections in the ILM and to a decrease in the visibility of the model. For example, individual instances of the PERSONALITY object may or may not have an academic title, academic degree, year of graduation from a university, and many other characteristics. For each of these characteristics it will be necessary to distinguish subclasses.

Some methods do not introduce the aggregated object as an independent category. In this case, the aggregated object is depicted as simple, and the user must first determine its identifier and properties. If the model allows only binary relationships to be depicted, then the designer must convert the n-ary relationship into a set of binary ones.

In addition to these difficulties when determining the identifier of an aggregated entity, problems may also arise when moving from an ILM to a datalogical model.

The option when the situation cannot be reflected in the ILM can be illustrated by the following: if the model construction technique does not involve fixing the membership class in the relationship, then this information will simply be lost.

In some CASE systems, there is a situation where some construction is allowed in the system as an intermediate one. For example, in IDEF and CASE ORACLE the relation M:M is allowed as a non-specific relation. Its presence is allowed in the early stages of project development, and later it must be replaced by a specific relationship through the introduction of a third entity. This is a disadvantage of the system, since, firstly, not all DBMS require such a transformation (some systems support the M:M relation explicitly), and, secondly, if such a transformation is required, the design automation system could perform it automatically at the datalogical design stage. Even if “manual” design is performed, the specified transformation must be performed by the designer at the stage of datalogical design, and not when describing the subject area. In addition, during the transformation under consideration at the information design stage, IDEF is introduced new category entities - intersection entities or associative entities. The introduction of new entities entails the introduction into the ILM and additional connections. All this taken together complicates the already difficult task of information design.

There may be entities in the subject area whose identifiers are dependent on the identifier of some other object. For example, if sections of an enterprise are numbered within a workshop, then the section identifier will be composite, including the workshop code and the section code. In the infological model, we can limit ourselves to indicating this composite identifier. Some methods for constructing ER models (for example, the IDEFIX, Prokit methodology) provide for the introduction of special types of entities and special types of relationships to display such situations. Thus, in IDEF, an entity, to identify which one must consider its relationship with other entities; is called an identifier-dependent entity and uses a rounded box to represent it. To depict an identity-independent entity, a rectangle is used. To connect objects, one of which is needed to fully identify the other, the concept of an identifying relationship is introduced. It also has its own symbol. IDEF uses a solid line for an identifying relationship and a dotted line for a non-identifying relationship.

6. As noted above when considering the principles of infological modeling, the concepts of “object”, “property”, “relationship” are relative. Thus, in the basic information model we propose, we distinguish different types objects: simple, composite, aggregated, generalized. Some systems, such as IDEF, do not classify objects in this way and instead use varieties of relationships.

Both approaches have a right to exist. There are no fundamental differences entailing any significant consequences in the compared approaches.







2024 gtavrl.ru.