2 model essence connection basic concepts. Designing Relational Databases Using Semantic Models: ER Diagrams


Entity-relationship data model

The entity-relationship data model was introduced in 1976 by P. P. Chen. It has much in common with hierarchical and network models data and due to its orientation towards the design process can be considered as a generalization and development of previously discussed models. The described model allows direct representation of connections of the type M: N.

Basic Concepts. Entity model-connection" is based on the idea that the real world consists of various entities connected by certain relationships. The categories “essence” and “connection” are declared fundamental, and their division is made at the stage of creating specific representations of some subject area.

Each entity belongs to a certain class or has a corresponding type. There are connections between entities, to which the user assigns some class (type). Thus, entity class And bond class define sets of specific objects and connections between them. Note that an entity can belong to more than one class (for example, a supplier can also be a consumer). At each moment of time, the state of communication S between entity classes E 1, E 2 ..., E n defined by the relationship between DOM sets E 1, DOM E2,..., DOM E n, where DOM E i, i= - set of objects of type E i.

The set of connections in the “entity-relationship” model can be represented as a mathematical relation P object classes:

Where e i- an entity belonging to a set of entities E i, motorcade<e 1 e 2 ... e n >- connection from many connections R. It is not necessary that everyone E i, on which it is determined R, were different. The collection of entities and relationship classes forms top level models.

Entities and relationships are described by their characteristic attributes. Among the attributes of an entity or relationship, a sublist is distinguished whose attribute values ​​uniquely identify the entity or relationship within the type. Entities, relationships and attributes form lower level of the model.

Graphically, the “entity-relationship” model is presented in the form of a diagram in which each class of objects corresponds to a rectangle, and to the class of connections - a hexagon (Fig. 2.7). Under the rectangle and hexagon, the names of the attributes of entities and relationships are indicated.

Rice. 2.7. Graphical representation of the entity-relationship model:

a) entity class; b) class of connections;

When depicting an entity class, we will adhere to the following notation: key attributes are underlined; two different entity classes cannot have the same name.

The following restrictions apply to communications:

types of connections between classes are specified in pairs (1:1, 1: N, N: 1, M: N). When values M and N specified, the maximum value is taken;

one relationship can relate to many entities and one entity can have many relationships. In case of connections of type 1:1, 1: N, N: 1 it is not always necessary to specify the connection name.

Let's consider an example of representing a conceptual database diagram using the “entity-relationship” model (Fig. 2.8). Let there be the following applications: management of supplies, warehouse, production and contracts. These applications can use the following entity classes: SUPPLIER (suppliers), BAZ-DET (basic parts), IZD-UNIT (products and assemblies), CONTRACT (contracts), EMPLOYEE (employees), DEPARTMENT (departments).

Rice. 2.8. An example of an entity-relationship model diagram

To meet the requirements of the above applications, the following relationships between entities are used:

SELECT - allows you to select a supplier of the base product depending on the terms of sale and delivery (these conditions are specified in the diagram);

ASSEMBLY-DB - indicates the basic parts (materials) that are directly used for the production of a product or assembly, as well as their number;

ASSEMBLY-UNIT - indicates units directly included in other units or products, as well as their number;

POST-BAZ - connects suppliers with basic details in the contract;

DESIGNATE - characterizes products and components in the contract;

RESPONSIBLE - indicates the person responsible for the contract;

PARTICIPATES - binds the agreement and the people who participate in its implementation;

WORKS - connects the department and the people who work in it;

MANAGES - indicates the head of this department.

The entity-relationship model diagram can be described as shown in Fig. 2.8.

Entity classes:

E1/SUPPLIER [NOM-POST, FAM POST, ADDRESS];

E2/BAZ-DET [NOM-BDZ-DET, NAIM-BASE-DET, QUANTITY-IN-STOCK, MINIM-QUANTITY];

E3/AGREEMENT [NOM-DOG, DATE];

Link classes:

L 1/POST-BAZ L2/CHOOSE L3/ASSEMBLY-DB

[SUPPLIER, BAZ-DET, CONTRACT];

[SUPPLIER, BASE-DET: PRICE, DEADLINE];

[BASE-DET, ED-NODE: QUANTITY-DB];

Relationship attribute names are separated by a colon from entity class names.

The entity-relationship model includes various characteristics subject area.

1. A relationship can relate to several classes of entities, for example, the POST-BAZ relationship connects the entity classes SUPPLIER, BAZ-DET, CONTRACT.

2. A relationship can refer multiple times to the same entity class, for example an ASSEMBLY-NODE relationship.

3. Many relationships can belong to the same entity class, for example, the WORKS and MANAGES relationships between the EMPLOYEE and DEPARTMENT entities.

4. The model displays various connections such as 1:1, 1: N, M: N.

5. The presence of two classes of entities for parts BAZ-DET and IZD-UZEL allows you to manage: supplies of parts and find suppliers based on the BAZ-DET class; the process of manufacturing products using the IZD-UZEL class.

6. Two classes of entities BAZ-DET and IZD-NODE have common and specific attributes. The presence of common attributes leads to some data redundancy. Specific attributes are required by the scope of the objects.

In the diagram (Fig. 2.8) one could consider only the function of selling products. In this case, the external diagram would include only entities: basic parts, units, products that need to be extracted along with the connections between them from the conceptual diagram. In the external diagram, a distinction must be made between products and assemblies, since some information required for sales is relevant only to products.

To solve the problems of managing a warehouse and production of products, it is necessary to describe the range of products and assemblies, indicating: the composition of products from assemblies and base parts, the composition of assemblies from subassemblies and base parts.

To indicate more specific relationships between entities, there are direct And feedback. Each such connection has a name and a pair of numbers.

Rice. 2.9. Scheme of direct and feedback connections

For example, in the relationship between the entities EMPLOYEE and DEPARTMENT (Figure 2.9), the direct relationship WORKS indicates that the employee works in only one department; Feedback CONTAINS indicates that the department contains at least one employee (usually many employees). In other words, connection L between two entity classes A And IN indicates that the entity A associated at least with M and, as a maximum, with N entities IN. Sometimes N may not be determined.

The entity-relationship model emerged in connection with the needs of database design. It satisfies two important criteria: firstly, the power of its tools allows it to represent the structures and constraints inherent in the real world, and secondly, the gap between the capabilities of the model and industrial DBMSs is not too large. These models help designers communicate with users during the process of analyzing and designing a database.

Relational model

Basic Concepts

In a relational data model, information is stored in one or more related tables. A single table usually represents a collection (group) of either real objects, or some abstract concepts, or events of the same type. Each entry in the table identifies one group object. A table consists of rows and columns, called records and fields, respectively. Tables have the following properties:

1. Each table element represents one data element, i.e. a group of values ​​in one column of one row is not allowed;

2. All columns in the table are homogeneous. This means that the elements of the column are of the same nature. The columns are given names;

3. There are no two identical rows in the table;

4. The order of rows and columns in the table can be arbitrary. In operations with such a table, its rows and columns can be viewed in any order, regardless of their information content or meaning.

Tables with such properties are an exact prototype of a mathematical two-dimensional set - a relationship. But these two concepts are not equivalent. A relation is an abstract mathematical object, and a table is a concrete representation of this abstract object. The difference is manifested in their properties. In a relation, the rows and columns can be unordered, but in a table, the rows are ordered from top to bottom and the columns from left to right. Rows in a table can be repeated rows, but not in a relation.

In the relational model, each row in a table is unique. This is achieved by using keys that contain one or more table fields. Keys are stored in an organized manner that allows direct access to table records during searches. The connection between tables is carried out through the values ​​of one or more matching fields (mostly key ones).

Here are a number of terms used in the relational model:

· Attitude (relation) is a two-dimensional set - a table that meets the above requirements;

· Attribute is a property that characterizes an object. In the table structure, each attribute has a name and a corresponding table column heading. The number of attributes is called degree of relationship ;

· By motorcade (tuple) is a table row. In general, tuples are a set of pairs<атрибут>, <значение>. Each value must be atomic, i.e. cannot be multi-valued or compound. Therefore, multivalued and composite attributes are not supported in the relational model. The number of tuples is called cardinal number ;

· Domain represents the set of all possible values ​​of a particular attribute of a relation.

· Primary key is an attribute of a relation that uniquely identifies each of its tuples. The key can be compound (complex), i.e., consist of several attributes.

· Potential Key is a subset of relation attributes that has the following properties:

The property of uniqueness. There are no identical tuples with the same candidate key values;

The property of non-redundancy. No subset of a potential key is unique.

Every relationship necessarily has a combination of attributes that can serve as a key. Its existence is guaranteed by the fact that a relation is a mathematical set that cannot contain identical tuples, i.e. By at least the entire set of attributes has the property of uniquely identifying tuples of a relation. There may be cases where a relation has several combinations of attributes, each of which uniquely identifies all tuples of the relation. All of these attribute combinations are potential or possible relation keys. One candidate key is selected as the primary key, the rest will be called secondary (alternative). There may even be situations where any of the candidate keys could be chosen as the primary key. An example is the periodic table containing the fields Name, Symbol And Atomic number. Potential keys have very great importance in relational theory. They are used to address tuples. By specifying the value of a potential key, we are guaranteed to receive no more than one tuple. For relationships that are related to other "base" relationships, there are also foreign keys that are used to establish the relationship.

· External key - This is an attribute of a subordinate relationship that is used to establish a connection with the base relationship. It contains values ​​that always match some of the candidate key values ​​of the underlying relation.

Based on the above concepts, mathematically the relationship can be described as follows. Let them be given n sets Dl, D2, D3,..., Dn. Then the attitude R there are many ordered tuples<d1, d2, d3 ,..., dn>, Where dk Î Dk, dk – attribute , a Dk – relationship domain R.

In the mid-70s, IBM engineer Codd proposed a data model based on mathematical operations calculus of relations and relational algebra. The main structural unit of this model was the relation. Therefore, this data model is called relational. Codd also developed a language for manipulating data represented as relationships. He proposed two versions of the data manipulation language that are equivalent in their expressive capabilities:

5. Relational algebra. It is a procedural language because the relation resulting from a query against a relational database is evaluated by executing a sequence of relational operators applied to the relations. Operators consist of operands, which are relations, and relational operations. The result of a relational operation is a relation. The operations of relational algebra can be divided into two groups. The first group consists of operations on sets, which include the operations of union, intersection, difference, division and Cartesian product. The second group consists of special operations on relations: projection, selection and connection.

6. Relational calculus. It is a non-procedural language of a descriptive or declarative nature, containing only information about the desired result. The process of obtaining this result is hidden from the user. Languages ​​of this type include SQL and QBE. The first is based on the tuple relational calculus, the second on the domain relational calculus.

With these languages, you can retrieve a subset of a table's columns and rows, creating smaller tables, and you can combine related data from multiple tables, creating larger tables. Therefore, different users can allocate in a relational database various sets data and connections between them. This way of presenting data is the most natural and understandable for the end user. The relational data model is very flexible because any representation of data with some redundancy can be reduced to two-dimensional tables.

Relationship

The theoretical foundation of the relational approach to databases is the mathematical theory of relations. Basic concepts and operations on relationships are used in relational databases.

Basic concepts and ways of representing relationships. Any system (mathematical, informational) is directly related to a multitude of some objects, or elements. Thus, in mathematics, sets of numbers are used: natural, positive, real, etc. In algebra, elements that can be added, subtracted, multiplied, etc. are considered, and in geometry - sets of points: straight lines, lines, planes, etc. . Facility information system, e.g. educational institution, contains information about teachers, students, departments, faculties, laboratories, class schedules, etc.

In addition to elements, the system includes connections and relationships between them. Yes, numbers A And b may be equal ( A = b) , not equal ( Ab), A more or equal b (Ab); figures A And IN may be congruent ( A = IN), A may contain IN (A B); two straight A And IN may be parallel ( A || IN), perpendicular (). Student A belongs to the set A(students of the department).

All of the above relationships concern two objects and are therefore called binary relations or simply relationships. The relationship between three objects is called ternary, and between n objects - n-ary. Thus, the relationship between the objects CUSTOMER, SUPPLIER, PRODUCT is ternary.

Binary relation R between sets A And IN(denoted R(A, IN)) is any set of ordered pairs ( A, b), Where A A, b IN. If ( A,b) R, then they say that A is in relation R To b, and write down aRb, Since the set of ordered pairs ( A, b), Where A A, b IN, is a Cartesian product A× IN, then any subset of this product will be a binary relation.

Example 2.1. Consider the many suppliers and the many products offered. Any subset of SUPPLIER - PRODUCT relationships is a binary relationship.

Example 2.2. Let given sets A= (1, 2, 3) and IN= (2, 3, 4, 5, 6). Cartesian product A× IN- this is a set of pairs:

(1, 2), (1, 3), …, (1, 6),

(2, 2), (2, 3), …, (2, 6),

(3, 2), (3, 3), …, (3, 6).

Let's construct a binary relation R, whose first element is a divisor of the second. We get the following binary relation: R={(1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 2), (2, 4), (2, 6), (3, 3), (3,6)}.

Example 2.3. Let Olga (O), Pavel (P), Ivan (I) be the names of the children in the family. Attitude A- Brother b will:

R= ((P, O), (I, O), (P, I), (I, P)).

In a relationship R(A, IN) a bunch of A, i.e. the set of all first coordinates is called domain of definition of the relation R, and the set B, i.e. the set of all second coordinates, - range of its values. So, for example 3.3, the domain of definition is the set (P, I), and the domain values ​​- set(O, P, I).

Complement of a binary relation R we will call the relation that defines the subset

= (A× B)\R,

those. a b if and only if ( a, b) R. So, for example 2.2

= {(2, 3), (2, 5), (3, 2), (3, 4), (3, 5)}.

Binary relations can be specified different ways: matrices, graphs, tables (sections). Attitude R(A, IN), Where A = {A 1, A 2 , ..., a m}; B = {b 1, b 2 , ..., b n), can be represented by an adjacency matrix, the rows of which correspond to the elements A, and columns - to elements IN; at intersection and i th line and b j the th column is written 1 if a i Rb j, and 0 if a i Rb j. Adjacency matrices for relationships R and for example 2.2 they look like

R

Binary relation R(A, IN) can be represented as a directed graph. Elements of the set A And IN- vertices of the graph, and those and only those elements are connected by an edge A A, b IN, for which ( a, b) R. So, in the form of a graph in Fig. 2.10 provides an example relationship:

Rice. 2.10. Representation of relation R as a graph

Let three sets be given A, IN, WITH and two relationships R(A, IN) And S(B, WITH). Composition, or multiplication, relationships R And S called a binary relation R.S.(or R*S) between elements of sets A And WITH such that aRSc if and only if there is at least one element b IN, at which true aRb And bSc.

Example 2.4. Let's consider the sets

A = {A 1, A 2 , A 3 }, IN = {b 1 , b 2 , b 3 }, WITH = {With 1 , c 2 , c 3 , c 4 }

and relationships

R (A, B) = {(a 1 , b 2), (a 2 , b 1), (a 2 , b 3), (a 3 , b 4)},

S(B, C) = {(b 1 , c 2), (b 2 , c 1)}.

Multiplying ratios R.S. can be represented in the form of a graph (Fig. 2.11.).

Multiplication of binary relations is associative, i.e. ( R.S.)T= R(ST). Let the relationship be given R(A, IN), S(B, WITH) And T (WITH, D). Then a(R.S.)Td= aR(ST)d, i.e. element A A if and only then is in each of the relations ( R.S.)T And R(ST) to element d D when such elements exist b IN And c WITH, What aRb, bSc, cTd. Multiplication of relations, however, is not in the general case commutative (commutative), i.e. R.S.S.R.. This operation takes place only in special cases (in this case it is said that R And S commutable).

Example 2.5. Let given sets

A = (a, b), B = (a, b, c), C = (b, c)

and relationships R (A, IN) = {(A, b), (b, With)}, S (B, C) = {(b, With), (A, b)). Then aRSc = aSRc for any A A And c WITH.

Multiplication k relations R on a set H, i.e. k-th degree R, denoted Rk, is recursively defined as follows:

1) aR l b true when true aRb;

2) aR i b For i>0 is true when such exists With A,
What aRc And cR i- l b are true.

Let us have aR 3 b. Then there is such With 1, what aRc 1 and c 1 R 2 b. For c 1 R 2 b there is something like this With 2 what c 1 Rc 2 and c 2 Rb, i.e. for аR 3 b there is such a thing With 1, With 2 A, What аRс 1 , c 1 Rc 2 and With 2 Rb.

Let relations be given in one or more sets R i (i runs through many indexes I) And S. Then

, (2.1)

According to a[(U R i)S]With there is such an element b, What a(Ri)b And bSc. And this, in turn, is equivalent to the existence of such an index i 0 what a R b And bSc, i.e.

Rice. 2.11.Representation of the operation of multiplying relations RS in the form of a graph

a(R S) c and therefore a(R i S)c. Note that in equalities (3.1) the union cannot be replaced by intersection. From (3.1) it follows that if given relations R, R" And S, and R R", That

RS R"S, SR SR". (2.2)

Indeed, since R R' That R R" = R", which leads to equality ( R R’) S = RS RS = RS, which is equivalent to including RS R"S.a, if for a functional relationship R its symmetrical relationship is also functional.

Every attitude R(A, IN) can be associated with the function f(x), if its cross section for each X A either empty or an element of the set IN. If f(x) is defined everywhere, i.e. the domain of definition of the function coincides with A, then they say that the ratio R(A, IN) is a map of the set A V IN. Functional relation R(A, IN) is called display A V IN, if for each A A there is one and only one element

Rice. 2.12. Representation of the functional relation R(A, B) in the form of a graph

b B, satisfying the relation aRb. Element b called way element A and is designated aR, and the element A- prototype of an element b when displayed R. The set of all prototypes of an element b V A when displayed R called a complete prototype this element in A.

The display can be specified by a table consisting of two rows. IN top line elements are recorded A A, and below them are the prototypes from the set corresponding to the named elements IN. For example, table

defines a mapping from the set (1, 2, 3, 4) to the set (2, 5, 1, 4). At the same time 1 R = 2, 2R = 5, 3R=1, 4R = 4.

Let R- display A V IN, Q- display IN V WITH. Mapping multiplication PQ will be a mapping A V WITH, and for anyone x?A fair x(PQ) = (xP)Q. Indeed, let x(PQ)=c. Then for some at IN we have xRu And yQc, where xP = at and therefore With = (xP)Q. Back, from ( xP)Q should x(PQ).

We will show the multiplication of mappings specified by tables using an example:

Display R called surjective (surjection) or displaying a set A for many IN, when each element b?IN has at least one prototype of A.

Example 2.6. Let A And IN- sets real numbers. Mapping (surjective) A on IN could be a function defined by the formula X→ Z X+ 5, i.e. X goes into y = 3x + 5.

Function Xat=X 2 defines the set display A V B, which is not surjective, since negative numbers from IN are not images of elements from A.

Display R sets A into the multitude IN is called one-to-one if the inverse relation R- l is a mapping IN V A. For a one-to-one mapping defined using sections, it is necessary and sufficient that each element of IN appeared in the bottom row of the table once and only once. Thus, the three tables given earlier as an example of multiplication of mappings correspond to one-to-one mappings.

A one-to-one mapping for which R defined everywhere, called injective (injection).

Example 2.7. Let A- a bunch of real numbers, IN- the set of positive real numbers. Display Xat = e x is one-to-one, since each at corresponds X= log y. Thus, we have an injective mapping, the inverse of which is the mapping atX=ln y.

One-to-one mapping R between elements of one set, for which R And R- l are defined everywhere, called self-referential or bijective mapping. A bijective mapping is both surjective and injective.

When mapping a certain set into itself, we say that the mapping aRb translates point A exactly b. When aRa point a is called fixed point display R. If all points of the set A are motionless during the mapping, then the mapping is called identical and denote E A. It's obvious that E -1 =E and for any display R RE=ER = R. When specifying a mapping to itself using sections, the bottom row of the table will have the same elements as the top row (possibly in a different order), and each of them appears once and only once:

The adjacency matrix corresponding to the mapping into itself is square:

R

The graph representation of a mapping to itself consists of cycles (finite or infinite).

In actual database structure design, a method is used - the so-called semantic modeling. Semantic modeling is the modeling of data structure based on the meaning of that data. Used as a semantic modeling tool various options entity-relationship diagrams(ER - Entity-Relationship).

The first version of the entity-relationship model was proposed in 1976 by Peter Ping-Sheng Chen. Subsequently, many authors developed their own versions of similar models (Martin notation, IDEF1X notation, Barker notation, etc.). In addition, various software, implementing the same notation may differ in their capabilities. In fact, all variants of entity-relationship diagrams are based on the same idea - a drawing is always clearer than a text description. All such diagrams use graphic image entities of the subject area, their properties (attributes), and relationships between entities.

We will describe working with ER diagrams close to Barker's notation as fairly easy to understand the basic ideas. This chapter is more of an illustration of semantic modeling techniques than a full introduction to the field.

Basic concepts of ER diagrams

Definition 1: Essence is a class of objects of the same type, information about which must be taken into account in the model.
Each entity must have a name expressed by a singular noun. Examples of entities can be such object classes as “Supplier”, “Employee”, “Invoice”. Each entity in the model is depicted as a rectangle with a name:

Rice. 1

Definition 2: Entity instance is a specific representative of a given entity.
For example, a representative of the “Employee” entity can be “Employee Ivanov”. Entity instances must be distinguishable, i.e. entities must have some properties that are unique to each instance of that entity.

Definition 3: Entity attribute is a named characteristic that is some property of an entity.
The name of the attribute must be expressed as a singular noun (possibly with characterizing adjectives). Examples of attributes of the “Employee” entity can be such attributes as “Personnel number”, “Last name”, “First name”, “Patronymic”, “Position”, “Salary”, etc. Attributes are depicted within a rectangle defining the entity:

Rice. 2

Definition 4: Entity Key is a non-redundant set of attributes, the values ​​of which together are unique for each instance of the entity. Non-redundancy is that removing any attribute from a key breaks its uniqueness. An entity can have several different keys. Key attributes are shown underlined on the diagram:

Rice. 3

Definition 5: Connection- this is some association between two entities. One entity can be connected to another entity or to itself.
Relationships allow one entity to find other entities related to it. For example, connections between entities can be expressed by the following phrases - “An EMPLOYEE can have several CHILDREN”, “Each EMPLOYEE must be enrolled in exactly one DEPARTMENT”. Graphically, the relationship is depicted by a line connecting two entities:

Rice. 4

Each link has two ends and one or two names. The name is usually expressed in an indefinite verbal form: “to have”, “to belong”, etc. Each name refers to its own end of the connection. Sometimes names are not written because they are obvious.

Each link can have one of the following types of communication:

Rice. 5

Communication type one to one means that one instance of the first entity (left) is associated with one instance of the second entity (right). A one-to-one relationship most often indicates that we actually have only one entity, incorrectly divided into two.

Communication type one-to-many means that one instance of the first entity (left) is associated with several instances of the second entity (right). This is the most commonly used type of communication. The left entity (on the "one" side) is called parental, right (from the “many” side) - subsidiary. A typical example of such a connection is shown in Fig. 4.

Communication type many-to-many means that each instance of the first entity can be associated with multiple instances of the second entity, and each instance of the second entity can be associated with multiple instances of the first entity. The many-to-many relationship type is a temporary relationship type that is acceptable during the early stages of model development. In the future, this type of relationship must be replaced by two one-to-many relationships by creating an intermediate entity.

Each connection can have one of two communication modalities:

Rice. 6

Modality" Maybe" means that an instance of one entity may or may not be associated with one or more instances of another entity.
Modality" must" means that an instance of one entity must be associated with at least one instance of another entity.
The connection may have different modalities at different ends (as in Fig. 4). The described graphic syntax allows you to unambiguously read diagrams using the following phrase structure:

<Каждый экземпляр СУЩНОСТИ 1> <МОДАЛЬНОСТЬ СВЯЗИ> <НАИМЕНОВАНИЕ СВЯЗИ> <ТИП СВЯЗИ> <экземпляр СУЩНОСТИ 2>

Each link can be read either from left to right or from right to left. Connection in Fig. 4 reads like this:

From left to right: "each employee can have several children."
From right to left: “Each child must belong to exactly one employee.”

An example of developing a simple ER model

When developing ER models, we must obtain the following information about the subject area:

  1. List of domain entities.
  2. List of entity attributes.
  3. Description of the relationships between entities.

ER diagrams are convenient because the process of identifying entities, attributes and relationships is iterative. Having developed the first approximate version of the diagrams, we refine them by interviewing subject matter experts. At the same time, the documentation in which the results of the conversations are recorded are the ER diagrams themselves.

Let's assume that we are faced with the task of developing information system by order of some wholesale trading company. First of all, we must study the subject area and the processes occurring in it. To do this, we interview company employees, read documentation, study order forms, invoices, etc.

For example, during a conversation with a sales manager, it turned out that he (the manager) believes that the system being designed should perform the following actions:

  • Store customer information.
  • Print invoices for goods released.
  • Monitor the availability of goods in the warehouse.

Let's select all the nouns in these sentences - these will be potential candidates for entities and attributes, and analyze them (we will highlight unclear terms with a question mark):

  • The buyer is a clear candidate for the entity.
  • The invoice is a clear candidate for an entity.
  • Product is a clear candidate for entity
  • (?)Warehouse - in general, how many warehouses does the company have? If there are several, then it will be a candidate for a new entity.
  • (?) Availability of a product is most likely an attribute, but an attribute of which entity?

An obvious connection immediately arises between the entities - “buyers can buy many goods” and “goods can be sold to many buyers.” The first version of the diagram looks like this:

Rice. 7

After asking additional questions to the manager, we found out that the company has several warehouses. Moreover, each product can be stored in several warehouses and be sold from any warehouse.

Where should I place the entities “Invoice” and “Warehouse” and what should I link them to? Let us ask ourselves, how are these entities related to each other and to the entities “Buyer” and “Product”? Buyers purchase goods and receive invoices containing data on the quantity and price of the purchased goods. Each buyer can receive several invoices. Each invoice must be issued to one buyer. Each invoice must contain several goods (there are no empty invoices). Each product, in turn, can be sold to several buyers through several invoices. In addition, each invoice must be issued from a specific warehouse, and many invoices can be issued from any warehouse. Thus, after clarification, the diagram will look like this:

Rice. 8

It's time to think about entity attributes. Speaking with employees of the company, we found out the following:

  • Every buyer is legal entity and has a name, address, and bank details.
  • Each product has a name, price, and is also characterized by units of measurement.
  • Each invoice has unique number, date of issue, list of goods with quantities and prices, as well as the total amount of the invoice. The invoice is issued from a specific warehouse and to a specific buyer.
  • Each warehouse has its own name.
  • Let's write down again all the nouns that will be potential attributes, and let's analyze them:
  • Legal entity is a rhetorical term; we do not work with individuals. We don't pay attention.
  • The name of the buyer is a clear characteristic of the buyer.
  • The address is a clear characteristic of the buyer.
  • Bank details are a clear characteristic of the buyer.
  • The name of the product is a clear characteristic of the product.
  • (?) The price of the product - it seems that this is a characteristic of the product. Does this characteristic differ from the price on the invoice?
  • A unit of measurement is a clear characteristic of a product.
  • The invoice number is a clear unique characteristic of the invoice.
  • The invoice date is a clear characteristic of the invoice.
  • (?) List of goods in the invoice - the list cannot be an attribute. You probably need to separate this list into a separate entity.
  • (?) The quantity of goods in the invoice is an obvious characteristic, but a characteristic of what? This is a characteristic of not just a “product”, but a “product in the invoice”.
  • (?) The price of the product in the invoice - again, this should not just be a characteristic of the product, but a characteristic of the product in the invoice. But the price of the product has already been seen above - is it the same thing?
  • The invoice amount is an explicit characteristic of the invoice. This characteristic is not independent. The amount of the invoice is equal to the sum of the costs of all goods included in the invoice.
  • The name of the warehouse is a clear characteristic of the warehouse.

During an additional conversation with the manager, it was possible to clarify various concepts prices It turned out that each product has a certain current price. This is the price at which the product is currently being sold. Naturally, this price may change over time. The price of the same product in different invoices issued at different times may be different. Thus, there are two prices - the price of the goods in the invoice and the current price of the goods.

With the emerging concept of “List of goods in the invoice” everything is quite clear. The entities "Invoice" and "Product" are related to each other by a many-to-many relationship. Such a relationship, as we noted earlier, must be split into two one-to-many relationships. This requires an additional entity. This entity will be the “List of goods in the invoice” entity. Its connection with the entities “Invoice” and “Product” is characterized by the following phrases - “each invoice must have several entries from the list of goods in the invoice”, “each entry from the list of goods in the invoice must be included in exactly one invoice”, “each product can be included into several entries from the list of goods in the invoice", "each entry from the list of goods in the invoice must be associated with exactly one product." The attributes "Quantity of goods in the invoice" and "Price of the goods in the invoice" are attributes of the entity "List of goods in the invoice".

We will do the same with the connection connecting the entities “Warehouse” and “Product”. Let's introduce an additional entity "Item in warehouse". The attribute of this entity will be “Quantity of goods in stock”. Thus, the product will be listed in any warehouse and its quantity in each warehouse will be different.

Now you can put all this into a diagram:

Rice. 9

Conceptual and physical ER models

The example ER diagram developed above is an example concept diagram. This means that the diagram does not take into account the features of a particular DBMS. From this conceptual diagram you can construct physical diagram, which will already take into account such features of the DBMS as permissible types and names of fields and tables, integrity restrictions, etc. Physical option diagram shown in Fig. 9 might look like this, for example:

Rice. 10

In this diagram, each entity represents a database table, each attribute becomes a column of the corresponding table. Please note that in many tables, for example, "CUST_DETAIL" and "PROD_IN_SKLAD", corresponding to the entities "Invoice list record" and "Item in warehouse", new attributes have appeared that were not in the conceptual model - these are the key attributes of the parent tables , migrated into child tables in order to provide relationships between tables using foreign keys.

It is easy to notice that the resulting tables are immediately in 3NF.

conclusions

The real means of data modeling is not the formal method of normalizing relations, but the so-called semantic modeling.

Various options are used as a semantic modeling tool entity-relationship diagrams(ER - Entity-Relationship).

Entity-relationship diagrams allow you to use visual graphic symbols to model entities and their relationships.

Distinguish conceptual And physical ER diagrams. Conceptual diagrams do not take into account the specific features of specific DBMSs. Physical diagrams are built on conceptual ones and represent a prototype of a specific database. Entities defined in the conceptual diagram become tables, attributes become table columns (taking into account the data types and column names allowed for a given DBMS), connections are implemented by migration key attributes of parent entities and creating foreign keys.

If entities are correctly defined, the resulting tables will immediately be in 3NF. The main advantage of the method is that the model is built by successive refinements of the initial diagrams.

This chapter, which is an illustration of ER modeling methods, does not cover more complex aspects of diagramming, such as subtypes, roles, exclusive relationships, non-transferable relationships, identifying relationships, etc.

ER models in isolation from the topic of designing relational databases.

But if you need to simultaneously use the terms of the ER model and the relational data model, then, of course, you need to use the terms relation And relationship various Russian equivalents. There are very different concepts behind these terms. In the relational model, relation is the only generic data structure. This same mechanism is used to represent “related” entities (think, for example, about foreign keys). As we will see a little later, the ER model uses two equal concepts to represent the database schema - entity and relationship. Relationships in the ER model play a different role than relationships in the relational data model.

In addition, pure transliteration of the term has also entered Russian-language terminology relation exactly in the sense attitude. We are talking, for example, about the relational data model, relational algebra etc., understanding a relational data model, relational algebra, etc. On this subject, at least in the context of databases, it is wise to finally reserve the terms relation And attitude to denote the concepts of the relational data model, and for the term relationship use another acceptable Russian-language equivalent - communication.

Most modern approaches to database design (mainly relational) are based on the use of different versions of the ER model. The model was proposed by Peter Chen in 1976. Modeling subject area is based on the use of graphic diagrams that include a small number of heterogeneous components. Simplicity and clarity of presentation conceptual database diagrams in the ER model led to its widespread use in CASE systems that support automated relational database design. Among the many varieties of ER models, one of the most popular and developed was used in CASE system Oracle company. We will discuss some simplified version of this model. To be more precise, let's focus on its structural and integral parts.

Basic concepts of the ER model

The main concepts of the ER model are entity, relationship and attribute. Essence is a real or imaginable object, information about which must be stored and accessible. 4 It is clear that this “definition” is actually a tautology, since, firstly, we are trying to define the term entity through the undefined term object, and secondly, attempts to define the term object are just as hopeless. Usually, authors try to justify themselves by saying that in such a context they mean the “everyday” and not some formalized concept of an object. Of course, this does not make it any easier, since the concept of essence must be understood in a fairly precise sense. But this tautology was not invented by the author of this course; it is traditional for the field of semantic modeling. In this area, they try to avoid formalities as much as possible. In ER model diagrams, an entity is represented as a rectangle containing the entity name. In this case, the name of the entity is the name of the type, and not of some specific instance of this type. 5 Although it would be more correct to always use the terms entity type and entity type instance, to avoid verbosity (and to follow tradition) in cases where it does not lead to ambiguity, we will use the term entity to mean entity type. For greater expressiveness and better understanding the entity name may be followed by examples of specific instances of that type.


Rice. 9.1.

In Fig. Figure 9.1 shows the essence of the AIRPORT with exemplary instances of Sheremetyevo and Heathrow. This primitive diagram nevertheless carries important information. Firstly, it shows that the database will contain the same type of data structures ( entity instances), describing airports. Secondly, since in life there are several points of view at airports (for example, the point of view of a pilot, the point of view of a passenger, the point of view of an administrator) and different data structures correspond to these points of view, the given examples of airports allow us to somewhat narrow the acceptable set of points of view. In our case, examples of international airports are given, so, most likely, there is a point of view of a passenger or pilot of international flights.

When defining an entity type, you must ensure that each entity instance can be distinguished from any other instance of the same entity. This requirement is somewhat similar to the requirement that there be no duplicate tuples in relational tables.

Connection is a graphically depicted association established between two entity types. Like an entity, a relationship is a generic concept; all instances of both associated types of entities are subject to established association rules. Therefore, it is more correct to talk about the type of connection established between entity types, and about connection type instances, installed between instances of the entity type. 6However, as with entity type, we will often use the term relationship to mean relationship type. In the version of the ER model discussed here, this association is always binary and can exist between two different entity types or between the type of entity and itself ( recursive connection). In any connection, two ends are identified (in accordance with the existing pair of connected entities), on each of which the name of the end of the connection is indicated, degree of end of connection(how many copies of this type entities must be present in every instance of this type of relationship), mandatory communication(i.e., whether any instance of a given entity type must participate in some instance of a given relationship type). 7 In some versions of the ER model, the end of the connection is called role of communication in this entity. Then we can talk about the name of the role, the degree of the role and the obligation of the communication role in this entity.

A connection is represented as an undirected line connecting two entities or leading from an entity to itself. In this case, at the point of connection with the entity, the following is used:

  • three-point entry into an entity rectangle if for that entity the relationship can (or should) use many ( many) entity instances ;
  • single-point entry, if only one can (or should) participate in the connection entity instance.

The required end of the connection is depicted with a solid line, and the optional end with a dashed line.

The relationship between the entities TICKET and PASSENGER, shown in Fig. 9.2, connects tickets and passengers. A link end named "for" allows more than one ticket to be associated with the same passenger, with each ticket being associated with a different passenger. The end of the link with the name "has" indicates that each ticket can only be owned by one passenger, and the passenger is not required to have at least one ticket.


Rice. 9.2.
  • Each TICKET is intended for one and only one PASSENGER;
  • Each PASSENGER may have one or more TICKETS or none at all.

On following example(Fig. 9.3) is shown recursive connection, connecting the essence of MAN with itself. The end of the connection with the name "son" defines the fact that several people can be sons of the same father. The end of the connection with the name "father" means that not every man should have sons.

A laconic oral interpretation of the diagram depicted is as follows:

  • every MAN is the son of one and only one MAN;
  • every MAN can be the father of one or more MEN.

Entity attribute is any detail that serves to clarify, identify, classify, numerically characterize or express the state of an entity. Attribute names are entered in a rectangle depicting

The conceptual database model is

A conceptual model is a kind of visual diagram, drawn in accepted notation and showing in detail the relationship between objects and their characteristics. A conceptual model is created for further database design and its translation, for example, into a relational database. The conceptual model depicts connections between data objects and their characteristics in a visually convenient form.

Accepted definitions in the conceptual database

For uniformity in database programming, the following concepts have been introduced for conceptual databases:

  1. Object or entity. This is the actual thing or object (for people) that the user (customer) wants to observe. For example, Ivanov Ivan Ivanovich;
  2. Attribute this is a characteristic of an object corresponding to its essence. For example. We ask ourselves the question: What information should be stored about Ivan Ivanovich Ivanov? The answers to this question will be the attributes of the object Ivan Ivanovich Ivanov;
  3. The third concept in conceptual database design is connection or relationship between objects.

Lexically it is more correct to say the relationship between CBD objects and the relationship between CBD entities (conceptual database), but you can find the most various combinations essence, object, connection and relationship (translation errors).


Conceptual Database Model symbols

Conceptual Database Model: Common Graphic Notations

An entity/relationship diagram (object/relationship) is called an ER diagram or EDR (entity-relationship diagram). The entity-relationship model itself was proposed by Professor Peter Pin-Shen Chen in 1976. The writing rules and conventions of an ER diagram are called notation. There are two common notations for ER diagrams:

  • Notation by Peter Chen;
  • Gordon Everest notation. Called Crow's Foot or Fork.

ER diagram notation according to Peter Chen

Chen proposed and adopted the following conventions for ER diagrams:

  • An entity or object is denoted by a rectangle;
  • Relationships are denoted by a diamond;
  • Attributes of objects are indicated by an oval;
  • If an entity is associated with a relation, then their relationship is indicated by a straight line with an arrow. An optional relationship is indicated by a dotted line. A powerful connection is indicated by a double line.

Each attribute can be associated with one object (entity).

Gordon Everest notation

Gordon Everest introduced a new designation for connections, which were called forks or crow's feet. He also introduced that an object should be represented by a rectangle with the name of the object type as a noun inside the rectangle. Moreover, this name must be unique within created base data.

Attributes are not separated into a separate figure, but fit into the rectangle of the object as a noun with a qualifying word.

The connection between objects is indicated by a straight line. Multiple bonds are indicated by a fork at the end. The connection itself is signed with a verb, such as “Includes” or “Belongs.”


conceptual model of the ERD Fork database

Add-ons

Attributes in an ER diagram can have their own attributes (composite) attribute.

A simple ER diagram is quite easy to draw. A rich, three-dimensional ER diagram is another matter. Below are some tips to help you build effective ER diagrams:

  • Identify all the objects in a given system and define the relationships between these objects;
  • An object must appear only once at a specific location in the diagram;
  • Define an accurate and appropriate name for each object, attribute, and relationship in the diagram. Choose simple and understandable words. Terms that are simple and familiar always win over vague, technical sounding words. For objects, nouns, for connections, verbs (with explanations possible). Don't forget about the uniqueness of object names;
  • Remove implicit, redundant or unnecessary relationships between objects;
  • Never connect a relationship to another relationship;
  • Use colors to categorize similar objects or highlight key areas in a diagram.

Typical form of documentation logical model subject area in ER modeling are entity-relationship diagrams, or ER diagrams (Entity Relationship Diagram). An ER diagram allows you to graphically represent all elements of a logical model according to simple, intuitive, but strictly defined rules - notations.

To create ER diagrams, one of the two most common notations is usually used.

  • Integration DEFinition for Information Modeling (IDEF1X). This notation was developed for the US Army and became a US federal standard. In addition, it is a standard in a number of international organizations (NATO, International Monetary Fund, etc.).
  • Information Engineering (IE). The notation developed by Martin, Finkelstein and others is used primarily in industry.

ER diagrams are usually constructed using CASE tools. In this lecture, all examples, unless otherwise noted, will use MS Office Visio 2007 notation.

An entity on an ER diagram is represented by a rectangle with a name at the top (Fig. 6.3).


Rice. 6.3. Representation of the "Employee" entity on an ER diagram

The rectangle lists entity attributes, while the attributes that make up unique entity identifier, are underlined (Fig. 6.4).


Rice. 6.4. Representation of the "Employee" entity with attributes and unique entity identifier

Every entity instance must be unique and different from other attributes. One of the main computer methods Entity recognition in an IS is the assignment of identifiers to entities (entity identifier). Since an entity is defined by a set of its attributes, for each entity it is advisable to select a subset of attributes that uniquely identifies this entity. Often entity identifier called the primary key.

Primary key is an attribute or group of attributes that uniquely identifies an instance of an entity. Attributes primary key not required on the diagram special designation– these are the attributes that are in the list of attributes above horizontal line(Fig. 6.3).

Choosing a primary key can be a difficult task, the solution of which can affect the effectiveness of future information systems. One entity may contain several attributes or sets of attributes that claim to be the primary key. Such applicants are called potential keys(candidate key).

The keys may be complex, i.e. containing several attributes. Complex primary keys do not require special notation - they are a list of attributes above a horizontal line.

Let's consider candidates for the primary key of the "employee" entity (Fig. 6.5).


Rice. 6.5. Defining a primary key for the "employee" entity

The following potential keys can be identified here.

  1. Personnel Number.
  2. Passport ID.
  3. Last name + First name + Patronymic.

To become primary potential clue must satisfy a number of requirements.

Uniqueness. No two instances must have the same candidate key values. Potential Key (Last name + First name + Patronymic) is a bad candidate, since the organization may have full namesakes working in it.

Compactness. A complex candidate key must not contain any attribute whose removal would not result in the loss of uniqueness. To ensure the uniqueness of the key ( Last name + First name + Patronymic) let's supplement it with attributes Date of Birth And Eye color. If business rules say that combinations of attributes Last name + First name + Patronymic + Date of birth sufficient to uniquely identify an employee, then Eye color turns out to be superfluous, i.e. the key Last name + First name + Middle name + Date of birth + Eye color is not compact.

When choosing a primary key, preference should be given to simpler keys, that is, keys containing fewer attributes. In the example, keys #1 and #2 are preferred over key #3.

Key attributes must not contain null values. If it is possible that an employee may not have a passport or have some other identity card instead of a passport, then key No. 2 will not be suitable as a primary key. If to ensure uniqueness it is necessary to supplement potential clue additional attributes, then they must not contain zero values. When adding key No. 3 with the attribute Date of Birth You need to make sure that dates of birth are known for all employees.

The value of the key attributes should not change during its entire existence entity instance. An employee of an organization can get married and change both her last name and her passport. Therefore, keys No. 2 and 3 are not suitable as a primary key.

Every entity must have at least one potential clue. Many entities have only one potential clue. This becomes the primary key. Some entities may have more than one possible key. Then one of them becomes primary, and the rest - alternative keys. Alternate Key is a potential key that has not become the primary key..

Some entities have natural (natural) keys. For example, the natural identifier of an invoice is its number. Otherwise, the designer can create a surrogate key (Surrogate Key) - an attribute whose value is created artificially and is not related to the subject area. When modeling data structures for a data warehouse, surrogate keys are preferable in many situations.

Domains are assigned by analysts and recorded in a special document - data dictionary(Data Dictionary). When creating a logical model, domains can be specified in entities in the ER diagram.

Each attribute has a domain. A domain can be defined as an abstract attribute from which ordinary attributes can be created, and the created attributes will have all the properties of the parent domain. Each attribute can be defined on only one domain, but multiple attributes can be defined on each domain. The concept of a domain includes not only the data type, but also the range of data values. For example, you could define the domain "Age" as a positive integer and define the attribute Employee age as belonging to this domain.

At the level logic modeling Data assignment of a domain to an attribute is general. For example, the attribute is text, numeric, binary, date, or "undefined". In the latter case, the analyst must provide a description of the domain. At subsequent stages, the type of domain is specified, the meaning of the concept of domain is physical model HD is narrower than an analyst can understand it. This is due to the fact that within the physical model, the domain is implemented through a domain restriction mechanism; the DBMS does not understand undefined domains.

The designer must carefully study the domains of each attribute from the point of view of their feasibility in the DBMS, with the participation of analysts, make changes to them if the feasibility condition is not met. In this case, the designer is guided by the following:

  • to implement a relational database, you need to use a relational or object-relational DBMS, for example, MS SQL Server 2008;
  • in the majority relational DBMS SQL is used as a language for manipulating and describing data, supporting certain standards, for example, ANSI SQL-92.

Attitude (connection) entities on an ER diagram is represented by a line connecting these entities. The ratio is read along a line either from left to right or from right to left. In Fig. 6.6 the following relationship is presented: each educational specialty must be registered with a specific individual (person), individual may have one or more specialties in education.


Rice. 6.6.

In MS Office Visio, the link name, link strength (power) and entity membership class to the connection is defined on the "Database Properties" tab, as shown in Fig. 6.7. The arrow on the communication line points to parent table.

When identifying connections, the emphasis is on identifying their characteristics. A relationship is a relationship between two or more entities. Each connection is realized through values entity attributes, For example, entity instance"Employee" (Fig. 6.6) is associated with entity instance"Education" by same values attributes Personnel Number. In other words, when you create a relationship in one of the entities, called a child entity, a new attribute called foreign key(Foreign Key, FK) (in Fig. 6.6 this is the attribute Personnel Number). Foreign key attributes are sometimes indicated by a symbol (FK) after their name.

A relationship is a logical relationship between entities. Each relationship must be named as a verb or verb phrase Relationship name (Verb Phrase) – a phrase characterizing the relationship between parent and child entities. The relationship name expresses some constraint or business rule and makes the diagram easier to read. In Fig. Figure 6.8 shows how to assign a name to a relationship.

There are different types of relationships: one-to-many identifying relationship, many-to-many relationship, and one-to-many non-identifying relationship. Various types of entities are also associated with relationship types.

There are two types of entities: dependent(Dependent entity) and independent(Independent entity). The type of an entity is determined by its relationship to other entities. An identifying relationship is established between an independent (parent end of the relationship) and dependent (child end of the relationship) entities.

An instance of a dependent entity is defined only through its relationship to the parent entity, that is, in the structure in Fig. 6.8 information about the specialty cannot be entered and does not make sense without information about the employee who has a specialty with an education diploma. When an identifying relationship is established (continuous line in the figure), the attributes of the primary key of the parent entity are automatically transferred to the primary key of the child entity (continuous line). This addition operation attributes of a child entity when creating a relationship is called attribute migration. In a child entity, such an attribute is considered a foreign key.

If the model is created using CASE tools, then when generating the database schema, the primary key attributes will receive the NOT NULL attribute, which means it is impossible to make an entry in the “Employees” table without information about the employee’s personnel number.

When a non-identifying relationship is established (Figure 6.9, dotted line), the child entity remains independent, and the primary key attributes of the parent entity migrate to the non-key components of the parent entity. A non-identifying relationship is used to link independent entities (Fig. 6.9).

Entity instance"Employee" may exist without reference to any entity instance“Department”, i.e. an employee can work in an organization and not be listed in any department.

An identifying connection is shown on the diagram as a solid line with a bold dot at the child end of the connection (see Fig. 6.8), a non-identifying one is shown as a dotted line (see Fig. 6.9).

Many-to-many relationship(many-to-many relationship) can only be created at the logical model level. In Fig. Figure 6.10 shows an example of defining a many-to-many relationship. A doctor can see many patients, a patient can be treated by several doctors. Such a connection is indicated by a solid line with two arrows at the ends.

A “many to many” relationship should be named in two phrases – in both directions (in the example “accepts/is being treated”). This makes the diagram easier to read. Contact us







2024 gtavrl.ru.