Architecture and types of subdatabases. Three-level database architecture


In progress scientific research dedicated to how exactly a DBMS should be structured, it was proposed various ways implementation. The most viable of them turned out to be the three-level database organization system proposed by the American Standards Committee ANSI (American National Standards Institute), shown in Fig. 1:

Rice. 1. ANSI Three-Level Database Management System Model

The architecture includes three levels: internal, conceptual and external. In general terms they are as follows:

Interior- this is the level closest to physical storage, i.e. associated with methods of storing information on physical storage devices.

External- closest to users, i.e. it is concerned with the ways in which data is presented to individual users.

Conceptual level- this is an intermediate level between the first two; in other words, this is the central control link, where the database is presented in the most general form, which combines the data used by all applications working with this database.

1. Level external models- the highest level, where each model has its own “vision” of the data. This level defines the viewpoint of individual users (applications) on the database. Each application sees and processes only the data that is needed by that particular application. For example, a work distribution system uses information about an employee's qualifications, but it is not interested in information about the employee's salary, home address and telephone number, and vice versa, it is this information that is used in the HR subsystem.

2. Conceptual level - the central control link, here the database is presented in the most general form, which combines the data used by all applications working with this database. In fact, the conceptual level reflects a generalized model subject area(real world objects) for which the database was created. Like any model, a conceptual model reflects only the significant, from the point of view of processing, features of objects in the real world. A conceptual diagram is a definition of a conceptual representation. In the majority existing systems the conceptual schema is actually little more than a simple union of all the individual external schemas with additional funds security and integrity rules.

3. Internal representation is the lower level representation of the entire database. The internal representation, just like the external and conceptual one, is not related to the physical level. This representation assumes an infinite linear address space. The internal representation is described using an internal schema, which defines not only the different types of records being stored, but also the indexes that exist, the way stored fields are represented, the physical sequence of stored records, etc.

This architecture allows for logical (between layers 1 and 2) and physical (between layers 2 and 3) independence when working with data. Logical independence implies the ability to change one application without adjusting other applications that work with the same database. Physical independence implies the ability to transfer stored information from one media to another while maintaining the functionality of all applications working with a given database. This is exactly what was missing when using file systems.

9. Relational database model.

Theoretical basis This model was the theory of relations, the basis of which was laid by two logicians - the American Charles Souders Peirce (1839-1914) and the German Ernst Schroeder (1841-1902). In textbooks on the theory of relations it was shown that the set of relations is closed under certain special operations, that is, together with these operations it forms an abstract algebra. This most important property relations was used in relational model to develop a data manipulation language related to the original algebra. American mathematician E. F. Codd in 1970 first formulated the basic concepts and limitations of the relational model, limiting the set of operations in it to seven basic and one additional operation.

The main data structure in the model is a relationship, which is why the model is called relational (from the English relation).

Any data used in programming has its own data types.

The relational model requires that the data types used be simple.

To clarify this statement, let's consider what types of data are usually considered in programming. Typically, data types are divided into three groups:

Simple data types.

Structured data types.

Reference data types.

Simple, or atomic, data types have no internal structure. Data of this type are called scalars. Simple data types include the following types: Logical, String, Numeric.

Various languages programming can expand and refine this list by adding types such as:

Real.

Monetary.

Enumerable.

Interval.

Of course, the concept of atomicity is quite relative. Thus, the string data type can be considered as a one-dimensional array of characters, and whole type data - as a set of bits. The only important thing is that when switching to such low level the semantics (meaning) of the data is lost. If a string expressing, for example, the last name of an employee is decomposed into an array of characters, then the meaning of such a string as a single whole is lost.

Structured data types are designed to define complex data structures. Structured data types are constructed from constituent elements called components, which in turn may have structure. The following data types can be considered as structured data types:

Arrays

Records (Structures)

From a mathematical point of view, an array is a function with a finite domain. For example, consider a finite set of natural numbers

called an index set. Display

from many to many real numbers specifies a one-dimensional real array. The value of this function for some index value is called the array element corresponding. Multidimensional arrays can be defined similarly.

A record (or structure) is a tuple of some Cartesian product of sets. Indeed, a record is a named, ordered set of elements, each of which belongs to a type. Thus, a record is an element of a set. By declaring new record types based on existing types, the user can construct arbitrarily complex data types.

What structured data types have in common is that they have an internal structure that is used at the same level of abstraction as the data types themselves.

When working with arrays or records, you can manipulate the array or record both as a single whole (create, delete, copy entire arrays or records) and element by element. For structured data types there are special functions - type constructors, which allow you to create arrays or records from elements of simpler types.

When working with simple data types, for example, numeric ones, we manipulate them as indivisible whole objects. To "see" that a numeric data type is actually complex (a collection of bits), we need to move to a lower level of abstraction. At the level program code this will look like assembly insertions into high-level language code or the use of special bitwise operations.

A reference data type (pointers) is designed to provide the ability to point to other data. Pointers are typical for procedural languages, which have the concept of a memory area for storing data. A reference data type is designed for processing complex changing structures, such as trees, graphs, and recursive structures.

For a relational data model, the type of data used is not important. The requirement that the data type be simple should be understood to mean that relational operations should not take into account the internal structure of the data. Of course, actions that can be performed with data as a single whole must be described, for example, data of a numeric type can be added, a concatenation operation is possible for strings, etc.

From this point of view, if we consider an array, for example, as a single whole and do not use element-by-element operations, then the array can be considered a simple data type. Moreover, you can create your own, no matter how complex data type, describe possible actions with this type of data, and if the operations do not require knowledge of the internal structure of the data, then this type of data will also be simple from the point of view of relational theory. For example, you can create a new type - complex numbers as a record of the form where. You can describe the functions of addition, multiplication, subtraction and division, and all operations with components and perform them only within these operations. Then, if only the described operations are used in actions with this type, then the internal structure does not play a role, and the data type from the outside looks like atomic.

This is exactly how some post-relational DBMSs implement work with arbitrarily complex data types created by users.

Domains

In the relational data model, the concept of data type is closely related to the concept of domain, which can be considered a clarification of the data type.

Domain is a semantic concept. A domain can be thought of as a subset of values ​​of some data type that have a specific meaning. The domain is characterized by the following properties:

A domain has a unique name (within the database).

The domain is defined at some simple type data or on another domain.

A domain may have some logical condition that allows it to describe a subset of data that is valid for that domain.

The domain carries a certain semantic load.

For example, a domain meaning "employee age" can be described as the following subset of the set of natural numbers:

The difference between a domain and the concept of a subset is precisely that the domain reflects the semantics defined by the subject area. There may be several domains that coincide as subsets, but carry different meanings. For example, the domains "Part Weight" and "Available Quantity" can equally be described as a set of non-negative integers, but the meaning of these domains will be different, and they will be different domains.

The main significance of domains is that domains limit comparisons. It is not logically correct to compare values ​​from different domains, even if they have same type. This reveals the semantic limitation of domains. The syntactically correct request “give a list of all parts whose part weight is greater than the available quantity” does not correspond to the meaning of the concepts “quantity” and “weight”.

The concept of a domain helps to correctly model the subject area. When working with real system In principle, a situation is possible when it is necessary to answer the request given above. The system will give an answer, but it will probably be meaningless.

Not all domains have a logical condition that limits the possible values ​​of the domain. In this case, the set of possible values ​​for the domain is the same as the set of possible values ​​for the data type.

The third level of the architecture is the internal level. An internal view is a low-level view of the entire database; it consists of many instances of each internal record type. The term "internal record" belongs to ANSI/SPARC terminology and refers to a construct called a stored record. The internal representation, like the external and conceptual representation, is not related to the physical layer because it does not consider the physical areas of the storage device, such as cylinders and tracks. In other words, the internal representation assumes an infinite linear address space; the details of how the address space is mapped to the physical storage device are very system-specific and are intentionally not included in the overall architecture.

The internal representation is described using an internal schema, which defines not only the different types of records being stored, but also the indexes that exist, the way stored fields are represented, the physical sequence of stored records, etc. Internal circuit is written using another data definition language - internal.

In conclusion, we note that in some exceptional situations, application programs, in particular those called utilities, can perform operations directly at the internal rather than at the external level. Of course, this practice is not recommended; it defines the risk in terms of security (security rules are ignored) and integrity (integrity rules are also ignored), in addition, the program will depend on the loaded data; but sometimes it can be the only way achieve the required function or achieve the required performance - just as a user of a high-level language sometimes needs to resort to assembly language for the same reasons.

Applications that use databases are usually classified into one of the software architectures, which have their own pros and cons.

Databases and software their creation and maintenance (DBMS) have a multi-level architecture.

There are conceptual, internal and external levels of database data representation, which correspond to models of similar purposes.

The conceptual level corresponds to the logical aspect of presenting domain data in an integrated form. The conceptual model consists of many instances of different data types, structured in accordance with the DBMS requirements for the logical structure of the database.

The internal layer represents the required organization of data in the storage environment and corresponds to the physical aspect of data presentation. The internal model consists of individual record instances physically stored in external media.

The outer layer supports the private data views required specific users. The external model is a subset conceptual model. It is possible to intersect external models based on data. Private logical data structure for separate application(task) or user corresponds to an external model or database subschema. With the help of external models, authorized access to application database data is supported (the composition and structure of the data of the conceptual database model available in the application is limited, and the acceptable modes for processing this data are specified: input, editing, deletion, search).

New or changing information needs emerge existing applications require the definition of correct external models for them, while no changes occur at the level of the conceptual and internal data model. Changes in the conceptual model, caused by the emergence of new types of data or changes in structures, may not affect all applications, i.e. a certain independence of programs from data is ensured. Changes in the conceptual model should be reflected in the internal model, and if the conceptual model remains unchanged, it is possible to independently modify the internal database model in order to improve its characteristics (data access time, memory consumption external devices and etc.). Thus, the database implements the principle of relative independence of the logical and physical organization of data.

Speaking about what such a complex software product as a DBMS should be like, first of all it is necessary to clearly define the basic concept of the system, which determines all subsequent stages of its development.

The DBMS architecture must ensure, first of all, the distinction between the user and system levels;

It is necessary to give each user the opportunity to have his own, different from others, idea of ​​​​the properties of the stored data.

Then the initial stage of designing any specific information system There should be abstract descriptions of the information needs of each user group, on the basis of which an abstract, but already common for the entire organization, description of the structures of stored data is also generated, and the DBMS through which this IS will be created and maintained must have certain capabilities for this.

Three-tier database architecture

One of the most important aspects of DBMS development is the idea of ​​separating the logical structure of the database from data manipulation, necessary for users,from the physical representation required by the computer hardware.

The same database, depending on the point of view, can have different levels of description. Based on the number of levels of data description supported by the DBMS, one-, two-, and three-level systems are distinguished. Currently, the most commonly supported three-level database description architecture (Fig. 2.1), with three levels of abstraction at which the database can be considered.

This architecture includes:

· external level at which users perceive data, where individual groups of users have their own view (PP) of the database; -

· internal level at which the DBMS and the operating system perceive data; -

· conceptual level of data presentation, designed to map the external level to the internal level, as well as to ensure their necessary independence from each other; it is associated with a generalized view of users.

The description of a data structure at any level is called a schema. There are three various types database schemas, which are defined according to the abstraction levels of the three-tier architecture. Actually high level There are several external schemas or subschemas that correspond to different views of the data. At the conceptual level, the description of a database is called a conceptual schema, and at the lowest level of abstraction, it is called an internal schema.

The main purpose of the three-tier architecture is to ensure data independence. The essence of this independence is that changes at lower levels do not affect the upper levels. There are two types of data independence: logical and physical.

Logical data independence means that external schemas are completely protected from changes made to the conceptual schema. Changes to the conceptual schema, such as adding or deleting new entities, attributes, or relationships, should be accomplished 15 without requiring changes to existing external schemas for other user groups.

Physical data independence means that the conceptual schema is protected from changes made to the internal schema. Internal schema changes, such as using different file systems or storage structures, different devices storage, index modification, or hashing should be accomplished without the need for changes to the conceptual or external schema.

The outer layer is the user layer. The user can be a programmer, or an end user, or a database administrator. The view of the database from the users' point of view is called the external view. Each user group identifies in the modeled subject area, common to the entire organization, those entities, attributes and relationships that are of interest to it. These partial or overridden database descriptions for separate groups users or focused on specific aspects of the problem domain is called a subschema.

The conceptual layer is the intermediate layer in the three-layer architecture and provides an abstract representation of all database information. The description of the database at this level is called a conceptual schema, which is the result of conceptual design.

Conceptual database design involves analyzing the information needs of users and identifying the data elements they need. Thus, a conceptual schema is a single logical description of all data elements and the relationships between them, the logical structure of the entire database. There is only one conceptual schema for each database.

The conceptual diagram must contain:

· entities and their attributes; -

· connections between entities; -

· restrictions imposed on data; -

· semantic information about the data; -

· ensuring security and maintaining data integrity.

The internal level is the third level of the database architecture. The internal representation is not related to the physical layer, since the physical layer of information storage has a significant individuality for each system.

At the bottom level is the internal circuitry, which is full description internal data model. There is only one internal schema for each database.

The internal schema describes the physical implementation of the database and is designed to achieve optimal performance and economical usage disk space. The following information is stored at the internal level:

· distribution of disk space for storing data and indexes; -

· a description of the details of saving records (indicating actual sizes stored data elements); -

· information about posting records; -

· information about data compression and selected encryption methods.

The DBMS is responsible for establishing correspondence between all three types of schemas at different levels, as well as checking their consistency.

Below the internal layer is the physical layer, which is controlled by the operating system but under the guidance of the DBMS. The physical layer considers how the data will be represented in the machine. It provides a physical view of the database: disk drives, physical addresses, indexes, pointers, etc. This layer is the responsibility of physical database designers, who work only with elements known to the operating system. Their areas of interest: pointers, implementation of sequential distribution, methods of storing fields of internal records on disk. However, the DBMS functions and operating system on physical level are not clearly separated and may vary from system to system.

DBMS functions

REVIEW LECTURE NOTES

For specialty students
T1002 “Information technology software”

(L.V. Rudikova, Ph.D., Associate Professor)

Question 31. DBMS ARCHITECTURE. RELATIONAL DATA MODEL

1. The concept of a database.

2. Three-tier database architecture.

3. Life cycle Database.

4. DBMS architecture.

5. Relational data model.

6. Design relational databases data.

7. Normal forms of relationships.

8. Relational algebra.

1. The concept of a database.

A database system is any computer-based information system in which data can be shared among many applications.

Information system – an automatic system that organizes data and provides information.

Information and management system – a system that provides information support management.

Data – scattered facts.

Information – organized and processed data.

Under database refers to a set of interconnected elementary groups of data (information) that can be processed by one or more application systems. Database system consists of a database; general purpose software called database management system (DBMS) , and serves to manage the database; appropriate equipment and people.

Each DBMS must meet the following requirements:

· provide the user with the ability to create new databases and define them schema (logical data structure) using a special language - data definition language; support multiple views of the same data;

· let " request» data and change them using query language, or data manipulation language; allow integration and sharing data from various applications;

· support the storage of very large amounts of data, measured in gigabytes or more, for a long time, protecting them from accidental damage and unauthorized use, and also provide database modification and access to data through queries, i.e. guarantee the security and integrity of data;

· control access to data simultaneously for many users; exclude the influence of one user’s request on another’s request and prevent simultaneous access, which could corrupt the data, i.e. ensure concurrency control of data access.

The database system consists of the following components:

· Users, i.e. people who use data.

· Applications, i.e. user programs that require data from the system.

· DBMS is software that manages access to data and provides specified functionality database systems.

· Data, i.e. strings stored in files.

· Host system is the computer system on which files are stored. Data rows are accessed by the host system. The role of the DBMS is to generate queries that enable the use of the host system's file management system functionality to serve various applications. A DBMS is an additional layer of software built on top of the host system software.

Thus, a system with a database can be represented as the following sequence of levels:

At the lowest level there is data stored in physical files (physical database memory). On upper level– applications with their own representations of the same physical data. Each database view is a specific logical structure built from the underlying physical data. To provide an interface between the physical memory of the database and its various logical versions (multiple supported views), the DBMS, in turn, must consist of several layers.

2. Three-level database architecture.

The distinction between logical and physical representation of data was formally recognized in 1978 when the committee ANSI/SPARC proposed a generalized structure of database systems. This structure is called three-tier architecture. The three levels of architecture are: internal, conceptual and external.

Internal level – this is the level that determines the physical appearance of the database, closest to physical storage and is associated with methods of storing information on physical storage devices. Associated with this layer are disk drives, physical addresses, indexes, pointers, etc. This layer is the responsibility of physical database designers who decide which physical devices will store data, what access methods will be used to retrieve and update data, and what measures should be taken to maintain or improve the performance of the database management system. Users do not touch this level.

Conceptual level – structural level that determines logic circuit Database. On this level conceptual design of the database is performed, which includes analyzing the information needs of users and identifying the data elements they need. The result of conceptual design is a conceptual diagram, a logical description of all data elements and the relationships between them.

External level – the structural level of the database, which defines user views of data. Each user group receives its own view of the data in the database. Each such data view provides a user-centered description of the data elements that make up the data view and the relationships between them. It can be directly derived from the conceptual framework. The collection of such user data views provides the external level.

User and Application Views

External level

Displays

Conceptual diagram

Conceptual level

Display

Internal level

Host system

Stored data

Rice. DBMS levels

3. Database life cycle.

The process of designing, implementing and maintaining a database system is called life cycle of the database (LDC). The procedure for creating a system is called system life cycle (SLC).

Understanding and the right approach to LCBD is very important and requires detailed consideration, since it is based on the approach data-centric. Data elements are more stable than the system functions performed. Creation correct structure data requires complex analysis of classes of data units and the relationships between them. If you build a logical database schema, then in the future you can create any number of functional systems that use this schema. The function-oriented approach can only be used to create temporary systems that are designed for a short period of operation.

LCBD consists of the following stages:

1. Pre-planning – database planning, carried out in the process of developing a strategic database plan. During the planning process, the following information is collected:

· what application programs are used and what functions they perform;

· what files are associated with each of these applications;

· what new applications and files are in the works.

This information helps determine how application information is used and determine future requirements for the database system.

The information from this stage is documented in the form of a generalized data model.

2. Feasibility check . Here the technological, operational and economic feasibility of the database creation plan is determined, i.e.:

· technological feasibility – is the technology available to implement the planned database?

· operational feasibility – are there the funds and experts needed to successfully implement the database plan?

· economic feasibility – can conclusions be determined? Will the planned system pay for itself? Is it possible to estimate costs and benefits?

3. Defining Requirements includes the selection of database goals, clarification of information requirements for the system and hardware requirements, and software. Thus, on at this stage collecting data and defining requirements is created general information model, expressed in the following tasks:

· The goals of the system are determined by analyzing information needs. It also necessarily indicates what kind of database should be created (distributed, holistic) and what communication tools are needed. The output document is a comment describing the goals of the system.

· Determination of user requirements: documentation in the form of generalized information (comments, reports, surveys, questionnaires, etc.); fixing system functions and definition application systems that will fulfill these requirements. The data is presented in the form of relevant documents.

· Determination of general hardware and software requirements related to maintaining the desired level of performance. (Find out the number of system users, the number of input messages per day, the number of printouts). This information is used to select types of computers and DBMS, disk capacity, and number of printers. The data from this stage is presented in a report containing sample hardware and software configurations.

· Developing a plan for the phased creation of the system, including the selection of initial applications.

4. Conceptual design – creation of a conceptual database diagram. Specifications are developed to the extent necessary to move to implementation.

The main output document is a single information model(or database schema at the conceptual level). When developing this model, information and functions that the system must perform, determined at the stage of collecting and determining system requirements, are used. At this stage, it is also desirable to define: 1) rules for the data; 2) rules for processes; 3) rules for the interface.

5. Implementation the process of turning a conceptual model into a functional database. It includes the following steps.

1) Selecting and purchasing the necessary DBMS.

2) Converting the conceptual (infological) database model into a logical and physical data model:

· Based on the infological data model, a data schema is built for a specific DBMS; if necessary, the database is denormalized in order to speed up query processing in all time-critical applications;

· it is determined which application processes need to be implemented in the data schema as stored procedures;

· implement constraints designed to ensure data integrity and enforce data rules;

· design and generate triggers to implement all centrally defined data rules and data integrity rules that cannot be specified as constraints;

· develop an indexing and clustering strategy; estimate the sizes of all tables, clusters and indexes;

· determine user access levels, develop and implement security and audit rules. Create roles and synonyms to provide multi-user access with consistent levels of access permissions.

· develop a network topology of the database and a mechanism for seamless access to remote data (replicated or distributed database).

3) Construction of a data dictionary that defines the storage of database data structure definitions. The data dictionary also contains information about access permissions, data protection rules and data control.

4) Filling the database.

5) Creation application programs, management control.

6) User training.

6. Evaluation and improvement of the database schema. Involves surveying users to identify functional unmet needs. Changes are made as necessary, adding new programs and data elements as needs change and expand.

Thus, the LCBD includes:

· Study the subject area and provide relevant documentation (1-3).

· Construction of an information model (4).

· Implementation (5).

· Performance evaluation and database support (6).

4. DBMS architecture.



Rice. Main components of the DBMS

Data, metadata - contain not only data, but also information about the structure of the data ( metadata). In a relational DBMS, metadata includes system tables (relationships), names of relationships, names of attributes of those relationships, and data types of those attributes.

Often the DBMS supports indices data. Index is a data structure that helps quickly find data elements given part of their value (for example, an index that finds tuples of a particular relation that have a given value of one of the attributes). Indexes are part of the stored data, and descriptions indicating which attributes the indexes have are part of the metadata.

Memory Manager -receives the required information from the data storage location and changes the information in it at the request of higher levels of the system.

In simple database systems, the memory manager can be the operating system's file system. However, to improve efficiency, the DBMS usually performs direct memory control. The memory manager consists of two components:

· File Manager monitors the location of files on the disk and obtains the block or blocks containing the files when requested by the buffer manager (the disk is generally divided into disk blocks- adjacent memory areas containing from 4000 to 16000 bytes).

· Buffer manager manages main memory. It receives blocks of data from disk through a file manager and selects a main memory page to store a particular block. It can temporarily store a disk block in main memory, but returns it to disk when a main memory page is needed for another block. Pages are also returned to disk when requested by the transaction manager.

"Request" processor - processes requests and requests changes to data or metadata. It suggests the best way to perform the required operation and issues appropriate commands to the memory manager.

The query processor (manager) turns a query or database action that can be executed at a very high level (for example, as a query SQL ), into a sequence of requests for stored data such as individual tuples of a relation or parts of an index on a relation. Often the most difficult part of processing request is his organization, i.e. choosing good query plan or a sequence of requests to the memory system responding to the request.

Transaction Manager - is responsible for the integrity of the system and must ensure the simultaneous processing of many requests, the absence of interference of requests (addition, min, max ) and data protection in case of system failure. It interacts with the query manager because it needs to know what data is affected by current queries (to avoid conflicts) and can defer some queries and operations to avoid conflicts. The transaction manager also interacts with the memory manager because data protection schemes typically involve storing a data change log. If the operation is performed correctly, the file registration will contain a record of changes, so you can re-execute even those changes that did not reach the disk due to a system failure.

Typical DBMSs allow the user to group multiple queries and/or changes into a single transaction. Transaction is a group of operations that must be performed sequentially as one whole.

Typically, a database system supports multiple transactions simultaneously. It is the correct execution of all such transactions that ensures transaction manager. Correct execution of transactions is ensuredACID -properties (atomicity, consistency, isolation, durability):

· atomicity- execution of either all transactions or none of them (for example, withdrawing money from an ATM and making a corresponding debit to the client’s account must be a single atomic transaction; each of these operations is not allowed to be performed separately);

· consistency - a state in which the data meets all possible expectations (for example, the consistency condition for an airline database is that no seat on the plane is reserved for two passengers);

· insulation- when two or more transactions are executed in parallel, their results must be isolated from each other. The simultaneous execution of two transactions at the same time should not lead to a result that would not have occurred if they were carried out sequentially (for example, when selling tickets for the same flight in case of availability). last place at simultaneous request two agents, the request of one must be fulfilled, the request of the other must not);

· longevity - after the transaction is completed, the result should not be lost in the event of a system failure, even if this failure occurs immediately after the completion of the transaction.

Let's also consider 3 types of access to the DBMS:

1. Requests - Questions about data can be generated in two ways:

a)by using common query interface(for example, a relational DBMS allows queries SQL , which are transmitted to the request processor, and also receives responses to them);

b) with the help application program interfaces- requests are transmitted through a special interface (arbitrary requests cannot be transmitted through this interface);

2. Modifications - These are operations to change data. They can also be executed either through a common interface or through an application program interface;

3. Circuit modifications - These are database administrator teams that have the right to change the database schema or create a new database.

Client/server architecture. Many versions of modern software implement the architecture client/server: One process (the client) sends a request to another process (the server) to execute. Typically, a database is often split into a server process and several client processes.

In the simplest client/server architecture, the entire DBMS is a server, except for the query interfaces, which interact with the user and send queries or other commands to the server. For example, a relational DBMS often uses the language SQL to represent requests from client to server. The database server then provides the client with a response in the form of a table (relationship). There is a tendency to increase the load on the client, since if there are many simultaneously working database users, problems may arise with the server.

5. Relational data model.

The RMD of a certain subject area is a set of relationships that change over time. When creating an information system, a set of relationships allows you to store data about objects of the subject area and model the connections between them.

Attitude is a two-dimensional table containing some data. Mathematically underN -ary relation R understand the Cartesian product set D 1 D 2 … D n sets ( domains) D 1, D 2, …, D n (), optionally different:

R D 1 D 2 … D n ,

where D 1 D 2 … D n – complete Cartesian product, i.e. a set of all possible combinations of n elements each, where each element is taken from its own domain.

Domain is a semantic concept. A domain can be thought of as a subset of values ​​of some data type that have a specific meaning. The domain is characterized by the following properties:

· The domain has unique name(within the database).

· The domain is defined at some simple data type or on a different domain.

· A domain may have some logical condition, which allows you to describe the subset of data that is valid for a given domain.

· The domain carries a certain semantic load.

Relationship attribute there are a couple of the kind<Имя_атрибута: Имя_домена>. Attribute names must be unique within the relationship. Often the attribute names of a relationship are the same as the names of the corresponding domains.

Ratio , defined on multiple domains, contains two parts: a header and a body.

Relationship header is a fixed number of relation attributes:

The relation head describes the Cartesian product of domains on which the relation is defined. The header is static; it does not change while working with the database. If attributes are changed, added or deleted in a relation, then the result will be other relationship (even with the same name).

Body relationship contains many tuples relationship. Every tuple relation represents a set of pairs of the form<Имя_атрибута: Значение_атрибута>:

such that the attribute value belongs to the domain . The body of the relation is a set of tuples, i.e. a subset of the Cartesian product of domains. Thus, the body of a relation is actually a relation in the mathematical sense of the word. The body of the relationship can change while working with the database - tuples can be changed, added and deleted.

The relationship is usually written as:

or shorter

,

or simply

The number of attributes in a relation is called degree (or -arity ) relationship. The cardinality of a set of tuples of a relation is called power relationship.

Relationship diagram is a list of attribute names of a given relationship indicating the domain to which they belong:

If attributes take values ​​from the same domain, then they are called -comparable, where is the set of valid comparison operations specified for a given domain. For example, if a domain contains numeric data, then all comparison operations are valid for it, then . However, for domains containing character data, not only comparison operations for equality and inequality of values ​​can be specified. If a given domain has a lexicographic ordering, then it also has a full range of comparison operations.

Schemes of two relations are called equivalent , if they have the same degree and it is possible to order the attribute names in the schemas in such a way that comparable attributes, that is, attributes that take values ​​from the same domain, will be in the same places:

Let – relation diagram. – schema of the relationship after ordering the attribute names. Then

~

Thus, for equivalent relations the following conditions are satisfied:

· The tables have the same number of columns.

· The tables contain columns with the same names.

· Columns with the same names contain data from the same domains.

· The tables have the same rows, but the order of the columns may vary.

All such tables are different Images the same relationship.

Properties of relationships. The properties of relations follow directly from the above definition of relation. These properties are the main differences between relationships and tables.

· There are no identical tuples in a relation .

· Tuples are not ordered (top to bottom) .

· Attributes are not ordered (from left to right) .

· All attribute values ​​are atomic .

Rice. Schematic representation of the relationship

Relational model is a database in the form of a set of interconnected relationships. In each connection, one relationship can act as the main one, and another relationship acts as a subordinate one. Thus, one tuple of a main relation can be associated with several tuples of a subordinate relation. To support these relationships, both relationships must contain the sets of attributes by which they are related. Basically this is primary key of the relationship , which uniquely defines the tuple of the main relation. To model a relationship, a subrelation must have a set of attributes that matches the primary key of the main relationship. However, here this set of attributes is already secondary key or foreign key , i.e. it defines a set of relation tuples that are associated with a single tuple of the main relation.

6. Design of relational databases.

When designing a relational database, the following problems must be solved:

1) Taking into account the semantics of the subject area, it is necessary to best represent the objects of the subject area in the form of an abstract data model (data design). Those. - decide on the database schema: what relationships the database should consist of, what attributes these relationships should have, what are the connections between the relationships.

2) Ensure the efficiency of executing database queries (physical database design).

After the datalogical design stage, the following resulting documents should be obtained:

· Building a correct data schema based on the relational data model.

· Description of the database schema in terms of the selected DBMS.

· Description of external models in terms of the selected DBMS.

· Description of declarative rules for maintaining database integrity.

· Development of procedures for maintaining the semantic integrity of the database.

So, the task of designing a relational database is to select a database schema from many alternative options.

Correct is a database schema in which there are no unwanted dependencies between relation attributes. The process of developing a correct database schema is called logical design .

Designing a database schema can be done in two ways:

· Decomposition (partition) method the original set of relations included in the database schema is replaced by another set of relations that are projections of the original relations! At the same time, the number of relationships increases.

· Synthesis method layout of a database schema from given initial elementary dependencies between objects of the subject area.

Classical database design is associated with theory normalization , which is based on the analysis of functional dependencies between relationship attributes. Functional dependencies define stable relationships between objects and their properties in the subject area under consideration.

The decomposition method is a process of sequential normalization of relation schemes: each new iteration corresponds to a higher order normal form and has better properties compared to the previous one. Thus, the existence of a universal relation containing all the attributes of the database is initially assumed, then, based on the analysis of connections between attributes, a decomposition of the universal relation is carried out (or an attempt is made), i.e. transition to several relations of lower dimension, and the original relation must be restored using a natural join operation.

So, each normal form corresponds to a certain set of constraints, and a relation is in a certain normal form if it satisfies its inherent set of constraints.

In the theory of relational databases, the following normal forms are usually distinguished:

first normal form (1 NF);

· second normal form (2 NF);

· third normal form (3 NF);

· Bays-Codd normal form ( BCNF);

· fourth normal form (4 NF);

· fifth normal form or projection form - compounds (5 NF or PYNF).

Basic properties of normal forms:

· each successive normal form is in some sense better than the previous one;

· when moving to the next normal form, the properties of the previous normal properties are preserved.

Database schemas are called equivalent, if the contents of the source database can be obtained by a natural connection of the relations included in the resulting schema, and no new tuples appear in the source database.

7. Normal forms of relationships.

The normalization process is based on an adequate reflection of the subject area in the form of tables containing data about the modeled object, and the ability to change the state of the database over time. As a rule, due to a mismatch between the domain data model, anomalies may occur that appear when performing the corresponding operations:

· Insertion anomalies (INSERT) – storage of heterogeneous information in one respect.

· Update anomalies (UPDATE) –Redundancy of relationship data due to heterogeneous storage.

· Deletion anomalies (DELETE) – storage of heterogeneous information in one relation.

It is also necessary to take into account the emerging undefined ( NULL) values. In different DBMSs, when performing various operations (comparing, merging, sorting, grouping, etc.) two NULL -values ​​may or may not be equal to each other, have different effects on the result of performing operations to determine average values ​​and find the number of values. To eliminate errors in many DBMSs it is possible to replace NULL -values ​​are zero when performing calculations, declaring all NULL -values ​​equal to each other, etc.

Normalization – splitting a table into several, which have better properties when updating, inserting and deleting data. Those. normalization is the process of sequentially replacing a table with its complete decompositions until they are all in 5NF; however, in practice it is sufficient to convert the tables to BCNF.

The normalization procedure is based on the fact that the only functional dependencies in any table should be dependencies of the form , where is the primary key and is some other field. Therefore, during the normalization process, you should get rid of all “other” functional dependencies, i.e. from those that have a different appearance than .

If we replace the codes of primary (foreign) keys during normalization, then we should consider 2 cases:

1. The table has a composite primary key, for example, and a field that functionally depends on part of this key, for example, from (from full key does not depend). It is recommended to create another table containing and ( – primary key) and delete from the original table:

Replace, primary key, federal law

on , primary key

and , primary key .

2. The table has a primary (possible) key, a field that is not a possible key, but functionally depends on, and another non-key field that functionally depends on:. It is recommended to create a table containing both ( - primary key) and - delete from the original table: It should be noted that to carry out such operations, one should initially have some “large” (universal) relations as input data.

Def.1. The relationship is in first normal form (1NF) if and only if none of its rows contains a single value in any of its fields and none of the key fields of the relation are empty.

According to definition 1, any relation will be in 1NF, i.e. a relation that satisfies the properties of relations: there are no identical tuples in the relation; tuples are not ordered; attributes are not ordered and differ by name; all attribute values ​​are atomic.

Def.2. The relationship is in second normal form (2NF) if and only if the relation is in 1NF and there are no non-key attributes that depend on the part complex key(i.e. all fields not included in the primary key have a full functional dependency with primary key).

If the candidate key is prime, then the relation is automatically in 2NF.

To eliminate the dependence of attributes on part of a complex key, it is necessary to perform decomposition multi-relationship relationships. Attributes that depend on part of a complex key are placed in a separate relation.

The attributes of a relationship are called mutually independent , if neither of them is functionally dependent on the other.

Def.3. The relationship is in third normal form (3NF) if and only if the relation is in 2NF and all non-key attributes are mutually independent (that is, none of the non-key fields of the relation depend functionally on any other non-key field).

To eliminate the dependency of non-key attributes, you need to decompose the relationship into several relationships. In this case, those non-key attributes that are dependent are placed in a separate relation.

When reducing relations using the normalization algorithm to relations in 3NF, it is assumed that all relations contain one candidate key. This is not always true. There are times when a relation may contain multiple keys.

Def.4. The relationship is in Bays-Codd normal form (NFBK) if and only if the determinants of all functional dependencies are potential keys (or if any functional dependence between its pals is reduced to a complete functional dependence on a possible key).

If a relation is in BCNF, then it is automatically in 3NF, as follows from Definition 4. To eliminate the dependence on determinants that are not potential keys, decomposition should be carried out, placing these determinants and the parts that depend on them into a separate relation.

There are times when a relation does not contain any functional dependencies. Those. attitude is completely key, i.e. the key of a relationship is the entire set of attributes. Thus, we have a multivalued dependence, because There is still a relationship between the attributes.

Def.5. The relationship is in fourth normal form (4NF) if and only if the relation is in BCNF and does not contain non-trivial multivalued dependencies.

Relations with non-trivial multivalued dependencies arise, as a rule, as a result of a natural connection of two relations over a common field, which is not key in any of the relations. In reality, this leads to storing information about two independent entities in one relation.

To eliminate non-trivial multivalued dependencies, you can decompose the original relation into several new ones.

Def.6. The relationship is in fifth normal form (5NF) if and only if any connection dependency present is trivial.

Def.6. identically also follows the definition.

Def.7. A relation is not in 5NF if the relation has a non-trivial join dependency.

That. If in every complete decomposition all projections of the original relation contain a possible key, we can conclude that the relation is in 5NF. A relation that does not have any complete decomposition is also in 5NF.

Without knowing anything about what potential keys exist in relation and how the attributes are interconnected, it cannot be argued that this attitude is in 5NF or other normal forms.

Possible clue relation is a set of relation attributes that completely and uniquely (functionally completely) determine the values ​​of all other attributes of the relation. In general, there may be several possible keys. Among all possible keys of a relationship, one is usually selected, which is considered the main one and which is called the primary key of the relationship.

Mutually independent attributes these are attributes that do not depend on one another. If there are several physical laws in a relation, then each attribute or set of attributes on which another attribute depends is called a determinant of the relation.

9. Relational algebra.

Relational algebra provides a framework for accessing relational data. The main purpose of algebra is to provide expressions that can be written down. Expressions can be used for:

· area definitions samples, i.e. defining the data for selection as a result of the sampling operation;

· area definitions updates, i.e. defining data to be inserted, modified, or deleted as a result of an update operation;

· definition (named) virtual relations, i.e. presentation of data for visualization through views;

· snapshot definition, i.e. defining the data to be stored as a “snapshot” of the relationship;

· defining safety rules, i.e. determination of data for which access control is carried out;

· determination of sustainability requirements, i.e. determining the data that is included in the scope for certain concurrency control operations;

· defining integrity rules, i.e. some special rules that the database must satisfy, along with general rules, representing part of the relational model and applied to each database.

In implementations of specific relational DBMSs, neither is currently used in its pure form. relational algebra, nor relational calculus. The de facto standard for accessing relational data has become SQL language(Structured Query Language).

Relational algebra, defined by Codd, consists of 8 operators comprising 2 groups:

  • traditional set operations (union, intersection, subtraction, Cartesian product);
  • special relational operations (selection, projection, connection, division).

In addition, the algebra includes an assignment operation, which allows you to save the results of calculating algebraic expressions in the database, and an attribute renaming operation, which makes it possible to correctly form the header (schema) of the resulting relationship.

A brief overview of relational algebra operators.

Samplereturns a relation that contains all tuples of a certain relation that satisfy some conditions. The sampling operation is also called the restriction operation ( restrict - limitation, now sampling is more often accepted - SELECT ).

Projectionreturns a relation containing all the tuples (i.e. - sub-tuples) of a particular relation after excluding some attributes from it.

Workreturns a relation containing all possible tuples that are a combination of two tuples belonging to two defined relations, respectively.

An associationreturns a relation containing all tuples that belong to either or both of two defined relations.

Intersection –returns a relation containing all tuples that belong simultaneously to two defined relations.

Subtraction –returns a relation containing all tuples that belong to the first of two defined relations and not to the second.

Connection (natural) – returns a relation whose tuples are a combination of two tuples (belonging respectively to two defined relations) that have a common value for one or more common attributes of the two relations (and such common values ​​appear only once in the resulting tuple, not twice).

Division –for two relations, binary and unary, returns a relation containing all the values ​​of one attribute of the binary relation that match (in the other attribute) all the values ​​in the unary relation.

LITERATURE

1. Date K.J. Introduction to Database Systems, 6th edition: Trans. from English - TO.; M.; St. Petersburg: Williams Publishing House, 2000. – 848 p.

2. Connolly T., Begg K., Strachan A. Databases: design, implementation and maintenance. Theory and practice, 2nd ed.: Trans. from English – M.: Williams Publishing House, 2000. – 1120 p.

3. Karpova T.S. Databases: models, development, implementation. – St. Petersburg: Peter, 2001. – 304 p.

4. Faronov V.V., Shumakov P.V. Delphi 4. Database Developer's Guide. – M.: “Knowledge”, 1999. – 560 p.

5. J. Groff, P. Weinberg. SQL: Complete Guide: Per. from English – K.: BHV Publishing Group, 2001. – 816 p.

6. Ken Goetz, Paul Litwin, Mike Gilbert. Access 2000. Developer's Guide. T.1, 2. Per. from English – K.: BHV Publishing Group, 2000. – 1264 p., 912 p.

7. Maklakov S.V BPwin and EPwin. CASE-tools for information systems development. – M.: DIALOG-MEPhI, 2001. – 304 p.

8. Ullman D., Widom D. Introduction to database systems / Transl. from English – M.: “Lori”, 2000. – 374 p.

9. Khomonenko A.D., Tsygankov V.M., Maltsev M.G. Databases: Textbook for higher education educational institutions/ Ed. Prof. A.D. Khomonenko. – St. Petersburg: CORONA print, 2000. – 416 p.

Software designed to work with databases is called a database management system (DBMS). DBMS are used for orderly storage and processing of large volumes of information.

The DBMS organizes the storage of information in such a way that it is convenient:

    browse,

    replenish,

    change,

    look for the information you need,

    make any selections

    sort in any order.

Main functions of the DBMS:

    data management in external memory (on disks);

    managing data in RAM using disk cache;

    logging changes, backing up and restoring the database after failures;

    support for database languages ​​(data definition language, data manipulation language).

Typically, a modern DBMS contains the following components:

    core, which is responsible for managing data in the external and random access memory and journaling,

    a database language processor that optimizes queries for retrieving and changing data, and creating, as a rule, machine-independent executable internal code,

    runtime support subsystem that interprets data manipulation programs that create user interface with DBMS

    as well as service programs (external utilities) that provide a number of additional capabilities for maintaining the information system.

DBMS classifications

According to the data model:

    Hierarchical

  • Relational

    Object-oriented

    Object-relational

3 levels of DBMS:

Fig.1.4. Data presentation levels

Conceptual level The ANSI/SPARC architecture serves to support a single view of the database that is common to and independent of all its applications. The conceptual level is a formalized information and logical model of the software. The description of this representation is called conceptual diagram.

Internal level architecture supports the presentation of the database in a storage environment - a stored database. At this architectural level, the database is presented in a completely “materialized” form, while at other levels work occurs at the level of individual instances or multiple instances of records. Description of the database at the internal level is called internal circuit or storage scheme.

External level database architecture is designed for various groups users. Descriptions of such representations are called external circuits. A database system can simultaneously support multiple external schemas for different user groups or tasks.

The set of circuits at all levels is called database schema.

Each of these layers can be considered managed if it has an external interface that supports data definition capabilities. In this case, it becomes possible to form and systemically support an independent view of the database for any group of personnel or users interacting with the database through the interface of this level.

The ANSI/SPARC architectural model assumes the presence in the DBMS of mechanisms that provide inter-level mapping of data “external - conceptual” and “conceptual - internal”. The functionality of these mechanisms provides data abstraction and determines the degree of data independence at all levels.







2024 gtavrl.ru.