Typical organization of a modern subd. support services

All tips

Naturally, the organization of a typical DBMS and the composition of its components corresponds to the set of functions we have considered. Let us recall that we have identified the following main functions of the DBMS:

· data management in external memory;

· management of RAM buffers;

· transaction management;

· logging and database recovery after failures;

· maintaining database languages.

Logically in modern relational DBMS we can distinguish the most internal part - the DBMS core (often called the Data Base Engine), the database language compiler (usually SQL), the runtime support subsystem, and a set of utilities. In some systems these parts are clearly distinguished, in others they are not, but logically such a division can be carried out in all DBMSs.

The DBMS kernel is responsible for managing data in external memory, managing RAM buffers, managing transactions, and logging. Accordingly, it is possible to distinguish such components of the core (by at least, logically, although some systems make these components explicit), such as the data manager, buffer manager, transaction manager, and log manager. As you can understand from the first part of this lecture, the functions of these components are interconnected, and to ensure correct operation DBMS, all these components must interact using carefully thought out and proven protocols. The DBMS kernel has its own interface, which is not directly accessible to users and is used in programs produced by the SQL compiler (or in the subsystem for supporting the execution of such programs) and database utilities. The DBMS kernel is the main resident part of the DBMS. When using a client-server architecture, the kernel is the main component of the server part of the system.

The main function of a database language compiler is to compile database language statements into some executable program. The main problem with relational DBMSs is that the languages of these systems (usually SQL) are non-procedural, i.e. in the operator of such a language, some action on the database is specified, but this specification is not a procedure, but only describes in some form the conditions for performing the desired action (remember the examples from the first lecture). Therefore, the compiler must decide how to execute a language statement before producing the program. Sufficiently applied complex methods optimization of operators, which we will consider in detail in the following lectures. The result of compilation is an executable program, represented on some systems in machine code, but more often in executable internal machine-independent code. In the latter case, the actual execution of the statement is carried out using the runtime support subsystem, which is, in fact, an interpreter of this internal language.

Finally, separate database utilities usually separate out procedures that are too expensive to perform using the database language, for example, loading and unloading a database, collecting statistics, checking global database integrity, etc. Utilities are programmed using the DBMS kernel interface, and sometimes even penetrating into the kernel.

2.3. Example: System R

The main goals of the System R developers were the following:

· provide a non-navigational interface high level user with the system, allowing to achieve data independence and enable users to work as efficiently as possible;

· provide a variety of acceptable ways to use the DBMS, including programmable transactions, interactive transactions and report generation;

· support a dynamically changing database environment in which relationships, indexes, views, transactions and other objects can be easily added and destroyed without interrupting the normal functioning of the system;

· provide the ability for many users to work in parallel with one database, allowing for parallel modification of database objects, provided that the necessary means of protecting the integrity of the database are available;

· provide a means of restoring a consistent state of databases after various types of hardware failures or software;

· provide a flexible mechanism that allows you to define different views of stored data, and to limit user access to the database by selection and modification based on an authorization mechanism to these views;

· ensure system performance when performing the mentioned functions, comparable to the performance of existing low-level DBMSs.

Structural organization System R is fully consistent with the goals set during its development. The main structural components of System R are the relational storage system (RSS) and the SQL query compiler. RSS provides a fairly low-level interface, but sufficient for implementing SQL to access the database stored in the database. Transaction synchronization, change logging, and database disaster recovery are also among the features of RSS. The query compiler uses the RSS interface to access a variety of reference information(catalogues of relations, indexes, access rights, integrity conditions, conditional actions, etc.) and produces working programs that are subsequently executed also using the RSS interface. Thus, the system is naturally divided into two levels - the level of memory and synchronization management, which in fact does not depend on base language system queries, and the language level (SQL level), at which most System R problems are solved. Note that this independence is more conditional than absolute: the SQL language can be replaced by another language, but it must have approximately the same semantics.

The organization of a typical DBMS and the composition of its components corresponds to the set of functions we have considered. We have identified the following main functions of the DBMS:

data management in external memory;

management of RAM buffers;

transaction management;

logging and database recovery after failures;

maintaining database languages.

Logically, in a modern relational DBMS, one can distinguish the most internal part - the DBMS core (often called the Data Base Engine), the database language compiler (usually SQL), the runtime support subsystem, and a set of utilities. In some systems these parts are clearly distinguished, in others they are not, but logically such a division can be carried out in all DBMSs.

The DBMS kernel is responsible for managing data in external memory, managing RAM buffers, managing transactions, and logging. Accordingly, kernel components can be distinguished (at least logically, although in some systems these components are explicitly distinguished) as a data manager, a buffer manager, a transaction manager, and a log manager. The functions of these components are interrelated, and to ensure correct operation of the DBMS, all these components must interact using carefully thought out and proven protocols. The DBMS kernel has its own interface, which is not directly accessible to users and is used in programs produced by the SQL compiler (or in the subsystem for supporting the execution of such programs) and database utilities. The DBMS kernel is the main resident part of the DBMS. When using a client-server architecture, the kernel is the main component of the server part of the system.

The main function of a database language compiler is to compile database language statements into some executable program.

The main problem with relational DBMSs is that the languages of these systems (usually SQL) are non-procedural, i.e. in the operator of such a language, some action on the database is specified, but this specification is not a procedure, it only describes in some form the conditions for performing the desired action. Therefore, the compiler must decide how to execute a language statement before producing the program. Quite complex methods of operator optimization are used, which we will consider in detail in the following lectures. The result of compilation is an executable program, represented on some systems in machine code, but more often in executable internal machine-independent code. In the latter case, the actual execution of the statement is carried out using the runtime support subsystem, which is, in fact, an interpreter of this internal language.

Early (pre-relational) subds

Before going into detail and consistent study relational database systems, let us dwell briefly on early (pre-relational) DBMSs. This makes sense for three reasons: firstly, these systems historically preceded relational ones and to correctly understand the reasons for the widespread transition to relational systems, you need to know at least something about their predecessors; secondly, the inside relational systems is largely based on the use of methods early systems; third, some knowledge of early systems will be useful for understanding the development of post-relational DBMSs.

We limit ourselves to considering only general approaches to the organization of two types of early systems, namely, hierarchical and network database management systems. The most common characteristics of early systems:

a) These systems have been in active use for many years, longer than any relational DBMS. In fact, some of the early systems are in use even today, huge databases have been accumulated, and one of the pressing problems of information systems is their use in conjunction with modern systems.

b) All early systems were not based on any abstract models. The concept of a data model actually came into use among database specialists only together with the relational approach. Abstract representations of early systems emerged later based on analysis and identification common features for various specific systems.

c) In early systems, database access was done at the record level. Users of these systems performed explicit navigation in the database using programming languages enhanced with DBMS functions. Interactive access to the database was supported only by creating appropriate application programs with their own interface.

e) The navigational nature of early systems and access to record-level data forced all optimization of database access to be done by the user himself, without any system support.

f) After the advent of relational systems, most early systems were equipped with "relational" interfaces. However, in most cases this did not make them truly relational systems, since it was still possible to manipulate the data in its natural way.

Typical organization of a modern DBMS. Naturally, the organization of a typical DBMS and the composition of its components corresponds to the set of functions we have considered.

Let us recall that we have identified the following main functions of the DBMS: managing data in external memory managing RAM buffers managing transactions logging and recovering the database after failures maintaining database languages. Logically, in a modern relational DBMS, one can distinguish the most internal part - the DBMS core, often called the Data Base Engine, a database language compiler, usually SQL, a runtime support subsystem, and a set of utilities.

In some systems these parts are clearly distinguished, in others they are not, but logically such a division can be carried out in all DBMSs. The DBMS kernel is responsible for managing data in external memory, managing RAM buffers, managing transactions, and logging. Accordingly, it is possible to distinguish such kernel components at least logically, although on some systems these components are distinguished explicitly, such as the data manager, buffer manager, transaction manager, and log manager.

As you can understand from the first part of this lecture, the functions of these components are interrelated, and to ensure the correct operation of the DBMS, all these components must interact using carefully thought out and proven protocols. The DBMS kernel has its own interface, which is not directly accessible to users and is used in programs produced by the SQL compiler or in the subsystem for supporting the execution of such programs and database utilities. The DBMS kernel is the main resident part of the DBMS. When using a client-server architecture, the kernel is the main component of the server part of the system.

Quite complex methods of operator optimization are used, which we will consider in detail in the following lectures. The result of compilation is an executable program, represented on some systems in machine code, but more often in executable internal machine-independent code. In the latter case, the actual execution of the statement is carried out using the runtime support subsystem, which is, in fact, an interpreter of this internal language.

Strengths and weaknesses of early systems Before moving on to a detailed and consistent study of relational database systems, let us briefly dwell on early pre-relational DBMSs. This makes sense for three reasons: firstly, these systems historically preceded relational ones, and to correctly understand the reasons for the widespread transition to relational systems you need to know at least something about their predecessors; secondly, the internal organization of relational systems is largely based on the use methods of early systems Third, some knowledge of the field of early systems will be useful for understanding the development of post-relational DBMSs. Note that in this lecture we are limited to considering only the general approaches to the organization of three types of early systems, namely, inverted list systems, hierarchical and network database management systems.

We will not touch on the features of any specific systems; this would lead to the presentation of many technical details that, although interesting, are somewhat aside from the main goal of our course.

Details can be found in the recommended literature. Let's start with some of the most general characteristics early systems a. These systems have been in active use for many years, longer than any relational database management system has been in use. In fact, some of the early systems are in use even today, huge databases have been accumulated, and one of the pressing problems of information systems is the use of these systems in conjunction with modern systems. b. All early systems were not based on any abstract models.

As we mentioned, the concept of a data model actually came into use among database specialists only together with the relational approach. Abstract representations of early systems appeared later on the basis of analysis and identification of common features among various specific systems. c. In early systems, database access was performed at the record level.

Users of these systems performed explicit navigation in the database using programming languages enhanced with DBMS functions. Interactive access to the database was supported only by creating appropriate application programs with their own interface. d. We can assume that the level of early DBMS tools correlates with the level of file systems in approximately the same way as the level of the Cobol language correlates with the level of Assembly language.

Note that with this view, the level of relational systems corresponds to the level of the Ada or APL languages. e. The navigational nature of early systems and access to record-level data forced the user to perform all database access optimizations themselves, without any system support. f. After the advent of relational systems, most early systems were equipped with relational interfaces. However, in most cases this did not make them truly relational systems, since it was still possible to manipulate the data in its natural way. 3.1. Main features of systems based on inverted lists. Among the most famous and typical representatives of such systems are Datacom DB from Applied Data Research, Inc. ADR, aimed at use on mainstream machines from IBM and Adabas Software company A.G. Organization of data access based on inverted lists is used in almost all modern relational DBMSs, but in these systems users do not have direct access to inverted index lists. By the way, when we look at the internal interfaces of relational DBMSs, you will see that they are very close to the user interfaces of systems based on inverted lists. 3.1.1. Data Structures A database organized using inverted lists is similar to a relational database, but with the difference that the tables stored and the paths to access them are visible to users.

In this case a. The rows of tables are ordered by the system in some physical sequence. b. The physical ordering of the rows of all tables can be determined and this is done for the entire database, for example, in Datacom DB. c. For each table, you can define an arbitrary number of search keys for which indexes are built.

These indexes are automatically maintained by the system but are clearly visible to users. 3.1.2. Data Manipulation Two classes of operators are supported: a. Operators that set the address of a record, including direct search operators, for example, finding the first record of a table along some access path, operators that find a record in terms of a relative position from previous entry along some access path. Operators over addressable records A typical set of operators LOCATE FIRST - find the first record of table T in physical order returns the address of the record LOCATE FIRST WITH SEARCH KEY EQUAL - find the first record of table T with a given search key value K returns the address of the record LOCATE NEXT - find the first record, the next for a record with a given address in a given access path, returns the address of the record LOCATE NEXT WITH SEARCH KEY EQUAL - find the next record of table T in the order of the search path with the given value K there must be a correspondence between the scanning method used and the key K returns the address of the record LOCATE FIRST WITH SEARCH KEY GREATER - find the first record of table T in the order of the search key K with a key field value greater than the specified value K returns the address of the record RETRIVE - select the record with specified address UPDATE - update a record with the specified address DELETE - delete a record with the specified address STORE - insert a record into the specified table The operation generates the address of the record. 3.1.3. Integrity Constraints General rules There are no definitions of database integrity.

Some systems support restrictions on the uniqueness of the values of some fields, but generally everything is left to the application program. 3.2. Hierarchical systems A typical representative of the most famous and widespread is the Information Management System IMS from IBM. The first version appeared in 1968. Many databases are still supported, which creates significant problems with the transition to both new technology DB, and on new technology. 3.2.1. Hierarchical Data Structures A hierarchical database consists of an ordered set of trees; more precisely, an ordered set of multiple instances of the same type of tree.

A tree type consists of a single root record type and an ordered set of zero or more subtree types, each of which is a tree type. A tree type in general is a hierarchically organized collection of record types.

Example of a hierarchical database schema tree type Here, Department is the ancestor of Chief and Employees, and Chief and Employees are children of Department. Relationships are maintained between record types.

A database with such a schema might look like this: we show one instance of the tree All instances of a given descendant type with common copy type of ancestor are called twins.

The complete traversal order is defined for the database - top-to-bottom, left-to-right.

IMS used original and non-standard terminology: segment instead of record, and a database record meant the entire tree of segments. 3.2.2. Data Manipulation Examples of typical operators for manipulating hierarchically organized data could be the following Find a specified database tree for example, department 310 Move from one tree to another Move from one record to another within a tree for example, from department to the first employee Move from one record to another in order hierarchy traversal Insert a new record at the specified position Delete the current record. 3.2.3. Integrity Constraints Automatically maintains the integrity of references between ancestors and descendants.

The basic rule is that no child can exist without its parent.

Note that similar maintenance of integrity across links between records that are not part of the same hierarchy is not supported. An example of such an external link could be the contents of the Caf Number field in an instance of the Curator record type. Hierarchical systems supported some form of database views based on a hierarchy constraint.

An example of a representation of the above database could be hierarchy 3.3. Network systems A typical representative is the Integrated Database Management System IDMS from Cullinet Software, Inc., designed for use on IBM mainstream machines running most operating systems.

The system's architecture is based on proposals from the Data Base Task Group DBTG of the Conference on Data Systems Languages CODASYL Programming Languages Committee, the organization responsible for defining the Cobol programming language. The DBTG report was published in 1971 and in the 70s several systems appeared, including IDMS. 3.3.1. Network data structures The network approach to data organization is an extension of the hierarchical one.

In hierarchical structures, a child record must have exactly one ancestor; in a network data structure, a child can have any number of ancestors. A network database consists of a set of records and a set of connections between these records, and more precisely, a set of instances of each type from a set of record types specified in the database schema and a set of instances of each type from a given set of connection types.

The relationship type is defined for two types of ancestor and descendant records. A relationship type instance consists of one instance of an ancestor record type and an ordered set of instances of a child record type. For a given type of relationship L with an ancestor record type P and a descendant record type C, the following two conditions must be met. Each instance of type P is an ancestor in only one instance of L. Each instance of C is a descendant in no more than one instance of L. The formation of relationship types does not special restrictions apply, for example the following situations are possible: a. A descendant record type in one L1 link type can be an ancestor record type in another L2 link type as in a hierarchy. b. A given record type P can be an ancestor record type in any number of relationship types. c. A given record type P can be a child record type in any number of relationship types. d. There can be any number of link types with the same ancestor record type and the same child record type, and if L1 and L2 are two link types with the same ancestor record type P and the same child record type C, the rules by which kinship is formed may differ in different relationships. e. Record types X and Y can be ancestor and child in one relationship, and descendant and ancestor in another. f. An ancestor and a child can be of the same record type. A simple example of a database network diagram 3.3.2. Data manipulation An approximate set of operations may be as follows Find a specific record in a set of similar records of engineer Sidorov Go from ancestor to the first descendant along some connection to the first employee of department 310 Go to the next descendant in some connection from Sidorov to Ivanov Go from descendant to ancestor along some connection find Sidorov's department Create a new record Destroy a record Modify a record Include in a connection Exclude from a connection Rearrange into another connection, etc. 3.3.3. Integrity constraints In principle they are not required to be maintained, but sometimes they require referential integrity as in the hierarchical model. 3.4. Advantages and disadvantages Strengths of early DBMSs Developed data management tools in external memory at a low level Ability to manually build effective application systems The ability to save memory by separating subobjects in network systems. Disadvantages Too difficult to use Actually requires knowledge of the physical organization Application systems depend on this organization Their logic is overloaded with the details of organizing access to the database. Theoretical Foundations We begin the study of relational databases and relational database management systems.

This approach is the most common at present, although along with generally recognized advantages it also has a number of disadvantages.

The advantages of the relational approach include the presence of a small set of abstractions that make it possible to relatively simply model most common subject areas and allow precise formal definitions, while remaining intuitive; the presence of a simple and at the same time powerful mathematical apparatus, based mainly on set theory and mathematical logic and providing a theoretical basis for the relational approach to database organization, the possibility of non-navigational data manipulation without the need to know the specific physical organization of databases in external memory.

Relational systems did not immediately become widespread.

While the main theoretical results in this area were obtained back in the 70s, and at the same time the first prototypes of relational DBMSs appeared, for a long time it was considered impossible to achieve effective implementation of such systems.

However, the advantages noted above and the gradual accumulation of methods and algorithms for organizing and managing relational databases led to the fact that already in the mid-80s, relational systems practically replaced early DBMSs from the world market. Currently, the main subject of criticism of relational DBMSs is not their lack of efficiency, but the inherent limitations of these systems, a direct consequence of their simplicity when used in so-called non-traditional areas; the most common examples are design automation systems, which require extremely complex data structures.

Another frequently noted disadvantage of relational databases is the inability to adequately reflect semantics subject area.

In other words, the ability to represent knowledge about the semantic specifics of a domain in relational systems is very limited. Modern research in the field of post-relational systems are mainly devoted to eliminating these shortcomings.

General concepts of the relational approach to database organization. Basic concepts and terms In this lecture we will introduce, on a relatively informal level, the basic concepts of relational databases, and also define the essence relational model data.

The main purpose of the lecture is to demonstrate the simplicity and possibility of intuitive interpretation of these concepts. Further lectures will provide more formal definitions on which the mathematical theory of relational databases is based 4.1. Basic Concepts relational databases The basic concepts of relational databases are data type, domain, attribute, tuple, primary key and attitude.

To begin with, let us show the meaning of these concepts using the example of the EMPLOYEES relation, which contains information about employees of a certain organization 4.1.1. Data type The concept of data type in the relational data model is completely adequate to the concept of data type in programming languages.

Typically, modern relational databases allow the storage of character, numeric data, bit strings, specialized numeric data such as money, as well as special temporal data such as date, time, time interval. The approach to expanding the capabilities of relational systems is developing quite actively; systems of the Ingres Postgres family have the corresponding capabilities with abstract data types. In our example, we are dealing with three types of data: character strings, integers and money. 4.1.2. Domain The concept of a domain is more specific to databases, although it has some analogies with subtypes in some programming languages.

In the most general form, a domain is defined by specifying some basic data type to which the elements of the domain belong, and an arbitrary logical expression, applied to an element of the data type.

If the evaluation of this Boolean expression returns true, then the data element is a domain element. The most correct intuitive interpretation of the concept of a domain is to understand the domain as an admissible potential set of values of a given type. For example, the Names domain in our example is defined on a basic character string type, but its values can only include strings that can represent a name in particular, such strings cannot begin with a soft character. It should also be noted that the semantic load of the concept of domain data is considered comparable only when they belong to the same domain.

In our example, the values of the domains Skip Numbers and Group Numbers are of type integer, but are not comparable. Note that most relational DBMSs do not use the concept of a domain, although Oracle V.7 already supports it. 4.1.3. Relational schema, database schema A relational schema is a named set of pairs of an attribute name, a domain name, or a type name if the domain concept is not supported. The degree or arity of a relation scheme is the cardinality of this set.

The degree of the EMPLOYEES relation is four, that is, it is 4-ary. If all attributes of one relation are defined on different domains, it makes sense to use the names of the corresponding domains to name attributes, not forgetting, of course, that this is just in a convenient way naming and does not eliminate the distinction between the concepts of domain and attribute. A database schema in a structural sense is a set of named relationship schemas. 4.1.4. Tuple, relation A tuple corresponding to a given relation schema is a set of attribute name, value pairs that contains one occurrence of each attribute name belonging to the relation schema. The value is a valid domain value for the attribute or data type if the domain concept is not supported. Thus, the degree or arity of the tuple, i.e. the number of elements in it coincides with the arity of the corresponding relation scheme.

Simply put, a tuple is a collection of named values of a given type. A relation is a set of tuples corresponding to a single relation schema.

Sometimes, to avoid confusion, they say schema relation and instance relation, sometimes the schema of a relation is called the head of the relation, and the relation as a set of tuples is called the body of the relation. In fact, the concept of a relation schema is closest to the concept of a structural data type in programming languages.

It would be quite logical to allow one to separately define a relation schema, and then one or more relations with that schema. However, this is not common practice in relational databases. The schema name of a relation in such databases is always the same as the name of the corresponding instance relation. In classic relational databases, once the database schema is defined, only the instance relationships are modified. New tuples can appear in them and existing tuples can be deleted or modified.

However, in many implementations it is also possible to change the database schema by defining new and changing existing relation schemas. This is commonly called database schema evolution. The usual everyday representation of a relation is a table, the header of which is the schema of the relation, and the rows are the tuples of the instance relation; in this case, the attribute names name the columns of this table. Therefore, sometimes they say table column, meaning a relational attribute. When we move on to consider the practical issues of organizing relational databases and management tools, we will use this everyday terminology.

This terminology is followed in most commercial relational DBMSs. A relational database is a set of relationships whose names are the same as the names of the relationship schemas in the database schema. As you can see, the main structural concepts of the relational data model, with the exception of the concept of a domain, have a very simple intuitive interpretation, although in the theory of relational databases they are all defined absolutely formally and precisely.

Methods used to solve the problem. Basic writing tool of this project Delphi was taken. Open architecture Delphi Borland, in developing its object-oriented development tools, has clearly come to the conclusion that code reuse and object orientation are not the only means of increasing programmer productivity. With the advent of Delphi, a developer can not only create and provide ready-to-use components to his colleagues, but also extend functionality environment in which it operates, using so-called open interfaces. This approach allows you to use Delphi as a common core of the set tools at all stages of creating application systems - starting with CASE systems and ending with the generation of documentation for created projects, with their full integration into the holy of holies of any programming environment - IDE. Let's look at the main options for expanding functionality Delphi environments in order to evaluate the degree of openness of the architecture of this tool.

The building blocks of applications are components. As you know, the fundamental basis of Delphi visual tools is the component approach.

What does it consist of? Delphi is built on the basis of the object-oriented language compiler Object Pascal, which continues the line of Pascal dialects - Turbo Pascal and Borland Pascal. As it evolved, each Borland Pascal implementation included new syntax extensions that reflected the latest advances in programming languages.

If we approach the assessment of the qualitative stages of Pascal development, three of them should be especially noted, aimed at supporting the concept of code reuse; modular architecture, with the ability to separate the interface and descriptive parts of Turbo Pascal 4.0 tools object orientation, with all its inherent characteristics - inheritance, encapsulation and polymorphism Turbo Pascal 5.5 support for RTTI Run-Time Type Information mechanisms, allowing you to obtain information about the basic characteristics of object class types and their object instances using language tools directly built into the system library and structure organizing class descriptions Delphi 1.0 - Object Pascal A consequence of the introduction of RTTI support was the possibility of creating a visual application development tool, which is what Delphi is.

At a certain level of the inheritance hierarchy of the Delphi base class library, the TPersistent class appears, providing the necessary level of abstraction for stream input/output objects of class instances. Its successor is the TComponent class, which defines the basic behavior of Delphi VCL Visual Component Library components in design-time mode, the design stage

End of work -

This topic belongs to the section:

Development of software for the Department of Resuscitation and Intensive Care of Newborns of the City Hospital N1 of Surgut

Strengths and weaknesses of early systems 1. Main features of systems based on inverted lists 33 3.1.1. Data structures 2.. The demographic situation of our region is quite favorable. The birth rate not only does not fall from year to year, but also increases, but the difficult conditions of the far north and constantly...

If you need additional material on this topic, or you did not find what you were looking for, we recommend using the search in our database of works:

What will we do with the received material:

If this material was useful to you, you can save it to your page on social networks:

Typical organization of a modern DBMS

OBD control systems

File systems.

The first AIS appeared in the 60s. It was based on file systems. File systems are a set of programs designed to solve a particular problem. Over time, the following disadvantages of file systems became apparent:

1) Data separation and isolation. Data is isolated in separate files, which requires significant labor costs when retrieving information, often requiring synchronous processing of several files.

2) Data duplication. Files different systems could intersect, i.e. contain the same data. This led to wasteful use disk memory or to a violation of data integrity (for example, information about an employee of an organization generated in the human resources department and in the accounting department may become contradictory).

3) Data dependency. Programs in an algorithmic language contain a description of data; when changing their structure, it was necessary to change the source texts of the programs

4) Incompatibility of file formats . Since the file structure is determined by the application code, it also depends on the programming language of that application. For example, the structure of a file created by a COBOL program may be completely different from the structure of the file created by created by the program in C language. The direct incompatibility of such files makes it difficult to process them together.

5) Fixed queries and rapid increase in the number of applications . As users work, their requirements for implementing new queries to data stored in files are constantly increasing. At the same time, requests are implemented by the programmer in the form of applications, which, accordingly, leads to an increase in their number. The implementation process often violates security or data integrity measures, does not provide recovery options in the event of hardware or software failure, and access to files is often limited to a single user.

History of development.

Developers (and users) realized that using only files is very expensive for production, and began to look for ways to solve the problems that arose. For this purpose, a hierarchical data model was first developed. In the mid-60s, IBM Corporation, together with the company NAA (North American Aviation, currently - Rockwell International) developed the first DBMS - the hierarchical IMS (Information Management System) system. The essence of the hierarchical system was as follows: there is a node - a collection of data that describes an object, all nodes are connected to each other in a strictly hierarchical order. The system had its advantages (ease of description) and disadvantages (duplication of data).

Then came network model data. The basic concepts are the same. But more diverse connections emerged.

In 1970, British scientist Edgar Codd published the work “A Relational Model of Data for Large Shared Data Banks.” This work is considered the first work on relational data storage. After its release, active work begins on the development of this information storage system. IN given time Object-Oriented Databases are being actively developed.

DBMS functions

Direct data management in external memory.

This function includes providing the necessary external memory structures both for storing data directly included in the database and for service purposes, for example, to speed up data access in some cases (usually indexes are used for this). Some DBMS implementations actively use the capabilities of existing file systems, while others work down to the level of external memory devices. But we emphasize that in developed DBMSs, users in any case are not required to know whether the DBMS uses a file system, and if it does, how the files are organized. In particular, the DBMS supports its own naming system for database objects.

Managing RAM buffers.

DBMSs usually work with databases of significant size; at least this size is usually significantly larger than the available amount of RAM. It is clear that if, when accessing any data element, an exchange is made with external memory, then the entire system will operate at the speed of the external memory device. Practically the only way The real increase in this speed is buffering the data in RAM. Moreover, even if the operating system performs system-wide buffering (as in the case of the UNIX OS), this is not enough for the purposes of the DBMS, which has much more information about the usefulness of buffering a particular part of the database. Therefore, developed DBMSs support their own set of RAM buffers with their own buffer replacement discipline. Note that there is a separate direction of DBMS, which is focused on the constant presence of the entire database in RAM. This direction is based on the assumption that in the future the amount of RAM in computers will be so large that there will be no need to worry about buffering. These works are currently in the research stage.

Transaction management.

A transaction is a sequence of operations on a database, considered by the DBMS as a single whole. Either the transaction completes successfully and the DBMS commits (COMMIT) the database changes made by this transaction to external memory, or none of these changes have any effect on the state of the database. The concept of a transaction is necessary to maintain the logical integrity of the database. Maintaining the transaction mechanism is prerequisite even single-user DBMSs (if, of course, such a system deserves the name DBMS). But the concept of transaction is much more important in multi-user DBMSs. The property that each transaction begins with an intact state of the database and leaves this state intact after its completion makes it very convenient to use the concept of a transaction as a unit of user activity in relation to the database. With appropriate management of concurrent transactions by the DBMS, each user can, in principle, feel like the only user of the DBMS (in fact, this is a somewhat idealized view, since in some cases users of multi-user DBMSs can feel the presence of their colleagues). Related to transaction management in a multi-user DBMS are the important concepts of transaction serialization and a serial plan for executing a mixture of transactions. Serialization of concurrently executing transactions is understood as such an order of scheduling their work in which the total effect of a mixture of transactions is equivalent to the effect of some sequential execution of them. A serial plan for executing a mixture of transactions is one that results in serialization of transactions. It is clear that if it is possible to achieve truly serial execution of a mixture of transactions, then for each user on whose initiative the transaction was formed, the presence of other transactions will be invisible (except for some slowdown in operation compared to the single-user mode). There are several basic transaction serialization algorithms. In centralized DBMSs, the most common algorithms are those based on synchronized acquisitions of database objects. When using any serialization algorithm, situations of conflicts between two or more transactions to access database objects are possible. In this case, to maintain serialization, it is necessary to roll back (eliminate all changes made to the database) one or more transactions. This is one of the cases when a user of a multi-user DBMS can actually (and quite unpleasantly) feel the presence of other users’ transactions in the system.

Journaling.

One of the main requirements for a DBMS is the reliability of data storage in external memory. Storage reliability means that the DBMS must be able to restore the last consistent state of the database after any hardware or software failure. Typically, two possible types of hardware failures are considered: so-called soft failures, which can be interpreted as a sudden stop of the computer (for example, an emergency power off), and hard failures, characterized by the loss of information on external memory media. Examples of software failures can be: a DBMS crash (due to an error in the program or as a result of some hardware failure) or a user program crash, as a result of which some transaction remains incomplete. The first situation can be thought of as a special type of soft hardware failure; when the latter occurs, it is necessary to eliminate the consequences of only one transaction. It is clear that in any case, to restore the database you need to have some additional information. In other words, maintaining the reliability of data storage in a database requires data storage redundancy, and the part of the data that is used for recovery must be stored especially reliably. The most common method of maintaining such redundant information is to maintain a database change log. The journal is a special part of the database, not available to users DBMS and maintained with special care (sometimes two copies of the log are maintained, located on different physical disks), which receives records of all changes to the main part of the database. In different DBMSs, database changes are logged at different levels: sometimes a log entry corresponds to a certain logical operation database changes (for example, the operation of deleting a row from a relational database table), sometimes a minimal internal operation of modifying an external memory page; some systems use both approaches simultaneously. In all cases, the strategy of “proactive” logging is followed (the so-called Write Ahead Log - WAL protocol). Roughly speaking, this strategy is that a record of a change in any database object must enter the external memory of the log before the changed object enters the external memory of the main part of the database. It is known that if the WAL protocol is correctly observed in the DBMS, then using the log you can solve all the problems of restoring the database after any failure. The simplest recovery situation is an individual transaction rollback. Strictly speaking, this does not require a system-wide database change log. It is enough for each transaction to maintain a local log of database modification operations performed in this transaction, and to roll back the transaction by performing reverse operations, following from the end of the local log. Some DBMSs do this, but in most systems local logs are not supported, and individual transaction rollback is performed using the system-wide log, for which all records from one transaction are linked in a reverse list (from end to beginning). During a soft failure, the external memory of the main part of the database may contain objects modified by transactions that were not completed at the time of the failure, and there may be no objects modified by transactions that completed successfully at the time of the failure (due to the use of RAM buffers, the contents of which are lost during a soft failure ). If the WAL protocol is followed, the external log memory must be guaranteed to contain records related to modification operations of both types of objects. The goal of the recovery process after a soft failure is the state of the external memory of the main part of the database, which would arise if the changes of all completed transactions were recorded in the external memory and which would not contain any traces of unfinished transactions. In order to achieve this, they first roll back uncompleted transactions (undo), and then replay (redo) those operations of completed transactions whose results are not displayed in external memory. This process contains many subtleties associated with general organization buffer and log management. We will look at this in more detail in the corresponding lecture. To restore a database after a hard failure, use a log and an archived copy of the database. Roughly speaking, an archival copy is full copy DB by the time the log starts filling (there are many options for a more flexible interpretation of the meaning archival copy). Of course for normal recovery The database after a hard crash must ensure that the log does not disappear. As already noted, particularly stringent requirements are imposed on the safety of the log in external memory in a DBMS. Then database recovery consists of reproducing from the archive copy the work of all transactions that ended at the time of the failure. In principle, it is even possible to replay pending transactions and continue their operation after the recovery is complete. However, in real systems this is usually not done, since the recovery process after a hard failure is quite lengthy.

Support for database languages.

To work with databases, special languages are used, generally called database languages. Early DBMSs supported several languages specialized in their functions. Most often, two languages were distinguished - the database schema definition language (SDL - Schema Definition Language) and the data manipulation language (DML - Data Manipulation Language). SDL served mainly to define the logical structure of the database, i.e. the structure of the database as it appears to users. The DML contained a set of data manipulation operators, i.e. operators that allow you to enter data into the database, delete, modify or select existing data. Modern DBMSs usually support a single integrated language that contains all the necessary tools for working with a database, starting from its creation, and providing a basic user interface with databases. The standard language of the currently most common relational DBMS is SQL (Structured Query Language). Let us list the main functions of a relational DBMS supported at the “language” level (i.e., functions supported when implementing the SQL interface). First of all, the SQL language combines the tools of SDL and DML, i.e. allows 25 to define a relational database schema and manipulate data. At the same time, naming database objects (for a relational database - naming tables and their columns) is supported at the language level in the sense that the SQL compiler converts object names into their internal identifiers based on specially supported service catalog tables. The internal part of the DBMS (kernel) does not work with the names of tables and their columns at all. The SQL language contains special tools for defining database integrity constraints. Again, integrity constraints are stored in special catalog tables, and database integrity control is ensured at the language level, i.e. When compiling database modification operators, the SQL compiler, based on the integrity constraints existing in the database, generates the corresponding program code. Special SQL operators allow you to define so-called database views, which are actually queries stored in the database (the result of any query to a relational database is a table) with named columns. For the user, a view is the same table as any base table stored in the database, but with the help of views you can limit or, conversely, expand the visibility of the database for specific user. Representations are also maintained at the linguistic level. Finally, authorization of access to database objects is also carried out on the basis of a special set SQL statements. The idea is that to execute different types of SQL statements, the user must have different permissions. The user who created the database table has full set permissions to work with this table. These powers include the power to delegate all or part of the power to other users, including the transfer power. User permissions are described in special catalog tables, and permission control is supported at the language level. A more precise description of possible SQL-based implementations of these functions will be given in the sections on SQL language and its implementation.

Typical organization of a modern DBMS

· data management in external memory;

· management of RAM buffers;

· transaction management;

· logging and database recovery after failures;

· maintaining database languages.

Logically, in a modern relational DBMS, the following DBMS components can be distinguished:

DBMS kernel (often called Data Base Engine),

Database language compiler (usually SQL),

Runtime support subsystem,

A set of utilities.

In some systems these parts are clearly distinguished, in others they are not, but logically such a division can be carried out in all DBMSs. DBMS kernel is responsible for managing data in external memory, managing RAM buffers, transaction management and logging. Accordingly, the kernel components can be identified (at least logically, although some systems explicitly identify these components) as a data manager, a buffer manager, a transaction manager, and a log manager. The functions of these components are interrelated, and to ensure correct operation of the DBMS, all these components must interact using carefully thought out and proven protocols. The DBMS kernel has its own interface, which is not directly accessible to users and is used in programs produced by the SQL compiler (or in the subsystem for supporting the execution of such programs) and database utilities. The DBMS kernel is the main part of the DBMS. When using a client-server architecture, the kernel is the main component of the server part of the system.

Main function language compiler A database is a compilation of database language statements into some executable program. The main problem with relational DBMSs is that the languages of these systems (usually SQL) are non-procedural, i.e. in the operator of such a language, some action on the database is specified, but this specification is not a procedure, but only describes in some form the conditions for performing the desired action. Therefore, the compiler must decide how to execute a language statement before producing the program. The result of compilation is an executable program, represented on some systems in machine code, but more often in executable internal machine-independent code. In the latter case, the actual execution of the operator is carried out using runtime support subsystems, which is, in essence, an interpreter of this internal language. Finally, in separate database utilities Usually, procedures are identified that are too expensive to perform using the database language, for example, loading and unloading a database, collecting statistics, global checking the integrity of the database, etc. Utilities are programmed using the DBMS kernel interface, and sometimes even penetrating into the kernel.

Questions for testing

File, file system. Classification of file systems. Basic approaches to protecting file systems.

From an application point of view file is a named area of external memory that can be written to and read from. The file management system takes care of allocating external memory, mapping file names to appropriate external memory addresses, and providing access to data.

Term file system(file system) is used to denote software system, which manages files, and a file archive stored in external memory.

Isolated file systems- each file archive (full directory tree) was located entirely on one disk package or logical drive(as on Windows), the full file name begins with the name of the disk device.

Centralized(as in Multics) - provided the ability to represent the entire collection of directories and files in the form of a single tree. The full file name began with name root directory, and the user was not required to worry about installing any specific disks on the disk device (that is, one file could be distributed on a disk, a floppy disk, and two screws).

Mixed(*nix) - a file system is created, and then mount is used to connect other devices with their own file directories.

Multi-user protection: mandatory And discretionary approaches. Mandatory: Each user has a separate credential to work with each file or does not have it, a lot of additional information.

Discretionary- each registered user corresponds to a pair of integer identifiers: the identifier of the group to which the user belongs and his own identifier. Accordingly, for each file, the full identifier of the user (own identifier plus the group identifier) who created this file is stored, and it is marked what actions with the file he himself can perform, what actions with the file are available to other users of the same group, and what they can do with file by users of other groups. For each file, the ability to perform three actions is controlled: read, write and execute. Actually, this is how it is in *nix.

DBMS. Basic functions of the DBMS. Typical organization of a modern DBMS.

DBMS(database management system) - Information system which provides third party programs access to structured data while providing some functionality:

management of logical consistency and data integrity in external memory;
management of RAM buffers (work directly with external disk device for a long time);
transaction management (with certain properties, question 3);
logging and database recovery after failures;
maintaining database languages (a universal query language that can be used from an external program and will allow you not to delve into the internal structure of data storage, for example SQL).

Typical organization of a modern DBMS. Logically, in a modern relational DBMS we can distinguish the most internal part - core DBMS (often called Data Base Engine), database language compiler(usually SQL) runtime support subsystem, set of utilities. In some systems these parts are clearly distinguished, in others they are not, but logically such a division can be carried out in all DBMSs.

DBMS kernel is responsible for managing data in external memory, managing RAM buffers, transaction management and logging.

Main function database language compiler is the compilation of database language statements into some executable program. The result of compilation is an executable program, represented on some systems in machine code, but more often in executable internal machine-independent code. In the latter case, the actual execution of the operator is carried out with the involvement of the subsystem runtime support, which is, in essence, an interpreter of this internal language.

In separate utilities Databases usually highlight procedures that are too expensive to perform using the database language, for example, loading and unloading a database, collecting statistics, checking global database integrity, etc.

Typical organization of a modern subd. support services

Early (pre-relational) subds

Development of software for the Department of Resuscitation and Intensive Care of Newborns of the City Hospital N1 of Surgut

What will we do with the received material:

Typical organization of a modern DBMS

Popular articles

Latest articles

Sections

Pages

Special projects

Contacts