Conventional oltp systems that come. OLTP technology tools


OLTP and OLAP systems

In the previous subsection it was noted that for an adequate representation subject area, ease of development and maintenance of a database, relationships should be reduced to third normal form (there are forms of normalization of higher orders, but in practice they are used quite rarely), that is, be highly normalized. However, weakly normalized relationships also have their advantages, the main one of which is that if the database is accessed mainly only with queries, and modifications and additions of data are carried out very rarely, then their sampling is much faster. This is explained by the fact that in weakly normalized relations their connection has already been made and processor time is not wasted on this. There are two classes of systems for which strongly and weakly normalized relations are more suitable.

Highly normalized data models are well suited for OLTP applications – On - Line Transaction Processing (OLTP) – applications operational processing transactions. Typical examples of OLTP applications are systems warehouse accounting, ticket orders, operational banking systems and others. Main function similar systems is to carry out large quantity short transactions. The transactions themselves are quite simple, but the problems are that there are a lot of such transactions, they are executed simultaneously, and if errors occur, the transaction must be rolled back and return the system to the state it was in before the transaction began. Almost all database queries in OLTP applications consist of insert, update, and delete commands. Selection queries are mainly intended to provide users with a selection of data from various types of directories. Thus, most of the requests are known in advance at the system design stage. Critical to OLTP applications is the speed and reliability of short data update operations. The higher the level of data normalization in OLTP applications, the faster and more reliable it is. Deviations from this rule can occur when, already at the development stage, some frequently occurring queries are known that require connecting relationships and the speed of execution of which significantly affects the operation of applications.

Another type of application is OLAP application – On - Line Analytical Processing (OLAP) – applications for online analytical data processing. This is a generalized term that characterizes the principles of building decision support systems - Decision Support System (DSS), data warehouses - Data Warehouse, data mining systems - Data Mining. Such systems are designed to find dependencies between data, to conduct dynamic analysis based on the “what if...” principle and similar tasks. OLAP applications operate with large amounts of data accumulated in the enterprise or taken from other sources. Such systems are characterized by the following features:

    new data is added to the system relatively rarely in large blocks, for example, once a month or quarter; Data added to the system is, as a rule, never deleted;
    Before loading, data undergoes various preparatory procedures related to bringing them to certain formats and the like; requests to the system are unregulated and quite complex; The speed of query execution is important, but not critical.

OLAP application databases are usually represented as one or more hypercubes, the dimensions of which represent reference data, and the cells of the hypercube itself store the values ​​of this data. Physically, a hypercube can be built on the basis of a special multidimensional data model - Multidimensional OLAP (MOLAP) or represented by means relational model data – Relational OLAP (ROLAP).

In OLAP systems that use a relational data model, it is advisable to store data in the form of weakly normalized relationships containing pre-computed basic totals. Data redundancy and related problems are not a problem here, since they are updated quite rarely and, along with the data update, the results are recalculated.

The characteristics and range of tasks effectively solved by each technology are illustrated in the following comparative table:

Characteristic

OLTP

OLAP

Purpose of the system

Registration, operational search and transaction processing, regulated analysis

Working with historical data, analytical processing, forecasting, modeling

Stored data

Operational, detailed

Covering a large period of time, aggregated

Data type

Structured

Various types

"Age" of data

Current (several months)

Historical (over the years) and projected

Data update rate

High, in small portions

Small, in large portions

Data aggregation level

Detailed data

Mainly aggregated data

Predominant operations

Data entry, search, update

Data analysis

How to use data

Predictable

Unpredictable

Transaction level

Database-wide

Kind of activity

Operational, tactical

Analytical, strategic

Priorities

Flexibility
User autonomy

A large number of executive employees

Relatively small number of management employees

Comparison of OLTP and OLAP

Characteristic

OLTP

OLAP

Nature of requests

Lots of simple transactions

Complex transactions

Stored data

Operational, detailed

Covering a large period of time, aggregated

Kind of activity

Operational, tactical

Analytical, strategic

Data type

Structured

Various types

System characteristic

Accounting system(OLTP)

OLAP

User interaction

Transaction level

Database-wide

Data used when the user accesses the system

Individual entries

Groups of records

Response time

Seconds

From a few seconds to a few minutes

Hardware resource usage

Stable

Dynamic

Nature of data

Mainly primary (most low level details)

Mostly derivatives (aggregate values)

Database access nature

Predefined or static access paths and data relationships

Undefined or dynamic access paths and data relationships

Data variability

High (data is updated with every transaction)

Low (data is rarely updated during the request)

Priorities

High performance High availability

Flexibility
User autonomy

Today, among the tools offered by the information technology market for processing and visualizing data for making management decisions, OLTP and OLAP technologies are most suitable. OLTP technology is focused on operational data processing, and the more modern OLAP technology is focused on interactive data analysis. Systems developed on their basis make it possible to achieve an understanding of the processes occurring at the control object through prompt access to various data slices (representations of the contents of databases, organized to reflect various aspects activities of the enterprise). In particular, by providing a graphical representation of data, OLAP is able to make processing results data easy for perception.

OLTP (Online Transaction Processing) - real-time transaction processing. A method of organizing a database in which the system works with small-sized transactions, but with a large flow, and at the same time the client requires the maximum from the system fast time answer.

IN modern DBMS Serialization of transactions is organized through a locking mechanism, i.e. During the execution of a transaction, the DBMS locks the database or part of it accessed by the transaction; the lock is maintained until the transaction is committed. If in progress parallel processing When another transaction attempts to access the locked data, transaction processing is suspended and resumed only after the transaction that locked the data completes and the lock is released. The smaller the object being blocked, the greater the efficiency of the database. A transaction that updates data across multiple network nodes is called DISTRIBUTED. If a transaction works with a database located on one node, then it is called LOCAL. From the user's point of view, local and distributed transactions should be processed in the same way, i.e. The DBMS must organize the process of executing transaction distribution so that all local transactions included in it are synchronously committed on all nodes affected by them distributed system. In this case, a distributed transaction should be committed only if all its constituent local transactions are committed, and if at least one of the local transactions is interrupted, the entire distributed transaction must be interrupted. To implement these requirements in practice, the DBMS uses a two-stage transaction commit mechanism.

1. The database server that commits a distributed transaction sends the “Prepare to commit” command to all network nodes registered to perform transactions. If at least one of the servers does not respond about readiness, then the distributed database server rolls back the local transaction on all nodes.

2. All local DBMSs are ready for committing, i.e. the server processes the distributed transaction, finishes committing it, sending a command to commit the transaction to all local servers.

OLAP (eng. online analytical processing, analytical processing in real time) is an information processing technology, including the compilation and dynamic publication of reports and documents. Used by analysts for fast processing complex queries to the database. Serves for preparing business reports on sales, marketing, management purposes, the so-called. data mining- data mining (a method of analyzing information in a database in order to find anomalies and trends without finding out semantic meaning records).

OLAP takes a snapshot of a relational database and structures it into a spatial query model. The stated processing time for queries in OLAP is about 0.1% of similar queries in a relational database.

An OLAP structure created from operational data is called an OLAP cube. A cube is created by joining tables using a star schema or a snowflake schema. At the center of the star schema is a fact table, which contains the key facts on which queries are made. Multiple dimension tables are joined to a fact table. These tables show how aggregated relational data can be analyzed. The number of possible aggregations is determined by the number of ways in which the original data can be hierarchically displayed.

For example, all clients can be grouped by city or by region of the country (West, East, North, etc.), so 50 cities, 8 regions and 2 countries will make up 3 levels of hierarchy with 60 members. Also, customers can be united in relation to products; if there are 250 products in 2 categories, 3 product groups and 3 production divisions, then the number of units will be 16560. When adding dimensions to the diagram, the number possible options quickly reaches tens of millions or more.

An OLAP cube contains basic data and information about dimensions (aggregates). The cube potentially contains all the information that might be needed to answer any queries. Due to the huge number of units, often a full calculation occurs only for some measurements, while for the rest it is performed “on demand”.

The challenge in using OLAP is creating queries, selecting reference data, and developing a schema, which is why most modern OLAP products come with a huge amount pre-configured queries. Another problem is in the underlying data. They must be complete and consistent

The first product to perform OLAP queries was Express (IRI). However, the term OLAP itself was coined by Edgar Codd, “the father of relational databases.” And Codd's work was funded by Arbor, a company that had released its own OLAP product, Essbase (later acquired by Hyperion, which was acquired by Oracle in 2007) the year before.

Other well-known OLAP products include Microsoft Analysis Services (formerly called OLAP Services, part SQL Server), Oracle OLAP Option, DB2 OLAP Server from IBM (in fact, EssBase with additions from IBM), SAP BW, SAS OLAP Server, products from Brio, BusinessObjects, Cognos, MicroStrategy and other manufacturers.

Greatest application of OLAP found in business planning and data warehouse products.

OLAP uses a multidimensional representation of aggregated data to provide quick access to strategically important information for in-depth analysis. OLAP applications must have the following basic properties:

  • multidimensional data representation;
  • support complex calculations;
  • correct consideration of the time factor.

Advantages of OLAP:

  • increasing the productivity of production personnel, developers application programs. Timely access to strategic information.
  • providing users with sufficient opportunities to contribute own changes into the diagram.
  • OLAP applications rely on data warehouses and OLTP systems to provide up-to-date data, thereby maintaining control over the integrity of corporate data.
  • reducing the load on OLTP systems and data warehouses.
OLAP OLTP
The data warehouse should include both internal corporate data and external data The main source of information entering the operational database is the activities of the corporation, and data analysis requires the involvement of external sources information (for example, statistical reports)
The volume of analytical databases is at least an order of magnitude larger than the volume of operational ones. To conduct reliable analysis and forecasting in a data warehouse, you need to have information about the corporation’s activities and market conditions over several years For operational processing, data is required over several last months
The data warehouse must contain uniformly presented and consistent information that is as close as possible to the content of operational databases. A component is needed to extract and “clean” information from different sources. In many large corporations, several operational information systems with their own databases simultaneously exist (for historical reasons). Operational databases may contain semantically equivalent information presented in different formats, with different indications of the time of its arrival, sometimes even contradictory
The set of queries to an analytical database cannot be predicted. Data warehouses exist to answer ad hoc queries from analysts. You can only count on the fact that requests will not come too often and will involve large amounts of information. The size of the analytical database encourages the use of queries with aggregates (sum, minimum, maximum, average, etc.) Data processing systems are created to solve specific problems. Information from the database is selected frequently and in small portions. Typically, a set of queries to an operational database is known already during design
When the variability of analytical databases is low (only when loading data), the ordering of arrays turns out to be reasonable, more quick methods indexing for mass sampling, storing pre-aggregated data Data processing systems by their nature are highly variable, which is taken into account in the DBMS used (normalized database structure, rows stored out of order, B-trees for indexing, transactional)
Information from analytical databases is so critical for a corporation that greater granularity of protection is required (individual access rights to certain lines and/or table columns) For data processing systems, information protection at the table level is usually sufficient.

The objectives of the OLTP system are the rapid collection and most optimal placement of information in the database, as well as ensuring its completeness, relevance and consistency. However, such systems are not designed for the most efficient, fast and multidimensional analysis.

Of course, it is possible to build reports based on the collected data, but this requires the business analyst either to constantly interact with an IT specialist, or to have special training in programming and computer technology.

What does the traditional decision-making process look like? Russian company using information system built on OLTP technology?

The manager gives the task to the information department specialist in accordance with his understanding of the issue. The information department specialist, having understood the task in his own way, builds a request operating system, receives an electronic report and brings it to the attention of the manager. This scheme of adoption is critical important decisions has the following significant shortcomings:

  • a negligible amount of data is used;
  • the process takes long time, since drawing up requests and interpreting an electronic report are rather tedious operations, while the manager may need to make a decision immediately;
  • The cycle must be repeated if it is necessary to clarify the data or consider the data from a different perspective, as well as if additional questions arise. Moreover, this slow cycle has to be repeated and, as a rule, several times, while even more time is spent on data analysis;
  • the difference in vocational training and areas of activity of an information technology specialist and manager. Often they think in different categories and, as a result, do not understand each other;
  • an unfavorable effect is exerted by such a factor as the complexity of electronic reports for perception. The manager does not have time to select the numbers of interest from the report, especially since there may be too many of them. It is clear that the work of preparing data most often falls on specialists in information departments. As a result, a competent specialist is distracted by routine and ineffective work of compiling tables, diagrams, etc., which, naturally, does not contribute to improving his skills.

There is only one way out of this situation, and it was formulated by Bill Gates in the form of the expression: “Information at your fingertips.” Initial information must be available to its direct consumer – the analyst. It is directly accessible. And the task of the information department employees is to create a system for collecting, accumulating, storing, protecting information and ensuring its availability to analysts.

The global industry has long been familiar with this problem, and for almost 30 years there have been OLAP technologies that are designed specifically to enable business analysts to operate with accumulated data and directly participate in their analysis. Such analytical systems are the opposite of OLTP systems in the sense that they eliminate information redundancy (“collapse” information). At the same time, it is obvious that it is the redundancy of primary information that determines the effectiveness of the analysis. DSS, combining these technologies, makes it possible to solve a number of problems:

  • Analytical tasks: calculating specified indicators and statistical characteristics of business processes based on retrospective information located in data warehouses.
  • Data visualization: presentation of all available information in user-friendly graphical and tabular form.
  • Obtaining new knowledge: determining the relationship and interdependence of business processes based on existing information(testing statistical hypotheses, clustering, finding associations and temporal patterns).
  • Simulation problems: mathematical modeling of the behavior of complex systems over an arbitrary period of time. In other words, these are tasks related to the need to answer the question: “What will happen if...?”
  • Control synthesis: determination of acceptable control actions that ensure the achievement of a given goal.
  • Optimization problems: integration of simulation, management, optimization and statistical methods of modeling and forecasting.

Enterprise managers using tools OLAP technologies, even without special training, can independently and quickly obtain all the information necessary for studying business patterns, and in the most various combinations and business analysis cross-sections. A business analyst has the opportunity to see in front of him a list of measurements and indicators of a business system. With such simple interface an analyst can build any reports, rearrange measurements (say, make cross-tabs - superimpose one measurement on another). In addition, he gets the opportunity to create his own functions based on existing indicators, conduct a “what if” analysis – obtain a result by specifying the dependencies of any indicators of business functions or a business function on indicators. In this case, the maximum response of any report does not exceed 5 seconds.

Did you know, What is the falsity of the concept of “physical vacuum”?

Physical vacuum - the concept of relativistic quantum physics, by which they mean the lowest (ground) energy state of a quantized field, which has zero momentum, angular momentum and other quantum numbers. Relativistic theorists call a physical vacuum a space completely devoid of matter, filled with an unmeasurable, and therefore only imaginary, field. Such a state, according to relativists, is not an absolute void, but a space filled with some phantom (virtual) particles. Relativistic quantum field theory states that, in accordance with the Heisenberg uncertainty principle, virtual, that is, apparent (apparent to whom?), particles are constantly born and disappeared in the physical vacuum: so-called zero-point field oscillations occur. Virtual particles of the physical vacuum, and therefore itself, by definition, do not have a reference system, since otherwise Einstein’s principle of relativity, on which the theory of relativity is based, would be violated (that is, an absolute measurement system with reference to the particles of the physical vacuum would become possible, which in turn would clearly refute the principle of relativity on which the SRT is based). Thus, the physical vacuum and its particles are not elements of the physical world, but only elements of the theory of relativity, which do not exist in the real world, but only in relativistic formulas, while violating the principle of causality (they appear and disappear without cause), the principle of objectivity (virtual particles can be considered, depending on the desire of the theorist, either existing or non-existent), the principle of factual measurability (not observable, do not have their own ISO).

When one or another physicist uses the concept of “physical vacuum,” he either does not understand the absurdity of this term, or is disingenuous, being a hidden or overt adherent of relativistic ideology.

The easiest way to understand the absurdity of this concept is to turn to the origins of its occurrence. It was born by Paul Dirac in the 1930s, when it became clear that denying the ether in its pure form, as was done by a great mathematician but a mediocre physicist, was no longer possible. There are too many facts that contradict this.

To defend relativism, Paul Dirac introduced the aphysical and illogical concept of negative energy, and then the existence of a “sea” of two energies compensating each other in a vacuum - positive and negative, as well as a “sea” of particles compensating each other - virtual (that is, apparent) electrons and positrons in a vacuum.

It is possible to identify certain classes of systems for which strongly or weakly normalized data models are more suitable.

Highly normalized data models are well suited for so-called OLTP applications(On-Line Transaction Processing (OLTP)-prompt transaction processing ). Typical examples of OLTP applications are warehouse accounting systems, ticket ordering systems, banking systems that perform money transfer operations, etc.

The main function of such systems is to perform a large number of short transactions. The transactions themselves look relatively simple, for example, “withdraw an amount of money from account A, add this amount to account B.”

The problem is that, firstly, there are a lot of transactions, secondly, they are executed simultaneously (several thousand concurrent users can be connected to the system), thirdly, if an error occurs, the transaction must be completely rolled back and return the system to the state that was before the start of the transaction (there should not be a situation where money was withdrawn from account A, but did not arrive to account B). Almost all database queries in OLTP applications consist of insert, update, and delete commands. Therefore, speed and reliability of short data update operations is critical for OLTP applications. The higher the level of data normalization in an OLTP application, the faster and more reliable it tends to be.

Another type of application is the so-called OLAP applications(On-Line Analytical Processing(OLAP) -operational analytical data processing ). This is a generalized term that characterizes the principles of construction decision support systems (Decision Support System-DSS),data warehouses(Data Warehouse),data mining systems (Data Mining). Such systems are designed to find dependencies between data (for example, you can try to determine how the sales volume of goods is related to the characteristics potential buyers), to perform a “what if…” analysis.

OLAP applications operate with large amounts of data already accumulated in OLTP applications, taken from them spreadsheets or from other data sources. Such systems are characterized by the following features:

New data is added to the system relatively rarely in large blocks (for example, once a quarter data based on the results of quarterly sales is downloaded from an OLTP application).

Data added to the system is usually never deleted.

Before loading the data passes various procedures"cleansing" related to the fact that one system can receive data from many sources that have various formats representations for the same concepts, data may be incorrect or erroneous.

Queries to the system are unregulated and, as a rule, quite complex.

The speed of query execution is important, but not critical.

Data from OLAP applications is typically represented as one or more hypercubes, the dimensions of which are reference data, and the cells of the hypercube itself store the actual data. For example, you can build a hypercube, the dimensions of which are: time (in quarters, years), type of product and company branches, and the cells store sales volumes. Such a hypercube will contain sales data various types goods by quarters and divisions. Based on this data, you can answer questions such as “which division has the best sales volumes this year?”, or “how are the sales trends of the Southwest region divisions this year compared to the previous year?”

Returning to the problem of data normalization, we can say that in OLAP systems using the relational data model (ROLAP), it is advisable to store data in the form of weakly normalized relations containing pre-computed basic totals. Great redundancy and the problems associated with it are not scary here, because the update occurs only when a new portion of data is loaded. In this case, both new data is added and the results are recalculated.

  • < Назад
  • Forward >

Characteristics of an OLTP system Large volume of information Often different databases for different departments Normalized schema, no duplication of information Intensive data changes Transactional mode of operation Transactions affect a small amount of data Processing current data - a snapshot Many clients Short response time - a few seconds Characteristics OLAP systems Large volume of information Synchronized information from different databases using common classifiers Unnormalized database schema with duplicates Data changes rarely, Change occurs through batch loading Complex ad-hoc queries are performed on large volumes of data with extensive use of groupings and aggregate functions. Time dependency analysis Small number of working users – analysts and managers More time response (but still acceptable) – a few minutes






Codd's rules for relational databases 1. Information rule. 2. Guaranteed access rule. 3. Rule for supporting invalid values. 4. Dynamic directory rule based on the relational model. 5. Rule of exhaustive data sublanguage. 6. View update rule. 7. Rule for adding, updating and deleting. 8. Rule of independence of physical data. 9. Rule of independence of logical data. 10. Rule of independence of integrity conditions. 11. Rule of independence of distribution. 12. Rule of uniqueness.


Codd's Rules for OLAP 1. Conceptual multidimensional representation. 2. Transparency. 3. Availability. 4. Consistent performance in report development. 5. Client-server architecture. 6. General multidimensionality. 7. Dynamic control of sparse matrices. 8. Multi-user support. 9. Unlimited cross operations. 10. Intuitive data manipulation. 11. Flexible options for receiving reports. 12. Unlimited dimension and number of aggregation levels.


Implementation of OLAP Types of OLAP - MOLAP (Multidimensional OLAP) servers - both detailed data and aggregates are stored in a multidimensional database. ROLAP (Relational OLAP) - detailed data is stored in a relational database; aggregates are stored in the same database in specially created service tables. HOLAP (Hybrid OLAP) - detailed data is stored in a relational database, and aggregates are stored in a multidimensional database.








Features of ROLAP - star schema 1. One fact table, which is highly denormalized 2. Several dimension tables, which are also denormalized 3. The primary key of the fact table is composite and has one column for each dimension 4. Aggregated data is stored together with the original Disadvantages If aggregates are stored together with the source data, then in measurements it is necessary to use additional parameter– hierarchy level











Storage structure in ORACLE DBMS SQL clientMOLAP Java client API JDBC OCI ODBC OLE DB CWM or CWM2 OLAP storage (BLOB in a relational table) Star schema Metadata registration Multidimensional core (process in ORACLE core) OLAP DML SQL interface to OLAP (DBMS_AW, OLAP_TABLE, ...) Multidimensional metadata







2024 gtavrl.ru.