What is olap technology. Analytical systems OLAP


The main difference between facts and information is that we receive and take into account data, and we can use information to benefit. Roughly speaking, information is analyzed and systematized data. Thanks to timely information received, many companies manage to survive both the financial crisis and the fiercest competition. It is not enough to collect facts and have all the necessary data. You also need to be able to analyze them. To make it easier for people who have to make important business decisions, various support systems have been developed. It is for this purpose that various complex systems have been developed that allow analyzing large amounts of heterogeneous data and turning them into information useful for the business user. The new field of business analytics aims to improve process control of business systems through the use of data warehousing and technology.

The market for business information systems today offers a diverse selection of solutions that help an enterprise organize management accounting, ensure operational management of production and sales, and carry out effective interaction with customers and suppliers.

A separate niche in the business systems market is occupied by analytical software products designed to support decision-making at the strategic level of enterprise management. The main difference between such tools and operational management systems is that the latter provide enterprise management in the “functioning mode,” that is, the implementation of a well-defined production program, while strategic-level analytical systems help the enterprise management develop decisions in the “development mode.”

The scale of the changes carried out may vary from deep restructuring to partial updating of technologies at individual production sites, but, in any case, decision makers consider development alternatives on which the fate of the enterprise in the long term depends.

No matter how powerful and developed the enterprise information system is, it cannot help in resolving these issues, firstly, because it is configured for stationary, established business processes, and secondly, it does not and cannot contain information for making decisions regarding new areas of business, new technologies, new organizational decisions.

Thanks to OLAP (On-Line Analytical Processing) data processing and analysis technology, any organization can almost instantly (within five seconds) obtain the data necessary for work. OLAP can be briefly defined by five keywords.

FAST (Fast) - this means that the time to search and provide the necessary information takes no more than five seconds. The simplest requests are processed in a second, and few complex requests have a processing time of more than twenty seconds. To achieve this result, various methods are used, from special forms of data storage to extensive pre-computations. This way, you can get a report in a minute that previously took days to prepare.

ANALYSIS (Analytical) says that the system can perform any analysis, both statistical and logical, and then stores it in an accessible form.

SHARED means the system provides the required privacy down to the cell level

MULTIDIMENSIONAL (Multidimensional) is the main characteristic of OLAP. The system must fully support hierarchies and multiple hierarchies, since this is the most logical way to analyze both business and organizational activities.

INFORMATION. The right information must be delivered to where it is needed.

When an organization operates, data related to its field of activity always accumulates, which is sometimes stored in completely different places, and bringing them together is both difficult and time-consuming. It is in order to speed up the acquisition of data to test emerging business hypotheses that the technology of interactive analytical data processing, or OLAP, was developed. The main purpose of such OLAP systems is to quickly respond to arbitrary user requests. This need often arises when developing some important business project, when the developer needs a working hypothesis that has arisen. Most often, the information the user needs should be presented in the form of some kind of dependence - for example, how sales volume depends on the product category, on the sales region, on the time of year, and so on. Thanks to OLAP, he is able to immediately obtain the necessary data in the desired layout for the selected period.

Interactive OLAP technology allows you to transform huge piles of reports and mountains of data into useful and accurate information that will help an employee make an informed business or financial decision at the right time.

In addition, thanks to OLAP, processing efficiency increases, and the user can receive large volumes of sorted (aggregated) information almost instantly. Thanks to OLAP, the user can clearly see how efficiently his organization operates, has the ability to quickly and flexibly respond to external changes, and has the ability to minimize the financial losses of his organization. OLAP provides accurate information that improves the quality of decisions made.

The only drawback of business analysis systems is their high cost. Creating a personal information storage requires both time and a lot of money.

The use of OLAP technology in business allows you to quickly obtain the necessary information, which, at the user’s request, can be presented in the usual form - reports, graphs or tables.

System integration procedures for business structures are based on the use of joint ERP, CRM and SCM solutions. In many cases, systems are supplied by different manufacturers, and the imported data must undergo data harmonization and be presented as heterogeneous data. In a business environment, there is an unambiguous requirement - a complete analysis of data, which involves viewing consolidated reports from different points of view.

Different manufacturers have different data reporting mechanisms. The heterogeneous representation procedure involves extract, transform, and load (ETL). For example, in Microsoft SQL Server 2005 Analysis Services, the problem of data consolidation is implemented using Data Source Views - types of data sources that describe analytical presentation models.

Business applications based on OLAP technologies, examples of products. The most common applications of OLAP technologies are:

Data analysis.

The task for which the most popular OLAP tools were originally used and still remain. A multidimensional data model, the ability to analyze significant amounts of data and quick response to requests make such systems indispensable for analyzing sales, marketing activities, distribution and other tasks with a large volume of source data.

Examples of products: Microsoft Excel Pivot Tables, Microsoft Analysis Services, SAP BW, Oracle Essbase, Oracle OLAP, Cognos PowerPlay, MicroStrategy, Business Objects.

Financial planning-budgeting.

A multidimensional model allows you to simultaneously enter data and easily analyze it (for example, plan fact analysis). Therefore, a number of modern CPM (Corporate Performance Management) products use OLAP% models. An important task is multidimensional reverse calculation (backsolve, breakback, writeback), which allows you to calculate the required changes in detailed cells when the aggregated value changes. It is a tool for what-if analysis, i.e. for playing various options for events during planning.

Product examples: Microsoft PerformancePint, Oracle EPB, Oracle OFA, Oracle Hyperion Planning, SAP SEM, Cognos Enterprise Planning, Geac.

Financial consolidation.

Consolidation of data in accordance with international accounting standards, taking into account ownership shares, different currencies and internal turnover, is an urgent task in connection with the increasingly stringent requirements of inspection bodies (SOX, Basel II) and companies going public. OLAP technologies allow you to speed up the calculation of consolidated reports and increase the transparency of the entire process.

Product examples: Oracle FCH, Oracle Hyperion FM, Cognos Controller.

Data Warehousing and On-Line Analytical Processing (OLAP) technologies
are important elements of business decision support, which are increasingly becoming an integral part of any industry. The use of OLAP technologies as a tool for business analytics gives more control and timely access to strategic
information that facilitates effective decision making.
This provides the opportunity to simulate real-life forecasts and use resources more efficiently. OLAP allows an organization to respond more quickly to market demands.

Bibliography:

1. Erik Thomsen. OLAP Solutions: Building Multidimensional Information Systems Second Edition. Wiley Computer Publishing John Wiley & Sons, Inc., 2002.

2. OLAP council white paper, http://www.olapcouncil.org/research/whtpaply.htm

3. Gerd Stumme and Bernhard Ganter. Formal Concept Analysis _ Mathematical Foundations.

The concept of OLAP technology was formulated by Edgar Codd in 1993.

This technology is based on the construction of multidimensional data sets - so-called OLAP cubes (not necessarily three-dimensional, as one might conclude from the definition). The purpose of using OLAP technologies is to analyze data and present this analysis in a form convenient for management personnel to understand and make decisions based on them.

Basic requirements for applications for multivariate analysis:

  • - providing the user with analysis results in an acceptable time (no more than 5 s);
  • - multi-user access to data;
  • - multidimensional data presentation;
  • - the ability to access any information regardless of its storage location and volume.

OLAP system tools provide the ability to sort and select data according to specified conditions. Various qualitative and quantitative conditions can be specified.

The main data model used in numerous tools for creating and maintaining databases - DBMS - is the relational model. The data in it is presented in the form of a set of two-dimensional relation tables connected by key fields. To eliminate duplication, inconsistency, and reduce labor costs for maintaining databases, a formal apparatus for normalizing entity tables is used. However, its use is associated with additional time spent on generating responses to database queries, although memory resources are saved.

A multidimensional data model represents the object under study in the form of a multidimensional cube; a three-dimensional model is more often used. Dimensions or attribute details are plotted along the axes or faces of the cube. The base attributes are the filling of the cube cells. A multidimensional cube can be represented by a combination of three-dimensional cubes in order to facilitate perception and presentation when generating reporting and analytical documents and multimedia presentations based on analytical work materials in a decision support system.

Within the framework of OLAP technologies, based on the fact that a multidimensional representation of data can be organized both by means of relational DBMSs and multidimensional specialized tools, three types of multidimensional OLAP systems are distinguished:

  • - multidimensional OLAP-MOLAP;
  • - relational OLAP-ROLAP;
  • - mixed or hybrid (Hibrid) OLAP-HOLAP.

In multidimensional DBMSs, data is organized not in the form of relational tables, but in the form of ordered multidimensional arrays in the form of hypercubes, when all stored data must have the same dimension, which means the need to form the most complete basis of dimensions. Data can be organized in the form of polycubes; in this option, the values ​​of each indicator are stored with its own set of dimensions, and data processing is carried out by the system’s own tool. The storage structure in this case is simplified, because there is no need for a data storage area in a multidimensional or object-oriented form. Huge labor costs for creating models and systems for converting data from a relational model to an object model are reduced.

The advantages of MOLAP are:

  • - faster receipt of responses to requests than with ROLAP - the time spent is one or two orders of magnitude less;
  • - SQL limitations make it difficult to implement many built-in functions.

Limitations of MOLAP include:

  • - relatively small database sizes;
  • - due to denormalization and preliminary aggregation, multidimensional arrays use 2.5-100 times more memory than the original data (memory consumption increases exponentially as the number of dimensions increases);
  • - there are no standards for the interface and means of data manipulation;
  • - there are restrictions when loading data.

Labor costs for creating multidimensional data increase sharply, because... In this situation, there are practically no specialized means of objectifying the relational model of data contained in the information repository. Query response times often cannot meet the requirements for OLAP systems.

The advantages of ROLAP systems are:

  • - the ability to quickly analyze the data directly contained in the warehouse, because most source databases are relational;
  • - with a variable dimension of the problem, RO-LAP wins, because no physical reorganization of the database is required;
  • - ROLAP systems can use less powerful client stations and servers, and the servers bear the main burden of processing complex SQL queries;
  • - the level of information protection and differentiation of access rights in relational DBMSs is incomparably higher than in multidimensional ones.

The disadvantage of ROLAP systems is lower productivity, the need for careful development of database schemas, special tuning of indexes, analysis of query statistics and taking into account analysis findings when modifying database schemas, which leads to significant additional labor costs.

Fulfillment of these conditions allows, when using ROLAP systems, to achieve indicators similar to MOLAP systems in terms of access time, as well as surpass them in memory savings.

Hybrid OLAP systems are a combination of tools that implement a relational and multidimensional data model. This allows you to dramatically reduce the resource costs of creating and maintaining such a model and the response time to requests.

This approach uses the advantages of the first two approaches and compensates for their disadvantages. This principle is implemented in the most developed software products for this purpose.

The use of hybrid architecture in OLAP systems is the most appropriate way to solve problems associated with the use of software tools in multidimensional analysis.

The pattern detection mode is based on intelligent data processing. The main task here is to identify patterns in the processes under study, relationships and mutual influence of various factors, search for large “unusual” deviations, and predict the course of various significant processes. This area relates to data mining.

Introduction

Nowadays, almost no organization can do without database management systems, especially among those that are traditionally focused on interacting with customers. Banks, insurance companies, airlines and other transport companies, supermarket chains, telecommunications and marketing firms, organizations engaged in the service sector and others - they all collect and store gigabytes of data about customers, products and services in their databases. The value of such information is undeniable. Such databases are called operational or transactional because they are characterized by a huge number of small transactions, or write-read operations. Computer systems that record transactions and actually access transaction databases are usually called online transaction processing systems (OLTP - On-Line Transactional Processing) or accounting systems.

Accounting systems are configured and optimized to perform the maximum number of transactions in short periods of time. Typically, individual transactions are very small and unrelated to each other. However, each data record characterizing interaction with a client (a call to the support service, a cash transaction, an order from a catalog, a visit to the company’s website, etc.) can be used to obtain qualitatively new information, namely to create reports and analyze the company’s activities .

The range of analytical functions in accounting systems is usually very limited. The schemas used in OLTP applications make it difficult to create even simple reports because the data is often distributed across multiple tables and requires complex join operations to aggregate them. In general, attempting to create complex reports requires a lot of processing power and results in lost performance.

In addition, accounting systems store constantly changing data. As transactions are collected, the total values ​​change very quickly, so two analyzes performed within a few minutes of each other may yield different results. Most often, the analysis is performed at the end of the reporting period, otherwise the picture may be distorted. In addition, the data required for analysis can be stored in several systems.

Some analyzes require structural changes that are not feasible in the current operating environment. For example, you need to find out what will happen if the company introduces new products. Such research cannot be carried out on a live base. Consequently, effective analysis can rarely be performed directly in the accounting system.

Decision support systems usually have the means to provide the user with aggregate data for various samples from the original set in a form convenient for perception and analysis. Typically, such aggregate functions form a multidimensional (and therefore non-relational) data set (often called a hypercube or metacube), the axes of which contain parameters, and the cells contain aggregate data that depends on them - and such data can also be stored in relational tables. Along each axis, data can be organized into a hierarchy, representing different levels of detail. Thanks to this data model, users can formulate complex queries, generate reports, and obtain subsets of data.

This is precisely what led to interest in decision support systems, which have become the main area of ​​application of OLAP (On-Line Analytical Processing, operational analytical processing, operational data analysis), turning the “ore” of OLTP systems into a finished “product” that managers and analysts can directly use. This method allows analysts, managers and executives to gain insight into accumulated data through rapid and consistent access to a wide range of information views.

The purpose of the course work is to examine OLAP technology.

multidimensional analytical data processing

Main part

1 Basic information about OLAP

The OLAP concept is based on the principle of multidimensional data representation. Edgar Codd coined the term OLAP in 1993. Having examined the shortcomings of the relational model, he first of all pointed out the impossibility of “combining, viewing and analyzing data from the point of view of multiple dimensions, that is, in the most understandable way for corporate analysts,” and defined the general requirements for OLAP systems that expand the functionality of relational DBMSs and include multidimensional analysis as one of its characteristics.

In a large number of publications, the acronym OLAP denotes not only a multidimensional view of data, but also the storage of the data itself in a multidimensional database. Generally speaking, this is not true, as Codd himself notes that “Relational databases were, are and will be the most suitable technology for storing enterprise data. The need is not for new database technology, but rather for analysis tools that complement the functions of existing DBMSs and sufficient flexible to enable and automate the various types of mining inherent in OLAP." Such confusion leads to oppositions like "OLAP or ROLAP", which is not entirely correct, since ROLAP (relational OLAP) at the conceptual level supports all the functionality defined by the term OLAP. It seems more preferable to use the special term MOLAP for OLAP based on multidimensional DBMSs. According to Codd, a multi-dimensional conceptual view is a multiple perspective consisting of several independent dimensions along which specific sets of data can be analyzed. Simultaneous analysis across multiple dimensions is defined as multivariate analysis. Each dimension includes areas of data consolidation, consisting of a series of successive levels of generalization, where each higher level corresponds to a greater degree of data aggregation for the corresponding dimension. Yes, measurement.

The contractor can be determined by the direction of consolidation, consisting of the levels of generalization “enterprise - division - department - employee”. The Time dimension can even include two consolidation directions - “year - quarter - month - day” and “week - day”, since counting time by month and by week is incompatible. In this case, it becomes possible to arbitrarily select the desired level of detail of information for each of the dimensions. The drilling down operation corresponds to the movement from higher to lower stages of consolidation; on the contrary, the operation of rolling up means movement from lower levels to higher ones.

Codd defined 12 rules that an OLAP class software product must satisfy.

1.2 Requirements for online analytical processing tools

Multi-Dimensional Conceptual View. The conceptual representation of the data model in an OLAP product must be multidimensional in nature, that is, allow analysts to perform intuitive operations of “slice and dice,” rotate, and pivot directions of consolidation. Transparency. The user does not need to know what specific tools are used to store and process data, how the data is organized or where it comes from.

Accessibility. The analyst must be able to perform analysis within the framework of a common conceptual framework, but the data can remain under the control of legacy DBMSs while being tied to a common analytical model. That is, the OLAP tool must overlay its logical schema on the physical data sets, performing all the transformations required to provide a single, consistent, and holistic view of the user's information.

Consistent Reporting Performance. As the number of dimensions and database sizes increase, analysts should not experience any performance degradation. Sustained performance is necessary to maintain the ease of use and freedom from complexity that is required to bring OLAP to the end user.

Client - server architecture (Client-Server Architecture). Most of the data that requires rapid analytical processing is stored in mainframe systems and retrieved from personal computers. Therefore, one of the requirements is the ability of OLAP products to work in a client-server environment. The main idea here is that the server component of an OLAP tool must be intelligent enough and have the ability to build a common conceptual schema by summarizing and consolidating the various logical and physical schemas of enterprise databases to provide a transparent effect.

Generic Dimensionality. All data dimensions must be equal. Additional functionality can be provided to individual dimensions, but since they are all symmetrical, this additional functionality can be provided to any dimension. The underlying data structure, formulas, and reporting formats should not rely on any one dimension.

Dynamic Sparse Matrix Handling. An OLAP tool must provide optimal processing of sparse matrices. The access speed must be maintained regardless of the location of the data cells and be constant for models with different numbers of dimensions and different data sparsity.

Multi-User Support. Often, several analysts need to work simultaneously with one analytical model or create different models based on the same corporate data. The OLAP tool must provide them with concurrent access and ensure data integrity and security.

Unrestricted Cross-dimensional Operations support. Computations and manipulation of data along any number of dimensions must not prohibit or restrict any relationships between data cells. Transformations that require arbitrary definition must be specified in a functionally complete formula language.

Intuitive Data Manipulation. Reorientation of consolidation directions, data detailing in columns and rows, aggregation and other manipulations inherent in the structure of the hierarchy of consolidation directions should be performed in the most convenient, natural and comfortable user interface.

Flexible reporting mechanism. Various ways of visualizing data must be supported, that is, reports must be presented in every possible orientation.

Unlimited Dimensions and Aggregation Levels. It is strongly recommended that every serious OLAP tool assume at least fifteen, and preferably twenty, dimensions in the analytical model.

2 Components of OLAP systems

2.1 Server. Client. Internet

OLAP allows you to perform fast and efficient analysis on large volumes of data. Data is stored in a multi-dimensional form that most closely reflects the natural state of real-world business data. OLAP also provides users with the ability to aggregate data faster and easier. With it, they can drill down into the content of this data if necessary to obtain more detailed information.

An OLAP system consists of many components. At the highest level of presentation, the system includes a data source, an OLAP server and a client. A data source represents the source from which the data for analysis is taken. Data from the source is transferred or copied to the OLAP server, where it is systematized and prepared for faster generation of responses to queries. The client is the user interface to the OLAP server. This section of the article describes the functions of each component and the significance of the entire system as a whole. Sources. The source in OLAP systems is the server that supplies data for analysis. Depending on the area of ​​use of the OLAP product, the source may be a Data Warehouse, a legacy database containing common data, a set of tables combining financial data, or any combination of the above. The ability of an OLAP product to work with data from various sources is very important. Requiring a single format or a single database in which all source data is stored is not suitable for database administrators. In addition, this approach reduces the flexibility and power of the OLAP product. Administrators and users believe that OLAP products that can retrieve data from not only multiple sources but multiple sources are more flexible and useful than those with more stringent requirements.

Server. The application part of the OLAP system is the OLAP server. This component does all the work (depending on the system model), and stores all the information to which active access is provided. Server architecture is governed by various concepts. In particular, the main functional characteristic of an OLAP product is the use of a multidimensional (MMDB) or relational (RDB) database for data storage. Aggregated/Pre-aggregated data

Fast query implementation is an imperative for OLAP. This is one of the basic principles of OLAP - the ability to intuitively manipulate data requires rapid retrieval of information. In general, the more calculations that must be made to obtain a piece of information, the slower the response. Therefore, in order to keep query implementation time short, pieces of information that are usually accessed most often, but which also require calculation, are subject to preliminary aggregation. That is, they are counted and then stored in the database as new data. An example of the type of data that can be calculated in advance is summary data - for example, sales figures for months, quarters or years, for which the actual data entered is daily figures.

Different vendors have different methods for selecting parameters, requiring pre-aggregation and the number of pre-calculated values. The aggregation approach affects both the database and query execution time. If more values ​​are being calculated, the likelihood that the user will request a value that has already been calculated increases, and therefore response time will be reduced by not having to request the original value to be calculated. However, if you calculate all possible values ​​- this is not the best solution - in this case the size of the database increases significantly, which will make it unmanageable, and the aggregation time will be too long. In addition, when numerical values ​​are added to the database, or if they change, this information must be reflected in pre-calculated values ​​that depend on the new data. Thus, updating the database can also take a long time in the case of a large number of pre-calculated values. Since the database typically runs offline during aggregation, it is desirable that the aggregation time is not too long.

Client. The client is exactly what is used to present and manipulate data in the database. The client can be quite simple - in the form of a table that includes such OLAP capabilities as, for example, data rotation (pivoting) and deepening into data (drilling), and represent a specialized, but equally simple report viewer or be such as powerful a tool as a custom application designed for complex data manipulation. The Internet is a new form of client. In addition, it bears the stamp of new technologies; Many Internet solutions differ significantly in their capabilities in general and as an OLAP solution in particular. This section discusses the various functional properties of each client type.

Despite the fact that the server is the “backbone” of an OLAP solution, the client is no less important. The server can provide a strong foundation to facilitate data manipulation, but if the client is complex or limited in functionality, the user will not be able to take full advantage of the powerful server. The customer is so important that many vendors focus their efforts solely on customer development. Everything that is included in these applications is a standard look at the interface, predefined functions and structure, and quick solutions for more or less standard situations. For example, financial packages are popular. Pre-built financial applications allow professionals to use familiar financial tools without having to design a database structure or conventional forms and reports. Query Tool/Report Generator. A query tool or report generator offers easy access to OLAP data. They have an easy-to-use graphical interface and allow users to create reports by drag-and-dropping objects into the report. While a traditional report generator provides the user with the ability to quickly produce formatted reports, OLAP-enabled report generators produce up-to-date reports. The final product is a report that has the ability to drill down into data to the level of detail, rotate (pivot) reports, support hierarchies, etc. Add-Ins (additions) of spreadsheets.

Today, many lines of business use spreadsheets to perform various forms of corporate data analysis. In some ways, it is an ideal tool for creating reports and viewing data. The analyst can create macros that manipulate data in a specific direction, and the template can be designed so that when data is entered, the formulas calculate the correct values, eliminating the need to repeatedly enter simple calculations.

However, all this results in a "flat" report, which means that once it is created, it is difficult to examine it from different aspects. For example, a chart displays information for a certain time period, say, a month. And if one wants to see daily data (as opposed to monthly data), an entirely new chart will need to be created. There are new data sets to define, new labels to add to the chart, and many other simple but time-consuming changes to make. In addition, there are a number of areas where errors can be made, which overall reduces reliability. When OLAP is added to a table, it is possible to create a single chart and then manipulate it in various ways to provide the user with the information they need without the burden of creating every possible view. Internet as a client. The newest member of the OLAP client family is the Internet. There are many advantages to generating OLAP reports over the Internet. The most significant is the lack of need for specialized software to access information. This saves the company a lot of time and money.

Each Internet product is specific. Some make it easier to create Web pages, but have less flexibility. Others let you create views of the data and then save them as static HTML files. All this makes it possible to view data via the Internet, but nothing more. It is impossible to actively manipulate data with their help.

There is another type of product - interactive and dynamic, which turns such products into fully functional tools. Users can drill into data, pivot, limit dimensions, and more. Before choosing a Web enabler, it is important to understand what functionality is required from a Web solution and then determine which product will best implement that functionality.

Applications. Applications are a type of client that uses OLAP databases. They are identical to the query tools and report generators described above, but in addition they bring greater functionality to the product. The application is generally more powerful than the query tool.

Development. Typically, OLAP providers provide a development environment for users to create their own customized applications. The development environment as a whole is a graphical interface that supports object-oriented application development. In addition, most vendors provide an API that can be used to integrate OLAP databases with other applications.

2.2 OLAP clients

OLAP clients with a built-in OLAP engine are installed on users' PCs. They require no server for computing and have zero administration involved. Such clients allow the user to tune into his existing databases; As a rule, a dictionary is created that hides the physical structure of the data behind its subject description, understandable to a specialist. After this, the OLAP client executes arbitrary queries and displays the results in an OLAP table. In this table, in turn, the user can manipulate the data and receive hundreds of different reports on the screen or on paper. OLAP clients designed to work with RDBMSs allow you to analyze data already available in the corporation, for example, stored in an OLTP database. However, their second purpose may be to quickly and cheaply create warehouses or data marts - in this case, the organization's programmers only need to create sets of star tables in relational databases and data loading procedures. The most time-consuming part of the work - writing interfaces with numerous options for custom queries and reports - is implemented in the OLAP client in just a few hours. The end user needs about 30 minutes to master such a program. OLAP clients are supplied by database developers themselves, both multidimensional and relational. These are SAS Corporate Reporter, which is almost a standard product in terms of convenience and beauty, Oracle Discoverer, a set of programs MS Pivot Services and Pivot Table, etc. Many programs designed to work with MS OLAP Services are delivered as part of the “OLAP to the Masses” campaign, which conducted by Microsoft Corporation. As a rule, they are improved versions of Pivot Table and are designed for use in MS Office or a Web browser. These are products from Matryx, Knosys, etc., which, due to their simplicity, cheapness and efficiency, have gained immense popularity in the West.

3 Classification of OLAP products

3.1 Multidimensional OLAP

Currently, there are a large number of products on the market that provide OLAP functionality to one degree or another. Providing a multidimensional conceptual view from the user interface to the source database, all OLAP products are divided into three classes based on the type of source database.

1. The very first online analytical processing systems (for example, Essbase from Arbor Software, Oracle Express Server from Oracle) belonged to the MOLAP class, that is, they could only work with their own multidimensional databases. They are based on proprietary technologies for multidimensional DBMSs and are the most expensive. These systems provide a full cycle of OLAP processing. They either include, in addition to the server component, their own integrated client interface, or use external spreadsheet programs to communicate with the user. To maintain such systems, a special staff of employees is required to install, maintain the system, and create data views for end users.

2. Relational online analytical processing (ROLAP) systems allow data stored in a relational database to be represented in multidimensional form, providing transformation of information into a multidimensional model through an intermediate metadata layer. This class includes MicroStrategy's DSS Suite, Informix's MetaCube, Information Advantage's DecisionSuite, and others. The InfoVisor software package, developed in Russia, at the Ivanovo State Energy University, is also a system of this class. ROLAP systems are well suited for working with large storage facilities. Like MOLAP systems, they require significant maintenance by information technology specialists and involve multi-user operation.

3. Finally, hybrid systems (Hybrid OLAP, HOLAP) are designed to combine the advantages and minimize the disadvantages inherent in previous classes. Speedware's Media/MR falls into this class. According to the developers, it combines the analytical flexibility and speed of response of MOLAP with the constant access to real data inherent in ROLAP.

In addition to the listed tools, there is another class - tools for generating queries and reports for desktop PCs, supplemented with OLAP functions or integrated with external tools that perform such functions. These well-developed systems retrieve data from original sources, transform it and place it into a dynamic multi-dimensional database running on the end user's client station. The main representatives of this class are BusinessObjects from the company of the same name, BrioQuery from Brio Technology, and PowerPlay from Cognos. An overview of some OLAP products is provided in the appendix.

In specialized DBMSs based on a multidimensional representation of data, data is organized not in the form of relational tables, but in the form of ordered multidimensional arrays:

1) hypercubes (all cells stored in the database must have the same dimension, that is, be in the most complete measurement basis) or

2) polycubes (each variable is stored with its own set of measurements, and all associated processing difficulties are transferred to the internal mechanisms of the system).

The use of multidimensional databases in online analytical processing systems has the following advantages.

1. In the case of using multidimensional DBMS, searching and retrieving data is much faster than with a multidimensional conceptual view of a relational database, since the multidimensional database is denormalized, contains pre-aggregated indicators and provides optimized access to the requested cells.

2. Multidimensional DBMSs easily cope with the tasks of including various built-in functions in the information model, while objectively existing limitations of the SQL language make performing these tasks on the basis of relational DBMSs quite difficult and sometimes impossible.

On the other hand, there are significant limitations.

1. Multidimensional DBMSs do not allow working with large databases. In addition, due to denormalization and pre-executed aggregation, the volume of data in a multidimensional database, as a rule, corresponds (according to Codd’s estimate) to 2.5-100 times less than the volume of the original detailed data.

2. Multidimensional DBMSs, compared to relational ones, use external memory very inefficiently. In the vast majority of cases, the information hypercube is highly sparse, and since the data is stored in an ordered form, undefined values ​​can only be removed by choosing the optimal sort order, which allows you to organize the data into the largest possible contiguous groups. But even in this case, the problem is only partially solved. In addition, the optimal sort order for storing sparse data will most likely not be the order that is most often used in queries. Therefore, in real systems it is necessary to seek a compromise between performance and redundancy of disk space occupied by the database.

Consequently, the use of multidimensional DBMS is justified only under the following conditions.

1. The volume of initial data for analysis is not too large (no more than a few gigabytes), that is, the level of data aggregation is quite high.

2. The set of information dimensions is stable (since any change in their structure almost always requires a complete restructuring of the hypercube).

3. The system's response time to unregulated requests is the most critical parameter.

4. Requires extensive use of complex built-in functions to perform cross-dimensional calculations on hypercube cells, including the ability to write custom functions.

The direct use of relational databases in online analytical processing systems has the following advantages.

1. In most cases, corporate data warehouses are implemented using relational DBMS, and ROLAP tools allow analysis directly on them. At the same time, the storage size is not such a critical parameter as in the case of MOLAP.

2. In the case of a variable dimension of the problem, when changes to the measurement structure have to be made quite often, ROLAP systems with dynamic representation of dimensions are the optimal solution, since such modifications do not require physical reorganization of the database.

3. Relational DBMSs provide a significantly higher level of data protection and good options for delineating access rights.

The main disadvantage of ROLAP compared to multidimensional DBMSs is lower performance. To provide performance comparable to MOLAP, relational systems require careful design of the database schema and configuration of indexes, that is, a lot of effort on the part of database administrators. Only by using star schemas can the performance of well-tuned relational systems approach that of systems based on multidimensional databases.

The works are entirely devoted to the description of the star schema and recommendations for its use. The idea is that there are tables for each dimension, and all the facts are placed in one table, indexed by a multiple key made up of the keys of the individual dimensions (Appendix A). Each ray of the star diagram specifies, in Codd's terminology, the direction of data consolidation along the corresponding dimension.

In complex problems with multi-level dimensions, it makes sense to turn to extensions of the star schema - the constellation schema (fact constellation schema) and the snowflake schema (snowflake schema). In these cases, separate fact tables are created for possible combinations of summary levels of different dimensions (Appendix B). This allows for better performance, but often leads to data redundancy and significant complexity in the database structure, which contains a huge number of fact tables.

An increase in the number of fact tables in a database can stem not only from the multiplicity of levels of different dimensions, but also from the fact that, in general, facts have different sets of dimensions. When abstracting from individual measurements, the user must receive a projection of the most complete hypercube, and the values ​​of the indicators in it are not always the result of elementary summation. Thus, with a large number of independent dimensions, it is necessary to maintain many fact tables corresponding to each possible combination of dimensions selected in the query, which also leads to wasteful use of external memory, increased time for loading data into the star schema database from external sources, and administration difficulties.

Extensions to the SQL language partially solve this problem (the GROUP BY CUBE, GROUP BY ROLLUP, and GROUP BY GROUPING SETS statements); in addition, a mechanism is proposed to find a compromise between redundancy and performance, recommending creating fact tables not for all possible combinations of dimensions , but only for those whose cell values ​​cannot be obtained using subsequent aggregation of more complete fact tables (Appendix B).

In any case, if the multidimensional model is implemented as a relational database, you should create long and "narrow" fact tables and relatively small and "wide" dimension tables. Fact tables contain the numerical values ​​of the hypercube cells, and the remaining tables define the multidimensional measurement basis containing them. Some information can be obtained using dynamic aggregation of data distributed over non-star normalized structures, although it should be remembered that queries involving aggregation in a highly normalized database structure can be quite slow.

Focusing on the representation of multidimensional information using star-shaped relational models allows us to get rid of the problem of optimizing the storage of sparse matrices, which is acute for multidimensional DBMSs (where the sparsity problem is solved by a special choice of schema). Although an entire record is used to store each cell, which, in addition to the values ​​themselves, includes secondary keys - links to dimension tables, non-existent values ​​are simply not included in the fact table.

Conclusion

Having considered the issues of operation and application of OLAP technology, companies face questions, the answers to which will allow them to choose a product that best meets the user’s needs.

These are the following questions:

Where does the data come from? – The data to be analyzed may be located in different places. It is possible that the OLAP database will receive them from an enterprise data warehouse or from an OLTP system. If the OLAP product already has the ability to access some data source, the processes of categorizing and cleaning data are reduced.

What manipulations does the user perform on the data? -
Once a user has accessed the database and started performing analysis, it is important that they are able to manipulate the data appropriately. Depending on the user's needs, it may be that a powerful report generator or the ability to create and host dynamic web pages is needed. However, it may be preferable for the user to have at his disposal a means of easily and quickly creating his own applications.

What is the total amount of data? - This is the most important factor when defining an OLAP database. Relational OLAP products can handle large amounts of data better than multidimensional ones. If the volume of data does not require the use of a relational database, a multidimensional product can be used with equal success.

Who is the user? - When defining an OLAP system client, the user's skill level is important. Some users are more comfortable integrating OLAP with a table, while others will prefer a dedicated application. Depending on the user’s qualifications, the issue of conducting training is also decided. A large company may be willing to pay for user training, a smaller company may refuse it. The client should be such that users feel confident and can use it effectively.

Today, most of the world's companies have moved to using OLAP as the core technology to provide information to decision makers. Therefore, the fundamental question to ask is whether spreadsheets should continue to be used as the primary platform for reporting, budgeting and forecasting. Companies must ask themselves whether they are willing to lose competitive advantage by using inaccurate, irrelevant and incomplete information before they mature enough to consider alternative technologies.

Also, in conclusion, it should be noted that the analytical capabilities of OLAP technologies increase the usefulness of data stored in a corporate information warehouse, allowing a company to interact more effectively with its customers.

Glossary

Concept Definition
1 BI tools Tools and technologies used to access information. Includes OLAP technologies, data mining and complex analysis; end-user tools and ad hoc query tools, business monitoring dashboards and corporate reporting generators.
2 On-line Analytical Processing, OLAP (Operational Analytical Processing) Technology for analytical processing of information in real time, including the compilation and dynamic publication of reports and documents.
3 Slice and Dice (Longitudinal and transverse sections, literally - “cutting into slices and cubes”) A term used to describe the complex data analysis functionality provided by OLAP tools. Retrieving data from a multidimensional cube with specified values ​​and a specified relative arrangement of dimensions.
4 Data Pivot The process of rotating a table of data, that is, converting columns to rows and vice versa.
5 Calculated member A measurement element whose magnitude is determined by the magnitudes of other elements (for example, in mathematical or logical applications). The calculated element may be part of an OLAP server or described by the user during an interactive session. A calculated element is any element that is not entered but is calculated.
6 Global Business Models A type of Data Warehouse that provides access to information that is distributed across various enterprise systems and is under the control of different divisions or departments with different databases and data models. This type of Data Warehouse is difficult to build due to the need to combine the efforts of users from different departments to develop a common data model for the Warehouse.
7 Data Mining Technical techniques using software tools designed for such a user who, as a rule, cannot say in advance what exactly he is looking for, but can only indicate certain patterns and directions of search.
8 Client/Server A technological approach that consists in dividing the process into separate functions. The server performs several functions - managing communications, providing database maintenance, etc. The client performs individual user functions - providing appropriate interfaces, performing cross-screen navigation, providing help functions, etc.
9 Multi-dimensional Database, MDBS and MDBMS A powerful database that allows users to analyze large volumes of data. A database with a special storage organization - cubes, providing high speed work with data stored as a set of facts, dimensions and pre-calculated aggregates.
10 Drill Down A method of examining detailed data used to analyze the summary level of data. The levels of "deepening" depend on the level of detail of the data in [ran.
11 Central Warehouse

1. A database containing data collected from an organization's operating systems. It has a structure convenient for data analysis. Designed to support decision making and create a unified information space for the corporation.

2. An automation method that covers all information systems managed from one place.

1 Golitsina O.L., Maksimov N.V., Popov I.I. Databases: Textbook. – M.: FORUM: INFRA-M, 2003. – 352 p.

2 Date K. Introduction to database systems. – M.: Nauka, 2005 – 246 p.

3 Elmanova N.V., Fedorov A.A. Introduction to Microsoft OLAP technologies. – M.: Dialogue-MEPhI, 2004. – 312 p.

4 Karpova T.S. Databases: models, development, implementation. – St. Petersburg: Peter, 2006. – 304 p.

5 Korovkin S. D., Levenets I. A., Ratmanova I. D., Starykh V. A., Shchavelev L. V. Solution to the problem of complex operational analysis of information from data warehouses // DBMS. - 2005. - No. 5-6. - 47-51 s.

6 Krechetov N., Ivanov P. Products for data mining ComputerWeek-Moscow. - 2003. - No. 14-15. - 32-39 s.

7 Przhiyalkovsky V.V. Complex analysis of large-volume data: new prospects for computerization // DBMS. - 2006. - No. 4. - 71-83 p.

8 Sakharov A. A. Concept of construction and implementation of information systems focused on data analysis // DBMS. - 2004. - No. 4. - 55-70 p.

9 Ullman J. Fundamentals of database systems. – M.: Finance and Statistics, 2003. – 312 p.

10 Hubbard J. Automated database design. – M.: Mir, 2007. – 294 p.


Korovkin S. D., Levenets I. A., Ratmanova I. D., Starykh V. A., Shchavelev L. V. Solution of the problem of complex operational analysis of information from data warehouses // DBMS. - 2005. - No. 5-6. - 47-51 s.

Ullman J. Fundamentals of database systems. – M.: Finance and Statistics, 2003. – 312 p.

Barseghyan A.A., Kupriyanov M.S. Data analysis technologies: DataMining, VisualMining, TextMining, Olap. – St. Petersburg: BHV-Petersburg, 2007. – 532 p.

Elmanova N.V., Fedorov A.A. Introduction to Microsoft OLAP technologies. – M.: Dialogue-MEPhI, 2004. – 312 p.

Date K. Introduction to database systems. – M.: Nauka, 2005 – 246 p.

Golitsina O.L., Maksimov N.V., Popov I.I. Databases: Textbook. – M.: FORUM: INFRA-M, 2003. – 352 p.

Sakharov A. A. Concept of construction and implementation of information systems focused on data analysis // DBMS. - 2004. - No. 4. - 55-70 p.

Przhiyalkovsky V.V. Complex analysis of large-volume data: new prospects for computerization // DBMS. - 2006. - No. 4. - 71-83 p.

Purpose of the report

This report will focus on one of the categories of intelligent technologies that are a convenient analytical tool - OLAP technologies.

The purpose of the report: to reveal and highlight 2 issues: 1) the concept of OLAP and their applied importance in financial management; 2) implementation of OLAP functionality in software solutions: differences, capabilities, advantages, disadvantages.

I would like to note right away that OLAP is a universal tool that can be used in any application area, and not just in finance (as can be understood from the title of the report), which requires data analysis using various methods.

Financial management

Financial management is an area in which analysis is more important than any other. Any financial and management decision arises as a result of certain analytical procedures. Today, financial management is becoming important for the successful functioning of an enterprise. Despite the fact that financial management is an auxiliary process in an enterprise, it requires special attention, since erroneous financial and managerial decisions can lead to large losses.

Financial management is aimed at providing the enterprise with financial resources in the required volumes, at the right time and in the right place in order to obtain the maximum effect from their use through optimal distribution.

It is perhaps difficult to define the level of “maximum resource efficiency”, but in any case,

The CFO should always know:

  • How many financial resources are available?
  • Where will the funds come from and in what quantities?
  • where to invest more effectively and why?
  • and at what points in time does all this need to be done?
  • how much is needed to ensure normal operation of the enterprise?

To obtain reasonable answers to these questions, it is necessary to have, analyze and know how to analyze a sufficiently large number of performance indicators. In addition, FU covers a huge number of areas: cash flow analysis (cash flow), analysis of assets and liabilities, profitability analysis, margin analysis, profitability analysis, assortment analysis.

Knowledge

Therefore, a key factor in the effectiveness of the financial management process is the availability of knowledge:

  • Personal knowledge in the subject area (one might say theoretical and methodological), including experience, intuition of a financier/finance director
  • General (corporate) knowledge or systematic information about the facts of financial transactions in an enterprise (i.e. information about the past, present and future state of the enterprise, presented in various indicators and measurements)

If the first lies in the scope of actions of this financier (or the HR director who hired this employee), then the second should be purposefully created at the enterprise by the joint efforts of employees of financial and information services.

What is there now

However, now a paradoxical situation is typical in enterprises: there is information, there is a lot of it, too much. But it is in a chaotic state: unstructured, inconsistent, fragmented, not always reliable and often erroneous, it is almost impossible to find and obtain. A lengthy and often useless generation of mountains of financial statements is carried out, which is inconvenient for financial analysis and difficult to understand, since it is created not for internal management, but for submission to external regulatory authorities.

According to the results of a study conducted by the company Reuters Among 1,300 international managers, 38% of respondents say they spend a lot of time trying to find the information they need. It turns out that a highly qualified specialist spends highly paid time not on data analysis, but on collecting, searching and systematizing the information necessary for this analysis. At the same time, managers are overloaded with data that is often irrelevant, which again reduces the effectiveness of their work. The reason for this situation: excess information and lack of knowledge.

What to do

Information must be turned into knowledge. For modern business, valuable information, its systematic acquisition, synthesis, exchange, use is a kind of currency, but in order to receive it, it is necessary to manage information, like any business process.

The key to information management is delivering the right information in the right form to stakeholders within the organization at the right time. The goal of such management is to help people work better together using increasing amounts of information.

Information technology in this case acts as a means by which it would be possible to systematize information in an enterprise, provide certain users with access to it and give them the tools to transform this information into knowledge.

Basic concepts of OLAP technologies

OLAP technologies (from the English On-Line Analytical Processing) is the name not of a specific product, but of an entire technology for the operational analysis of multidimensional data accumulated in a warehouse. In order to understand the essence of OLAP, it is necessary to consider the traditional process of obtaining information for decision making.

Traditional decision support system

Here, of course, there can also be many options: complete information chaos or the most typical situation when the enterprise has operational systems with the help of which the facts of certain operations are recorded and stored in databases. To extract data from databases for analytical purposes, a system of queries for specific data samples has been built.

But this method of decision support lacks flexibility and has many disadvantages:

  • negligible amount of data is used that can be useful for decision making
  • sometimes complex multi-page reports are created, of which 1-2 lines are actually used (the rest is just in case) - information overload
  • slow response of the process to changes: if a new data representation is needed, the request must be formally described and coded by the programmer, only then executed. Waiting time: hours, days. Or perhaps a solution is needed now, immediately. But after receiving new information, a new question will arise (clarifying)

If query reports are presented in a one-dimensional format, then business problems are usually multidimensional and multifaceted. If you want to get a clear picture of a company's business, then you need to analyze data from various perspectives.

Many companies create excellent relational databases, ideally organizing mountains of unused information, which in itself does not provide either a quick or sufficiently competent response to market events. YES - relational databases were, are and will be the most suitable technology for storing corporate data. We are not talking about new database technology, but rather about analysis tools that complement the functions of existing DBMSs and are flexible enough to provide and automate the various types of intellectual analysis inherent in OLAP.

Understanding OLAP

What does OLAP provide?

  • Advanced storage data access tools
  • Dynamic interactive data manipulation (rotation, consolidation or drill-down)
  • Clear visual display of data
  • Fast – analysis is carried out in real time
  • Multidimensional data presentation - simultaneous analysis of a number of indicators along several dimensions

To get the effect of using OLAP technologies, you must: 1) understand the essence of the technologies themselves and their capabilities; 2) clearly define what processes need to be analyzed, what indicators they will be characterized by and in what dimensions it is advisable to see them, i.e. create an analysis model.

The basic concepts that OLAP technologies operate on are as follows:

Multidimensionality

To understand the multidimensionality of the data, you should first present a table showing, for example, the performance of Enterprise Costs by economic elements and business units.

This data is presented in two dimensions:

  • article
  • business unit

This table is not informative, as it shows sales for one specific period of time. For different time periods, analysts will have to compare several tables (for each time period):

The figure shows a 3rd dimension, Time, in addition to the first two. (Article, business unit)

Another way to show multidimensional data is to represent it in the form of a cube:

OLAP cubes allow analysts to obtain data at various slices to obtain answers to questions posed by the business:

  • Which costs are critical in which business units?
  • How do business unit costs change over time?
  • How do cost items change over time?

Answers to such questions are necessary for making management decisions: on the reduction of certain cost items, the impact on their structure, identifying the reasons for changes in costs over time, deviations from the plan and their elimination - optimizing their structure.

In this example, only 3 dimensions are considered. It's difficult to depict more than 3 dimensions, but it works in the same way as with 3 dimensions.

Typically, OLAP applications allow you to obtain data on 3 or more dimensions, for example, you can add one more dimension - Plan-Actual, Cost Category: direct, indirect, by Orders, by Months. Additional dimensions allow you to obtain more analytical slices and provide answers to questions with multiple conditions.

Hierarchy

OLAP also allows analysts to organize each dimension into a hierarchy of groups, subgroups, and totals that reflect the measure across the entire organization—the most logical way to analyze a business.

For example, it is advisable to group costs hierarchically:

OLAP allows analysts to look at the overall summary metric (at the top level) and then drill down to the bottom and subsequent levels to discover the exact reason why the metric changed.

By allowing analysts to use multiple dimensions in a data cube, with the ability to hierarchically construct dimensions, OLAP provides a picture of the business that is not compressed by the information warehouse structure.

Changing directions of analysis in a cube (rotating data)

As a rule, they operate in concepts: dimensions specified in columns, rows (there may be several of them), the rest form slices, the contents of the table form dimensions (sales, costs, cash)

Typically, OLAP allows you to change the orientation of cube dimensions, thereby presenting the data in different views.

The display of cube data depends on:

  • dimension orientations: which dimensions are specified in rows, columns, slices;
  • groups of indicators, highlighted in rows, columns, sections.
  • Changing dimensions is within the scope of the user's actions.

Thus, OLAP allows you to carry out various types of analysis and understand their relationships with their results.

  • Deviation analysis is an analysis of plan implementation, which is supplemented by factor analysis of the causes of deviations by detailing the indicators.
  • Dependency analysis: OLAP allows you to identify various dependencies between various changes, for example, when beer was removed from the assortment during the first two months, a drop in roach sales was discovered.
  • Comparison (comparative analysis). Comparison of the results of changes in an indicator over time, for a given group of goods, in different regions, etc.
  • Analysis of dynamics allows us to identify certain trends in changes in indicators over time.

Efficiency: we can say that OLAP is based on the laws of psychology: the ability to process information requests in “real time” - at the pace of the process of analytical comprehension of data by the user.

If a relational database can read about 200 records per second and write 20, then a good OLAP server, using calculated rows and columns, can consolidate 20,000-30,000 cells (equivalent to one record in a relational database) per second.

Visibility: It should be emphasized that OLAP provides advanced means of graphical presentation of data to the end user. The human brain is capable of perceiving and analyzing information that is presented in the form of geometric images, in a volume that is several orders of magnitude greater than information presented in alphanumeric form. Example: Let's say you need to find a familiar face in one of a hundred photographs. I believe this process will take you no more than a minute. Now imagine that instead of photographs you will be offered a hundred verbal descriptions of the same persons. I think that you will not be able to solve the proposed problem at all.

Simplicity: The main feature of these technologies is that they are intended for use not by a specialist in the field of information technology, not by an expert statistician, but by a professional in the applied field - a credit department manager, a budget department manager, and finally a director. They are designed for the analyst to communicate with the problem, not with the computer..

Despite the great capabilities of OLAP (in addition, the idea is relatively old - the 60s), its actual use is practically never found in our enterprises. Why?

  • there is no information or the possibilities are not clear
  • habit of thinking two-dimensionally
  • price barrier
  • excessive technological content of articles devoted to OLAP: unusual terms are scary - OLAP, “data mining and slicing”, “ad hoc queries”, “identifying significant correlations”

Our approach and Western ones to the use of OLAP

In addition, we also have a specific understanding of the application utility of OLAP even while understanding its technological capabilities.

Our and Russian authors of various materials devoted to OLAP express the following opinion regarding the usefulness of OLAP: most perceive OLAP as a tool that allows you to expand and collapse data simply and conveniently, carrying out manipulations that come to the analyst’s mind during the analysis process. The more “slices” and “sections” of data the analyst sees, the more ideas he has, which, in turn, require more and more “slices” for verification. It is not right.

The Western understanding of the usefulness of OLAP is based on a methodological analysis model that must be incorporated when designing OLAP solutions. The analyst should not play with the OLAP cube and aimlessly change its dimensions and levels of detail, data orientation, graphical display of data (and this really takes!), but clearly understand what views he needs, in what sequence and why (of course, the elements " there may be discoveries here, but it is not fundamental to the usefulness of OLAP).

Applications of OLAP

  • Budget
  • Flow of funds

One of the most fertile areas of application of OLAP technologies. It is not for nothing that no modern budgeting system is considered complete without the presence of OLAP tools for budget analysis. Most budget reports are easily built on the basis of OLAP systems. At the same time, the reports answer a very wide range of questions: analysis of the structure of expenses and income, comparison of expenses for certain items in different divisions, analysis of the dynamics and trends of expenses for certain items, analysis of costs and profits.

OLAP will allow you to analyze cash inflows and outflows in the context of business operations, counterparties, currencies and time in order to optimize their flows.

  • Financial and management reporting (with analytics that management needs)
  • Marketing
  • Balanced Scorecard
  • Profitability Analysis

If you have the appropriate data, you can find various applications of OLAP technology.

OLAP products

This section will discuss OLAP as a software solution.

General requirements for OLAP products

There are many ways to implement OLAP applications, so no particular technology should have been required, or even recommended. Under different conditions and circumstances, one approach may be preferable to another. The implementation techniques include many different proprietary ideas that vendors are so proud of: variations of client-server architecture, time series analysis, object orientation, data storage optimization, parallel processes, etc. But these technologies cannot be part of the definition of OLAP.

There are characteristics that must be observed in all OLAP products (if it is an OLAP product), which is the ideal of the technology. These are the 5 key definitions that characterize OLAP (the so-called FASMI test): Fast Analysis of Shared Multidimensional Information.

  • Fast(FAST) means that the system should be able to provide most responses to users within approximately five seconds. Even if the system warns that the process will take significantly longer, users may become distracted and lose their thoughts, and the quality of the analysis will suffer. This speed is not easy to achieve with large amounts of data, especially if special on-the-fly calculations are required. Vendors resort to a wide variety of methods to achieve this goal, including specialized forms of data storage, extensive pre-computing, or increasingly stringent hardware requirements. However, there are currently no fully optimized solutions. At first glance, it may seem surprising that when receiving a report in a minute that not so long ago took days, the user very quickly becomes bored while waiting, and the project turns out to be much less successful than in the case of an instant response, even at the cost of less detailed analysis.
  • Shared means that the system makes it possible to fulfill all data protection requirements and implement distributed and simultaneous access to data for different levels of users. The system must be able to handle multiple data changes in a timely, secure manner. This is a major weakness of many OLAP products, which tend to assume that all OLAP applications are read-only and provide simplified security controls.
  • Multidimensional is a key requirement. If you had to define OLAP in one word, you would choose it. The system must provide a multi-dimensional conceptual view of data, including full support for hierarchies and multiple hierarchies, as this determines the most logical way to analyze the business. There is no minimum number of dimensions that must be processed, as this also depends on the application, and most OLAP products have a sufficient number of dimensions for the markets they are aimed at. Again, we do not specify what underlying database technology should be used if the user is to obtain a truly multidimensional conceptual view of the information. This feature is the heart of OLAP
  • Information. The necessary information must be obtained where it is needed, regardless of its volume and storage location. However, a lot depends on the application. The power of various products is measured in terms of how much input data they can process, but not how many gigabytes they can store. The power of the products varies widely - the largest OLAP products can handle at least a thousand times more data than the smallest. There are many factors to consider in this regard, including data duplication, RAM requirements, disk space usage, performance metrics, integration with information warehouses, etc.
  • Analysis means that the system can handle any logical and statistical analysis specific to a given application and ensures that it is stored in a form accessible to the end user. The user should be able to define new custom calculations as part of the analysis without the need for programming. That is, all required analysis functionality must be provided in an intuitive way for end users. Analysis tools could include certain procedures, such as time series analysis, cost allocation, currency transfers, target searches, etc. Such capabilities vary widely among products, depending on the target orientation.

In other words, these 5 key definitions are the goals that OLAP products are designed to achieve.

Technological aspects of OLAP

An OLAP system includes certain components. There are various schemes for their operation that this or that product can implement.

Components of OLAP systems (what does an OLAP system consist of?)

Typically, an OLAP system includes the following components:

  • Data source
    The source from which data for analysis is taken (data warehouse, database of operational accounting systems, set of tables, combinations of the above).
  • OLAP server
    Data from the source is transferred or copied to the OLAP server, where it is systematized and prepared for faster generation of responses to queries.
  • OLAP client
    User interface to the OLAP server in which the user operates

It should be noted that not all components are required. There are desktop OLAP systems that allow you to analyze data stored directly on the user's computer and do not require an OLAP server.

However, what element is required is the data source: data availability is an important issue. If they exist, in any form, such as an Excel table, in the accounting system database, or in the form of structured reports from branches, the IT specialist will be able to integrate with the OLAP system directly or with intermediate conversion. OLAP systems have special tools for this. If this data is not available, or it is of insufficient completeness and quality, OLAP will not help. That is, OLAP is only a superstructure over the data, and if there is none, it becomes a useless thing.

Most data for OLAP applications originates in other systems. However, in some applications (for example, planning or budgeting), data can be created directly in OLAP applications. When data comes from other applications, it is usually necessary for the data to be stored in a separate, duplicate form for the OLAP application. Therefore, it is advisable to create data warehouses.

It should be noted that the term “OLAP” is inextricably linked with the term “data warehouse” (Data Warehouse). A data warehouse is a domain-specific, time-based, and immutable collection of data to support management decision-making. Data in the warehouse comes from operational systems (OLTP systems), which are designed to automate business processes; the warehouse can be replenished from external sources, for example, statistical reports.

Despite the fact that they contain obviously redundant information that is already in databases or operating system files, data warehouses are necessary because:

  • fragmentation of data, storing it in various DBMS formats;
  • data retrieval performance improves
  • if in an enterprise all data is stored on a central database server (which is extremely rare), the analyst will probably not understand their complex, sometimes confusing structures
  • complex analytical queries for operational information slow down the current work of the company, blocking tables for a long time and taking over server resources
  • ability to clean and harmonize data
  • it is impossible or very difficult to directly analyze data from operating systems;

The purpose of the repository is to provide the “raw material” for analysis in one place and in a simple, understandable structure. That is, the concept of Data Warehousing is not a concept of data analysis, rather it is a concept of preparing data for analysis. It involves the implementation of a single integrated data source.

OLAP products: architectures

When using OLAP products, two questions are important: how and where keep And process data. Depending on how these two processes are implemented, OLAP architectures are distinguished. There are 3 ways to store data for OLAP and 3 ways to process this data. Many manufacturers offer several options, some try to prove that their approach is the single most prudent one. This is, of course, absurd. However, very few products can operate in more than one mode efficiently.

OLAP data storage options

Storage in this context means keeping data in a constantly updated state.

  • Relational databases: This is a typical choice if an enterprise stores accounting data in a RDB. In most cases, data should be stored in a denormalized structure (the most suitable is a star schema). A normalized database is not acceptable due to the very low query performance when generating aggregates for OLAP (often the resulting data is stored in aggregate tables).
  • Database files on the client computer (kiosks or data marts): This data can be pre-distributed or created by queries on client computers.

Multidimensional Databases: This assumes that data is stored in a multidimensional database on a server. It can include data extracted and summarized from other systems and relational databases, end-user files, etc. In most cases, multidimensional databases are stored on disk, but some products allow you to use RAM, calculating the most frequently used data on the fly " Very few products based on multidimensional databases allow multiple editing of data; many products allow single editing but multiple readings of data, while others are limited to reading only.

These three storage locations have different storage capabilities, and they are arranged in descending order of capacity. They also have different query performance characteristics: relational databases are much slower than the latter two options.

Options for processing OLAP data

There are 3 of the same data processing options:

  • Using SQL: This option is, of course, used when storing data in a RDB. However, SQL does not allow multidimensional calculations in a single query, so it requires writing complex SQL queries to achieve more than basic multidimensional functionality. However, this doesn't stop developers from trying. In most cases, they perform a limited number of relevant calculations in SQL, with results that can be obtained from multidimensional data processing or from the client machine. It is also possible to use RAM that can store data using more than one request: this dramatically improves response.
  • Multidimensional processing on the client: The client OLAP product does the calculations itself, but such processing is only available if users have relatively powerful PCs.

Server-side multidimensional processing: This is a popular place to perform multidimensional calculations in client-server OLAP applications and is used in many products. Performance is usually high because most of the calculations have already been done. However, this requires a lot of disk space.

Matrix of OLAP architectures

Accordingly, by combining storage/processing options, it is possible to obtain a matrix of OLAP system architectures. Accordingly, theoretically there can be 9 combinations of these methods. However, since 3 of them lack common sense, in reality there are only 6 options for storing and processing OLAP data.

Multidimensional storage options
data

Options
multidimensional
data processing

Relational database

Server-side multidimensional database

Client computer

Cartesis Magnitude

Multidimensional server processing

Crystal Holos (ROLAP mode)

IBM DB2 OLAP Server

CA EUREKA:Strategy

Informix MetaCube

Speedware Media/MR

Microsoft Analysis Services

Oracle Express (ROLAP mode)

Pilot Analysis Server

Applix iTM1

Crystal Holos

Comshare Decision

Hyperion Essbase

Oracle Express

Speedware Media/M

Microsoft Analysis Services

PowerPlay Enterprise Server

Pilot Analysis Server

Applix iTM1

Multidimensional processing on the client computer

Oracle Discoverer

Informix MetaCube

Dimensional Insight

Hyperion Enterprise

Cognos PowerPlay

Personal Express

iTM1 Perspectives

Since it is storage that determines processing, it is customary to group by storage options, that is:

  • ROLAP products in sectors 1, 2, 3
  • Desktop OLAP - in sector 6

MOLAP products – in sectors 4 and 5

HOLAP products (allowing both multidimensional and relational data storage options) – in 2 and 4 (in italics)

Categories of OLAP products

There are more than 40 OLAP vendors, although they cannot all be considered competitors because their capabilities are very different and, in fact, they operate in different market segments. They can be grouped into 4 fundamental categories, the differences between which are based on the following concepts: complex functionality - simple functionality, performance - disk space. It is useful to depict categories in the shape of a square because it clearly shows the relationships between them. The distinctive feature of each category is represented on its side, and the similarities with others are represented on the adjacent sides, therefore, the categories on opposite sides are fundamentally different.

Peculiarities

Advantages

Flaws

Representatives

Applied OLAP

Complete applications with rich functionality. Almost all require a multidimensional database, although some work with a relational one. Many of this category of applications are specialized, such as sales, manufacturing, banking, budgeting, financial consolidation, sales analysis

Possibility of integration with various applications

High level of functionality

High level of flexibility and scalability

Application complexity (user training required)

High price

Hyperion Solutions

Crystal Decisions

Information Builders

The product is based on a non-relational data structure that provides multidimensional storage, processing and presentation of data. During the analysis process, data is selected exclusively from a multidimensional structure. Despite the high level of openness, suppliers persuade buyers to purchase their own tools

High performance (fast calculations of summary indicators and various multidimensional transformations for any of the dimensions). The average response time to an ad hoc analytical query when using a multidimensional database is usually 1-2 orders of magnitude less than in the case of an RDB

High level of openness: a large number of products with which integration is possible

They easily cope with the tasks of including various built-in functions in the information model, conducting specialized analysis by the user, etc.

The need for large disk space to store data (due to redundancy of data that is stored). This is an extremely inefficient use of memory - due to denormalization and pre-executed aggregation, the volume of data in a multidimensional database corresponds to 2.5-100 times less than the volume of the original detailed data. In any case, MOLAP does not allow working with large databases. The real limit is a database of 10-25 gigabytes

The potential for a database “explosion” is an unexpected, sharp, disproportionate increase in its volume

Lack of flexibility when it comes to modifying data structures. Any change in the structure of dimensions almost always requires a complete restructuring of the hypercube

For multidimensional databases, there are currently no uniform standards for the interface, languages ​​for describing and manipulating data

Hyperion (Essbase)

DOLAP (Desktop OLAP)

Client OLAP products that are fairly easy to implement and have a low cost per seat

We are talking about such analytical processing where hypercubes are small, their dimension is small, the needs are modest, and for such analytical processing a personal machine on a desktop is sufficient

The goal of the producers of this market is to automate hundreds and thousands of jobs, but users must perform a fairly simple analysis. Buyers are often encouraged to buy more jobs than necessary

Good integration with databases: multidimensional, relational

Possibility of making complex purchases, which reduces the cost of implementation projects

Ease of use of applications

Very limited functionality (not comparable in this regard with specialized products)

Very limited power (small data volumes, small number of measurements)

Cognos (PowerPlay)

Business Objects

Crystal Decisions

This is the smallest sector of the market.

Detailed data remains where it was originally - in the relational database; some aggregates are stored in the same database in specially created service tables

Capable of handling very large amounts of data (cost-effective storage)

Provide a multi-user mode of operation, including editing mode, and not just reading

Higher level of data protection and good options for differentiating access rights

Frequent changes to the measurement structure are possible (do not require physical reorganization of the database)

Low performance, significantly inferior in terms of response speed to multidimensional ones (response to complex queries is measured in minutes or even hours rather than in seconds). These are better report builders than interactive analytics tools

Complexity of products. Requires significant maintenance costs from information technology specialists. To provide performance comparable to MOLAP, relational systems require careful design of the database schema and configuration of indexes, that is, a lot of effort on the part of database administrators

Expensive to implement

The limitations of SQL remain a reality, which prevents the implementation in RDBMS of many built-in functions that are easily provided in systems based on a multidimensional representation of data

Information Advantage

Informix (MetaCube)

It should be noted that consumers of hybrid products that allow the choice of ROLAP and MOLAP mode, such as Microsoft Analysis Services, Oracle Express, Crystal Holos, IBM DB2 OLAPServer, almost always select MOLAP mode.

Each of the presented categories has its own strengths and weaknesses; there is no single optimal choice. The choice affects 3 important aspects: 1) performance; 2) disk space for data storage; 3) capabilities, functionality and especially the scalability of the OLAP solution. In this case, it is necessary to take into account the volume of data being processed, the power of the equipment, the needs of users and seek a compromise between speed and redundancy of disk space occupied by the database, simplicity and versatility.

Classification of Data Warehouses in accordance with the volume of the target database

Disadvantages of OLAP

Like any technology, OLAP also has its drawbacks: high requirements for hardware, training and knowledge of administrative personnel and end users, high costs for the implementation of the implementation project (both monetary and time, intellectual).

Selecting an OLAP product

Choosing the right OLAP product is difficult, but very important if you want the project to not fail.

As you can see, product differences lie in many areas: functional, architectural, technical. Some products are very limited in settings. Some are created for specialized subject areas: marketing, sales, finance. There are products for general purposes, which do not have an application specific use, which must be quite flexible. As a rule, such products are cheaper than specialized ones, but the implementation costs are higher. The range of OLAP products is very wide - from the simplest tools for building pivot tables and charts that are part of office products, to tools for analyzing data and searching for patterns, which cost tens of thousands of dollars.

As in any other field, in the field of OLAP there cannot be clear guidelines for choosing tools. You can only focus on a number of key points and compare the proposed software capabilities with the needs of the organization. One thing is important: without properly thinking about how you are going to use OLAP tools, you risk creating a huge headache for yourself.

During the selection process, there are 2 questions to consider:

  • assess the needs and capabilities of the enterprise
  • evaluate the existing offer on the market, development trends are also important

Then compare all this and, in fact, make a choice.

Needs assessment

You can't make a rational product choice without understanding what it will be used for. Many companies want the “best possible product” without a clear understanding of how it should be used.

In order for the project to be successfully implemented, the financial director must, at a minimum, competently formulate his wishes and requirements to the manager and automation service specialists. Many problems arise due to insufficient preparation and awareness for the choice of OLAP; IT specialists and end users experience communication difficulties simply because they manipulate different concepts and terms during conversation and put forward conflicting preferences. There needs to be consistency in goals within the company.

Some factors have already become obvious after reading the overview of OLAP product categories, namely:

Technical aspects

  • Data sources: corporate data warehouse, OLTP system, table files, relational databases. Possibility of linking OLAP tools with all DBMS used in the organization. As practice shows, the integration of heterogeneous products into a stable operating system is one of the most important issues, and its solution in some cases can be associated with big problems. It is necessary to understand how simply and reliably it is possible to integrate OLAP tools with the DBMS existing in the organization. It is also important to evaluate the possibilities of integration not only with data sources, but also with other applications to which you may need to export data: email, office applications
  • Variability of data taken into account
  • Server platform: NT, Unix, AS/400, Linux - but don't insist that OLAP specification products run on questionable or dying platforms you're still using
  • Client-side and browser standards
  • Deployable architecture: local network and PC modem connection, high-speed client/server, intranet, extranet, Internet
  • International Features: Multi-currency support, multi-lingual operations, data sharing, localization, licensing, Windows update

Amounts of input information that are available and that will appear in the future

Users

  • Area of ​​application: sales/marketing analysis, budgeting/planning, performance analysis, accounting report analysis, qualitative analysis, financial condition, generation of analytical materials (reports)
  • Number of users and their location, requirements for the division of access rights to data and functions, secrecy (confidentiality) of information
  • User type: senior management, finance, marketing, HR, sales, production, etc.
  • User experience. User skill level. Consider providing training. It is very important that the OLAP client application is designed so that users feel confident and can use it effectively.

Key Features: Data Writeback Needs, Distributed Computing, Complex Currency Conversions, Report Printing Needs, Spreadsheet Interface, Application Logic Complexity, Dimensions Required, Analysis Types: Statistical, Goal Search, What-If Analysis

Implementation

  • Who will be involved in implementation and operation: external consultants, internal IT function or end users
  • Budget: software, hardware, services, data transfer. Remember that paying for OLAP product licenses is only a small part of the total cost of the project. Implementation and hardware costs may be more than the license fee, and long-term support, operation, and administration costs are almost certainly significantly more. And if you make the wrong decision to buy the wrong product just because it's cheaper, you may end up with a higher overall project cost due to higher maintenance, administration and/or hardware costs for what you'll likely get lower level of business benefits. When estimating total costs, be sure to ask the following questions: How broad are the sources of implementation, training, and support available? Is the potential general fund (employees, contractors, consultants) likely to grow or shrink? How widely can you use your industrial professional experience?

Despite the fact that the cost of analytical systems remains quite high even today, and the methodologies and technologies for implementing such systems are still in their infancy, today the economic effect they provide significantly exceeds the effect of traditional operational systems.

The effect of proper organization, strategic and operational planning of business development is difficult to quantify in advance, but it is obvious that it can exceed the costs of implementing such systems by tens and even hundreds of times. However, one should not be mistaken. The effect is ensured not by the system itself, but by the people working with it. Therefore, declarations like: “a system of Data Warehousing and OLAP technologies will help the manager make the right decisions” are not entirely correct.” Modern analytical systems are not artificial intelligence systems and they can neither help nor hinder decision making. Their goal is to promptly provide the manager with all the information necessary to make a decision in a convenient form. And what information will be requested and what decision will be made based on it depends only on the specific person using it.

All that remains to be said is that these systems can help solve many business problems and can have far-reaching positive effects. It remains to be seen who will be the first to realize the benefits of this approach and be ahead of the others.

The concept of multidimensional data analysis is closely associated with operational analysis, which is performed using OLAP systems.

OLAP (On-Line Analytical Processing) is a technology for operational analytical data processing that uses methods and tools for collecting, storing and analyzing multidimensional data to support decision-making processes.

The main purpose of OLAP systems is to support analytical activities and arbitrary (the term ad-hoc is often used) requests from user analysts. The purpose of OLAP analysis is to test emerging hypotheses.

The origins of OLAP technology are the founder of the relational approach, E. Codd. In 1993, he published an article entitled "OLAP for User Analysts: What It Should Be." This paper outlines the basic concepts of online analytics and identifies the following 12 requirements that must be met by products that enable online analytics. Tokmakov G.P. Database. Database concept, relational data model, SQL languages. P. 51

Listed below are the 12 rules outlined by Codd that define OLAP.

1. Multidimensionality -- an OLAP system at the conceptual level should present data in the form of a multidimensional model, which simplifies the processes of analysis and perception of information.

2. Transparency -- the OLAP system must hide from the user the real implementation of the multidimensional model, the method of organization, sources, processing and storage means.

3. Availability -- An OLAP system must provide the user with a single, consistent and holistic data model, providing access to data regardless of how or where it is stored.

4. Consistent performance when developing reports - the performance of OLAP systems should not decrease significantly as the number of dimensions on which analysis is performed increases.

5. Client-server architecture -- the OLAP system must be able to work in a client-server environment, because Most of the data that today needs to be subjected to operational analytical processing is stored distributedly. The main idea here is that the server component of the OLAP tool should be sufficiently intelligent and allow the construction of a common conceptual scheme based on the generalization and consolidation of various logical and physical schemes of corporate databases to provide the effect of transparency.

6. Dimensional equality -- The OLAP system must support a multidimensional model in which all dimensions are equal. If necessary, additional characteristics can be provided to individual dimensions, but this capability must be provided to any dimension.

7. Dynamic management of sparse matrices -- the OLAP system must provide optimal processing of sparse matrices. The access speed must be maintained regardless of the location of the data cells and be constant for models with different numbers of dimensions and varying degrees of data sparsity.

8. Support for multi-user mode - the OLAP system must provide the ability for several users to work together with one analytical model or create different models for them from single data. In this case, both reading and writing data is possible, so the system must ensure its integrity and security.

9. Unlimited cross operations -- the OLAP system must ensure that the functional relationships described using a certain formal language between the cells of the hypercube are preserved when performing any slice, rotation, consolidation or drill-down operations. The system should independently (automatically) perform the transformation of established relationships, without requiring the user to redefine them.

10. Intuitive data manipulation -- An OLAP system must provide a way to perform slicing, rotating, consolidating, and drilling operations on a hypercube without the user having to do a lot of interface manipulation. The dimensions defined in the analytical model must contain all the necessary information to perform the above operations.

11. Flexible options for obtaining reports -- the OLAP system must support various methods of data visualization, i.e. reports should be presented in any possible orientation. Reporting tools must present synthesized data or information resulting from the data model in any possible orientation. This means that rows, columns or pages must show from 0 to N dimensions at a time, where N is the number of dimensions of the entire analytical model. Additionally, each content dimension shown in a single post, column, or page must allow any subset of the elements (values) contained in the dimension to be shown in any order.

12. Unlimited dimension and number of aggregation levels - research on the possible number of necessary dimensions required in the analytical model showed that up to 19 dimensions can be used simultaneously. Hence, it is strongly recommended that the analytical tool be able to provide at least 15, and preferably 20, measurements simultaneously. Moreover, each of the common dimensions should not be limited in the number of user-analyst-defined aggregation levels and consolidation paths.

Codd's Additional Rules.

The set of these requirements, which served as the de facto definition of OLAP, quite often causes various complaints, for example, rules 1, 2, 3, 6 are requirements, and rules 10, 11 are unformalized wishes. Tokmakov G.P. Database. Database concept, relational data model, SQL languages. P. 68 Thus, Codd's listed 12 requirements do not allow us to accurately define OLAP. In 1995, Codd added the following six rules to the above list:

13. Batch Retrieval vs. Interpretation -- An OLAP system must provide access to both its own and external data equally effectively.

14. Support for all OLAP analysis models -- An OLAP system must support all four data analysis models defined by Codd: categorical, interpretive, speculative and stereotypical.

15. Processing of non-normalized data -- the OLAP system must be integrated with non-normalized data sources. Data modifications made in the OLAP environment should not result in changes to data stored in the original external systems.

16. Saving OLAP results: storing them separately from the source data - an OLAP system operating in read-write mode must save the results separately after modifying the source data. In other words, the security of the original data is ensured.

17. Elimination of missing values ​​- An OLAP system, when presenting data to the user, must discard all missing values. In other words, missing values ​​must be different from null values.

18. Handling Missing Values ​​-- The OLAP system must ignore all missing values ​​without regard to their source. This feature is related to the 17th rule.

In addition, Codd divided all 18 rules into the following four groups, calling them features. These groups were named B, S, R and D.

The main features of (B) include the following rules:

Multidimensional conceptual representation of data (rule 1);

Intuitive data manipulation (rule 10);

Availability (rule 3);

Batch extraction vs. interpretation (rule 13);

Support for all OLAP analysis models (rule 14);

Client-server architecture (rule 5);

Transparency (rule 2);

Multi-user support (rule 8)

Special Features (S):

Processing of non-normalized data (rule 15);

Storing OLAP results: storing them separately from the source data (rule 16);

Elimination of missing values ​​(rule 17);

Handling missing values ​​(rule 18). Reporting Features (R):

Flexibility of reporting (rule 11);

Standard reporting performance (rule 4);

Automatic physical layer configuration (modified original rule 7).

Dimension Control (D):

Universality of measurements (rule 6);

Unlimited number of dimensions and aggregation levels (rule 12);

Unlimited operations between dimensions (rule 9).







2024 gtavrl.ru.