Distributed information systems and networks. Afanasyev Oleg Alexandrovich

Useful tips

Course structure. Lectures Distributed systems: tasks, terminology, operating principles. Client-server architecture. Typical tasks. Areas of use. Example of an information system (typical application in architecture client-server). Multi-tier architecture. Areas of use. Brief overview of modern technologies. XML, CGI/JSP, Servlets, DCOM, CORBA, RMI (.NET). Selecting layers in a multi-tier architecture (typical architecture). "Thin" and "Thick" clients. Application server. Database Server. Migration of objects (issues of distribution of computational load). System deployment. CORBA Basics. CORBA and OOP. Interface Definition Language IDL. IDL mapping to C++. IDL mapping to Java. ORB. Dynamic interaction between clients and servers. CORBA naming services. An example of an information system implemented in a multi-tier architecture.

Course structure. Practice Laboratory work 1 Service system discount cards Required tools: server - Oracle (MSSQL Server 2000 sp3), client - Java (jdk, VisualCafe, MS J++,...) Laboratory work 2 WMS (Warehouse Management System) Thin client(Web, HandHeld, cellular telephone, …). Application server. Interaction between client and application server. Server business logic. Issues of distribution of computing load. Ensuring fault tolerance. Required tools: server - Oracle (MSSQL Server 2000 sp3), Application/business logic - Java (jdk, VisualCafe, MS J++,...)

Distributed Systems: Definitions A distributed system is a collection of independent nodes (computers) that appear to the user as one computer. a distributed system is a collection of independent computers connected by a network with software that ensures their joint functioning.

Consequences... No global time – Asynchronous message transmission - – Limited accuracy of clock synchronization No system state – There is not a single process in a distributed system that knows the current global state of the system Consequence of parallelism and data transfer mechanism

Consequences... Failures – Processes execute autonomously, in isolation – Failures of individual processes may remain undetected – Individual processes may be unaware of a system-wide failure – Failures occur more often than in a centralized system – New causes of failures (which did not exist in monolithic systems) – Network failures isolate processes and fragment the system

Principles of separation Functional separation: nodes perform various tasks–Client / server –Host / Terminal –Data collection/data processing Solution - creation of shared services Natural separation (defined by task) –Service system for a supermarket chain –Network to support teamwork

Partitioning principles Load sharing/balancing: assigning tasks to processors so as to optimize total load systems. Power amplification: different nodes work on the same task – Distributed systems containing a set of microprocessors can approach the power of a supercomputer – 10000 CPU, each 50 MIPS, together MIPS - > a command is executed in nsec -> light travels 0.6 mm -> any existing chip - more! the command is executed in 0.002 nsec -> light travels 0.6 mm -> any existing chip - more!">

Principles of separation Physical separation: the system is built on the assumption that the nodes are physically separated (reliability requirements, fault tolerance). Economic: A set of cheap chips can provide better price/performance than a mainframe - Mainframe: 10 times faster, 1000 times more expensive

Sharing resources Sharing resources is often one of the reasons for developing a distributed system – Reduces cost, (file and print servers) – Shares data between users (collaboration on a project) Services – Manages a set of resources – Provides services to users

Resource sharing Server is used to provide services – Receives service requests from clients call operation – Receive message/reply to message full implementation - remote call – The roles of client and server change from call to call the same process can be both a client and server – Client/Server terminology applies to processes, not nodes!!!

Distributing the application Fragmentation – dividing the application into modules for distribution Configuration – Connecting modules with each other (dependencies) Placement – uploading modules to target system–Distribution of computing modules between nodes (static or dynamic)

Heterogeneity Middleware: middleware layer – allows heterogeneous nodes to communicate – Defines a homogeneous computing model – Supports one or more programming languages – Provides support for distributed applications Calling remote objects Remote SQL calling Distributed transaction processing Examples: CORBA, Java RMI, Microsoft DCOM

Heterogeneity Mobile code: Code is designed to migrate between nodes – Need to overcome hardware differences (different instruction sets) Virtual machines–The compiler “produces” bytecode for the VM –VM is implemented for all hardware platforms (Java) Brute force methods –We port the code to each platform...

Security Scenario 1: Access to test results via NFS – How do we know that the user is a teacher who has access to the data? –Authorization Scenario 2: Sending a number credit card to an online store – No one except the recipient should read the data – Cryptography

Scalability Cost of physical resources – Increases as the number of users increases – Should not increase faster than O(n), where n = number of users Performance overhead – Increases with data size (and number of users) – Search time should not increase faster than O(log n) where n = data size

Parallelism Concurrency control – Multiple threads accessing a resource Proper scheduling of access in parallel threads (elimination of mutual exclusions, transactions) – Synchronization (semaphores) Safe, but reduces performance – Shared objects (resources) must work correctly in a multi-threaded environment

Transparency Access transparency: access to local and remote resources through identical calls Location transparency: access to resources regardless of their physical location Concurrency transparency: the ability for multiple processes to work with resources in parallel without affecting each other Replication transparency: the ability for multiple instances of the same resource to be used without knowledge of the physical details of replication. Error Handling Transparency: Protects software components from failures that occur in other software components. Disaster recovery Mobility transparency: The ability to move an application between platforms without redesigning it Performance transparency: The ability to configure the system to increase performance when the composition of the execution platform changes Scalability transparency: The ability to increase performance without changing the structure of the software system and the algorithms used

Results Distributed system: – Autonomous (but connected by a data transmission medium) nodes – Interaction through message passing Many examples that distributed systems are needed and you need to be able to build them Distributed systems exist and you need to be able to develop and support them

Architecture of distributed information systems and Web applications

Distributed system is a set of independent computers, which appears to their users as a single unified system. Despite the fact that all computers are autonomous, for users they appear to be a single system.

To the main characteristics distributed systems:

1. The differences between computers and methods of communication between them are hidden from users. The same applies to the external organization of distributed systems.

2. Users and applications experience a consistent experience across distributed systems, no matter where or when they interact.

Distributed systems should also be relatively easy to expand, or scale. This characteristic is a direct consequence of having independent computers, but at the same time does not indicate how these computers are actually combined into a single system.

In order to maintain a unified view of the system, the organization of distributed systems often includes an additional layer software, located between top level, where users and applications reside, and the lower level, consisting of operating systems (Figure 1.11).

Accordingly, such a distributed system is usually called intermediate level system (middleware). Note that the intermediate layer is distributed among many computers.

Features of the functioning of distributed systems include:

· presence of a large number of objects;

· request execution delays (for example, if local calls require about a couple of hundred nanoseconds, then requests to an object in distributed systems require from 0.1 to 10 ms);

· some objects may not be used for a long time;

· distributed components are executed in parallel, which leads to the need to coordinate execution;

· requests in distributed systems have a high probability of failure;

· increased safety requirements.

Due to the presence of increased latency, interfaces in a distributed system must be designed to reduce query execution time. This can be achieved by reducing the frequency of access, as well as by enlarging the functions performed.

To combat failures, clients are required to check whether requests are being executed by the server. Security in distributed applications can be increased by monitoring communication sessions (authentication, authorization, data encryption).

The architecture of Web applications (Web services) is widely used nowadays. Web service is an application accessible via the Internet. It provides services in a form independent of the service provider, as it uses a universal operating platform and a universal data format (XML). IN Web-based-services are based on standards that define the formats and language of queries, as well as protocols for searching for these services on the Internet. The scheme for accessing the database via the Internet is shown in Fig. 1.12.

Figure 1.12 – Scheme of access to the DBMS server via the Internet

Currently, there are three different technologies that support the concept of distributed object systems: EJB, DCOM CORBA.

The main idea behind the development of EJB technology ( Enterprise Java Beans) - create an infrastructure for components so that they can be easily inserted and removed from servers, thereby increasing or decreasing the functionality of the server. EJB components are Java classes and can run on any EJB-compatible server, even without recompilation. The main goals of EJB technology are:

1. Make it easier for developers to create applications by relieving them of the need to implement services such as transactions, threads, loads, etc. from scratch. Developers can concentrate their attention on describing the logic of their applications, shifting the tasks of storing, transferring and security of data to the EJB system .

2. Describe the main structures of the EJB system and the interfaces for interaction between its components.

3. Free the developer from implementing EJB objects due to the presence of a special code generator.

Thanks to the Java model used, EJB is relatively simple and in a fast way creation of distributed systems.

DCOM technology ( Distributed Component Object Model ) is a software architecture developed by Microsoft for distributing applications across multiple computers on a network. A software component on one computer can use DCOM to pass messages to a component on another computer. DCOM automatically establishes a connection, transmits a message, and returns a response from the remote component. DCOM's ability to link components allowed Microsoft to put Windows next to additional features, in particular, to implement a Microsoft Transaction Server, which is responsible for executing database transactions over the Internet.

Source: Journal “Prospects for Science and Education” Issue No. 6(12)/2014 http://cyberleninka.ru/article/n/problemy-raspredelennyh-sistem

annotation

The article describes the features of distributed systems. The concept of a distributed system and a distributed information system is revealed.

A classification of distributed systems is given. In particular, by the type of resources provided: distributed computing systems, distributed information systems, semantic Tread. By the number of elements in the system: cluster, enterprise-level distributed system, global system.

The requirements for distributed systems are described: transparency of a distributed system, transparency of location, transparency of access, transparency of access parallelism, transparency of scalability of a distributed system, transparency of replication, openness of the system, security, reliability of the PC.

The article describes problems in creating and operating distributed systems, such as: problems of system administration, problems of load balancing, problems of data recovery in case of errors, problems of limited scalability (the problem of increasing the number of system nodes, the problem of limited server capabilities, the problem of limited data networks, the problem of limited data processing algorithms), the problem of software portability.

Keywords: computing, distributed systems, distributed computing systems, distributed information systems

Introduction

IN modern society There is a need to improve the quality and speed of processing primarily “big data” and secondarily data in distributed systems. In this regard, the importance of distributed data storage and processing systems is increasing as a means of solving this problem. One of the main tasks of any distributed system is to analyze the properties of the received data, which, for a number of reasons, cannot be assessed on a single node. To achieve the goal and speed up processing time, it is necessary at the first stage to send data to distributed nodes of the system, and at the second to collect data from distributed nodes and aggregate this data into a common global view. This is a difficult task due to the dynamics often encountered in these types of problems, which imposes very frequent changes into local values that affect the overall global properties of the entire task. Creating efficient and adaptive distributed systems can significantly speed up the speed of data processing. In order to consider this issue, we will analyze the problems that arise during the design and operation of distributed systems.

The concept of a distributed system

At the moment, there are a large number of definitions of the concept “distributed system” in the literature. The most complete definition was proposed by AS Tanenbaum: “A distributed system (DS) is a set of independent computers that is perceived by its users as a single sequential system.” Another definition is proposed in the work: Distributed systems are software and hardware systems in which the execution of operations (actions, calculations) necessary to ensure the target functionality of the system is distributed (physically or logically) between different executors. In the computing field, in our study, a PC will be understood as a hardware and software system created for a specific practical application, the functionality of which is distributed on various nodes.

Distributed systems can be classified according to various criteria: by the number of elements in the system, by the level of organization of distributed systems, by the type of resources provided, as well as a number of other characteristics. Based on the type of resources provided, there are:

distributed computing systems (Computational Grid)
distributed information systems (Data Grid)
Semantic Grid

The main characteristic of computing systems (Computational Grid) is that the computing power of the entire system is provided as the main resource. The main direction of development of systems of this type is to increase the computing power of the system by increasing the number of computing nodes. An example of distributed computing systems are clusters.

Distributed information systems (Data Grid) provide computing resources for processing large volumes data for tasks that do not require large computing resources. The Semantic Grid provides not only individual computing power (databases, services), but also a set of computing systems and information systems for each specific subject area.

Based on the number of elements in the system, distributed systems are distinguished: cluster, corporate-level distributed system, global system. A distributed system is a cluster if the total number of elements does not exceed several dozen. A distributed enterprise-level system already contains hundreds, and in some cases, thousands of elements. Global system is a distributed system with more than 1000 elements included in its composition. Moreover, often, the elements of such systems are globally distributed. An example of a global distributed network is the Internet, where the information field is the resource provided.

Basic requirements for distributed systems

The main requirements for distributed systems are: transparency, system openness, security, PC scalability, reliability. Let's look at each characteristic in more detail.

Transparency of a distributed system. Transparency, in general, is that distributed systems should be perceived by users of the system as a homogeneous entity, and not as a set of autonomous entities that interact with each other. Designing a distributed system is a complex task, and maintaining the necessary transparency is a necessary condition functioning of the system. There are different types of transparency.

Location transparency. In distributed systems, location transparency means that the user does not have to know where the resources he needs are located. Files can be moved to different nodes of a distributed system, for example, if a failure occurred on a PC node and the data was restored on another PC node, but the user should not notice these movements. For example, in distributed information file systems, the user should see only a single file space, despite the fact that the data may be physically located on different servers.

Transparency of access. In distributed systems, the principle of access transparency plays the most important role. Transparency in in this case is to ensure that access differences are hidden and that data is made available.

Access concurrency transparency. Different users of distributed systems must be able to access shared data in parallel. In this case, it is necessary to ensure parallel sharing of system resources, and accordingly, to ensure the concealment of the fact of resource sharing.

Replication transparency. In order to ensure data safety, especially on distributed file systems, it is necessary to ensure data replication. The user should not be aware that data replication exists. To hide this factor, it is necessary that the provided data or resources have the same names.

Openness of the system. Unlike early distributed systems, which were inherently limited and closed because they were created primarily within individual organizations and to solve specific problems, modern distributed systems are being created increasingly open. The application of the principle of openness to distributed systems has become possible thanks to the development of data lines, increased processor performance, and the general development of information technology. The openness of distributed systems means the ability to interact with other open systems. Open systems must have the following characteristics:

PCs must comply with clearly defined interfaces
The systems included in the RS must easily interact with each other
Systems must provide application portability.

System openness can be achieved using: programming languages, hardware platforms, software.

Safety. Security occupies a special place in modern distributed systems. RS safety is, in general, a combination of 3 factors:

Ensuring the confidentiality of data and resources;
Ensuring confidentiality of access to resources for multiple users;
Ensuring the integrity of resources and data.

The need to create distributed systems that provide the necessary security of data and the entire structure of the PC arises everywhere. Many security issues can be resolved at the level of individual PC nodes, for example, by installing firewalls and anti-virus software on individual system nodes, introducing a user authentication policy and other methods. But due to the peculiarities of the architecture of most PCs, this approach is not always effective. Software cannot always provide the necessary data confidentiality in a distributed system. For example, software cannot always provide full protection against MIMT and DDOS attacks on a distributed network. Often, methods of protection against such attacks are not always acceptable for computer network nodes. An important indicator, when organizing PC protection, is the level of system availability. The level of availability of a distributed system is determined not only by the availability of the resource at time t, but also by the principles of organizing PC protection, since most software, providing protection against attacks aimed at denial of service. Much attention is paid to this issue by many antivirus software manufacturers.

Reliability of RS. Due to the emergence of new methods and algorithms that are demanding on computing resources and, most importantly, time resources, the need for the availability of distributed systems at time t becomes extremely urgent. The main indicator that determines the reliability of the entire PC is fault tolerance. Fault tolerance is most important property computing system, which consists in the possibility of continuing actions, specified by the program, after a malfunction occurs.

Problems of operating distributed systems

Despite all the advantages of distributed systems compared to traditional centralized systems(RSs provide significantly lower deployment costs and ease of implementation), RSs also have a number of significant shortcomings. The main problems of distributed systems compared to traditional systems are:

system administration problems;
problems of limited scalability of the PC;
software portability problems.

System administration problems include problems: problems balancing the load on system nodes; problems with data recovery in case of errors. Fragmentation of resources in distributed systems requires the creation of flexible, customizable administration tools. Since administration in globally distributed systems must occur automatically, the following main problems arise in the administration of distributed systems:

load balancing on system nodes;
data recovery in case of an error;
collecting statistics from system nodes;
updating software on system nodes automatically.

It is worth understanding that this list is general, i.e. For each specific PC, problems may arise that are not described in this work. But these problems are the most common in practice and deserve special attention when designing a PC. The last two problems are well studied, and there are many developed software tools in the distributed systems software market that provide both statistics collection and software updating. Of scientific interest are methods of balancing the load on system nodes and methods of data recovery in case of errors. Due to the specifics of the RS, as well as the heterogeneity of the RS equipment and architecture, there is no single design method that would provide a solution to these problems.

Load balancing problems. An important problem when designing a PC is to ensure effective load balancing on system nodes. A properly chosen load balancing strategy has a decisive impact on the overall efficiency and speed of a distributed system. At the moment, there are many approaches to solving this problem.

In the general case, a generalized classification of methods for balancing the load of computing nodes can be distinguished. Based on the nature of load distribution on computing nodes, they distinguish: dynamic balancing (redistribution); static balancing.

Static balancing is often performed as a result of an a priori analysis. When distributing resources across computing nodes, the distributed system model is analyzed in order to identify the best balancing strategy. In this case, it is necessary to take into account the structure of the PC, as well as the configuration of the computing nodes. Main disadvantage this method load balancing is the need to associate nodes with different hardware configurations with the computational complexity of the task, which is not always possible.

Dynamic balancing of a distributed system consists of adapting the load on the nodes of a distributed system during operation, which in turn allows for more efficient use of network resources. The need for dynamic balancing arises when it is not possible to initially a priori assume the overall network load. Such situations most often arise, for example, in mathematical modeling problems, when during calculations at each iteration, the complexity of the calculation increases and, accordingly, the total computation time also increases. Dynamic balancing also allows the use of software that will be invariant to the architecture of the distributed system.

Problems with data recovery in case of errors. During the operation of distributed systems, the most common problem that arises is tracking failures and subsequent data recovery. This situation may arise, for example, during a power failure of one of the PC nodes. Automatic data recovery is a complex task that involves many problems. During recovery, it is necessary to find out the nature of the error that occurred, classify it and automatically restore all data. In this case, not only the entire integrity of the associated data must be preserved, but also the availability of the remaining data, since recovery must occur without blocking the main resources for read-write, i.e. the distributed system must function without interruption. At the moment, there are many approaches to solving this problem. For example, one of the recovery methods in distributed information systems (distributed DBMS) is the use of the so-called transaction log, which stores all information about all changes that have occurred in the database. The difficulty in this case lies in the correct classification of errors and the correct application of data recovery methods in automatic mode.

Problems of limited scalability. Scalability of distributed systems is one of the primary tasks when designing a PC. Distributed systems made it possible to avoid the main disadvantage of centralized systems - the limited expansion of the system's computing power. There are three main indicators of system scalability:

scalability of the PC in relation to its size. A system is considered scalable relative to its size if it can easily connect new nodes to it.
Geographic scalability. A system is considered geographically scalable if it is possible to connect new nodes to its network, without being tied to a specific geographic area (country, city, data center, etc.), that is, globally distributed nodes.
Scalability of management. A system is considered scalable in terms of resource management if, as the total number of system nodes grows, system administration does not become more complicated.

When solving the problem of system scalability, many problems must be solved. Let us highlight the main problems of scalability of distributed systems.

The problem is increasing the number of system nodes, which is not always possible due to the limitations of services and algorithms, since many services are often configured to be used on a specific amount of equipment, for example, to use only one specific server, a specific architecture. That is, we are faced with the problem of centralization of both resources and services.

The problem is the limited capabilities of the server, which aggregates data collected from system nodes into a common global view.

The problem of limited data networks. Since, with geographic scalability, the nodes of a distributed system can be located in geographically distant parts of the world, when designing and operating a PC we are faced with problems of reliability of data transmission networks. At low data transfer rates, the overall reliability and performance of the PC may decrease.

The problem of limited data processing algorithms. It is necessary to use methods and algorithms for collecting data from system nodes that minimally overload the communication network.

Software portability problem. The problem of software portability is one of the key limiting factors in the development and further scaling of distributed systems. The problem of portability is the inability to run the created application on different architectures. Rapid development software architectures, programming languages, as well as the general development of the entire IT industry as a whole - all this has led to the need to create methodologies for software code portability.

The issue of software portability is especially acute in globally distributed systems, where various heterogeneous equipment with different operating systems are often used as nodes. For example, to combine computers into one global computing GRID network, it is necessary to write a client application for each computing node, taking into account the specifics of its architecture and the installed OS, which is a difficult task. Ever-increasing demands for increased mobility software products, leads to the need to conduct research in this direction. Many published scientific papers are devoted to the problems of ensuring cross-platform software, which show the main approaches and methods that allow creating portable applications.

Conclusion

On the one hand, distributed systems are tools that allow solving a large number of complex problems, most of which cannot be solved by other methods. Distributed systems eliminate the main disadvantage of centralized systems - the limited expansion of computing power. At the same time, the analysis revealed a number of problems that require solutions.

The limited use of distributed systems is complicated by the use of equipment from different manufacturers with different types of architectures. Due to the wide variety of aspects of building computing systems, as well as the variety of existing operating systems, there is a need to create methods for adaptive scheduling of thread distribution in a distributed system, which will significantly speed up the processing speed of incoming service requests and increase the overall system performance. However, in general, there is a positive trend towards solving problems and it should be assumed that in the coming years there will be a qualitative technological leap in this direction.

LITERATURE

Tsvetkov V. Ya., Lobanov A. A. Big Data as Information Barrier // European Researcher. 2014. Vol. (78). No. 7-1. p. 1237-1242.
Martin D. Computer networks and distributed data processing: Software, methods and architecture: [In 2 issues]: Transl. from English Vol. 1. Finance and Statistics, 1985.
Tsvetkov V.Ya. Database. Operation of information systems with distributed databases. M.: MIIGAiK, 2009. 88 p.
Shokin Yu.I. and others. Distributed information and analytical system for searching, processing and analyzing spatial data // Computing technologies. 2007. T. 12. No. 3. pp. 108-115.
Tanenbaum A., Van Steen M. Distributed systems. Pearson Prentice Hall, 2007.
I.B. Burdonov, A.S. Kosachev, V.N. Ponomarenko, V.Z. Shnitman. Review of approaches to verification of distributed systems. M.: Russian Academy of Sciences. Institute of System Programming (ISP RAS) 2003. 51 p.
Vovchenko A.E., Kalinichenko L.A., Stupnikov S.A. Semantic grid based on the concept of subject intermediaries. Institute of Informatics Problems of the Russian Academy of Sciences. URL: http://83.149.245.107/synthesis/publications/10semgrid/10semgr id.pdf (accessed September 20, 2014).
Rodin A.V., Burtsev V.L. Parallel or distributed computing systems? // Proceedings Scientific session MEPhI-2006. T. 12 Informatics and management processes. Computer systems and technologies. With. 149-151.
George Coulouris, Jean Dollimore, Tim Kindberg, “Distributed Systems Concepts and Design” 3rd edition, Addison-Wesley.
Blaze M. et al. The role of trust management in distributed security systems // Secure Internet Programming. Springer Berlin Heidelberg, 1999, pp. 185-210.
Babich A.V., Bersenev G.B. Algorithms for dynamic load balancing in a distributed active monitoring system // Izvestia of Tula State University. Technical science. 2011. No. 3. pp. 251-261.
Daryapurkar A., Deshmukh M. V. M. Efficient Load Balancing Algorithm in Cloud Environment // International Journal Of Computer Science And Applications. 2013. T. 6. No. 2. p. 308-312.
Distributed systems. Principles and paradigms E. Tanenbaum, M. Van Steen. St. Petersburg: Peter, 2003.
Tanenbaum A. S., Klint P., Bohm W. Guidelines for software portability // Software: Practice and Experience. 1978. T. 8. No. 6. pp. 681-698.
James D. Mooney. "Bringing Portability to the Software Process". Technical Report TR 97-1, Dept. of Statistics and Computer Science, West Virginia University, Morgantown WV, 1997.

Architecture of distributed systems and basic concepts of distributed data processing……………………………………………………………….2

Open systems concept……………………………………………………….12

Advantages of the open systems ideology……………………………17

Open systems and object-oriented approach……………19

Computer (information) networks…………………………………21

Global networks………………………………………………………..24

Local networks……………………………………………………………..27

Multiprocessor computers……………………………………..31

Interacting processes……………………………………………………..36

Architecture of distributed systems and basic concepts of distributed data processing

Distributed refers to information systems that are not located in one controlled territory or one facility.

Distributed information system (RIS) is any information system that allows you to organize the interaction of independent but interconnected computers. These systems are designed to automate such objects that are characterized by the territorial distribution of points of origin and consumption of information.

In general distributed information system (RIS) is a set of concentrated IP connected into a single system using communication subsystem

Focused IP can be:

individual computers, including personal computers,

computing systems and complexes,

local area networks (LAN).

Currently practically not used non-intelligent subscriber points that do not include a computer . Therefore, it is reasonable to assume that the smallest structural unit of RIS is the computer (Fig. 1).

Distributed information systems are built using network technologies and represent computer networks (CN).

The term "distributed system" refers to an interconnected collection of autonomous computers, processes or processors. Computers, processes or processors are referred to as nodes in a distributed system. Being defined as "standalone", nodes must at least be equipped with their own control unit. Thus, parallel computer single thread multi-data (SIMD) does not fall within the definition of a distributed system. To be defined as "interconnected", nodes must be able to exchange information.

Because processes can act as nodes in a system, the definition includes software systems built as a set of interacting processes, even if they run on the same hardware platform. In most cases, however, a distributed system will at least contain several processors connected by switching hardware.

The communication subsystem includes:

communication modules (CM);

channels of connection;

concentrators;

Internet gateways (bridges).

Main function communication modules is the transfer of the received packet to another CM or subscriber station in accordance with the transmission route. The communications module is also called a packet switching center.

Rice. 1. Fragment of a distributed information system

Channels of connection combine network elements into a single network; channels can have different data transfer rates.

Hubs are used to compress information before transmitting it over high-speed channels.

Internet gateways and bridges used to connect a network to a LAN or to connect segments of global networks. Bridges are used to connect network segments with the same network protocols.

In any RIS, in accordance with the functional purpose, three subsystems can be distinguished:

user subsystem;

control subsystem;

communication subsystem.

Custom or subscriber the subsystem includes information systems of users (subscribers) and is intended to meet the needs of users for storage, processing and receiving

Availability control subsystems allows you to combine all elements of the RIS into a single system in which the interaction of elements is carried out according to uniform rules. The subsystem ensures the interaction of system elements by collecting and analyzing service information and influencing the elements in order to create optimal conditions for the functioning of the entire network.

Communication subsystem ensures the transfer of information in the network for the benefit of users and RIS management.

The functioning of RIS can be considered as the interaction of remote processes through a communication subsystem.

Computer network processes are generated by users (subscribers) and other processes.

The interaction of remote processes is as follows:

file sharing,

forwarding messages by email,

sending applications to implement programs and obtain results,

accessing databases, etc.

Conceptually distributed image Data processing implies one or another type of organization of a communication network and decentralization ation of three categories of resources:

computing hardware and computing power itself;

databases;

system management.

In distributed information systems, the following basic functions are implemented to varying degrees:

Access to resources (computing power, programs, data, etc.) from terminals and from user programs in “file server” mode;

performing tasks and interactive communication between users and programs launched at their request in the “client-server” mode;

collecting statistics on the functioning of the system;

ensuring the reliability and survivability of the system as a whole.

Currently, various approaches are used to classify distributed information systems according to different criteria.

According to the degree of homogeneity, they are distinguished:

completely heterogeneous RIS;

partially heterogeneous RIS;

homogeneous fig.

Fully heterogeneous RIS are characterized by the fact that they combine computer, built on the basis various architectures and functioning P about managing different operating systems (OS) ).

Typically, RIS of this type as a communications service use global networks , based on X.25 protocols, Frame relay , ATM , Internet -technology.

Partially heterogeneous RIS build on the basis computers of the same type , working running different OS , or they include computersdifferent types running the same OS.

For example , IBMPC computers are controlled by different operating systems; MSDOS, OS/2, Windows 95, WindowsNT.

Homogeneous distributed systems are built on computers of the same type, equipped with the same operating systems.

According to architectural features there are:

RIS based systems teleprocessing ;

RIS based network technology .

Under network technology is understood as a form of computer interaction in which any of the processes of one of the machines, on its own initiative, can establish a logical connection with any process on any other computer .

Unlike such systems RICE based teleprocessing systems don't worry ensure complete, symmetrical and independent interaction of processes.

By degree of distribution From the user's perspective, RIS are divided into 2 groups:

regional and local.

Regional RIS include distributed configurations, ha characterized by the following main parameters :

Unlimited geographical distribution;

The presence of certain routing mechanisms;

Every two nodes are connected by their own channel, and there is no problem of its separation;

Wide range of transmission speeds - 10 3 ... 10 8 bit/s;

Arbitrary topology.

There are several ways to organize interaction. between computers:

circuit switching;

message switching;

packet switching;

frame switching - Framerelay;

cell switching - ATM-technology.

The basis of local RIS are local networks with the following characteristics: characteristics:

small geographical distribution;

the use of a unified communication environment and, consequently, the physical full connectivity of all network nodes, leading to the replacement of routing with addressing;

high and very high exchange rates - 10 7 ... 10 9 bit/s;

application special methods and algorithms for accessing a unified environment to ensure high speed transmission with simultaneous use of the medium by all nodes of the communication service;

limited possible topologies.

Under architecture RICE understand its relationship logical , physio cheskoe And program structures .

Logical structure RICE reflects composition of network services and communications between them (Fig. 2).

In this structure information and computing service designed to solve problems of network users.

Terminal Service ensures interaction of terminals with the network.

This service includes:

conversion of formats and codes,

management of different types of terminals,

processing procedures for exchanging information between terminals and the network, etc.

Trance tailor service designed to solve all problems related to transmission of messages over the network.

She drives:

routes,

streams and data,

decomposition of messages into packages and a number of other functions.

Interface service solves problems ensuring interactions between different types of computers, functioning running various operating systems , having different architectures, word lengths, data presentation formats, etc.

Besides, management service interfaces carries out the interaction of computers that are part of various networks.

Administrative service

manages the network,

implements reconfiguration and recovery procedures,

collects statistics on the functioning of the network,

carries out network testing.

The given complete composition of logical structure elements is not mandatory for all real systems.

So, in homogeneous networks there is no need for inter face service , V simplest networks may be missing administrative service etc.

Information and computing (IVS) And terminal services form subscriber service .

Interface And transport service form communication onny service.

It follows that administrative service does not directly perform any functions related to network user services, and can be considered as a mechanism for servicing itself networks .

Element distribution logical structure on various computers sets physical structure RICE (Fig. 3).

The elements of such a structure are computers connected to each other and to terminals.

Depending on the implementation of a particular network service in the computer would be in physical structure can be distinguished:

1 - main computers;

2 - communication computers;

3 - interface computers;

4 - terminal computers;

5 - administrative computers.

Several services can be implemented on one computer.

Program structure RIS reflects the composition of network components software (software) and connections between them .

It's obvious that compound network software is determined by the logical structure, that is, the functions performed by its services,

In the same time connections between software components are largely depend on the physical structure.

The complexity of the tasks performed by distributed information system network software requires that the network software be developed in a highly structured manner. Nowadays, network software is always organized as a collection of modules, each of which performs very specific functions and builds on the services offered by other modules. In network organizations, there is always a strict hierarchy between these modules, because each module exclusively uses the services offered by the previous module. The modules are named levels in the context of network implementation.

Network software It has multi-level hierarchical organization, which caught two factors:

the need to minimize costs for modifying network software when changing the composition of the equipment used;

Any changes made on the network should not affect user programs that use network capabilities.

For a hierarchical organization it is necessary clear description interfaces And protocols, i.e. rules of interaction:

programs executed on the same computer and located at different levels,

and programs located at the same level, but located in different computers.

The desire to create a unified, universal and open to changes in logical and physical structures led to standard ization of software hierarchy levels of computer networks.

Distributed information systems

A distributed information system is a set of databases that are remotely located from each other and have a number of general parameters. They operate according to general rules that are defined centrally simultaneously for all databases included in the information system. Information is exchanged according to rules that are also centrally determined.

The organization of a distributed information system is necessary for enterprises engaged in various types activities, if there is a need to solve such problems as the need to quickly obtain information from the database of remotely located units. Also, the need to implement such a system may arise if there is a need for consolidation in common base data information contained in databases legal entities, which are part of the enterprise structure. This is carried out for the purpose of further data analysis and generation of reports from one database, both for the enterprise as a whole and separately for each legal entity.

Such an information system is implemented when it is necessary to introduce centralized changes in the structure and configuration of the database operating rules for the functioning of all remote departments and legal entities. At the same time, the ability to change certain rules directly from remote units may be prohibited.

Also, the implementation of a distributed information system is carried out if it is necessary to ensure control over changes in data in remotely located departments of the organization.

The procedure for organizing a distributed information system consists of two stages. At the first stage, preparatory work is carried out: the structures of the information system, the rules for migrating information between databases that are part of a distributed information system, as well as the rules for limiting changes in such databases are determined.

The second stage includes the process of preparing a distributed information system. At this stage, the selection of optimally suitable software is made, with the help of which a distributed information base will be organized, working according to the rules described as a result of the implementation preparatory work. Also at this stage, the selected software is configured in order to organize and effectively manage distributed information systems.

As an example, let's consider a corporate information system - the Regional Distributed Education Information System (RDIS).

Tasks of RRISO (Fig. 5.1):

1. Maintaining a centralized database to ensure system management.
2. Integration of heterogeneous databases of pedagogical and management information.
3. Providing a unified user interface and generating standard documents.
4. Creation of a centralized electronic library and support for the work of students and teachers with peripheral electronic libraries.
5. Support for distance learning and independent testing.
6. Sharing computing resources and equipment.
7. Automatic exchange of electronic information between educational institutions, automation of the processes of creating, processing and storing information.
8. Protection of information posted in RRISON and copyright of database developers, electronic educational materials and applications.
9. Support for group work in the preparation of electronic educational materials, training, and scientific research.
10. Integration with similar information systems of foreign and domestic computer networks.

Rice. 5.1.

An object automation (Fig. 5.2) has a geographically distributed structure. It consists of the regional education department, municipal education authorities, district education authorities, and educational institutions. All of them are dispersed over a large area of the region. They interact with the administrations of the region, cities, districts, with students and their parents, and the public.

Rice. 5.2.

The purpose of the information system is monitoring in the field of education (Fig. 5.3).

Rice. 5.3.

The regional distributed information system has a hierarchical organization (Fig. 5.4).

Hierarchical system structure is due to the presence of several levels of education management: regional level (Department of Education and structural divisions administration of the region), the level of large municipalities (education authorities, divisions administration of the city - regional center), the level of regional and urban districts, the level of individual educational institutions various types and types, other departments, institutions and organizations providing social services, protection of the rights of children and adolescents.

Automation information exchange ensures consistency of data used at various levels of the information system, improves them reliability.

Rice. 5.4.

Interaction between educational authorities and educational institutions that exist between them information flows, are determined by the regulations of the regional education department. The IS must have an architecture that matches the structure of the automation object. The system being developed must include subsystems belonging to several hierarchy levels:

· Level of educational institutions. The components of this level differ in the set of functions they implement, depending on the type of educational institution. The main purpose of these components in this system is to collect primary information about the activities of an educational institution and generate reports (information about the activities of specific educational institutions of various types) for education management bodies and state statistics bodies, as well as maintaining the management functions of an educational institution, organization educational process in him. The need to combine these functions in one application is dictated by the requirement to minimize manual processing of information, its re-entry and duplication, which is a source of errors in the operation of information systems.
· Level of municipal educational authorities of the region's districts. The main purpose of these subsystems is to obtain primary information from educational institutions, its integration and transfer to a higher level, the generation of reports (information on the activities of educational institutions of the district, city) for higher education authorities and state statistics bodies, as well as maintaining the functions of management of educational institutions of the relevant territories.
· Level of the Department of Education of the region. The main purpose of the components of this level is to analyze information received from the subsystems of lower levels, maintain education management functions, generate state statistical reporting and maintain a comprehensive monitoring system in the field of education.

Subsystems of each level ensure the maintenance of primary information and documentation support for the activities of educational institutions and education management bodies, the generation of primary and summary reports, information exchange with other subsystems, and information protection.

Rice. 5.5.

Architecture IS corresponds to the multi-level structure of the region's education system. The system includes subsystems of several levels (Fig. 5.5):

· Information systems of educational institutions of various types and types.
· Information systems of municipal (territorial, district) education authorities.
· Information system of educational authorities at the regional level.

A regional-scale system must support the possibility of distributed storage and distributed data processing.

Each subsystem works with its own local base data, but with a single model. Data is fragmented. To implement the possibility of data transfer between DB subsystems used component data replication.

All changes made to data model if it is necessary to expand it, adjustments to new information needs are transferred to those subsystems whose work is affected by the updates.

Integration subsystems are implemented based on BizTalk technology Server.

Software technology platform - Microsoft. NET.

Rice. 5.6.

Software The IS (Fig. 5.6) is flexibly configured during installation: it is configured to perform the functions of a subsystem of the appropriate level, to work in educational institutions of various types and types, various conditions operation.

Users have the opportunity to carry out search and selection of documents, their viewing (through document management components).

The system supports automation functions for standard operations, office work and document flow (through business process management components). Changes in DB are entered only through the execution of appropriate operations, during which the primary data is changed and documents are created.

Performing operations and working with documents are carried out in accordance with the rights of users determined by their belonging to a certain category and job responsibilities.

Distributed information systems and networks. Afanasyev Oleg Alexandrovich

Architecture of distributed information systems and Web applications

annotation

Introduction

The concept of a distributed system

Basic requirements for distributed systems

Problems of operating distributed systems

Conclusion

LITERATURE

Architecture of distributed systems and basic concepts of distributed data processing

Popular articles

Latest articles

Sections

Pages

Special projects

Contacts