Data storage equipment. Building a budget data storage system

Twitter

Submit a solution question We answer on weekdays
In one hour

Andrey Olovyannikov,a.olovjannikov@site

Let's agree….

The purpose of this article is not to study in detail various data storage systems (SDS). We will not analyze all kinds of interfaces - software and hardware - that are used to create different methods of data storage. We will not consider the “bottlenecks” of certain types of storage systems organization. Here you will not see a detailed discussion of the iSCSI protocols and their implementation in the form of FC (Fibre Channel), SCSI, etc.

Our task is much more modest - simply “Agree on terminology” with our potential buyer. So, before starting a discussion of any problem, two physicists come to an agreement about what process or phenomenon they will designate in certain words. This is necessary in order to save each other’s time and nerve cells, and to conduct the conversation more productively and to mutual pleasure.

Storage system or... storage system?

Let's start, as they say, from the beginning.

By storage systems we will still understand Data Storage Systems as a set of software and hardware that serves as a reliable, fastest and simplest way to store and access data for organizations of different levels, both financial and structural. We would like to immediately draw your attention to the fact that different companies have different needs for storing information in one form or another and different financial capabilities for their implementation. But in any case, we would like to note that no matter how much money or specialists of one level or another are at the buyer’s disposal, we insist that all their needs fit into our definition of storage systems - be it a regular set of large-capacity disks, or a complex multi-level PCS structure (Parallels Cloud Storage). This definition, in our opinion, includes another widely used abbreviation translated into English - storage as a Storage Area Network - SAN. We will illustrate SAN a little below when we talk about typical methods of implementing storage systems.

The most typical and understandable way to implement storage systems is DAS - Direct Attached Storages - drives that connect directly to the computer that controls the operation of these drives.

The simplest example of DAS is a regular computer with a hard drive or DVD (CD) drive with data installed in it. A more complex example (see figure) is an external storage device (external hard drive, disk shelf, tape drive, etc.), which communicates directly with the computer through one or another protocol and interface (SCSI, eSATA, FC, etc.). d.). We offer disk shelves or Data Storage Servers (another abbreviation for storage) as DAS storage devices.

A data storage server in this case means a computer with its own processor, OS and sufficient memory to process large amounts of data stored on numerous disks inside the server.

It should be noted that with this implementation of the storage system, only the computer with the DAS directly sees the data; all other users have access to the data only “with the permission” of this computer.

You can see basic DAS storage configurations in

Storage systemsNAS

Another fairly simple implementation of storage systems is NAS (Network Attached Storage).

As it becomes clear, access to data is carried out through network protocols, as a rule, through the familiar computer local network (although more complex access to data stored on network resources has now become widespread). The most understandable and simple example of a NAS storage system is a household storage for music and movies, which can be accessed by several home network users at once.

NAS stores data in the form of a file system and, accordingly, provides access to resources through network file protocols (NFS, SMB, AFP...).

For a simple example of implementing a NAS storage system, see Fig. 2.

We would like to note right away that NAS, in principle, can be considered any intelligent device that has its own processor, memory and sufficiently fast network interfaces for transmitting data over the network to different users. Also, special attention should be paid to the speed of the disk subsystem. You can see the most typical NAS device configurations in

Storage Area Network is one of the ways to implement a storage system as a Data Storage System - see above.

This is a software-hardware, as well as an architectural solution for connecting various data storage devices in such a way that the operating system “sees” these devices as local. This is achieved by connecting these devices to their respective servers. The devices themselves can be different - disk arrays, tape libraries, optical storage arrays.

With the development of data storage technologies, the difference between SAN and NAS systems has become very arbitrary. Conventionally, they can be distinguished by the method of data storage: SAN - block devices, NAS - data file system.

The protocols for implementing SAN systems can be different - Fiber Channel, iSCSI, AoE.

One of the architectural ways to implement a SAN is shown in Fig. 3.

Typical examples of SAN storage systems can be found in

In conclusion, we hope that we were able to “agree on terminology” with you and all that remains is to discuss options for creating a storage system for your business and select solutions that suit you in terms of reliability, simplicity and budget.

What is the purpose of data storage systems (DSS)?

Data storage systems are designed for secure and fault-tolerant storage of processed data with the ability to quickly restore access to data in the event of a system failure.

What are the main types of storage systems?

Based on the type of implementation, storage systems are divided into hardware and software. According to the area of application, storage systems are divided into individual, for small workgroups, for workgroups, for enterprises, and corporate. Based on the type of connection, storage systems are divided into:

1. DAS (Direct Attached Storage - directly connected systems)

A feature of this type of system is that control over access to data for devices connected to the network is carried out by the server or workstation to which the storage is connected.

2. NAS (Network Attached Storage - systems connected to a LAN)

In this type of system, access to information stored in the repository is controlled by software that runs in the repository itself.

3. SAN (Storage Attached Network — systems that are a network between servers that process data and, in fact, storage systems);

With this method of building a data storage system, access to information is controlled by software running on storage servers. Through SAN switches, storage is connected to servers using high-performance access protocols (Fibre channel, iSCSI, ATA over ethernet, etc.)

What are the features of software and hardware implementation of storage systems?

The hardware implementation of a storage system is a single hardware complex consisting of a storage device (which is a disk or array of disks on which data is physically stored) and a management device (a controller that distributes data between storage elements).

The software implementation of a storage system is a distributed system in which data is stored without reference to any specific storage or server, and access to the data is carried out through specialized software that is responsible for the safety and security of the stored data).

We are starting a new section called “Educational Education”. Things that seem to be well known to everyone will be described here, but, as it often turns out, not to everyone, and not so well. We hope that the section will be useful.

So, issue No. 1 – “Data storage systems”.

Data storage systems.

In English they are called in one word – storage, which is very convenient. But this word is translated into Russian rather clumsily - “storage”. Often in the slang of “IT people” they use the word “storage” in Russian transcription, or the word “kranilka”, but this is completely bad manners. Therefore, we will use the term “data storage systems,” abbreviated as storage systems, or simply “storage systems.”

Data storage devices include any device for recording data: the so-called. “flash drives”, compact discs (CD, DVD, ZIP), tape drives (Tape), hard drives (Hard disk, they are also called “hard drives” in the old fashioned way, since their first models resembled the cartridge clip of the 19th century rifle of the same name) and etc. Hard drives are used not only inside computers, but also as external USB storage devices, and even, for example, one of the first iPod models is a small hard drive with a diameter of 1.8 inches, with a headphone output and a built-in screen .

Recently, the so-called “solid-state” storage systems SSD (Solid State Disk, or Solid State Drive), which are similar in principle to a “flash drive” for a camera or smartphone, only have a controller and a larger volume of stored data. Unlike a hard drive, an SSD has no mechanically moving parts. While prices for such storage systems are quite high, they are rapidly falling.

All of these are consumer devices, and among industrial systems, one should highlight, first of all, hardware storage systems: hard drive arrays, the so-called. RAID controllers for them, tape storage systems for long-term data storage. In addition, a separate class: controllers for storage systems, for managing data backup, creating “snapshots” in the storage system for their subsequent restoration, data replication, etc.). Storage systems also include network devices (HBAs, Fiber Channel Switches, FC/SAS cables, etc.). And finally, large-scale solutions for data storage, archiving, data recovery and disaster recovery have been developed.

Where does the data that needs to be stored come from? From us, beloved users, from application programs, e-mail, as well as from various equipment - file servers and database servers. In addition, the provider of a large amount of data – the so-called. M2M (Machine-to-Machine communication) devices – various types of sensors, sensors, cameras, etc.

Based on the frequency of use of stored data, storage systems can be divided into short-term storage systems (online storage), medium-term storage systems (near-line storage) and long-term storage systems (offline storage).

The first includes the hard drive (or SSD) of any personal computer. The second and third are external storage systems DAS (Direct Attached Storage), which can be an array of disks external to the computer (Disk Array). They, in turn, can also be divided into “just a bunch of disks” JBOD (Just a Bunch Of Disks) and an array with an iDAS (intelligent disk array storage) management controller.

External storage systems come in three types: DAS (Direct Attached Storage), SAN (Storage Area Network) and NAS (Network attached Storage). Unfortunately, even many experienced IT specialists cannot explain the difference between SAN and NAS, saying that once there was this difference, but now it supposedly no longer exists. In fact, there is a difference, and a significant one (see Fig. 1).

Figure 1. Difference between SAN and NAS.

In a SAN, the servers themselves are actually connected to the storage system through the SAN storage area network. In the case of NAS, network servers are connected via a local LAN to a shared file system in RAID.

Basic storage connection protocols

SCSI protocol(Small Computer System Interface), pronounced "skázy", is a protocol developed in the mid-1980s for connecting external devices to minicomputers. Its version, SCSI-3, is the basis for all storage system communication protocols and uses the common SCSI command set. Its main advantages: independence from the server used, the possibility of parallel operation of several devices, high data transfer speed. Disadvantages: limited number of connected devices, connection range is very limited.

FC protocol(Fiber Channel), internal protocol between the server and shared storage system, controller, disks. It is a widely used serial communication protocol operating at speeds of 4 or 8 Gigabits per second (Gbps). It, as its name implies, works through fiber optics, but it can also work over copper. Fiber Channel is the primary protocol for FC SAN storage systems.

iSCSI protocol(Internet Small Computer System Interface), a standard protocol for transferring data blocks over the well-known TCP/IP protocol i.e. "SCSI over IP". iSCSI can be considered a high-speed, low-cost solution for storage systems connected remotely over the Internet. iSCSI encapsulates SCSI commands in TCP/IP packets for transmission over an IP network.

SAS protocol(Serial Attached SCSI). SAS uses serial data transfer and is compatible with SATA hard drives. Currently, SAS can transfer data at 3 Gbit/s or 6 Gbit/s, and supports full duplex mode, i.e. can transmit data in both directions at the same speed.

Types of storage systems.

Three main types of storage systems can be distinguished:

DAS (Direct Attached Storage)
NAS (Network attached Storage)
SAN (Storage Area Network)

Storage systems with direct connection of DAS drives were developed back in the late 70s, due to the explosive increase in user data, which simply did not physically fit in the internal long-term memory of computers (for young people, let’s note that here we are not talking about personal computers, they were then still there weren’t, but large computers, so-called mainframes). The data transfer speed in DAS was not very high, from 20 to 80 Mbit/s, but for the needs of that time it was quite enough.

Figure 2. DAS

Storage systems with network connection NAS appeared in the early 90s. The reason was the rapid development of networks and the critical requirements for sharing large amounts of data within an enterprise or operator network. The NAS used a special network file system, CIFS (Windows) or NFS (Linux), so different servers of different users could read the same file from the NAS at the same time. The data transfer speed was already higher: 1 – 10 Gbit/s.

Figure 3. NAS

In the mid-90s, networks for connecting FC SAN storage devices appeared. Their development was driven by the need to organize data scattered across the network. One storage device in a SAN can be divided into several small nodes called LUN (Logical Unit Number), each of which belongs to one server. Data transfer speed has increased to 2-8 Gbit/s. Such storage systems could provide data protection technologies against loss (snapshot, backup).

Figure 4. FC SAN

Another type of SAN is IP SAN (IP Storage Area Network), developed in the early 2000s. FC SANs were expensive, difficult to manage, and IP networks were at their peak, so this standard came into being. The storage systems were connected to the servers using an iSCSI controller via IP switches and provided data transfer rates of 1–10 Gbit/s.

Fig.5. IP SAN.

The table below shows some comparative characteristics of all storage systems reviewed:

Type	NAS	SAN
Type	NAS
Parameter		FC SAN	*IP SAN*	DAS
Transmission type	SCSI, FC, SAS	F.C.	IP	IP
Data type	Data block	File	Data block	Data block
Typical application	Any	File server	Database	CCTV
Advantage	Excellent Compatibility	Easy to install, low cost	Good scalability	Good scalability
Flaws	Difficulty in control. Inefficient use of resources. Poor scalability	Poor performance. Limitations in Applicability	High price. Complexity of scaling configuration	Low performance

Briefly, SANs are designed to transfer massive blocks of data to storage systems, while NASs provide file-level data access. With the combination of SAN + NAS, you can achieve a high degree of data integration, high-performance and file sharing. Such systems are called unified storage.

Unified storage systems: network storage architecture that supports both a file-oriented NAS system and a block-oriented SAN system. Such systems were developed in the early 2000s to solve the problems of administration and high total cost of ownership of separate systems in one enterprise. This storage system supports almost all protocols: FC, iSCSI, FCoE, NFS, CIFS.

Hard disks

All hard drives can be divided into two main types: HDD (Hard Disk Drive, which, in fact, is translated as “hard drive”) and SSD (Solid State Drive, the so-called “solid-state drive”). That is, both disks are hard. What then is a “soft disk”, do they even exist? Yes, in the past there were, called “floppy disks” (they were so called because of the characteristic “popping” sound in the disk drive when operating). Drives for them can still be seen in the system units of old computers that have been preserved in some government agencies. However, with all the desire, such magnetic disks can hardly be classified as storage SYSTEMS. These were some analogues of the current “flash drives”, albeit with a very small capacity.

The difference between an HDD and an SSD is that an HDD has several coaxial magnetic disks inside and complex mechanics that move the magnetic read-write heads, while an SSD has no mechanically moving parts at all, and is, in fact, a microcircuit pressed into plastic. Therefore, strictly speaking, calling only HDDs “hard drives” is incorrect.

Hard drives can be classified according to the following parameters:

Design: HDD, SSD;
HDD diameter in inches: 3.5, 2.5, 1.8 inches;
Interface: ATA/IDE, SATA/NL SAS, SCSI, SAS, FC
Class of use: individual (desktop class), corporate (enterprise class).


Parameter	SATA	SAS	NL-SAS	SSD
Rotation speed (RPM)	7200	15000/10000	7200	N.A.
Typical Capacity (TB)	1T/2T/3T	0.3T/0.6T/0.9T	2T/3T/4T	0.1T/0.2T/0.4T
MTBF (hour)	1 200 000	1 600 000	1 200 000	2 000 000
Notes	Development of ATA hard drives with serial data transfer. SATA 2.0 supports transfer speeds of 300MB/s, SATA3.0 supports up to 600MB/s. The average annual AFR (Annualized Failure Rate) for SATA drives is about 2%.	SATA hard drives with a SAS interface are suitable for tiering. The average annual AFR (Annualized Failure Rate) for NL-SAS drives is about 2%.		Solid-state drives made of electronic memory chips, including a control device and a chip (FLASH/DRAM). The interface specification, functions and method of use are the same as HDD, size and shape are the same.

Characteristics of hard drives.

Capacity

Modern hard drives measure capacity in gigabytes or terabytes. For HDD, this value is a multiple of the capacity of one magnetic disk inside the box, multiplied by the number of magnetic disks, of which there are usually several.

Rotation speed (HDD only)

The rotation speed of the magnetic disks inside the drive, measured in revolutions per minute RPM (Rotation Per Minute), is usually 5400 RPM or 7200 RPM. HDDs with SCSI/SAS interfaces have a rotation speed of 10000－15000 RPM.

Average access time = Mean seek time + Mean wait time, i.e. time to retrieve information from the disk.
Data transfer rate

These are the speeds at which a hard drive can read and write data, measured in megabytes per second (MB/S).

IOPS (Input/Output Per Second)

The number of I/O operations (or read/write) per second (Input/Output Operations Per Second), one of the main indicators for measuring disk performance. For applications with frequent read and write operations, such as OLTP (Online Transaction Processing), IOPS is the most important indicator, because the performance of a business application depends on it. Another important indicator is data throughput, which can be roughly translated as “data transmission throughput,” which shows how much data can be transferred per unit of time.

RAID

No matter how reliable hard drives are, data in them is sometimes lost for various reasons. Therefore, RAID technology (Redundant Array of Independent Disks) was proposed - an array of independent disks with redundant data storage. Redundancy means that all bytes of data written to one disk are duplicated on another disk, and can be used if the first disk fails. In addition, this technology helps increase IOPS.

The basic concepts of RAID are stripping and mirroring of data. Their combinations determine different types of RAID arrays of hard drives.

The following levels of RAID arrays are distinguished:

Combinations of these types give rise to several more new types of RAID:

The figure explains the principle of RAID 0 (partitioning):

Rice. 6. RAID 0.

And this is how RAID 1 (duplication) is performed:

Rice. 7. RAID 1.

And this is how RAID 3 works. XOR is a logical function “exclusive OR” (eXclusive OR). Using it, the parity value is calculated for data blocks A, B, C, D..., which is written to a separate disk.

Rice. 8. RAID 3.

The above diagrams well illustrate the principle of RAID operation and do not need any comments. We will not provide operating diagrams for the remaining RAID levels; those interested can find them on the Internet.

The main characteristics of RAID types are shown in the table.

Storage Software

Storage software can be divided into the following categories:

Management and administration: management and setting of infrastructure parameters: ventilation, cooling, disk operating modes, etc., control by time of day, etc.
Data protection: Snapshot (“snapshot” of the disk state), copying LUN contents, split mirror, remote data duplication (Remote Replication), CDP (Continuous Data Protection), etc.
Increased reliability: various software for multiple copying and backup of data transmission routes within and between data centers.
Increased efficiency: Thin Provisioning technology, automatic tiered storage, deduplication, quality of service management, cache prefetch, partitioning, automatic data migration , reducing disk rotation speed (disk spin down)

The technology is very interesting " thin provisioning" As is often the case in IT, terms are often difficult to adequately translate into Russian, for example, it is difficult to accurately translate the word “provisioning” (“provision”, “support”, “provision” - none of these terms convey the meaning completely). And when it is “thin”...

To illustrate the principle of “thin provisioning”, we can cite a bank loan. When a bank issues ten thousand credit cards with a limit of 500 thousand, it does not need to have 5 billion in its account to service this volume of loans. Credit card users usually do not spend all of their credit at once, and use only a small part of it. However, each individual user can use the entire or almost the entire loan amount if the total amount of bank funds is not exhausted.

Rice. 9. Thin provisioning.

Thus, the use of thin provisioning allows us to solve the problem of inefficient allocation of space in the SAN, save space, facilitate administrative procedures for allocating space to applications on storage, and use the so-called oversubscribing, that is, allocate more space to applications than we physically have, in the hope that that applications will not require all the space at the same time. As the need for it arises, it is later possible to increase the physical storage capacity.

The division of a storage system into tiers (tiered storage) assumes that various data are stored in storage devices whose speed corresponds to the frequency of access to this data. For example, frequently used data can be placed in “online storage” on SSD drives with high access speeds and high performance. However, the price of such disks is still high, so it is advisable to use them only for online storage (for now).

The speed of FC/SAS drives is also quite high, and the price is reasonable. Therefore, such disks are well suited for “near-line storage”, where data is stored that is not accessed so often, but at the same time not so infrequently.

Finally, SATA/NL-SAS drives have relatively low access speeds, but they have high capacity and are relatively cheap. Therefore, they are usually used for offline storage, for rarely used data.

As soon as the management system notices that access to data in offline storage has become more frequent, it transfers it to near-line storage, and with further intensification of its use, to online storage on SSD drives.

Deduplication (elimination of repetitions) of data(deduplication, DEDUP). As the name suggests, deduplication eliminates duplicate data from disk space typically used for data backup. Although the system is unable to determine which information is redundant, it can detect the presence of duplicate data. Due to this, it becomes possible to significantly reduce the capacity requirements of the redundancy system.

Disk spin-down) – what is usually called “hibernation” (falling asleep) of the disk. If the data on a particular disk is not used for a long time, then Disk spin-down puts it into hibernation mode to reduce the energy consumption of wasting the disk spinning at normal speed. This also increases the service life of the disk and increases the reliability of the system as a whole. When a new request arrives for data on this disk, it “wakes up” and its rotation speed increases to normal. The trade-off for saving energy and increasing reliability is some delay when first accessing data on the disk, but this cost is well justified.

"Snapshot" of the disk state (Snapshot). A Snapshot is a fully usable copy of a specific set of data on disk at the time the copy was taken (hence why it is called a “snapshot”). Such a copy is used to partially restore the system state at the time of copying. In this case, the continuity of the system is not affected at all, and performance does not deteriorate.

Remote data replication: Works using Mirroring technology. Can maintain multiple copies of data across two or more sites to prevent data loss in the event of natural disasters. There are two types of replication: synchronous and asynchronous, the difference between them is explained in the figure.

Rice. 10. Remote data replication (Remote Replication).

Continuous data protection CDP (Continuous data protection), also known as continuous backup or real-time backup, is the creation of a backup copy automatically whenever data changes. At the same time, it becomes possible to restore data in case of any accidents at any time, and a current copy of the data is available, and not those that were a few minutes or hours ago.

Management and administration programs (Management Software): this includes a variety of software for managing and administering various devices: simple configuration programs (cofiguration wizards), centralized monitoring programs: topology display, real-time monitoring, mechanisms for generating failure reports. This also includes Business Guarantee programs: multi-dimensional performance statistics, performance reports and queries, etc.

Disaster Recovery (DR, Disaster Recovery). This is a fairly important component of serious industrial storage systems, although quite expensive. But these costs must be borne so as not to lose overnight “what was acquired through back-breaking labor.” The data protection systems discussed above (Snapshot, Remote Replication, CDP) are good until a natural disaster occurs in the locality where the storage system is located: a tsunami, flood, earthquake or (pah-pah-pah) nuclear war. And any war can also greatly spoil the lives of people who are doing useful things, for example, storing data, and not running around with a machine gun in order to chop off other people’s territories or punish some “infidels.” Remote replication implies that the replicating storage system is located in the same city, or at least nearby. Which, for example, does not help in case of a tsunami.

Disaster Recovery technology assumes that the backup center used to restore data during natural disasters is located at a considerable distance from the main data center and interacts with it via a data transmission network superimposed on a transport network, most often an optical one. Using CDP technology, for example, with such an arrangement of the main and backup data centers, for example, will simply be technically impossible.

DR technology uses three fundamental concepts:

BW (Backup Window)– “reservation window”, the time required for the backup system to copy the received amount of data from the working system.
RPO (Recovery Point Objective)– “Acceptable recovery point”, the maximum period of time and the corresponding amount of data that is acceptable for a storage system user to lose.
RTO (Recovery Time Objective)– “tolerable unavailability time”, the maximum time during which the storage system can be unavailable without a critical impact on the main business.

Rice. 11. Three fundamental concepts of DR technology.

* * *

This essay does not pretend to be complete and only explains the basic principles of the storage system, although not in full. Various sources on the Internet contain many documents that describe in more detail all the points stated (and not stated) here.

Continuation of the storage topic about object storage systems -.

As you know, recently there has been an intensive increase in the volume of accumulated information and data. IDC's Digital Universe study showed that the global volume of digital information could increase from 4.4 zettebytes to 44 zettebytes by 2020. According to experts, the volume of digital information doubles every two years. Therefore, today the problem of not only information processing, but also its storage is extremely relevant.

To resolve this issue, there is currently a very active development of such an area as the development of storage systems (networks/data storage systems). Let's try to figure out what exactly the modern IT industry means by the concept of “data storage system”.

Storage system is a software and hardware integrated solution aimed at organizing reliable and high-quality storage of various information resources, as well as providing uninterrupted access to these resources.

The creation of such a complex should help in solving a variety of problems that modern business faces in the course of building an integral information system.

Main storage components:

Storage devices (tape library, internal or external disk array);

Monitoring and control system;

Data backup/archiving subsystem;

Storage management software;

Infrastructure for access to all storage devices.

Main goals

Let's consider the most typical tasks:

Decentralization of information. Some organizations have a developed branch structure. Each individual division of such an organization must have free access to all the information it needs for its work. Modern storage systems interact with users who are located at a great distance from the center where data processing is performed, and therefore are able to solve this problem.

Failure to anticipate the finite resources required. During project planning, it can be extremely difficult to determine exactly how much information will have to be worked with during system operation. In addition, the amount of accumulated data is constantly increasing. Most modern storage systems support scalability (the ability to increase its performance after adding resources), so the power of the system can be increased in proportion to the increase in loads (upgrade).

Security of all stored information. It can be quite difficult to control and limit access to an enterprise’s information resources. Unskilled actions of service personnel and users, deliberate attempts at sabotage - all this can cause significant harm to stored data. Modern storage systems use various fault tolerance schemes that make it possible to withstand both deliberate sabotage and inept actions of unqualified employees, thereby maintaining the functionality of the system.

The complexity of managing distributed information flows - any action aimed at changing distributed information data in one of the branches inevitably creates a number of problems - from the difficulty of synchronizing different databases and versions of developer files to unnecessary duplication of information. Management software products supplied with storage systems will help you optimally simplify and effectively optimize your work with stored information.

High costs. As shown by the results of a study conducted by IDC Perspectives, data storage costs account for about twenty-three percent of all IT costs. These costs include the cost of software and hardware parts of the complex, payments to maintenance personnel, etc. Using storage systems allows you to save on system administration and also reduces personnel costs.

Main types of storage systems

All data storage systems are divided into 2 types: tape and disk storage systems. Each of the two above-mentioned types is divided, in turn, into several subspecies.

Disk storage systems

Such data storage systems are used to create backup intermediate copies, as well as to quickly work with various data.

Disk storage systems are divided into the following subtypes:

Devices for backups (various disk libraries);

Devices for work data (equipment characterized by high performance);

Devices used for long-term storage of archives.

Tape storage systems

Used to create archives and backups.

Tape storage systems are divided into the following subtypes:

Tape libraries (two or more drives, a large number of slots for tapes);

Autoloaders (1 drive, several slots intended for tapes);

Separate drives.

Main connection interfaces

Above we looked at the main types of systems, and now let's take a closer look at the structure of the storage systems themselves. Modern storage systems are divided according to the type of host connection interfaces they use. Let's consider below the 2 most common external connection interfaces - SCSI and FibreChannel. The SCSI interface resembles the widely used IDE and is a parallel interface that allows up to sixteen devices to be placed on one bus (for IDE, as is known, two devices per channel). The maximum speed of the SCSI protocol today is 320 megabytes per second (a version that will provide a speed of 640 megabytes per second is currently in development). The disadvantages of SCSI are the following: inconvenient, lacking noise immunity, too thick cables, the maximum length of which does not exceed twenty-five meters. The SCSI protocol itself also imposes certain limitations - as a rule, it is 1 initiator on the bus plus slave devices (streamers, disks, etc.).

The FibreChannel interface is used less frequently than the SCSI interface because the hardware used for this interface is more expensive. In addition, FibreChannel is used to deploy large SAN storage networks, so it is used only in large companies. Distances can be practically anything - from the standard three hundred meters on standard equipment to two thousand kilometers for powerful switches (“directors”). The main advantage of the FibreChannel interface is the ability to combine many storage devices and hosts (servers) into a common SAN storage network. Less important advantages are: longer distances than with SCSI, the possibility of channel aggregation and redundant access paths, the ability to “hot plug” equipment, and higher noise immunity. Two-core single- and multimode optical cables are used (with SC or LC type connectors), as well as SFP - optical transmitters made on the basis of laser or LED emitters (the maximum distance between the devices used, as well as the transmission speed, depends on these components).

Storage topology options

Traditionally, storage systems are used to connect servers to a DAS – data storage system. In addition to DAS, there are also NAS - data storage devices that connect to the network, as well as SAN - components of storage networks. SAN and NAS systems were created as an alternative to the DAS architecture. However, each of the above solutions was developed as a response to the ever-increasing demands on modern storage systems and was based on the use of technologies available at that time.

The first network storage system architectures were developed in the 1990s to address the most significant shortcomings of DAS systems. Network storage solutions were designed to achieve the above objectives: reducing the cost and complexity of data management, reducing local network traffic, increasing overall performance and data availability. At the same time, SAN and NAS architectures solve different aspects of one common problem. As a result, two network architectures began to exist simultaneously. Each of them has its own functionality and advantages.

DAS

(D irect A ttached S torage) is an architectural solution used in cases where a device used for storing digital data is connected via the SAS protocol via an interface directly to a server or workstation.

The main advantages of DAS systems: low cost compared to other storage solutions, ease of deployment and administration, high-speed data exchange between the server and storage system.

The above advantages have allowed DAS systems to become extremely popular in the segment of small corporate networks, hosting providers and small offices. But at the same time, DAS systems also have their drawbacks, for example, not optimal utilization of resources, explained by the fact that each DAS system requires the connection of a dedicated server; in addition, each such system allows you to connect no more than two servers to a disk shelf in a certain configurations.

Advantages:

Affordable price. A storage system is essentially a disk cage installed outside the server, equipped with hard drives.

Ensuring high-speed exchange between the server and the disk array.

Flaws:

Insufficient reliability - in the event of an accident or any problems arising in the network, the servers cease to be available to a number of users.

High latency resulting from the fact that all requests are processed by one server.

Low manageability – making the entire capacity available to one server reduces the flexibility of data distribution.

Low resource utilization - Data volumes required are difficult to predict; some DAS devices in an organization may have excess capacity, while others may not have enough capacity, since reallocating capacity is usually too labor-intensive or impossible.

NAS

(N etwork A ttached S torage) is an integrated stand-alone disk system that includes a NAS server with its own specialized operating system and a set of user-friendly functions that ensure quick system startup, as well as access to any files. The system is connected to an ordinary computer network, allowing users of this network to solve the problem of lack of free disk space.

NAS is a storage device that connects to the network like a regular network device, providing file access to digital data. Any NAS device is a combination of a storage system and the server to which the system is connected. The simplest version of a NAS device is a network server that provides file resources.

NAS devices consist of a head unit that processes data and also connects a chain of disks into a single network. NAS provide the use of data storage systems over Ethernet networks. Shared access to files is organized using the TCP/IP protocol. Such devices allow file sharing even among clients whose systems are running different operating systems. Unlike DAS architecture, in NAS systems the server does not need to be taken offline to increase the overall capacity; You can add disks to the NAS structure by simply connecting the device to the network.

NAS technology is developing today as an alternative to universal servers that carry a large number of different functions (email, fax server, applications, printing, etc.). NAS devices, unlike universal servers, perform only one function - a file server, trying to do this as quickly, simply and efficiently as possible.

Connecting the NAS to a LAN provides access to digital information to an unlimited number of heterogeneous clients (that is, clients with different operating systems) or other servers. Today, almost all NAS devices are used on Ethernet networks based on TCP/IP protocols. Access to NAS devices is carried out using special access protocols. The most common file access protocols are DAFS, NFS, CIFS. Specialized operating systems are installed inside such servers.

A NAS device can look like an ordinary “box” equipped with one Ethernet port and a pair of hard drives, or it can be a huge system equipped with several specialized servers, a huge number of disks, and external Ethernet ports. Sometimes NAS devices are part of a SAN network. In this case, they do not have their own storage devices, but only provide access to the data that is located on block devices. In this case, NAS acts as a powerful specialized server, and SAN acts as a data storage device. In this case, a single DAS topology is formed from SAN and NAS components.

Advantages

Low cost, availability of resources for individual servers, as well as for any computer in the organization.

Versatility (one server can serve Unix, Novell, MS, Mac clients).

Ease of deployment as well as administration.

Ease of resource sharing.

Flaws

Accessing information through network file system protocols is often slower than accessing a local disk.

Most affordable NAS servers are not able to provide the flexible, high-speed access method that is provided by modern SAN systems (at the block level, not at the file level).

SAN

(S torage A rea N etwork)- this architectural solution allows you to connect external data storage devices (tape libraries, disk arrays, optical drives, etc.) to servers. With this connection, external devices are recognized by the operating system as local. Using a SAN network allows you to reduce the total cost of maintaining a data storage system and allows modern organizations to organize reliable storage of their information.

The simplest version of a SAN is storage systems, servers and switches connected by optical communication channels. In addition to disk data storage systems, disk libraries, tape drives (tape libraries), devices used for storing information on optical disks, etc. can be connected to a SAN.

Advantages

Reliability of access to data located on external systems.

Independence of the SAN topology from the servers and data storage systems used.

Security and reliability of centralized data storage.

Convenient centralized data and switching management.

The ability to transfer I/O traffic to a separate network, providing LAN offload.

Low latency and high performance.

Flexibility and scalability of the SAN logical structure.

Virtually unlimited geographical SAN sizes.

The ability to quickly distribute resources between servers.

The simplicity of the backup scheme is ensured by the fact that all data is located in one place.

The ability to create fault-tolerant cluster solutions based on an existing SAN without additional costs.

Availability of additional services and features, such as remote replication, snapshots, etc.

High level of security SAN/

The only drawback of such solutions is their high cost. In general, the domestic market for data storage systems lags behind the market of developed Western countries, which is characterized by widespread use of storage systems. High cost and shortage of high-speed communication channels are the main reasons hindering the development of the Russian storage market.

RAID

Speaking about data storage systems, you should definitely consider one of the main technologies that underlie the operation of such systems and are widely used in the modern IT industry. We mean RAID arrays.

A RAID array consists of several disks that are controlled by a controller and interconnected via high-speed data transfer channels. The external system perceives such disks (storage devices) as a single whole. The type of array used directly affects the degree of performance and fault tolerance. RAID arrays are used to increase the reliability of data storage, as well as to increase the write/read speed.

There are several RAID levels used to create storage area networks. The most commonly used levels are:

1. This is a disk array of increased performance, without fault tolerance, with striping.
Information is divided into separate data blocks. It is recorded simultaneously on two or several disks.

Pros:

The amount of memory is summed up.

Significant increase in performance (the number of disks directly affects the multiplicity of performance increase).

Minuses:

The reliability of RAID 0 is lower than that of even the most unreliable disk, because if any of the disks fails, the entire array becomes inoperable.

2. – disk mirror array. This array consists of a pair of disks that completely copy each other.

Pros:

Ensuring, when parallelizing requests, an acceptable write speed, as well as a gain in read speed.

Ensuring high reliability - a disk array of this type functions as long as at least 1 disk is working in it. The probability of failure of 2 disks at the same time, equal to the product of the probabilities of failure of each of them, is much lower than the probability of failure of one disk. If one disk fails, in practice it is necessary to immediately take action to restore redundancy again. To do this, it is recommended to use hot spare disks with RAID of any level (except zero).

Minuses:

The only disadvantage of RAID 1 is that the user gets one hard drive for the price of two drives.

3. . This is a RAID 0 array built from RAID 1 arrays.

4. RAID 2. Used for arrays using Hamming code.

Arrays of this type are based on the use of the Hamming code. Disks are divided into 2 groups: for data, and also for codes used for error correction. Data on the disks used to store information is distributed similar to the distribution in RAID 0, that is, it is divided into small blocks in accordance with the number of disks. The remaining disks store all error correction codes, which help restore information if one of the hard disks fails. The Hamming method used in ECC memory makes it possible to correct single errors on the fly, as well as detect double ones.

RAID 3, RAID 4. These are disk arrays with striping, as well as a dedicated parity disk. In RAID 3, data from n disks is divided into components smaller than a sector (blocks or bytes), and then distributed across n-1 disks. Parity blocks are stored on one disk. In RAID 2, n-1 disks were used for this purpose, but most of the information on the control disks was used for on-the-fly error correction, while for most users, when a disk fails, a simple recovery of information is sufficient (for this, the information that fits on one hard disk is enough ).

RAID 4 is similar to RAID 3, however, the data on it is divided into blocks rather than individual bytes. This partly made it possible to solve the problem of insufficiently high data transfer speeds with a small volume. Writing in this case is too slow due to the fact that during recording, parity is generated for the block, writing to a single disk.
RAID 3 differs from RAID 2 by the inability to correct errors on the fly, as well as by less redundancy.

Pros:

Cloud providers are also actively purchasing data storage systems for their needs, for example, Facebook and Google build their own servers from ready-made components, but these servers are not taken into account in the IDC report.

IDC also expects that emerging markets will soon significantly outpace developed markets in terms of storage consumption, as they are characterized by higher rates of economic growth. For example, the region of Eastern and Central Europe, Africa and the Middle East will surpass Japan in 2014 in terms of spending on data storage systems. By 2015, the Asia-Pacific region, excluding Japan, will surpass Western Europe in terms of data storage consumption.

The sale of data storage systems carried out by our company “Navigator” gives everyone the opportunity to receive a reliable and durable basis for storing their multimedia data. A wide selection of Raid arrays, network storage and other systems makes it possible to individually select RAIDs from the second to the fourth for each order. The impossibility of parallel write operations is explained by the fact that a separate control disk is used to store digital parity information. RAID 5 does not have the above-mentioned disadvantage. Checksums and data blocks are written automatically to all disks; there is no asymmetric disk configuration. By checksums we mean the result of an XOR operation. XOR makes it possible to replace any operand with the result and, using the XOR algorithm, obtain the missing operand as a result. To store the result of XOR, only one disk is needed (its size is identical to the size of any disk in the raid).

Pros:

The popularity of RAID5 is explained primarily by its cost-effectiveness. Writing to a RAID5 volume requires additional resources, which ultimately results in performance degradation because additional computations and writes are required. But when reading (compared to a separate hard drive), there is a certain benefit, consisting in the fact that data streams coming from several disks can be processed in parallel.

Minuses:

RAID 5 has much lower performance, especially during random write operations (such as Random Write), where performance is reduced by 10 to 25 percent of that of RAID 10 or RAID 0. This is because the process requires more disk operations (each server write operation on the RAID controller is replaced by 3 operations - 1 read operation and 2 write operations). The disadvantages of RAID 5 appear when one disk fails - in this case, the entire volume goes into critical mode, all read and write operations are accompanied by additional manipulations, which leads to a sharp drop in performance. The reliability level in this case drops to the reliability level of RAID 0, equipped with the corresponding number of disks, becoming n times less than the reliability of a single disk. If, before the array is restored, at least one more disk fails or an unrecoverable error occurs on it, the array will be destroyed, and the data on it cannot be recovered using conventional methods. Please also be aware that the RAID data redundancy recovery process, called RAID Reconstruction, will cause an intense, continuous read load from all drives for many hours after a drive fails. As a result, one of the remaining disks may fail. Also, previously undetected failures in reading data in cold data arrays (those data that are not accessed during normal operation of the array - inactive and archived) may be revealed, which leads to an increased risk of failure during data recovery.

6. – this is a RAID 50 array, which is built from RAID5 arrays;

7. – a striped disk array that uses 2 checksums, calculated in 2 independent ways.

RAID 6 is in many ways similar to RAID 5, but differs from it in a higher degree of reliability: it allocates the capacity of two disks for checksums, and the two sums are calculated using different algorithms. A higher power RAID controller is required. Helps protect against multiple failures, ensuring functionality after two drives fail at the same time. Array organization requires the use of a minimum of four disks. Using RAID-6 typically results in a disk group performance degradation of approximately 10 to 15 percent. This is explained by the large amount of information that the controller has to process (there is a need to calculate a second checksum, as well as read and rewrite a larger number of disk blocks in the process of writing each block).

8. is a RAID 0 array, which is built from RAID6 arrays.

9. Hybrid RAID. This is another level of RAID array that has become quite popular recently. These are regular RAID levels used in conjunction with additional software, as well as SSD drives that are used as read cache. This leads to an increase in system performance, explained by the fact that SSDs, compared to HDDs, have much better speed characteristics. Today there are several implementations, for example, Crucial Adrenaline, as well as several budget Adaptec controllers. Currently, the use of Hybrid RAID is not recommended due to the small resource of SSD drives.

Hybrid RAID reads are performed from the faster SSD, and writes are performed from both SSDs and HDDs (for redundancy purposes).
Hybrid RAID is great for applications that use lower-level data (virtual machine, file server, or Internet gateway).

Features of the modern storage market

In the summer of 2013, the analytical company IDC published its next forecast for the storage system market, calculated by it until 2017. Analysts’ calculations demonstrate that in the next four years, global enterprises will purchase storage systems with a total capacity of one hundred and thirty-eight exabytes. The total realized capacity of storage systems will increase annually by approximately thirty percent.

However, compared to previous years, when there was rapid growth in data storage consumption, the pace of this growth will slow down somewhat, since today most companies use cloud solutions, giving preference to technologies that optimize data storage. Saving storage space is achieved through tools such as virtualization, data compression, data deduplication, etc. All of the above tools provide space savings, allowing companies to avoid spontaneous purchases and resort to purchasing new storage systems only when they are really needed.

Of the 138 exabytes expected to be sold in 2017, 102 exabytes will be for external storage systems, and 36 for internal ones. In 2012, storage systems of twenty exabytes were implemented for external systems and eight for internal ones. Financial costs for industrial storage systems will increase annually by approximately 4.1 percent and by 2017 will amount to about forty-two and a half billion dollars.

We have already noted that the global storage market, which recently experienced a real boom, has gradually declined. In 2005, the growth in storage consumption at the industrial level was sixty-five percent, and in 2006 and 2007 – fifty-nine percent each. In subsequent years, the growth in storage consumption decreased further due to the negative impact of the global economic crisis.

Analysts predict that the growth in the use of cloud storage systems will lead to a decrease in the consumption of data storage solutions at the corporate level. Cloud providers are also actively purchasing data storage systems for their needs, for example, Facebook and Google build their own servers from ready-made components, but these servers are not taken into account in the IDC report.

Prompt sale of data storage systems

The wide technical capabilities, literacy and experience of the company’s personnel guarantee fast and comprehensive completion of the task. At the same time, we are not limited solely to the sale of data storage systems, since we also carry out its setup, startup and subsequent service and maintenance.

Trinity company is one of the leaders in the IT market among suppliers of data storage systems (DSS) in Russia. Over our more than 25-year history, being an official supplier and partner of well-known storage systems brands, we have supplied our customers with several hundred data storage systems for various purposes from equipment vendors (manufacturers) such as: IBM, Dell EMC, NetApp, Lenovo, Fujitsu, HP, Hitachi, Oracle (Sun Microsystems), Huawei, RADIX, Infortrend. Some storage systems contained more than 1,000 hard drives and had a capacity of more than a petabyte.

Today we are a multi-vendor system integrator and are engaged in the design and construction of IT infrastructure for enterprises, supplying and implementing to our customers not only data storage systems of well-known brands, but also server and network equipment, engineering infrastructure, information security tools, as well as management and monitoring. Trinity's comprehensive approach is ensured by the deep expertise of our engineers and long-term partnerships with hardware and software manufacturers. Today we can offer comprehensive IT solutions for businesses of any scale and tasks of any complexity.

We provide a wide range FREE services, with which we accompany possible activities in relationships with our potential customers of IT equipment and solutions. We are ready to work for FREE and prepare a solution to an IT problem in terms of analyzing all possible options, selecting the optimal one, calculating the solution architecture, drawing up all hardware and software specifications, as well as deploying this solution in the customer’s infrastructure.

A systematic approach to comprehensively solving a customer’s IT problems or supplying individual IT components of a solution involves in-depth consulting with Trinity experts to select the only correct and optimal solution.

Trinity is an official partner of leading manufacturers of storage equipment and software, confirmed by the highest level statuses Premier, GOLD, PLATINUM and receiving special awards with which vendors recognize their partners for achievements in the level of expertise and implementation of complex information technologies in the industries of production, trade and public administration.

We offer not only to buy data storage equipment from leading international brands (manufacturers), such as Dell EMC, Lenovo, NetApp, Fujitsu, HP (HPe), Hitachi, Cisco, IBM, Huawei, but we are also ready to perform the entire range of IT services for you. tasks for selecting equipment, consulting, drawing up specifications, pilot testing in our laboratory or at your site, setting up, installing and optimizing the infrastructure specifically for your tasks and specific applications. We are also ready to provide special prices for supplied data storage systems and related equipment and software, as well as provide qualified technical support and service.

We are always ready to help develop technical specifications and specifications for data storage systems (DSS) and server equipment for specific tasks, services and applications, select financial terms (installments, leasing), deliver and install equipment at the customer’s site and subsequent launch with consulting and training of client IT employees.

Selection of the optimal equipment configuration for data storage and processing

We are ready to offer you optimally configured data storage systems. In our portfolio of solutions, we have various data storage systems: All-Flash Class systems (flash), Hybrid storage systems on solid-state Flash drives, SSD, NVMe, SAS, SATA with various options for connecting to hosts, such as file environments (network file system NFS and SMB), and block storage systems (Fibre Channel and iSCSI), and are also ready to calculate hyperconverged systems (HCI). You can formulate your tasks or wishes for the composition of the storage system, performance requirements (IOPs - input/output operations per second), access time requirements (Latency, delay in milli- or microseconds), storage capacity (gigabytes, terabytes, petabytes), physical size and energy consumption, as well as servers and software (operating systems, hypervisors and application applications). We are ready to advise you by phone or by mail and are ready to offer you a full or partial audit of the resources and storage services of your company’s IT infrastructure, for a deep understanding of your tasks, requirements and capabilities for the optimal selection of an IT solution (storage system) or the implementation of a complex project , the results of which will work for your business for many years, with the ability to increase power and storage capacity with growing requirements, your specifics and development tasks. You will be able to select (receive specifications and prices), conduct pilot testing of data storage systems in your infrastructure, receive all the necessary consultations and subsequently buy data storage systems and other related equipment and software, receiving a single-vendor or multi-vendor solution, and our specialists will complete the entire complex deliveries and works from your first contact with us, to signing certificates of completed work and provision of service.

In addition to ready-made and configured data storage systems, Trinity offers a wide range of server equipment and network infrastructure that are integrated into the customer’s IT infrastructure for a comprehensive solution to data storage and processing problems. Almost any review of storage systems that can be found on thematic sites and forums will certainly include information from our long-term partners IBM, Dell EMC, NetApp, Lenovo, Fujitsu, HP, Hitachi, Cisco and Huawei. You can buy and set up all this data storage equipment in our company quickly and profitably.

Sizing and selection of data storage system specifications for your company’s tasks

We have in stock both ready-made, most popular data storage systems and all the capabilities for quickly and accurately developing technical specifications for developing storage system configurations for the needs of a specific company. Our systems are capable of operating around the clock: 24 hours a day, 7 days a week, 365 days a year without failures or errors. We achieve such statistics through the high quality of supplied solutions and rigorous testing of all units and components of storage systems before shipment to our customers. The use of RAID technologies, fault tolerance tools, clustering and disaster protection solutions (Disaster Recovery), both at the hardware level and at the level of operating systems, controllers, hypervisors and deployed services, guarantees the integrity and availability of processed and stored information on data storage systems, and on backups. You can simply buy data storage systems from our company or invite us to participate in a complex IT project in which data storage equipment is one of the components of the enterprise’s IT infrastructure.

In-house development of data storage system

The Trinity company has developed and supplies a data storage system (DSS) to the Russian market under its own brand "FlexApp". This storage system is based on RAIDIX software. Trinity's line of domestically produced storage equipment includes both high-performance data storage systems based on flash drives (All-Flash) and high-capacity storage systems using many of the most capacious hard drives of 16TB (terabytes) in each shelf with the ability to combine these shelves into pools reaching a total capacity of hundreds of petabytes. The FlexApp data storage system we developed can be the basis of data storage equipment for telecom operators to comply with the requirements of the Yarovaya Law.

How can you buy a data storage system from our company?

In order to calculate and purchase a data storage system from our company, you must send a request by mail for the model you are interested in or describe your requirements for the composition of such a model. You can also call us during business hours. We will be happy to discuss with you the tasks and requirements for data storage systems, their performance, and level of fault tolerance. We are ready to provide complete and free expert advice on the configuration and technical features of any data storage systems produced by our partners: Dell EMC, Lenovo, NetApp, Fujitsu, HP (HPe), Hitachi, Cisco, IBM, Huawei for the optimal selection of the required solution.

Our offices with engineers and experts are located in three regions of the country:

Central Federal District, Moscow;
Northwestern Federal District, St. Petersburg;
Ural Federal District, Ekaterinburg.

We are always ready to see you and invite you to visit Trinity offices to discuss solutions to your IT problems with our managers, experts, engineers and company management. If necessary, we are ready to organize meetings between customers and representatives of vendors (manufacturers) and suppliers. Our employees are also ready to come to your site for acquaintance and detailed study of the IT infrastructure and functioning of IT services.

Data storage equipment. Building a budget data storage system

Let's agree….

Storage system or... storage system?

Storage systemsNAS

Data storage systems.

Basic storage connection protocols

Hard disks

RAID

Storage Software

* * *

Main storage components:

Main goals

Main types of storage systems

Main connection interfaces

Storage topology options

DAS

NAS

SAN

RAID

Features of the modern storage market

Prompt sale of data storage systems

Selection of the optimal equipment configuration for data storage and processing

Sizing and selection of data storage system specifications for your company’s tasks

In-house development of data storage system

How can you buy a data storage system from our company?

Popular articles

Latest articles

Sections

Pages

Special projects

Contacts