Random access storage devices. What will we do with the received material?


There is much more dynamic memory in a computer than static memory, since DRAM is used as the main memory of the VM. Like SRAM, dynamic memory consists of a core (an array of electronic devices) and interface logic (buffer registers, data reading amplifiers, regeneration circuits, etc.). Although the number of types of DRAM has already exceeded two dozen, their cores are organized almost identically. The main differences are related to the interface logic, and these differences are also due to the scope of application of the microcircuits - in addition to the main memory of the VM, dynamic memory ICs are included, for example, in video adapters. The classification of dynamic memory chips is shown in Fig. 72.

To evaluate the differences between types of DRAM, let’s first look at the algorithm for working with dynamic memory. For this we will use Fig. 68.

Unlike SRAM, the address of a DRAM cell is transferred to the chip in two steps - first the column address, and then the row, which makes it possible to reduce the number of address bus pins by approximately half, reduce the size of the case and place a larger number of chips on the motherboard. This, of course, leads to a decrease in performance, since it takes twice as long to transfer the address. To indicate which part of the address is transmitted at a certain moment, two auxiliary signals RAS and CAS are used. When accessing a memory cell, the address bus is set to the address of the row. After the processes on the bus have stabilized, the RAS signal is applied and the address is written to the internal register of the memory chip. The address bus is then set to the column address and the CAS signal is issued. Depending on the state of the WE line, data is read from the cell or written to the cell (the data must be placed on the data bus before writing). The interval between setting the address and issuing the RAS (or CAS) signal is specified technical characteristics microcircuits, but usually the address is set in one clock cycle system bus, and the control signal is as follows. Thus, to read or write one cell of dynamic RAM, five clock cycles are required, in which the following occurs: issuing a row address, issuing a RAS signal, issuing a column address, issuing a CAS signal, performing a read/write operation (in static memory, the procedure takes only two up to three measures).

Rice. 72. Classification of dynamic RAM: a) – chips for main memory; b) – microcircuits for video adapters.

You should also remember the need to regenerate data. But along with the natural discharge of the capacitor, the electronic device also leads to a loss of charge over time when reading data from DRAM, so after each read operation the data must be restored. This is achieved by writing the same data again immediately after reading it. When reading information from one cell, the data of the entire selected row is actually output at once, but only those that are in the column of interest are used, and all the rest are ignored. Thus, a read operation from a single cell destroys the entire row's data and must be recovered. Data regeneration after reading is performed automatically by the interface logic of the chip, and this happens immediately after reading the line.

Now let's look at the different types of dynamic memory chips, starting with system DRAM, that is, chips designed to be used as main memory. At the initial stage, these were asynchronous memory chips, the operation of which is not strictly tied to the clock pulses of the system bus.

Asynchronous dynamic RAM. Asynchronous dynamic RAM chips are controlled by RAS and CAS signals, and their operation, in principle, is not directly related to bus clock pulses. Asynchronous memory is characterized by additional time spent on interaction between memory chips and the controller. Thus, in an asynchronous circuit, the RAS signal will be generated only after a clock pulse arrives at the controller and will be perceived by the memory chip after some time. After this, the memory will produce data, but the controller will be able to read it only upon the arrival of the next clock pulse, since it must work synchronously with the rest of the VM devices. Thus, there are slight delays during the read/write cycle due to the memory controller and memory controller waiting.

DRAM chips. The first dynamic memory chips used the simplest method of data exchange, often called conventional. It allowed reading and writing a memory line only every fifth clock cycle . The steps of such a procedure have been described previously. Traditional DRAM corresponds to the formula 5-5-5-5. Microcircuits of this type could operate at frequencies up to 40 MHz and, due to their slowness (access time was about 120 ns), did not last long.

FPMDRAM chips. Dynamic RAM chips that implement FPM mode are also early types of DRAM. The essence of the regime was shown earlier. The read pattern for FPM DRAM is described by the formula 5-3-3-3 (14 clock cycles in total). The use of a fast page access scheme reduced access time to 60 ns, which, taking into account the ability to operate at higher bus frequencies, led to an increase in memory performance compared to traditional DRAM by approximately 70%. This type of chip was used in personal computers until about 1994.

EDO DRAM chips. The next stage in the development of dynamic RAM was ICs with hyperpage access mode(HRM, Hyper Page Mode), better known as EDO (Extended Data Output - extended data retention time at the output). The main feature of the technology is the increased time of data availability at the output of the microcircuit compared to FPM DRAM. In FPM DRAM chips, the output data remains valid only when the CAS signal is active, which is why the second and subsequent row accesses require three clock cycles: a CAS switch to the active state, a data read clock, and a CAS switch to the inactive state. In EDO DRAM, on the active (falling) edge of the CAS signal, the data is stored in an internal register, where it is stored for some time after the next active edge of the signal arrives. This allows the stored data to be used when the CAS is already in an inactive state. In other words, timing parameters are improved by eliminating cycles of waiting for the moment of data stabilization at the output of the microcircuit.

The reading pattern of EDO DRAM is already 5-2-2-2, which is 20% faster than FPM. Access time is about 30-40 ns. It should be noted that the maximum system bus frequency for EDO DRAM chips should not exceed 66 MHz.

BEDO DRAM chips. EDO technology has been improved by VIA Technologies. The new modification of EDO is known as BEDO (Burst EDO). The novelty of the method is that during the first access, the entire line of the microcircuit is read, which includes consecutive words of the package. The sequential transfer of words (switching columns) is automatically monitored by the internal counter of the chip. This eliminates the need to issue addresses for all cells in a packet, but requires support from external logic. The method allows you to reduce the time of reading the second and subsequent words by another clock cycle, due to which the formula takes the form 5-1-1-1.

EDRAM chips. A faster version of DRAM was developed by Ramtron's subsidiary, Enhanced Memory Systems. The technology is implemented in FPM, EDO and BEDO variants. The microcircuit has more fast core and internal cache memory. The presence of the latter is the main feature of the technology. The cache memory is static memory (SRAM) with a capacity of 2048 bits. The EDRAM core has 2048 columns, each of which is connected to an internal cache. When accessing any cell, the entire row (2048 bits) is read simultaneously. The read line is entered into SRAM, and the transfer of information to cache memory has virtually no effect on performance, since it occurs in one clock cycle. When further accesses to cells belonging to the same row are made, the data is taken from the faster cache memory. The next access to the kernel occurs when accessing a cell that is not located in a line stored in the chip's cache memory.

The technology is most effective when reading sequentially, that is, when the average access time for a chip approaches the values ​​characteristic of static memory (about 10 ns). The main difficulty is incompatibility with controllers used when working with other types of DRAM.

Synchronous dynamic RAM. In synchronous DRAM, information exchange is synchronized by external clock signals and occurs at strictly defined points in time, which allows you to take everything from the bandwidth of the processor-memory bus and avoid wait cycles. Address and control information is recorded in the memory IC. After which the response of the microcircuit will occur through a clearly certain number clock pulses, and the processor can use this time for other actions not related to memory access. In the case of synchronous dynamic memory, instead of the duration of the access cycle, they talk about the minimum permissible period of the clock frequency, and we are already talking about a time of the order of 8-10 ns.

SDRAM chips. The abbreviation SDRAM (Synchronous DRAM) is used to refer to “regular” synchronous dynamic RAM chips. The fundamental differences between SDRAM and the asynchronous dynamic RAM discussed above can be reduced to four points:

· synchronous method of data transfer to the bus;

· conveyor mechanism for packet forwarding;

· use of several (two or four) internal memory banks;

· transfer of part of the functions of the memory controller to the logic of the microcircuit itself.

Memory synchronicity allows the memory controller to “know” when data is ready, thereby reducing the costs of waiting and searching cycles for data. Since data appears at the output of the IC simultaneously with clock pulses, the interaction of memory with other VM devices is simplified.

Unlike BEDO, the pipeline allows packet data to be transferred clock by clock, allowing the RAM to operate smoothly at higher frequencies than asynchronous RAM. The advantages of a pipeline are especially important when transmitting long packets, but not exceeding the length of the chip line.

A significant effect is achieved by dividing the entire set of cells into independent internal arrays (banks). This allows you to combine access to a cell in one bank with preparation for the next operation in the remaining banks (recharging control circuits and restoring information). The ability to keep multiple lines of memory open simultaneously (from different banks) also helps improve memory performance. When accessing banks alternately, the frequency of accessing each of them individually decreases in proportion to the number of banks and SDRAM can operate at higher frequencies. Thanks to the built-in address counter, SDRAM, like BEDO DRAM, allows reading and writing in burst mode, and in SDRAM the burst length varies and in burst mode it is possible to read an entire memory line. The IC can be characterized by the formula 5-1-1-1. Although the formula for this type of dynamic memory is the same as BEDO, the ability to operate at higher frequencies means that SDRAM with two banks at a bus clock speed of 100 MHz can almost double the performance of BEDO memory.

DDR SDRAM chips. An important step in the further development of SDRAM technology was DDR SDRAM (Double Data Rate SDRAM - SDRAM with double the data transfer rate). Unlike SDRAM, the new modification produces data in burst mode on both edges of the synchronization pulse, due to which the throughput doubles. There are several DDR SDRAM specifications, depending on the system bus clock speed: DDR266, DDR333, DDR400, DDR533. Thus, the peak bandwidth of a DDR333 memory chip is 2.7 GB/s, and for DDR400 it is 3.2 GB/s. DDR SDRAM is currently the most common type of dynamic memory in personal VMs.

RDRAM, DRDRAM microcircuits. The most obvious ways to increase the efficiency of a processor with memory are to increase the bus clock frequency or the sampling width (the number of simultaneously transferred bits). Unfortunately, attempts to combine both options encounter significant technical difficulties (as the frequency increases, the problems of electromagnetic compatibility become worse; it becomes more difficult to ensure that all parallelly sent bits of information arrive at the same time to the consumer). Most synchronous DRAMs (SDRAM, DDR) use wide sampling (64 bits) at a limited bus frequency.

A fundamentally different approach to building DRAM was proposed by Rambus in 1997. It focuses on increasing the clock speed to 400 MHz while reducing the sample width to 16 bits. The new memory is known as RDRAM (Rambus Direct RAM). There are several varieties of this technology: Base, Concurrent and Direct. In all, clocking is carried out on both edges of clock signals (as in DDR), due to which the resulting frequency is 500-600, 600-700 and 800 MHz, respectively. The first two options are almost identical, but the changes in Direct Rambus (DRDRAM) technology are quite significant.

First, let's look at the fundamental points of RDRAM technology, focusing mainly on the more modern version - DRDRAM. The main difference from other types of DRAM is the original data exchange system between the core and the memory controller, which is based on the so-called “Rambus channel” using an asynchronous block-oriented protocol. At the logical level, information between the controller and memory is transferred in packets.

There are three types of packages: data packages, row packages and column packages. Packets of rows and columns are used to transmit commands from the memory controller to control the rows and columns of the array of storage elements, respectively. These commands replace conventional system microcircuit control using RAS, CAS, WE and CS signals.

The GE array is divided into banks. Their number in a crystal with a capacity of 64 Mbit is 8 independent or 16 dual banks. In dual bank^, the pair of banks share common read/write amplifiers. The internal core of the chip has a 128-bit data bus, which allows 16 bytes to be transferred at each column address. When recording, you can use a mask in which each bit corresponds to one byte of the packet. Using the mask, you can specify how many bytes of the packet and which bytes should be written to memory.

The data, row and column lines in the channel are completely independent, so row commands, column commands and data can be transmitted simultaneously, and for different banks of the chip. Column packets contain two fields and are transmitted over five lines. The first field specifies the main write or read operation. The second field contains either an indication of the use of a record mask (the mask itself is transmitted over the data lines), or an extended operation code that defines an option for the main operation. String packets are divided into activation, cancellation, regeneration and power mode switching commands. Three lines are allocated for transmitting string packets.

The write operation can immediately follow the read - only a delay is needed for the time the signal travels through the channel (from 2.5 to 30 ns depending on the length of the channel). To equalize delays in the transmission of individual bits of the transmitted code, the conductors on the board must be positioned strictly in parallel, have the same length (the length of the lines should not exceed 12 cm) and meet strict requirements defined by the developer.

Each write in the channel can be pipelined, with the first data packet having a latency of 50 ns, and the remaining read/write operations occurring continuously (latency is only introduced when changing from a write to a read operation, and vice versa).

Available publications mention Intel work and Rambus on a new version of RDRAM called nDRAM that will support data transfer rates up to 1600 MHz.

SLDRAM chips. A potential competitor to RDRAM as a memory architecture standard for future personal VMs is a new type of dynamic RAM developed by the SyncLink Consortium, a consortium of VM manufacturers, known by the abbreviation SLDRAM. Unlike RDRAM, the technology of which is the property of Rambus and Intel, this standard is open. At the system level, the technologies are very similar. Data and commands from the controller to memory and back to SLDRAM are transmitted in packets of 4 or 8 messages. Commands, address and control signals are sent over a unidirectional 10-bit command bus. Read and write data is transmitted over a bidirectional 18-bit data bus. Both buses operate at the same frequency. For now, this frequency is still 200 MHz, which, thanks to DDR technology, is equivalent to 400 MHz. The next generations of SLDRAM should operate at frequencies of 400 MHz and higher, that is, provide an effective frequency of more than 800 MHz.

Up to 8 memory chips can be connected to one controller. To avoid delays in signals from chips further away from the controller, the timing characteristics for each chip are determined and entered into its control register when the power is turned on.

ESDRAM chips. This is a synchronous version of EDRAM that uses the same techniques to reduce access time. A write operation, unlike a read operation, bypasses the cache, which increases ESDRAM performance when resuming reading from a line already in the cache. Thanks to the presence of two banks in the chip, downtime due to preparation for read/write operations is minimized. The disadvantages of the microcircuit in question are the same as those of EDRAM - the controller is more complicated, since it must read the ability to prepare for reading into cache memory new line kernels. In addition, with an arbitrary sequence of addresses, the cache memory is used inefficiently.

CDRAM chips. This type of RAM was developed by Mitsubishi Corporation, and it can be considered as a revised version of ESDRAM, free from some of its imperfections. The capacity of the cache memory and the principle of placing data in it have been changed. The capacity of a single cache block has been reduced to 128 bits, so the 16-kilobit cache can simultaneously store copies of 128 memory locations, allowing for more efficient use of cache memory. Replacement of the first memory section placed in the cache begins only after the last (128th) block is filled. The means of access have also changed. Thus, the chip uses separate address buses for the static cache and the dynamic core. Transferring data from the dynamic core to cache memory is combined with issuing data to the bus, so frequent but short transfers do not reduce the performance of the IC when reading large amounts of information from memory and put CDRAM on par with ESDRAM, and when reading at selective addresses, CDRAM clearly wins. It should be noted, however, that the above changes led to even greater complexity of the memory controller.

End of work -

This topic belongs to the section:

Organization of computers and systems

Siberian State Aerospace University.. named after Academician M. F. Reshetnev.. organization of computers and systems..

If you need additional material on this topic, or you did not find what you were looking for, we recommend using the search in our database of works:

What will we do with the received material:

If this material was useful to you, you can save it to your page on social networks:

All topics in this section:

Levels of detail of the computer structure
A computer as a complete object is the fruit of the efforts of specialists in various fields of human knowledge. Every specialist considers a computer

Evolution of Computing Automation Tools
Attempts to facilitate, and ideally automate, the computing process have a long history, dating back more than 5,000 years. With the development of science and technology, computing automation tools are continuously

Zero generation (1492-1945)
To complete the picture, we will mention two events that occurred before our era: the first abacus - the abacus, invented in ancient Babylon 3000 BC. e., and their more “modern” version with k

First generation(1937-1953)
Several developments have claimed the role of the first electronic computer in history in different periods. What they had in common was the use of circuits based on electron vacuum tubes

Second generation (1954-1962)
The second generation is characterized by a number of advances in the element base, structure and software. It is generally accepted that the reason for identifying a new generation of VMs was techno

Third generation (1963-1972)
The third generation was marked by a sharp increase in VM computing power, resulting from great advances in architecture, technology and software. Basics

Fourth generation (1972-1984)
The fourth generation usually begins with the transition to large-scale integration (LSI) and very large-scale integration (VLSI) integrated circuits and

Fifth generation (1984-1990)
The main reason for separating computing systems of the second half of the 80s into an independent generation was rapid development An aircraft with hundreds of processors, which became the motivator

Concept of a stored program machine
Based on the goals of this section, we will introduce a new definition of the term “computer” as a set of technical means used for automated processing of discrete data.

Binary coding principle
According to this principle, all information, both data and commands, is encoded with binary digits 0 and 1. Each type of information is represented by a binary sequence and has its own

Program control principle
All calculations provided for by the algorithm for solving the problem must be presented in the form of a program consisting of a sequence of control words - commands. Each team

The principle of memory homogeneity
Commands and data are stored in the same memory and are externally indistinguishable in memory. They can only be recognized by the way they are used. This allows you to perform the same actions on commands.

Von Neumann architecture
Von Neumann's article defines the main VM devices with which the above principles should be implemented. Most modern VMs in their structure correspond to the program principle

Structures of computers
Currently, two methods of constructing computers have become approximately equally widespread: with direct connections and based on a bus. Imagine a typical

Computing system structures
The concept of “computing system” presupposes the presence of many processors or complete computers, when combining which one of two approaches is used.

Promising directions of research in the field of architecture
The main directions of research in the field of computer and computer architecture can be divided into two groups: evolutionary and revolutionary. The first group includes studies

The concept of instruction system architecture
The command system of a computer is the complete list of commands that a given VM is capable of executing. In turn, under the command system architecture (ASC) it is customary to define

Stack architecture
The stack is memory that is structurally different from the main memory of the VM. The principles of constructing stack memory are discussed in detail later, but here we will highlight only those aspects that

Battery architecture
Battery-based architecture was historically one of the first to emerge. In it, to store one of the operands of an arithmetic or logical operation, the processor has a dedicated register - the accumulator

Register architecture
In this type of machine, the processor includes an array of registers (register file) known as general purpose registers (GPR). These registers, in a sense, can be considered

Dedicated memory architecture
In a dedicated memory architecture, main memory can only be accessed using two special commands: load and store. In English transcription this architecture

Command formats
A typical command, in general, should indicate: · the operation to be performed; · addresses of the source data (operands) on which the operation is performed; · address at

Command length
This is the most important circumstance that affects the organization and capacity of memory, bus structure, complexity and speed of the CPU. On the one hand, it is convenient to have a powerful set of commands at your disposal, that is, both

Address part width
The address part of the command contains information about the location of the source data and the location where the result of the operation is saved. Typically the location of each operand and result is specified in the command

Number of addresses per command
To determine the number of addresses included in the address part, we will use the term addressing. In the “maximum” version, three components must be specified: the address of the first op.

Targeting and program execution time
The execution time of one command is the sum of the operation execution time and the memory access time. For a three-address command, the last one is summed up from four components

Operand Addressing Methods
The question of how the location of the operands can be indicated in the address field of an instruction is considered one of the central questions in the development of a VM architecture. From the point of view of Sokr

Direct addressing
With direct addressing (NA), the address field of the command contains the operand itself instead of the address (Fig. 15). This method can be used when performing arithmetic

Direct addressing
With direct or absolute addressing (PA), the address code directly indicates the number of the memory cell being accessed (Fig. 22), that is, the address code coincides with the executive

Indirect addressing
One of the ways to overcome the problems inherent in direct addressing can be the technique when, using the limited address field of the command, the address of the cell is indicated, in turn

Register addressing
Register addressing (RA) is similar to direct addressing. The difference is that the address field of the instruction points not to a memory cell, but to a processor register (Fig. 24). Identify

Indirect register addressing
Register indirect addressing (RIA) is an indirect addressing where the operand's execution address is stored not in a main memory location, but in a processor register. Resp.

Offset addressing
When addressing with an offset, the executive address is formed as a result of summing the contents of the address field of the command with the contents of one or more processor registers (Fig.

Relative addressing
With relative addressing (RA), to obtain the executive address of the operand, the contents of the subfield Ak of ​​the command are added to the contents of the program counter (Fig. 27). So

Basic register addressing
In the case of basic register addressing (BRA), a register called the base register contains the full-bit address, and the Ac subfield contains the offset relative to this address. Link to ba

Index addressing
With index addressing (IA), the Ac subfield contains the address of the memory cell, and the register (specified explicitly or implicitly) contains the offset relative to this address. As you can see, this method

Page addressing
Page addressing (PTA) involves dividing the address space into pages. A page is identified by its starting address, which acts as a base. The older part of this

Command loop
A program in a von Neumann computer is implemented by a central processing unit (CPU) through the sequential execution of the commands that form the program. Actions required for sampling (

Main indicators of computers
The use of a specific computer makes sense if its performance corresponds to the performance requirements determined by the requirements for the implementation of specified algorithms. As a basis

i80x86 software architecture
One of the most common general-purpose processors at the moment are processors with x86 architecture (Intel IA-32). The forefather of the family of these processors was the i8086 CPU. AND

Code segment
The code segment usually records microprocessor instructions that are executed sequentially one after another. To determine the address next command after completing the previous one

Variables in the program
All other segments allocate space for variables used in the program. The division into data segments, stack segment and additional data segment is due to the fact that

Stack segment
To store temporary values ​​for which it is inappropriate to allocate variables, a special memory area called the stack is intended. To address such an area, use seg

Microprocessor i8086
From a programmer's point of view, a microprocessor is represented as a set of registers. Registers are designed to store some data and therefore, in a sense, they correspond

Access to memory cells
As already noted, any microprocessor system must include memory in which programs and the data necessary for their operation are located. Physical and lo

Microprocessor commands
A program running in a microprocessor system is ultimately a set of bytes perceived by the microprocessor as the code of a particular command along with the corresponding

Main groups of commands and their brief characteristics
To simplify the programming process in assembly language, a mnemonic notation of microprocessor commands is used (usually in the form of abbreviations of English words describing actions

Addressing methods in i80x86 architecture
The addressing methods discussed above can be fully applied when writing a program in assembly language. Let's consider methods for implementing the most commonly used methods

Addressing memory cells
In addition to registers and constants, memory cells can be used in commands. Naturally, they can be used both as a source and as a receiver of data. More precisely, the commands use

Direct addressing
With direct addressing, the command specifies the offset, which corresponds to the beginning of the placement of the corresponding operand in memory. By default, when using simplified segment description directives

Indirect addressing
With indirect addressing, the offset of the corresponding operand in the segment is contained in one of the microprocessor registers. Thus, the current contents of the microprocessor register determines the execution

Indirect addressing by base
When using indirect addressing, you can add a constant to the contents of the register. In this case, the executive address is calculated as the sum of the contents of the corresponding register and this constant

Addressing by database with indexing
The i8086 microprocessor can also use a combination of indirect index addressing and base addressing. The executive address of the operand is determined as the sum of three components - the contents of

Laboratory work. i8086 processor software architecture
In the assembly language of the i8086 processor using any convenient package (TASM is recommended), implement the following tasks: 1. Tabulate the function

Structure of computer interconnections
A set of paths that interconnect the main devices of a VM (central processor, memory and input/output modules) forms the structure of the computer’s interconnections.

Tire types
An important criterion that determines the characteristics of a tire can be its intended purpose. Based on this criterion, we can distinguish: · processor-memory buses; · input buses

System bus
To reduce cost, some VMs have a common bus for memory and I/O devices. This type of bus is often called a system bus. The system bus serves for physical and logical

Single bus computer
In single-bus interconnection structures, there is one system bus that provides information exchange between the processor and memory, as well as between the airborne device on the one hand and the processor on the other.

Computer with two types of buses
Although input/output device controllers (IDCs) can be connected directly to the system bus, greater effect achieved by using one or more I/O buses

Computer with three types of buses
A high-speed expansion bus can be added to the bus system to connect high-speed peripheral devices.

Mechanical aspects
The main bus that connects the devices of a computer is usually located on the so-called backplane or motherboard. The bus is formed by thin parallel copper strips

Electrical aspects
All devices using the bus are electrically connected to its signal lines, which are electrical conductors. By changing the voltage levels on the signal lines,

Bus line distribution
Any transaction on the bus begins with the master device setting address information. The address allows you to select a slave device and establish a connection between it and the master. D

Leased and multiplexed lines
Some VMs combine the address and data lines into a single multiplexed address/data bus. Such a bus operates in time-sharing mode, since the bus cycle is divided into

Priority schemes
Each potential leader is assigned a specific priority level, which can remain constant (static or fixed priority) or vary depending on the priority level.

Arbitration schemes
Arbitration of requests for bus control can be organized in a centralized or decentralized manner. The choice of a specific scheme depends on the performance requirements and

PCI interface
The dominant position in the PC market is sufficient long time occupied by systems based on the PCI bus (Peripheral Component Interconnect - Interaction of peripheral components). This

AGP port
WITH widespread implementation multimedia technologies, the PCI bus bandwidth has become insufficient for the productive operation of the video card. In order not to change the existing tire standard

PCI Express
Interface PCI Express(original name - 3GIO) uses the PCI concept, but their physical implementation is radically different. At the physical layer, PCI Express represents

Data localization
By data localization we mean the ability to access one of the hosts, as well as address data on it. The host address is usually contained in the address part of the input/output commands

Control and synchronization
The control and synchronization function is that the VVM must coordinate the movement of data between the internal resources of the VM and external devices. When developing systems

Information exchange
The main function of the IIM is to ensure the exchange of information. On the side of the “large” interface, this is an exchange with the CPU, and on the side of the “small” interface, this is an exchange with the computer. In this regard, it is required

Interrupt and exception system in IA-32 architecture
Interrupts and exceptions are events that indicate that certain conditions have occurred in the system or in a currently executing task that require processor intervention.

Advanced Programmable Interrupt Controller (APIC)
IA-32 microprocessors, starting with the Pentium model, contain a built-in advanced programmable interrupt controller (APIC). Built-in APIC is designed for prera registration

Calculation pipeline
Improving the element base no longer leads to a dramatic increase in VM performance. Architectural techniques seem more promising in this regard, including

Synchronous linear conveyors
The efficiency of a synchronous conveyor largely depends on the correct choice of the duration of the clock period Tk. The minimum permissible Tk can be defined as

Conveyor efficiency metrics
To characterize the effect achieved by pipelining calculations, three metrics are usually used: acceleration, efficiency and performance. Under accelerated

Nonlinear conveyors
The pipeline is not always a linear chain of stages. In a number of situations, it turns out to be advantageous when functional blocks are connected to each other not in series, but accordingly

Command pipeline
The idea of ​​a conveyor belt of commands was proposed in 1956 by academician S. A. Lebedev. As you know, a command cycle is a sequence of stages. Having entrusted the implementation of each of

Conflicts in the command pipeline
The number 14 obtained in the example characterizes only the potential performance of the command pipeline. In practice, due to problems arising in the pipeline conflict situations achieve such a pro

Methods for solving the conditional jump problem
Despite the importance of the aspect of calculating the execution address of the transition point, the main efforts of VM designers are aimed at solving the problem of conditional transitions, since

Transition Prediction
Transition prediction is today considered one of the most effective ways to deal with management conflicts. The idea is that even before the moment

Static branch prediction
Static branch prediction is carried out on the basis of some a priori information about the program to be executed. The prediction is made at the stage of program compilation and

Dynamic branch prediction
In dynamic strategies, the decision about the most likely outcome of the command is made during calculations, based on information about previous transitions (transition history), collected

Superpipeline processors
The efficiency of a conveyor is directly dependent on the frequency with which processing objects are supplied to its input. You can achieve an n-fold increase in the rate of operation of the conveyor

Full and reduced instruction set architectures
Modern technology programming is focused on high-level languages ​​(HLL), the main task of which is to facilitate the process of writing programs. More than 90% of the entire program process

Main features of RISC architecture
The main efforts in the RISC architecture are aimed at building the most efficient command pipeline, that is, one where all commands are retrieved from memory and sent to the CPU for processing.

Advantages and Disadvantages of RISC
Comparing the advantages and disadvantages of CISC and RISC, it is impossible to draw an unambiguous conclusion about undeniable advantage one architecture over another. For certain areas of use of VM l

Superscalar processors
Since the possibilities for improving the element base have already been practically exhausted, further increasing the performance of VMs lies in the plane of architectural solutions. As already about

Laboratory work. VM execution devices
Counters. A counter is a device whose output signals display the number of pulses received at the counting input. A JK flip-flop can serve as an example of a simple

Characteristics of memory systems
The list of main characteristics that must be taken into account when considering a specific type of memory includes: · location; · capacity; · unit

Hierarchy of storage devices
Memory is often called the “bottleneck” of von Neumann VMs due to its serious performance lag behind processors, and this gap is steadily increasing. So, if

Main memory
Main memory (RAM) is the only type of memory that the CPU can access directly (except for CPU registers). Information storing

Block organization of main memory
The main memory capacity of modern VMs is too large to be implemented on a single integrated circuit (IC). The need to combine several ICs

Organization of memory chips
Integrated circuits (ICs) of memory are organized in the form of a matrix of cells, each of which, depending on the capacity of the IC, consists of one or more storage elements (SE)

Synchronous and asynchronous storage devices
As a first criterion by which main memory storage devices can be classified, consider the synchronization method. From these positions known types Memory subsection

Random access storage devices
Most of the currently used types of RAM chips are not able to store data without an external source of energy, that is, they are volatile (vo

Static and dynamic RAM
In static RAM, the storage element can store recorded information indefinitely (subject to supply voltage). Dynamic storage element

Static random access memories
Let us recall that the role of a storage element in static RAM is played by a trigger. Static RAM is currently the fastest, but also the most expensive type of RAM.

Laboratory work. Advanced work with memory and transfer of control in the program
Implement the following programs in assembly language of the i8086 microprocessor using the call and ret control transfer commands: 1. Define the cut

Magnetic disks
Information in magnetic disk storage (MD) is stored on flat metal or plastic plates (disks) coated with magnetic material. Data is written to and read from

Data organization and formatting
The data on the disk is organized into a series of concentric circles called tracks (Figure 72). Each of them has the same width as the head. Adjacent paths are separated by gaps. This

Internal structure of disk systems
Fixed-head memories have one read/write head per track. The heads are mounted on a rigid arm that crosses all the tracks of the disk. On disk

Redundant Array Concept
Magnetic disks being the basis external memory any VM, at the same time remain one of the “bottlenecks” due to the relatively high cost, insufficient performance and fault

Improving disk subsystem performance
Increasing the performance of the disk subsystem in RAID is achieved using a technique called striping. It is based on data partitioning and di

Improving the fault tolerance of the disk subsystem
One of the goals of the RAID concept was the ability to detect and correct errors that arise from disk failures or failures. This is achieved due to redundant disk space

RAID level 0
RAID level 0, strictly speaking, is not a full-fledged member of the RAID family, since this scheme does not contain redundancy and is aimed only at improving performance in a limited way.

RAID Level 1
RAID 1 achieves redundancy by duplicating data. In principle, source data and their copies can be placed arbitrarily on a disk array, the main thing is that they are found

RAID Level 2
RAID 2 systems use parallel access technology, where all disks are simultaneously involved in executing each I/O request. Usually the spindles of all disks are synchronized

RAID Level 3
RAID 3 is organized similarly to RAID2. The difference is that RAID 3 only requires one additional disk- disk parity, no matter how large the disk array is (p

RAID Level 4
In its idea and technique for generating redundant information, RAID 4 is identical to RAID 3, only the size of the stripes in RAID 4 is much larger (usually one or two physical blocks on the disk). Gla

RAID Level 5
RAID 5 has a structure similar to RAID 4. The difference is that RAID 5 does not have a separate disk to store parity stripes, but rather spreads them across all disks. Typical

RAID Level 6
RAID 6 is very similar to RAID 5. Data is also split into block-sized stripes and distributed across all drives in the array. Likewise, the parity stripes are distributed across different disks.

RAID level 7
RAID 7, patented by Storage Computer Corporation, combines an array of asynchronously operating disks and cache memory managed by the array controller's built-in operating system.

RAID level 10
This scheme is the same as RAID 0, but unlike it the role separate disks perform disk arrays, built according to the RAID 1 scheme (Fig. 83). Thus, in RAID 10 soche

Features of the implementation of RAID systems
RAID arrays can be implemented in software, hardware, or a combination of software and hardware. When implemented in software, conventional disk drives are used.

Optical memory
In 1983, the first digital audio system based on compact discs (CD - compact disk) was introduced. A compact disc is a single-sided disc capable of storing more than 60 minutes of

Levels of parallelism
Methods and means for implementing parallelism depend on the level at which it should be provided. Typically, the following levels of parallelism are distinguished: · Job level. Nesk

Program level parallelism
It makes sense to talk about parallelism at the program level in two cases. Firstly, when a program can have independent sections that can be executed in parallel

Instruction level parallelism
Command-level concurrency occurs when the processing of multiple commands or the execution of different steps of the same command may overlap in time. Computing developers

Program Concurrency Profile
The number of processors of a multiprocessor system participating in parallel in the execution of the program at each moment of time t is determined by the concept of the degree of parallelism D(t) (


Let's consider the parallel execution of a program with the following characteristics: · O(n) - the total number of operations (commands) performed on an n-processor system;

Amdahl's Law
By purchasing a parallel computing system to solve his problem, the user expects a significant increase in computing speed due to the distribution of computing power

Gustafson's Law
A certain amount of optimism in the assessment given by Amdahl's law comes from research conducted by the already mentioned John Gustafson from NASA Ames Research. Solving on a computer system

Cache coherence in SMP systems
The memory bandwidth requirements of modern processors can be significantly reduced by using large multi-level caches. Then if these requirements

Cache coherence in MPP systems
There are two different ways building large-scale systems with distributed memory. The simplest way is to eliminate the hardware mechanisms that provide

Organization of interrupts in multiprocessor systems
Let's consider the implementation of interrupts in the simplest symmetrical multiprocessor systems, which use several processors connected by a common bus. Each processor

Conclusion
It is not possible to cover all aspects of the structure and organization of computers in one publication (and even within one course). Knowledge in this area of ​​human activity

Bibliography
1. Aven, O.I. Assessing the quality and optimization of computer systems / O.I. Aven, N. Ya. Turin, A. Ya. Kogan. – M.: Nauka, 1982. – 464 p. 2. Voevodin, V.V. Parallel computing

Random Access Memory (RAM), i.e. random access memory, used by the central processor to jointly store data and executable program code. According to the principles of information storage, RAM can be divided into static and dynamic.

RAM can be considered as a set of cells, each of which can store one information bit.

In static RAM, cells are built on various options triggers. Once a bit is written to such a cell, it can store it for as long as desired - all it needs is power. Hence the name of memory - static, i.e. remaining in an unchanged state. The advantage of static memory is its speed, but the disadvantages are high power consumption and low specific data density, since one trigger cell consists of several transistors and, therefore, takes up a lot of space on the chip. For example, a 4 Mbit chip would consist of more than 24 million transistors, consuming corresponding power.

In dynamic RAM, the elementary cell is a capacitor made using CMOS technology. Such a capacitor is capable of storing for several milliseconds electric charge, the presence of which can be associated with an information bit. When writing a logical one to a memory cell, the capacitor is charged, and when writing a zero, it is discharged. When reading data, the capacitor is discharged, and if its charge was non-zero, then the output of the reading circuit is set to a single value. The process of reading (accessing the cell) is combined with recovery (regeneration) of the charge. If the cell is not accessed for a long time, the capacitor is discharged due to leakage currents and information is lost. To compensate for charge leakage, memory cells are periodically accessed cyclically, because each reversal restores the previous charge of the capacitor. The advantages of dynamic memory include high data density and low power consumption, while the disadvantages are low performance compared to static memory.

Currently, dynamic memory (Dynamic RAM - DRAM) is used as computer RAM, and static memory (Static RAM - SRAM) is used to create high-speed cache memory for the processor.

Dynamic memory chips are organized in the form of a square matrix, and the intersection of a row and a column of the matrix defines one of the elementary cells. When accessing a particular cell, you need to specify the address of the desired row and column. Setting the row address occurs when a special strobe pulse RAS (Raw Address Strobe) is applied to the inputs of the microcircuit, and setting the column address occurs when the CAS pulse (Column Address Strobe) is applied. RAS and CAS pulses are supplied sequentially one after another via the multiplexed address bus.

Regeneration in the microcircuit occurs simultaneously along the entire row of the matrix when accessing any of its cells, i.e. It is enough to cycle through all the lines.

Over time, developers have created various types of memory. They had different characteristics, they used different technical solutions. The main driving force behind the development of memory was the development of computers and central processing units. There was a constant need to increase the speed and amount of RAM.

Page memory

Page mode DRAM (PM DRAM) was one of the first types of computer RAM produced. Memory of this type was produced in the early 1990s, but with the increase in processor performance and resource intensity of applications, it was necessary to increase not only the amount of memory, but also the speed of its operation.

Fast page memory

Fast page memory (eng. fast page mode DRAM, FPM DRAM) appeared in 1995. The memory did not undergo fundamentally new changes, and an increase in operating speed was achieved by increased load to the memory hardware. This type of memory was mainly used for computers with Intel 80486 processors or similar processors from other companies. The memory could operate at frequencies of 25 and 33 MHz over time full access 70 and 60 ns and with a duty cycle of 40 and 35 ns, respectively.

EDO DRAM -- memory with enhanced output

With the advent of processors Intel Pentium FPM DRAM was completely ineffective. Therefore, the next step was memory with an improved output (extended data out DRAM, EDO DRAM). This memory appeared on the market in 1996 and began to be actively used on computers with Intel Pentium processors and higher. Its performance was 10-15% higher compared to FPM DRAM type memory. Its operating frequency was 40 and 50 MHz, respectively, the full access time was 60 and 50 ns, and the duty cycle time was 25 and 20 ns. This memory contains a data latch for the output data, which provides some pipelining for improved read performance.

SDRAM -- synchronous DRAM

Due to the release of new processors and a gradual increase in the system bus frequency, the stability of EDO DRAM memory began to noticeably decrease. It was replaced by synchronous memory (eng. synchronous DRAM, SDRAM). New features of this type of memory were the use of a clock generator to synchronize all signals and the use of pipelined information processing. The memory also worked reliably at higher system bus frequencies (100 MHz and higher).

If for FPM and EDO memory the reading time of the first cell in the chain (access time) is indicated, then for SDRAM the reading time of subsequent cells is indicated. A chain is several consecutive cells. It takes quite a lot of time to read the first cell (60-70 ns), regardless of the type of memory, but the time to read subsequent ones greatly depends on the type. The operating frequencies of this type of memory could be 66, 100 or 133 MHz, the full access time was 40 and 30 ns, and the duty cycle time was 10 and 7.5 ns.

Virtual Channel Memory (VCM) technology was used with this type of memory. VCM uses a virtual channel architecture that allows data to be transferred more flexibly and efficiently using on-chip register channels. This architecture is integrated into SDRAM. VCM, in addition to its high data transfer speed, was compatible with existing SDRAM, which made it possible to upgrade the system without significant costs or modifications. This solution has found support from some chipset manufacturers.

Enhanced SDRAM (ESDRAM)

To overcome some of the signal latency problems inherent in standard DRAM memory, it was decided to embed a small amount of SRAM on the chip, that is, create an on-chip cache.

ESDRAM is essentially SDRAM with a small amount of SRAM. With low latency and burst operation, frequencies up to 200 MHz are achieved. As with external cache memory, the SRAM cache is designed to store and retrieve the most frequently accessed data. Hence the reduction in data access time of slow DRAM.

One such solution was ESDRAM from Ramtron International Corporation.

Batch EDO RAM

EDO RAM (burst extended data output DRAM, BEDO DRAM) has become a cheap alternative to SDRAM. Based on EDO DRAM, its key feature was block-by-block technology (a block of data was read in one clock cycle), which made it faster than SDRAM. However, the inability to operate at a system bus frequency of more than 66 MHz did not allow this type of memory to become popular.

A special type of RAM - Video RAM (VRAM) - was developed based on SDRAM memory for use in video cards. It allowed for a continuous flow of data during the image update process, which was necessary to realize high quality images. Based on memory type VRAM, a memory specification like Windows RAM(WRAM), sometimes mistakenly associated with operating systems Windows family. Its performance is 25% higher than the original SDRAM, thanks to some technical changes.

Compared to conventional SDRAM memory, SDRAM memory with double data rate SDRAM, DDR SDRAM or SDRAM II, the throughput was doubled. Initially, this type of memory was used in video cards, but later support for DDR SDRAM appeared on the chipset side.

All previous DRAMs had separate address, data, and control lines, which imposed limitations on the speed of the devices. To overcome this limitation, some technology solutions have implemented all signals on a single bus. Two of these solutions are DRDRAM and SLDRAM technologies. They have received the most popularity and deserve attention. The SLDRAM standard is open and, like previous technology, SLDRAM uses both clock edges. As for the interface, SLDRAM adopts a protocol called SynchLink Interface and aims to operate at 400 MHz.

DDR SDRAM memory operates at frequencies of 100, 133, 166 and 200 MHz, its full access time is 30 and 22.5 ns, and its duty cycle time is 5, 3.75, 3 and 2.5 ns.

Since the clock frequency ranges from 100 to 200 MHz, and data is transmitted at 2 bits per clock pulse, both on the edge and on the fall of the clock pulse, the effective data transmission frequency lies in the range from 200 to 400 MHz. Such memory modules are designated DDR200, DDR266, DDR333, DDR400.

Direct RDRAM or Direct Rambus DRAM

The RDRAM memory type is developed by Rambus. The high performance of this memory is achieved by a number of features not found in other types of memory. The initial very high cost of RDRAM memory led to the fact that manufacturers of powerful computers preferred less powerful, but cheaper DDR SDRAM memory. Memory operating frequencies are 400, 600 and 800 MHz, full access time is up to 30 ns, duty cycle time is up to 2.5 ns.

Structurally new type operational DDR memory 2 SDRAM was released in 2004. Based on DDR SDRAM technology, this type of memory, due to technical changes, shows higher performance and is intended for use on modern computers. The memory can operate at bus clock speeds of 200, 266, 333, 337, 400, 533, 575 and 600 MHz. In this case, the effective data transmission frequency will be 400, 533, 667, 675, 800, 1066, 1150 and 1200 MHz, respectively. Some memory module manufacturers, in addition to standard frequencies They also produce samples operating at non-standard (intermediate) frequencies. They are intended for use in overclocked systems where frequency headroom is required. Full access time -- 25, 11.25, 9, 7.5 ns or less. Duty cycle time - from 5 to 1.67 ns.

This type of memory is based on DDR2 SDRAM technologies with twice the data transfer frequency on the memory bus. Is different reduced energy consumption compared to its predecessors. The bandwidth frequency ranges from 800 to 2400 MHz (the frequency record is more than 3000 MHz), which provides greater throughput compared to all predecessors.

DRAM memory designs

Rice. 4. Various cases DRAM. From top to bottom: DIP, SIPP, SIMM (30-pin), SIMM (72-pin), DIMM (168-pin), DIMM (184-pin, DDR)

Fig.5.

Rice. 6. DDR2 module in 204-pin SO-DIMM package

DRAM memory is structurally implemented both in the form of separate microcircuits in packages such as DIP, SOIC, BGA, and in the form of memory modules of the type: SIPP, SIMM, DIMM, RIMM.

Initially, memory chips were produced in DIP-type packages (for example, the K565RUxx series), then they began to be produced in more technologically advanced packages for use in modules.

Many SIMM modules and the vast majority of DIMMs had SPD (Serial Presence Detect) installed - a small EEPROM memory chip that stores module parameters (capacity, type, operating voltage, number of banks, access time, etc.), which were available in software as equipment in which the module was installed (used for auto-configuring parameters), and to users and manufacturers.

SIPP modules

SIPP (Single In-line Pin Package) type modules are rectangular boards with contacts in the form of a series of small pins. This type of design is practically no longer used, since it was later replaced by SIMM-type modules.

SIMM modules

SIMM (Single In-line Memory Module) type modules are long rectangular boards with a number of pads along one of its sides. The modules are fixed in the connection connector (socket) using latches, by installing the board at a certain angle and pressing it until it is brought to a vertical position. Modules of 4, 8, 16, 32, 64, 128 MB were produced.

The most common are 30- and 72-pin SIMMs.

DIMMs

Modules of the DIMM type (Dual In-line Memory Module) are long rectangular boards with rows of contact pads along both sides, installed vertically into the connection connector and secured at both ends with latches. Memory chips on them can be placed on one or both sides of the board.

SDRAM memory modules are most common in the form of 168-pin DIMMs, memory DDR type SDRAM is in the form of 184-pin, and modules such as DDR2, DDR3 and FB-DIMM SDRAM are 240-pin modules.

SO-DIMMs

For portable and compact devices (Mini-ITX form factor motherboards, laptops, notebooks, tablets, etc.), as well as printers, network and telecommunications equipment, etc., structurally reduced DRAM modules (both SDRAM and DDR SDRAM) -- SO-DIMM (Small outline DIMM) -- analogues of DIMM modules in a compact design to save space.

RIMM modules

Modules of the RIMM type (Rambus In-line Memory Module) are less common; they come with RDRAM type memory. They are represented by 168- and 184-pin varieties, and on the motherboard such modules must be installed only in pairs, otherwise special plug modules are installed in empty connectors (this is due to the design features of such modules). There are also 242-pin PC1066 RDRAM RIMM 4200 modules, which are not compatible with 184-pin connectors, and a smaller version of RIMM - SO-RIMM, which are used in portable devices.

In synchronous memory, all processes when performing data writing and reading operations are coordinated in time with the clock frequency of the central processor (or system bus), i.e. Memory and CPU work synchronously without waiting cycles. Information is transmitted in packets using a high-speed synchronized interface.

Memory type SDRAM. Let's look at the main features of synchronous dynamic memory SDRAM.

Composition and purpose of signals. Synchronous memory signals include signals RAS#, CAS#, WE#, M.A.#, which perform the same functions as in asynchronous dynamic memory. In addition to the above signals, signals unique to SDRAM are used. These include:

  • CLK(Clock) – synchronization clock pulses acting on a positive edge (0 → 1);
  • SKE(Clock Enable) – enable/disable synchronization when SKE= 1/0. The absence of clock pulses reduces memory power consumption. Switching to a mode with reduced power consumption is carried out using special commands when SKE= 0. Should be highlighted three modes:

■ low consumption mode(Power Down Mode), implemented by the NOP or INHBT commands. In these modes, the memory chip does not accept control commands. The duration of stay in them is limited by the regeneration period;

■ synchronization pause mode(Clock Suspend Mode), in which there is no data transmission and new commands are not accepted. The microcircuit goes into this mode when executing a read or write command when the signal SKE = 0 is set;

■ self-regeneration mode, into which the microcircuit goes upon the Self Refresh command. In this mode, regeneration cycles are periodically performed according to the internal timer with external synchronization disabled;

  • C.S.# (Chip Select) – chip selection. At CS# = 0 command decoding is enabled; at C.S.# = 1 decoding of commands is prohibited, but execution of started commands continues;
  • BSO, BSL(Bank Select) or VA 0, VA 1 (Bank Address) – select the bank to which the command is addressed;
  • D and A 1 set column address, signal A 10 = 1 enables auto precharge mode. In Precharge cycles the signal A 10=1 turns on the precharge mode for all banks, regardless of the values ​​of signals 550, 551;
  • DQ(Data Input/Output) – bidirectional data input/output lines;
  • DQM(Data Mask) – data masking. IN reading cycle at DQM= 1, the data bus is switched to a high-impedance state (turned off) after two clock cycles. IN write cycle at DQM 1 recording of current data is prohibited, when DQM = 0 allows recording without delay.

SDRAM chips have two or more banks as well as column address counters. The advantages of the synchronous SDRAM interface include the fact that, in combination with an internal multi-bank organization, it is capable of providing high memory performance with frequent access.

SDRAM memory has the ability to activate rows in several banks. Each line is activated by its own ACT command during any transaction with another bank. After activating a line of the selected bank during writing and reading, the line can be closed not immediately, but after performing a series of calls to its elements. To access the open row of the required bank, the read RD and write WR commands are used, which indicate the column address and bank number. It is possible to organize the writing/reading processes in such a way that the data bus in each clock cycle will carry the next portion of data for a series of accesses to different memory areas. Because calls do not require activation commands, they will execute faster. Using the CS# chip's sampling signal, you can keep rows open in banks of different chips connected by a common memory bus.

Using a counter it is very easy to implement batch operating mode. At initialization, the burst length (1, 2, 4, 8 elements), the order of addresses in the burst (interleaved or linear), and the operating mode (batch mode for all operations or read-only) can be programmed. With DQM = 1 signal in mode records recording of any element of the package is blocked, and in the mode reading– transferring the data buffer to a high-impedance state.

Thanks to the elimination of wait cycles, address alternation, batch mode, and three-stage pipeline addressing, it was possible to reduce the operating cycle time of the microcircuit to 8...10 ns (1: 10 ns = 100 MHz) and increase the data transfer rate to 800 MB/s at the system clock frequency 100 MHz buses.

Memory type DDR SDRAM (Dual Data Rate - double data speed). The main feature of DDR memory in relation to conventional SDRAM is that data is switched at the edge and fall of the system bus clock pulses. This makes it possible to perform two calls per clock interval and double the performance. When transmitting data along the edge and fall of synchronization pulses, increased requirements are placed on the timing parameters of control signals and data. To satisfy them, the following measures have been taken: a DQS strobe signal has been introduced; two clock pulses CLK1 and CLK2 are used, as well as additional hardware. Unlike conventional SDRAM chips, in which write data is transmitted simultaneously with the command, in DDR SDRAM write data is supplied with a delay of one clock cycle (Write Latency). The CAS Latency value can be fractional (CL = 2, 2.5, 3).

At 100 MHz, DDR SDRAM has a peak performance of 200 Mbit per pin, which in 8-byte DIMM modules corresponds to a performance of 1600 MB/s. At a frequency of 133 MHz, performance is 2100 MB/s.

Memory type RDRAM. In 1992, the American company Rambus began developing a new type of memory, which was called RDRAM (Rambus DRAM). The storage core of this memory is built on conventional CMOS dynamic memory cells. However, the memory interface differed significantly from the traditional synchronous interface. The high-speed Rambus RDRAM interface provides the ability to transfer data at speeds of up to 600 MB/s via a 1-byte data bus. Effective throughput reaches 480 MB/s, which is 10 times higher than that of EDO DRAM devices. The access time to a number of memory cells is less than 2 ns per byte, and the latency (access time to the first byte of the data array) is 23 ns. When exchanging large amounts of data, Rambus memory is the best option in terms of performance/cost ratio. A further development was the Direct DRAM interface, or simply DDRAM, with a 16-bit (18-bit for chips with control bits) data bus. RDRAM has been used in high-performance personal computers since 1999 and is supported in system logic chipsets.

The structure of the RDRAM memory subsystem consists of memory controller, dripping And actual memory chips(Fig. 10.9).

RDRAM memory in relation to other types of memory (FPM/EDO and SDRAM) has the following distinctive features:

  • is narrow channel device data transmission. The amount of data transferred per clock cycle is only 16 bits, not counting two additional parity bits;
  • Thanks to a small number (30) of channel lines and special measures taken for their location, the channel clock frequency is increased

Rice. 10.9.

up to 400 MHz, which provides performance equal to 16x400x2/8 = 1600 MB/s (taking into account data transfer along the edge and fall of clock pulses). To improve performance, you can use dual- and quad-channel RDRAM, which allows you to increase the data transfer rate to 3.2 or 6.4 MB/s, respectively. The dual-channel PC800 RDRAM currently in use is the fastest memory type (not much ahead of the PC2100 DDR SDRAM);

  • The cell address is transmitted via separate buses: one for the row address, the other for the column address. Addresses are transmitted in sequential packets. During operation, RDRAM performs pipeline fetching from memory, and the address can be transmitted simultaneously with the data;
  • To improve performance, another design solution was proposed: the transmission of control information is separated from the transmission of data on the bus. For this purpose, independent control circuits are provided and two groups of buses are allocated: address buses for row and column selection commands and an information bus for data transmission 2 bytes wide;
  • consumes little energy. The supply voltage of R1MM memory modules, as well as RDRAM devices, reaches only 2.5 V. The voltage of the low-voltage signal varies from 1.0 to 1.8 V, i.e. the voltage drop is 0.8 V. In addition, RDRAM has four modes of reduced power consumption and can automatically switch to standby mode at the end of the transaction, which allows further savings in power consumption.

Memory with virtual channels– VC SDRAM. Memory assignment. In a modern computer, RAM is accessed by various devices. Some of the devices (programs that run in parallel in a multitasking operating system) reserve certain memory areas for themselves. Devices such as processor, IDE and SCSI controllers, sound cards and AGP video cards and others access RAM directly. When simultaneously accessing the memory of several devices, their service is delayed. To eliminate this drawback, a special memory module architecture was developed, which includes 16 independent memory channels. Each device (program) is allocated a separate channel for accessing memory.

Memory architecture. A feature of the memory architecture with virtual channels (Virtual Channel Memory Architecture) is that between the array of memory cells and the external interface of the memory chip there are 16 channel buffers(Fig. 10.10). Several buffers can be combined into virtual channels. In terms of composition and signal levels, VC SDRAM (Virtual Channel SDRAM) microcircuits are similar to conventional SDRAM (they have an external organization of 4, 8 or 16 bits of data), but differ in structure, command system and a number of other indicators. The microcircuit contains two banks (A and B), made in the form of a square matrix. Each row of the matrix is ​​divided into 4 segments. For a 128 Mbit chip, the matrix size is 8K x 8K, a row is 8K bits, and a segment is 2K bits. The channel buffer capacity is also 2K bits. For one access to the matrix, parallel transmission of 2K bits of data is performed between one of the buffers and a segment of the selected row. The chips are installed in a 168-pin DIMM module.

Organization of exchange. Data exchange operations are divided into two phases:

external exchange data between the information source and the channel buffer. This exchange phase is carried out through the memory controller (not shown in Fig. 10.10) and is performed using read and write commands (READ and WRITE), which indicate the channel number and column address. The exchange occurs in batch mode. The length of the package is programmable and can be 1, 2, 4, 8 or 16 gears (elements). The first data when reading a channel appears with a delay of 2 clock cycles relative to the read command, the next data appears at each clock cycle;

Rice. 10.10.

internal exchange data between channels and an array of storage cells. The exchange proceeds in the following sequence:

■ using the PRFA prefetch and RSTA save commands, received immediately after accessing the memory array, rows are automatically deactivated (precharge). To deactivate the selected bank and both banks at once, you can use special commands;

■ by the ACT command, which specifies the bank (A or B) and the row address, the required matrix row is activated;

■ The PRF (Prefetch) and RST (Restore) commands implement reading an array into a buffer and storing the buffer data in the array. The commands indicate the bank number, segment number and channel number.

Both phases of the exchange are carried out using commands from the external interface almost independently of each other. The list of commands used is given in table. 10.1.

Regeneration of VC DRAM is performed by periodically issuing REF commands (auto-regeneration according to the internal counter of the address of regenerated rows) or in energy saving mode self-regeneration, into which the microcircuits switch to the SELF command.

Many modern Chipsets support VCM SDRAM DIMMs.

It should be noted that the possibility of using one or another type of memory is determined by the motherboard chipset.

There is much more dynamic memory in a computer than static memory, since DRAM is used as the main memory of the VM. Like SRAM, dynamic memory consists of a core (an array of electronic devices) and interface logic (buffer registers, data reading amplifiers, regeneration circuits, etc.). Although the number of types of DRAM has already exceeded two dozen, their cores are organized almost identically. The main differences are related to the interface logic, and these differences are also due to the scope of application of the microcircuits - in addition to the main memory of the VM, dynamic memory ICs are included, for example, in video adapters. The classification of dynamic memory chips is shown in Fig. 5.10.

To evaluate the differences between types of DRAM, let’s first look at the algorithm for working with dynamic memory. For this we will use Fig. 5.6.

Unlike SRAM, the address of a DRAM cell is transferred to the chip in two steps, first the column address and then the row address, which makes it possible to reduce the number of address bus pins by approximately half, reduce the size of the case and place a larger number of chips on the motherboard. This, of course, leads to a decrease in performance, since it takes twice as long to transfer the address. To indicate which part of the address is transmitted at a certain moment, two auxiliary signals RAS and CAS are used. When accessing a memory cell, the address bus is set to the address of the row. After the processes on the bus have stabilized, the RAS signal is applied and the address is written to the internal register of the chip

Rice. 5.10. Classification of dynamic RAM: a - chips for main memory; b - chips for video adapters

memory. The address bus is then set to the column address and the CAS signal is issued. Depending on the state of the WE line, data is read from the cell or written to the cell (the data must be placed on the data bus before writing). The interval between setting the address and issuing the RAS (or CAS) signal is determined by the technical characteristics of the microcircuit, but usually the address is set in one cycle of the system bus, and the control signal in the next. Thus, to read or write one cell of dynamic RAM, five clock cycles are required, in which the following occurs: issuing a row address, issuing a RAS signal, issuing a column address, issuing a CAS signal, performing a read/write operation (in static memory, the procedure takes only two up to three measures).

You should also remember the need to regenerate data. But along with the natural discharge of the capacitor, the electronic device also leads to a loss of charge over time when reading data from DRAM, so after each reading operation the data must be restored. This is achieved by writing the same data again immediately after reading it. When reading information from one cell, the data of the entire selected row is actually output at once, but only those that are in the column of interest are used, and all the rest are ignored. Thus, a read operation from one cell results in the destruction of the entire row's data and must be recovered. Data regeneration after reading is performed automatically by the interface logic of the chip, and this happens immediately after reading the line.

Now let's look at the different types of dynamic memory chips, starting with system DRAM, that is, chips designed to be used as main memory. At the initial stage, these were asynchronous memory chips, the operation of which is not strictly tied to the clock pulses of the system bus.

Asynchronous dynamic RAM. Asynchronous dynamic RAM chips are controlled by RAS and CAS signals, and their operation, in principle, is not directly related to bus clock pulses. Asynchronous memory is characterized by additional time spent on the interaction of memory chips and the controller. Thus, in an asynchronous circuit, the RAS signal will be generated only after a clock pulse arrives at the controller and will be perceived by the memory chip after some time. After this, the memory will produce data, but the controller will be able to read it only upon the arrival of the next clock pulse, since it must work synchronously with the rest of the VM devices. Thus, there are slight delays during the read/write cycle due to the memory controller and memory controller waiting.

MicrocircuitsDRAM. The first dynamic memory chips used the simplest method of data exchange, often called conventional. It allowed reading and writing a memory line only every fifth clock cycle (Fig. 5.11, A). The steps of such a procedure have been described previously. Traditional DRAM corresponds to the formula 5-5-5-5. Microcircuits of this type could operate at frequencies up to 40 MHz and, due to their slowness (access time was about 120 seconds), did not last long.

MicrocircuitsFPM DRAM. Dynamic RAM chips that implement FPM mode are also early types of DRAM. The essence of the regime was shown earlier. The reading circuit for FPM DRAM (Fig. 5.11, b) is described by the formula 5-3-3-3 (14 clock cycles in total). The use of a fast page access scheme has reduced access time to 60 seconds, which, taking into account the ability to operate at higher bus frequencies, has led to an increase in memory performance compared to traditional DRAM by approximately 70%. This type of chip was used in personal computers until about 1994.

MicrocircuitsEDO DRAM. The next stage in the development of dynamic RAM was ICs with hyperpage mode, access(HRM, Hyper Page Mode), better known as EDO (Extended Data Output - extended data retention time at the output). The main feature of the technology is the increased time of data availability at the output of the microcircuit compared to FPM DRAM. In FPM DRAM chips, the output data remains valid only when the CAS signal is active, which is why the second and subsequent row accesses require three clock cycles: a CAS switch to the active state, a data read clock, and a CAS switch to the inactive state. In EDO DRAM, on the active (falling) edge of the C AS signal, the data is stored in an internal register, where it is stored for some time after the next active edge of the signal arrives. This allows the stored data to be used when the CAS is already in an inactive state (Fig. 5.11, V)

In other words, timing parameters are improved by eliminating cycles of waiting for the moment of data stabilization at the output of the microcircuit.

The reading pattern of EDO DRAM is already 5-2-2-2, which is 20% faster than FPM. Access time is about 30-40 ns. It should be noted that the maximum system bus frequency for EDO DRAM chips should not exceed 66 MHz.

MicrocircuitsBEDO DRAM. EDO technology has been improved by VIA Technologies. The new modification of EDO is known as BEDO (Burst EDO). The novelty of the method is that during the first access, the entire line of the microcircuit is read, which includes consecutive words of the package. The sequential transfer of words (switching columns) is automatically monitored by the internal counter of the chip. This eliminates the need to issue addresses for all cells in a packet, but requires support from external logic. The method allows you to reduce the time of reading the second and subsequent words by another clock cycle (Fig. 5.11, d), due to which the formula takes the form 5-1-1-1.

5.11. Timing diagrams of various types of asynchronous dynamic memory with a packet length of four words: a - traditional DRAM; b - FPM FRAM; V- EDO DRAM;

G - BEDO DRAM

MicrocircuitsEDRAM. A faster version of DRAM was developed by Ramtron's subsidiary, Enhanced Memory Systems. The technology is implemented in FPM, EDO and BEDO variants. The chip has a faster core and internal cache memory. The presence of the latter is the main feature of the technology. The cache memory is static memory (SRAM) with a capacity of 2048 bits. The EDRAM core has 2048 columns, each of which is connected to an internal cache. When accessing any cell, the entire row (2048 bits) is read simultaneously. The read line is entered into SRAM, and the transfer of information to cache memory has virtually no effect on performance since it occurs in one clock cycle. When further accesses to cells belonging to the same row are made, the data is taken from the faster cache memory. The next access to the kernel occurs when accessing a cell that is not located in a line stored in the cache memory of the chip.

The technology is most effective when reading sequentially, that is, when the average access time for a chip approaches the values ​​characteristic of static memory (about 10 ns). The main difficulty is incompatibility with controllers used when working with other types of DRAM

Synchronous dynamic RAM. In synchronous DRAM, information exchange is synchronized by external clock signals and occurs at strictly defined points in time, which allows you to take everything from the bandwidth of the processor-memory bus and avoid wait cycles. Address and control information is recorded in the memory IC. After which the response of the microcircuit will occur through a clearly defined number of clock pulses, and the processor can use this time for other actions not related to accessing memory. In the case of synchronous dynamic memory, instead of the duration of the access cycle, they talk about the minimum permissible period of the clock frequency, and we are already talking about a time of the order of 8-10 ns.

MicrocircuitsSDRAM. The abbreviation SDRAM (Synchronous DRAM) is used to refer to “regular” synchronous dynamic RAM chips. The fundamental differences between SDRAM and the asynchronous dynamic RAM discussed above can be reduced to four points:

Synchronous method of data transfer to the bus;

Conveyor mechanism for packet forwarding;

Use of several (two or four) internal memory banks;

Transferring part of the functions of the memory controller to the logic of the microcircuit itself.

Memory synchronicity allows the memory controller to “know” when data is ready, thereby reducing the costs of waiting and searching cycles for data. Since data appears at the output of the IC simultaneously with clock pulses, the interaction of memory with other VM devices is simplified.

Unlike BEDO, the pipeline allows packet data to be transferred clock by clock, thanks to which the RAM can operate uninterruptedly at higher frequencies than asynchronous RAM. The advantages of a pipeline are especially important when transmitting long packets, but not exceeding the length of the chip line.

A significant effect is achieved by dividing the entire set of cells into independent internal arrays (banks). This allows you to combine access to a cell in one bank with preparation for the next operation in the remaining banks (recharging control circuits and restoring information). The ability to keep multiple lines of memory open simultaneously (from different banks) also helps improve memory performance. When accessing banks alternately, the frequency of accessing each of them individually decreases in proportion to the number of banks and SDRAM can operate at higher frequencies. Thanks to the built-in address counter, SDRAM, like BEDO DRAM, allows reading and writing in burst mode, and in SDRAM the burst length varies and in burst mode it is possible to read an entire memory line. The IC can be characterized by the formula 5-1-1-1. Despite the fact that the formula for this type of dynamic memory is the same as that of BEDO, the ability to operate at higher frequencies means that SDRAM with two 6 banks at a bus clock speed of 100 MHz can almost double the performance of BEDO memory.

MicrocircuitsDDR SDRAM. An important step in the further development of SDRAM technology was DDR SDRAM (Double Data Rate SDRAM - SDRAM with double the data transfer rate). Unlike SDRAM, the new modification produces data in burst mode on both edges of the synchronization pulse, due to which the throughput doubles. There are several DDR SDRAM specifications, depending on the system bus clock speed: DDR266, DDR333, DDR400, DDR533. Thus, the peak bandwidth of a DDR333 memory chip is 2.7 GB/s, and for DDR400 it is 3.2 GB/s. DDR SDRAM is currently the most common type of dynamic memory in personal VMs.

MicrocircuitsRDRAM, DRDRAM. The most obvious ways to increase the efficiency of a processor with memory are to increase the bus clock frequency or the sampling width (the number of simultaneously transferred bits). Unfortunately, attempts to combine both options encounter significant technical difficulties (as the frequency increases, the problems of electromagnetic compatibility become worse; it becomes more difficult to ensure that all parallelly sent bits of information arrive at the same time to the consumer). Most synchronous DRAMs (SDRAM, DDR) use wide sampling (64 bits) at a limited bus frequency.

A fundamentally different approach to building DRAM was proposed by Rambus in 1997. It focuses on increasing the clock speed to 400 MHz while reducing the sample width to 16 bits. The new memory is known as RDRAM (Rambus Direct RAM). There are several varieties of this technology: Base, Concurrent and Direct. In all, clocking is carried out on both edges of clock signals (as in DDR), due to which the resulting frequency is 500-600, 600-700 and 800 MHz, respectively. The first two options are almost identical, but the changes in Direct Rambus technology are quite significant.

First, let's look at the fundamental points of RDRAM technology, focusing mainly on the more modern version - DRDRAM. The main difference from other types of DRAM is the original data exchange system between the core and the memory controller, which is based on the so-called “Rambus channel” using an asynchronous block-oriented protocol. At the logical level, information between the controller and memory is transferred in packets.

There are three types of packages: data packages, row packages and column packages. Packets of rows and columns are used to transmit commands from the memory controller to control the rows and columns of the array of storage elements, respectively. These commands replace the conventional chip control system using RAS, CAS, WE and CS signals.

The GE array is divided into banks. Their number in a crystal with a capacity of 64 Mbit is 8 independent or 16 dual banks. In dual banks, the pair of banks share common read/write amplifiers. The internal core of the chip has a 128-bit data bus, which allows 16 bytes to be transferred at each column address. When recording, you can use a mask in which each bit corresponds to one byte of the packet. Using the mask, you can specify how many bytes of the packet and which bytes should be written to memory.

The data, row and column lines in the channel are completely independent, so row commands, column commands and data can be transmitted simultaneously, and for different banks of the chip. Column packets contain two fields and are transmitted over five lines. The first field specifies the main write or read operation. The second field contains either an indication of the use of a record mask (the mask itself is transmitted over the data lines), or an extended operation code that defines an option for the main operation. String packets are divided into activation, cancellation, regeneration and power mode switching commands. Three lines are allocated for transmitting string packets.

The write operation can immediately follow the read - only a delay is needed for the time the signal travels through the channel (from 2.5 to 30, not depending on the length of the channel). To equalize delays in the transmission of individual bits of the transmitted code, the conductors on the board must be positioned strictly in parallel, have the same length (the length of the lines should not exceed 12 cm) and meet strict requirements defined by the developer.

Each write in the channel can be pipelined, with the first data packet having a latency of 50 ns, and the remaining read/write operations occurring continuously (latency is only introduced when changing from a write to a read operation, and vice versa).

Available publications mention the work of Intel and Rambus on a new version of RDRAM, called nDRAM, which will support data transfer at frequencies up to 1600 MHz.

MicrocircuitsSLDRAM. A potential competitor to RDRAM as a memory architecture standard for future personal VMs is a new type of dynamic RAM developed by the SyncLm Consortium, a consortium of VM manufacturers, known by the abbreviation SLDRAM. Unlike RDRAM, the technology of which is the property of Rambus and Intel, this standard- open. At the system level, the technologies are very similar. Data and commands from the controller to memory and back to SLDRAM are transmitted in packets of n or 8 parcels. Commands, address and control signals are sent over a unidirectional 10-bit command bus. Read and write data is supplied over a bidirectional 18-bit data bus. Both buses operate at the same frequency. For now, this frequency is still 200 MHz, which, thanks to DDR technology, is equivalent to 400 MHz. The next generations of SLDRAM should operate at frequencies of 400 MHz and higher, that is, provide an effective frequency of more than 800 MHz.

Up to 8 memory chips can be connected to one controller. To avoid delays in signals from chips further away from the controller, the timing characteristics for each chip are determined and entered into its control register when the power is turned on.

MicrocircuitsESDRAM. This is a synchronous version of EDRAM that uses the same techniques to reduce access time. A write operation, unlike a write operation, bypasses the cache, which increases FSDRAM performance when resuming reading from a line already in the cache. Thanks to the presence of two banks in the chip, downtime due to preparation for read/write operations is minimized. The disadvantages of the microcircuit under consideration are the same as those of EDRAM - the complication of the controller, since it must take into account the possibility of preparing to read a new kernel line into the cache memory. In addition, with an arbitrary sequence of addresses, the cache memory is used inefficiently.

MicrocircuitsCDRAM. This type of RAM was developed by Mitsubishi Corporation, and it can be considered as a revised version of ESDRAM, free from some of its imperfections. The capacity of the cache memory and the principle of placing data in it have been changed. The capacity of a single cache block has been reduced to 128 bits, so the 16-kilobit cache can simultaneously store copies of 128 memory locations, allowing for more efficient use of cache memory. Replacement of the first memory section placed in the cache begins only after the last (128th) block is filled. The means of access have also changed. Thus, the chip uses separate address buses for the static cache and the dynamic core. Transferring data from the dynamic core to cache memory is combined with issuing data to the bus, so frequent but short transfers do not reduce the performance of the IC when reading large amounts of information from memory and put CDRAM on par with ESDRAM, and when reading at selective addresses, CDRAM clearly wins. It should be noted, however, that the above changes led to even greater complexity of the memory controller.







2024 gtavrl.ru.