Current pending errors count is lit yellow. What is S.M.A.R.T


Today, I would like to talk in a little more detail about the SMART technology mentioned in passing in the previous article about the criteria for choosing a hard drive, and also to clarify the issue of the appearance of bad sectors when checking the surface with special programs and the exhaustion of the reserve surface for their reassignment - an issue raised in the previous article.

To begin with, as always, a brief historical excursion. The reliability of a hard drive (and any storage device in the most general case) is always given great importance. And the point is not at all in its cost, but in the value of the information that it takes with it to another world, passing away itself, and in the loss of profit associated with downtime when hard drives fail, if we are talking about business users, even if the information remains. And it’s quite natural that you want to know about such unpleasant moments in advance. Even ordinary reasoning at the everyday level suggests that monitoring the state of the device in operation can suggest such moments. All that remains is to somehow implement this observation in the hard drive.

For the first time, engineers from the blue giant (IBM, that is) thought about this problem. And in 1995, they proposed a technology that monitors several critical parameters of the drive and attempts, based on the collected data, to predict its failure - Predictive Failure Analysis (PFA). The idea was picked up by Compaq, which a little later created its own technology - IntelliSafe. Seagate, Quantum and Conner also participated in Compaq's development. The technology they created also monitored a number of disk performance characteristics, compared them with an acceptable value, and reported to the host system if there was a danger. This was a huge step forward, if not in increasing the reliability of hard drives, then at least in reducing the risk of losing information when using them. The first attempts were successful and showed the need for further development of the technology. Already in the association of all major hard drive manufacturers, S.M.A.R.T (Self Monitoring Analyzing and Reporting Technology) technology has appeared, based on IntelliSafe and PFA technologies (by the way, PFA still exists today as a set of technologies for monitoring and analyzing various subsystems of IBM servers, including including the disk subsystem, and monitoring the latter is based precisely on SMART technology).

So, SMART is a technology for internally assessing the condition of a disk, and a mechanism for predicting possible failure of a hard disk. It is important to note that the technology, in principle, does not solve emerging problems (the main ones are shown in the figure below); it can only warn about a problem that has already arisen or is expected in the near future.

At the same time, it must also be said that technology is not able to predict absolutely all possible problems and this is logical: failure of electronics as a result of a power surge, damage to heads and surfaces as a result of an impact, etc. no technology can predict. The only predictable problems are those that are associated with the gradual deterioration of any characteristics, the uniform degradation of any components.

Stages of technology development

SMART technology went through three stages in its development. In the first generation, observation of a small number of parameters was implemented. No independent actions of the drive were provided for. The launch was carried out only by commands via the interface. There is no specification completely describing the standard, and, therefore, there was and is no clear designation about which parameters should be controlled. Moreover, their definition and determination of the permissible level of their reduction was entirely left to the hard drive manufacturers (which is natural due to the fact that the manufacturer knows better what exactly should be controlled on his particular hard drive, because all hard drives are too different). And for this reason, the software, written, as a rule, by third-party companies, was not universal, and could erroneously report an upcoming failure (confusion arose due to the fact that different manufacturers stored the values ​​of different parameters under the same identifier). There were a large number of complaints that the number of cases of detection of a pre-failure state was extremely small (features of human nature: you want to get everything at once, somehow it never occurred to anyone to complain about sudden disk failures before the introduction of SAMRT). The situation was further aggravated by the fact that in most cases the minimum necessary requirements for the functioning of SMART were not met (we'll talk about this later). Statistics show that the number of predicted failures was less than 20%. The technology at this stage was far from perfect, but it was a revolutionary step forward.

Not much is known about the second stage of SMART development - SMART II. Basically the same problems were observed as with the first one. Innovations included the ability to perform a background surface check, performed automatically by the disk during downtime, and error logging; the list of controlled parameters was expanded (again, depending on the model and manufacturer). Statistics show that the number of predicted failures has reached 50%.

The current stage is represented by SMART III technology. Let’s look at it in more detail, try to understand in general terms how it works, what is needed in it and why.

We already know that SMART monitors the main characteristics of the drive. These parameters are called attributes. The parameters required for monitoring are determined by the manufacturer. Each attribute has some value - Value. Typically ranging from 0 to 100 (although it can range up to 200 or up to 255), its value is the reliability of a particular attribute relative to some reference value (defined by the manufacturer). A high value indicates no change in this parameter or, depending on the value, its slow deterioration. A low value indicates rapid degradation or a possible imminent failure, i.e. the higher the Value of the attribute, the better. Some monitoring programs display the value Raw or Raw Value - this is the value of the attribute in the internal format (which is also different for disks of different models and different manufacturers), in the one in which it is stored in the drive. For a simple user it is not very informative; the Value calculated from it is of greater interest. For each attribute, the manufacturer determines the minimum possible value at which trouble-free operation of the drive is guaranteed - Threshold. If the attribute value is below the Threshold value, malfunction or complete failure is very likely. It remains only to add that attributes can be critical and non-critical. If a critical parameter goes beyond the Threshold, it actually means failure; going beyond the permissible values ​​of a non-critical parameter indicates a problem, but the disk can maintain its functionality (albeit, perhaps with some deterioration in some characteristics: performance, for example).

The most commonly observed critical characteristics include: Raw Read Error Rate - the frequency of errors when reading data from a disk, the origin of which is determined by the disk hardware.

Spin Up Time- time to spin up a package of disks from a resting state to operating speed. When calculating the normalized value (Value), the practical time is compared with some reference value set at the factory. A non-deteriorating non-maximum value with Spin Up Retry Count Value = max (Raw equal to 0) does not indicate anything bad. The difference in time from the reference one can be caused by a number of reasons, for example, the power supply has failed.

Spin Up Retry Count- the number of repeated attempts to spin up disks to operating speed, if the first attempt was unsuccessful. A non-zero Raw value (and therefore a non-maximum Value) indicates problems in the mechanical part of the drive.

Seek Error Rate- frequency of errors when positioning the head block. A high Raw value indicates the presence of problems, which may include damage to the servo marks, excessive thermal expansion of the disks, mechanical problems in the positioning unit, etc. A constant high Value indicates that everything is fine.

Reallocated Sector Count- number of sector reassignment operations. SMART in modern ones is capable of analyzing a sector for stability “on the fly” and, if it is recognized as faulty, reassigning it. Below we will talk about this in more detail.

Of the non-critical, so to speak, information attributes, the following are usually monitored:

  • Start/Stop Count- total number of spindle starts/stops. The disk motor is guaranteed to be able to withstand only a certain number of on/off cycles. This value is selected as Treshold. The first models of 7200 rpm disks had an unreliable motor, could only carry a small number of them, and quickly failed.
  • Power On Hours- the number of hours spent in the switched on state. The rated time between failures (MBTF) is selected as a threshold value for it. Taking into account the usually absolutely incredible values ​​of MBTF, it is unlikely that the parameter will ever reach a critical threshold. But even in this case, disk failure is completely unnecessary.
  • Drive Power Cycle Count- the number of complete on-off cycles of the disk. Using this and the previous attribute, you can estimate, for example, how much the disk was used before purchase.
  • Temperature- simple and clear. The readings of the built-in temperature sensor are stored here. Temperature has a huge impact on disk life (even if it is within acceptable limits).
  • Current Pending Sector Count- the number of sectors that are candidates for replacement is stored here. They have not yet been defined as bad, but reading them differs from reading a stable sector, the so-called suspicious or unstable sectors.
  • Uncorrectable Sector Count- the number of errors when accessing the sector that were not corrected. Possible causes may be mechanical failures or surface damage.
  • UDMA CRC Error Rate- the number of errors that occur when transmitting data via the external interface. May be caused by poor-quality cables or abnormal operating conditions.
  • Write Error Rate- shows the frequency of errors occurring when writing to disk. Can serve as an indicator of the surface quality and mechanics of the drive.

    All errors and parameter changes that occur are recorded in SMART logs. This feature has already appeared in SMART II. All parameters of the logs - purpose, size, their number are determined by the hard drive manufacturer. At the moment, you and I are only interested in the fact of their existence. Without details. The information stored in the logs is used to analyze the condition and make forecasts.

    Without going into details, the work of SMART is simple - during operation of the drive, all errors and suspicious phenomena that occur are simply monitored, which are reflected in the corresponding attributes. In addition, starting with SMART II, ​​many drives now have self-diagnosis functions. Running SMART tests is possible in two modes: off-line - the test is actually performed in the background, since the drive is ready to accept and execute a command at any time, and exclusive mode, in which when a command arrives, the test is completed.

    There are three documented types of self-diagnosis tests: background data collection (Off-line collection), shortened test (Short Self-test), extended test (Extended Self-test). The last two can be executed in both background and exclusive modes. The set of tests included in them is not standardized.

    The duration of their execution can be from seconds to minutes and hours. If you suddenly do not access the disk, and it makes sounds as well as during the workload, it just seems to be engaged in introspection. All data collected as a result of such tests will also be stored in logs and attributes.

    Oh those bad sectors...

    Now let's return to the issue of bad sectors, where it all began. SMART III has a feature that allows you to reassign BAD sectors transparently to the user. The mechanism works quite simply; if the reading of a sector is unstable, or there is an error in reading it, SMART adds it to the list of unstable ones and increases their counter (Current Pending Sector Count). If, when accessed again, the sector is read without problems, it will be removed from this list. If not, then when the opportunity arises - in the absence of access to the disk, the disk will begin to independently check the surface, primarily suspicious sectors. If a sector is found to be faulty, it will be reassigned to a sector from the backup surface (RSC will increase accordingly). This background reassignment leads to the fact that on modern hard drives bad sectors are almost never visible when checking the surface with service programs. At the same time, if there are a large number of bad sectors, their reassignment cannot continue indefinitely. The first limiter is obvious - this is the volume of the reserve surface. This is exactly the case I had in mind. The second is not so obvious - the fact is that modern hard drives have two defect lists P-list (Primary, factory) and G-list (Growth, formed directly during operation). And with a large number of reassignments, it may happen that there is no room in the G-list to record a new reassignment. This situation can be identified by the high rate of reassigned sectors in SMART. In this case, all is not lost, but that is beyond the scope of this article.

    So, using SMART data, even without taking the disk to the workshop, you can tell quite accurately what is happening to it. There are various add-on technologies to SMART that allow you to determine the condition of the disk even more accurately and almost reliably the cause of its failure. We will talk about these technologies in a separate article.

    You need to know that purchasing a drive with SMART is not enough to be aware of all the problems occurring with the drive. The disk, of course, can monitor its condition without outside help, but it will not be able to warn itself in case of approaching danger. We need something that will allow us to issue a warning based on SMART data. (the usual chain is shown in the figure below).

    As an option, a BIOS is possible, which checks the status of SMART drives when booting with the corresponding option enabled. But if you want to constantly monitor the condition of the disk, you need to use some kind of monitoring program. Then you will be able to see the information in a detailed and convenient form.



    SmartMonitor from HDD Speed ​​running under DOS


    SIGuiardian running from Windows

    We will also talk about these programs in a separate article. This is exactly what I meant when I said that at first the necessary requirements for operating hard drives with SMART were not met.

    Information storage technologies:

    NoiseGuard technology
    Magneto-optical technologies
  • All modern hard drives support self-testing technology, status analysis, and accumulation of statistical data on the deterioration of their own characteristics S.M.A.R.T. (Self-Monitoring Analysis and Reporting Technology). Basics of S.M.A.R.T. were developed in 1995 by joint efforts of leading hard drive manufacturers. In the process of improving drive equipment, the capabilities of the technology were also refined, and after the SMART standard, SMART II appeared, then SMART III, which, obviously, will not be the last either.

    During its operation, the hard drive constantly monitors certain parameters of its state and reflects them in special characteristics - attributes(Attribute), stored, as a rule, in a specially allocated part of the disk surface, accessible only to the internal firmware of the drive - service area. Attribute data can be read by special software.

    Attributes are identified by their digital number, most of which are interpreted identically by drives of different models. Some attributes may be determined by a specific hardware manufacturer and are supported only by certain drive models.

    Attributes consist of several fields, each of which has a specific meaning. Typically, S.M.A.R.T. reader programs give a decoding of attributes in the form:

    1. Attribute- attribute name
    2. ID- attribute identifier
    3. Value- current attribute value
    4. Threshold- minimum threshold value of the attribute
    5. Worst- the lowest attribute value for the entire operating time of the drive
    6. Raw- absolute value of the attribute
    7. Type(optional) - attribute type - characterizes performance (PR - Performance-related), characterizes failures (ER - Error rate), event counter (EC - Events count), defined by the manufacturer or not used (SP - Self-preserve);

    To analyze the state of the drive, perhaps the most important attribute value is Value- a conditional number (usually from 0 to 100 or up to 253) specified by the manufacturer. The Value value is initially set to maximum when the drive is manufactured and is reduced if its parameters deteriorate.

    For each attribute there is a threshold value, before reaching which, the manufacturer guarantees its performance - field Threshold. If the Value value approaches or becomes less than the Threshold value, it is time to change the drive. The list of attributes and their meanings are not strictly standardized and are determined by the drive manufacturer, but the most important of them are interpreted in the same way.

    For example, an attribute with ID 5 ( Reallocated sector count) will characterize the number of disk sectors rejected and reassigned from the reserve area, both for devices manufactured by Seagate, and for Western Digital, Samsung, Maxtor.

    The hard drive does not have the ability, on its own initiative, to transfer SMART data to the consumer. They are read by special software.

    In the settings of most modern BIOS motherboards there is an item that allows you to disable or enable reading and analysis of SMART attributes during hardware tests before booting the system. Enabling the option allows the BIOS hardware testing routine to read the values ​​of critical attributes and, if the threshold is exceeded, warn the user about it. As a rule, without much detail:

    Primary Master Hard Disk: S.M.A.R.T status BAD!, Backup and Replace.

    The BIOS routine is paused to attract attention:

    Thus, without installing or launching additional software, it is possible to timely determine the critical state of the drive (when this option is enabled) using the Basic Input-Output System (BIOS).

    S.M.A.R.T data analysis hard drive

    To obtain SMART data in the operating system environment, special programs can be used, in particular, almost all utilities for testing hard drive hardware.

    One of the most popular programs for testing hard drives is Victoria Sergei Kazansky. On the author's website you will find the latest version of the program, as well as a lot of useful information, including a detailed description of working with Victoria.

    The Victoria program has two versions - for working in a DOS environment and for working in a Windows environment. The DOS version can work directly with the hard disk controller and has significantly greater capabilities compared to the Windows version. The purpose, main features and procedure for using the program could previously be found on the author’s website, but for some time now the site has been abandoned and there is no information there.

    The program is easy to use and allows you to evaluate the technical condition of the drive, perform its testing and some settings - noise level, performance, physical volume. Drive surface testing modes allow you to forcefully get rid of bad sectors using the Remap several types. The testing menu is called up by pressing a key F4 (SCAN). The user has the opportunity to set testing area:

    • Start LBA:0- start of the area (default - 0)
    • End LBA:14680064- end of area (by default - the number of the last disk block)

    Test mode:

    • Linear reading- sequential reading from the initial block to the final one;
    • Random reading- the number of the read block is generated randomly;
    • BUTTERFLY reading- blocks are read, starting from the boundary numbers (beginning and end), to the center of the testing area. Changing the mode is done by pressing the spacebar.

    Error handling mode. This item allows you to hide defective blocks using reassignment (remap) from the reserve area. Mode selection is performed using the spacebar key. The selected method for working with defects is displayed in the upper right corner of the screen, under the clock, and also in the bottom line when the test is launched. You can change the mode during or during scanning.

    • Ignore Bad Blocks- the program will not perform any actions when an error is detected.
    • BB = RESTORE DATA- the program will try to recover data from damaged sectors.
    • BB = Classic REMAP- writing to the damaged sector is performed to call the reassignment procedure.
    • BB = Advanced REMAP- improved algorithm for hiding bad blocks. Used when the classic remap does not help. The program performs a special sequence of operations in order to form a candidate feature for repair (attribute 197) for the faulty block. Then a 10-fold write is performed, processed by the drive firmware as normal processing of a remap candidate - if there is an error, a reassignment is performed, if there is no error, the block is considered normal and is removed from the remap candidates. This mode allows you to hide bad blocks without losing user data. Of course, only in cases where the drive is technically sound and there is free space in the reserve area for reassignment.
    • BB = Fujitsu Remap- execution of specific algorithms based on the undocumented capabilities of some Fujitsu drive models
    • BB = Erase 256 sect- when a bad sector is detected, a block of 256 sectors is rewritten. User data is not saved.

    While working with the program, you can call up contextual help using the F1 key.

    Version Victoria For Windows has more modest capabilities for configuring the drive and selecting testing modes, and currently does not have Russian language support, but it is easier to use and the available capabilities are quite sufficient for reading the SMART table and assessing the technical condition of the drive.

    The program does not require installation, just download the latest version from the link Victoria v4.47 from our website.

    The program must be run under an account with administrator privileges. In a Windows 7/8 environment, you must use the “Run as administrator” context menu.

    To analyze the state of SMART attributes, select the operating mode through the Windows software interface - turn on the button API in the upper right part of the main window. Then select the drive to check - click on the button Standard in the main menu of the program and highlight the desired disk in the list window with the mouse.

    The information window will display the drive's passport - model, firmware version, serial number, size, etc. To obtain SMART data, select the menu item SMART and click the "Get SMART" button. The result will be displayed in the program information window.

    Brief description of the attributes (the hexadecimal value of the number is given in parentheses):

    • 001 (1) Raw Read Error Rate- absolute value of reading errors. There are some differences in the formation of the value of this attribute by different manufacturers. From experience, I can say that Seagate drives can have a gigantic RAW value for this attribute while actually being in good condition, while Western Digital drives can have zero values, having critical indicators for other characteristics. Some models may not support this attribute at all.
    • 003 (3) Spin Up Time- Average time to spin up the disk spindle from 0 RPM to operating speed.
    • 004 (4) Start/Stop Count- Number of spindle start/stop cycles.
    • 005 (5) Reallocated Sector Count- Number of reassigned sectors. Modern drives have a fairly large (thousands of sectors) reserve area of ​​the drive surface for use in case the characteristics of sectors from the main zone deteriorate. If the drive detects problems with writing/reading a sector, it automatically moves its data to the reserve area, and this sector is marked as “reassigned”. This process is often called “remapping” or “automatic defect reassignment”; it is performed by the drive’s firmware and is invisible to the user (operating system). Field raw value contains the total number of remapped sectors. Even a non-critical, but large value of this field can lead to a decrease in data transfer speed, since the drive performs the additional operation of installing heads on the tracks of the spare area, usually located at the end of the disk.
    • 007 (7) Seek Error Rate- Frequency of occurrence of positioning errors of the magnetic head unit (MMG). The drive monitors the correct installation of the heads on the required surface track. If the installation was performed incorrectly, an error is recorded and the operation is repeated. For this drive, the cause of a large number of errors was overheating.
    • 008 (8) Seek Time Performance- average speed of magnetic head positioning. If the attribute value decreases (positioning slows down), then there is a high probability of problems with the mechanical part of the head drive.
    • 009 (9) Power-On Hours- Number of hours switched on. Reaching the limit value of this attribute means that the drive has reached the time between failures specified by the manufacturer (MTBF - Mean Time Between Failures).
    • 010 (0A) Spin Retry Count- Number of retries to start the spindle. After turning on the power, the drive spins up the disks and controls the achievement of the operating rotation speed for a given device (for example, 5400, 7200, 10000 rpm) within a certain time. In case of failure, the retry counter is increased and the start attempt is repeated.
    • 011 (0B) Recalibration Retries- the number of recalibration attempts, if the first attempt was unsuccessful. If the attribute value increases, then there is a high probability of problems with the mechanical part of the drive. In addition, an increase in the absolute value of this attribute may be caused by the fact that the recalibration procedure is used by the internal firmware of the drive to correct other types of errors.
    • 012 (0C) Device Power Cycle Count- Number of disk on/off cycles.
    • 184 (B8) End-to-End error- This attribute - part of HP SMART IV technology - means that after data is transferred through buffer memory, the parity of the data between the computer controller and the hard drive does not match.
    • 187 (BB) Reported Uncorrectable Error- Characterizes the number of errors that were not corrected by the drive firmware.
    • 188 (BC) Command Timeout Number of interrupted operations due to HDD timeout. Typically this attribute value should be zero, and if the value is much higher than zero, then most likely there will be some serious problems with the power supply or oxidation of the interface cable pins.
    • 189 (BD) High Fly Writes- If the head’s flight height above the magnetic surface exceeds the optimal height even for a short time, then the data recorded by it may not be readable in the future. Modern drives use specially developed head height control technology, which makes it possible not to record data at non-optimal heights. One is added to the counter of this attribute, and the recording is performed after setting the normal flight altitude. An elevated value of this attribute may be caused by external shock or vibration, abnormal temperature, or deterioration of the magnetic surface or head.
    • 190 (BE) Airflow Temperature- ambient temperature of the magnetic head unit. For most models this attribute is absent and attribute 194 is used.
    • 191 (BF) G-sense error rate- the number of errors resulting from shock loads. The attribute stores readings from the built-in accelerometer, which records all impacts, jolts, falls, and even careless installation of the disk into the computer case. It usually quite accurately characterizes the operating conditions of laptops - a high value of the attribute indicates sudden shocks and falls during operation of the device.
    • 192 (C0) Power-off retract count- number of shutdown cycles or emergency failures (switching the drive power on/off).
    • 193 (C1) Load/Unload Cycle- the number of cycles of moving the block of magnetic heads into the parking area.
    • 194 (C2) HDA Temperature- temperature of the drive itself (HDA - Hard Disk Assembly). This attribute stores the readings of the built-in temperature sensor, which is usually one of the magnetic heads (usually the bottom one). The data recorded in the attribute fields displays the current, minimum and maximum temperature. The Worst field shows the worst temperature reached during the drive's operation (you can set the fact of overheating and its degree), raw value - the current temperature. Some drive models may support attribute 205 (CD) Thermal asperity rate (TAR), which records the number of dangerous temperature changes.
    • 195 (C3) Hardware ECC recovered- characterizes the number of read errors corrected by the drive hardware using an error correction code. Such errors do not require re-reading the sector, and do not lead to a loss of data exchange speed, but a large number of them indicate a deterioration in the parameters of the reading path.
    • 196 (C4) Reallocation Event Count- Number of bad sector reassignment events. In field raw value This attribute stores the total number of attempts to transfer data from unstable sectors to the backup area. Both successful and unsuccessful attempts are counted.
    • 197 (C5) Current Pending Sector Count- Current number of unstable sectors. Raw value field This attribute shows the total number of sectors that the drive currently considers candidates for reassignment to the reserve area (remap). If in the future any of these sectors are read successfully, then it is excluded from the list of candidates. If reading the sector is accompanied by errors, the drive will try to recover the data and transfer it to the reserve area, and mark the sector itself as remapped.
    • 198 (C6) Uncorrectable Sector Count- Counter of uncorrectable errors. These are errors that were not corrected by the drive's internal hardware correction tools. It can be caused by the malfunction of individual elements or the lack of free sectors in the spare area of ​​​​the disk when the need for reassignment arose.
    • 199 (C7) UltraDMA CRC Error Count- Counter of errors that occurred when transferring data in UltraDMA mode. Hardware controls for data transfer from the drive to RAM detected a checksum error. Often this type of error is associated not so much with the drive hardware, but with a faulty interface cable, unstable power supply, overclocking of the PCI bus frequency, overheating of the motherboard chipset chips, etc.
    • 200 (C8) Write Error Rate (Multi-Zone Error Rate)- Characterizes the presence of errors when recording data. May be caused by deterioration of the surface, heads, or write path characteristics. The lower the Value, the more dangerous it is to use such a drive.
    • 220 (DC) Disk Shift- displacement of the disk block relative to the vertical axis of the spindle. It mainly occurs due to a strong impact or fall of the drive and, as a rule, is a signal to replace it.
    • 228 (E4) Power-Off Retract Cycle- Number of automatic parking of magnetic heads when turning off the power.

    Modern drives not only support the formation of S.M.A.R.T attributes, but also maintain additional statistics logs, and also support the protocol SCT(SMART Command Transport), which provides reading of log data. The device statistics log is a read-only SMART log sent by the drive when it receives a READ LOG EXT, READ LOG DMA EXT, or SMART READ LOG command. The logs display information about the execution of built-in S.M.A.R.T (self-test) tests, error statistics, numbers of bad LBA blocks, etc.

    What is S.M.A.R.T.? Why do SMART errors occur and what does it mean? Below we will describe in detail the causes and methods for eliminating such problems.

    Means S.M.A.R.T., showing hard drive errors (HDD or SSD) is a signal that some problems have occurred with the drive that affect the stability and operation of the computer.

    In addition, such an error is a serious reason to think about safety of your important data, since due to a problematic drive you can simply lose all the information that almost impossible to restore.

    What is SMART and what does it show?

    "S.M.A.R.T." stands for "self-monitoring, analysis and reporting technology", which translated means “technology of self-diagnosis, analysis and reporting”.

    Each hard drive connected via a SATA or ATA interface has a built-in S.M.A.R.T. system that allows you to perform the following functions:

    • Conduct analysis drive.
    • Correct software Problems from HDD.
    • Scan surface hard drive.
    • Conduct program correction, cleaning or replacement damaged blocks.
    • Give ratings vital characteristics of the disk.
    • Keep reports about all hard drive parameters.

    System S.M.A.R.T. allows you to provide the user with complete information about physical condition of the hard drive a scoring method that can be used to calculate the approximate time of HDD failure. You can personally familiarize yourself with this system using the Victoria program or other analogues.

    You can find out how to work, check and correct hard drive errors in the Victoria program in the article “”.

    S.M.A.R.T. errors

    As a rule, in a normally operating drive, the S.M.A.R.T. does not produce any errors even with low scores. This is due to the fact that the occurrence of errors is a signal of possible imminent disk failure.

    S.M.A.R.T. errors always indicate some kind of malfunction or that some elements of the disk are practically have exhausted their resource. If the user begins to see such messages, you should think about the safety of your data, since now they can disappear at any moment!

    Examples of SMART errors

    Error "SMART failure predicted"


    In this case, S.M.A.R.T. notifies the user about imminent disk failure. Important: if you see such a message on your computer, copy it urgently all important information and files to another medium, since this hard drive may become unusable at any time!

    Error "S.M.A.R.T. status BAD"

    This error indicates that some parameters of the hard drive are in poor condition (they have practically exhausted their resource). As in the first case, you should immediately make a backup of important data.

    Error “the smart hard disk check has detected”

    As with the previous two errors, the S.M.A.R.T. talking about imminent HDD failure.

    Error codes and names may vary between different hard drives, motherboards, or BIOS versions, however, each of them is a signal to backup your files.

    How to fix SMART error?

    S.M.A.R.T. errors indicate imminent hard drive failure, therefore, error correction, as a rule, does not bring the desired result, and the error remains. In addition to critical errors, there are other problems that can cause these types of messages. One such problem is elevated carrier temperature.

    It can be viewed in the Victoria program in the SMART tab under the item 190 "Airflow temperature" for HDD. Or under the item 194 "Controller temperature" for SDD.

    If this indicator is overestimated, measures should be taken to cooling the system unit:

    • Check cooler performance.
    • Clear dust.
    • Put additional cooler for better ventilation.

    Another way to fix SMART errors is checking the drive for errors.

    This can be done by going to the folder "My computer", by clicking right mouse button by disk or partition by selecting "Service" and running the check.

    If the error was not corrected during the check, you should resort to disk defragmentation.

    To do this while in properties disk, press the button "Optimize", select the required drive and press "Optimize".


    If the error does not go away after this, most likely the disk has simply exhausted its resource, and soon he will become unreadable, and the user will only have to purchase a new HDD or SSD.

    How to disable SMART check?

    Disc with S.M.A.R.T error Maybe fail at any time, but this does not mean that you cannot continue to use it.

    It is worth understanding that using such a disk should not imply storing any valuable information on it. Knowing this, you can carry out reset smart settings who will help disguise annoying errors.

    For this:

    Step 1. Go to BIOS or UEFI(F2 or Delete button during loading), go to item "Advanced", select the line "IDE Configuration" and press Enter. To navigate, use the arrows on your keyboard.


    Step 2. On the screen that opens, you should find your drive and press Enter(hard drives are labeled “Hard Disc”).


    Step 3. Scroll down the list and select an option SMART, press Enter and select the item "Disabled".


    Step 4. Exit BIOS, applying and saving settings.

    It is worth noting that on some systems this procedure may be performed slightly differently, but the principle of shutdown itself remains the same.

    After disabling SMART errors will stop appearing, and the system will boot normally until until the HDD completely fails. In some situations, errors may appear in the OS itself, then it is enough to reject them several times, after which the "Don't show again" button.

    What to do if the data was lost?

    In case of accidental formatting, deletion by viruses or loss of any important data, you should quickly return the lost information using the most effective method.

    One such method is a data recovery program. RS Partition Recovery. This utility can quickly return remote photos, video files, audio tracks, Pictures, documentation and any other files, which disappeared from the drive for various reasons. has an advanced system for scanning and searching for deleted information, which allows you to find and restore even those files that were deleted quite a long time ago. More details about the capabilities and main features RS Partition Recovery can be found on the official website of the manufacturer

    There are many free hard drive testing tools that can help you determine what's going on with your hard drive when you suspect there's a problem with it.

    An operating system like Windows already includes tools such as checking the disk for errors and the command chkdsk, but there are other tools below that are available for free from hard drive manufacturers and other developers.

    Important: Depending on the problem found, you may need to replace the hard drive if it fails any of the given hard drive tests. To do this, you need to follow the tips given in the program.

    Seagate SeaTools is a free hard drive testing program available to users in one of two options:

    • SeaTools for DOS Supports Seagate or Maxtor drives and works regardless of your operating system, running directly from a CD or USB drive, making this program very reliable.
    • SeaTools for Windows is a program that needs to be installed on the Windows operating system. With its help, you can perform basic and advanced testing of any drives from any manufacturers - both internal and external.

    Those users who access SeaTools Desktop, SeaTools Online, or PowerMax from Maxtor should note that the above program replaces all three of these programs. Today, Seagate is the owner of the Maxtor trademark.

    SeaTools from Seagate are the best in their segment. They are used to check hard drives in professional computer services, but any user can easily use them.

    The Windows version of SeaTools runs on operating systems from Windows 10 to Windows XP.

    HDDScan is a free program for checking all types of drives, regardless of their manufacturer.

    HDDScan includes several tools, including SMART testing and surface inspection.

    The program is very easy to use, does not require installation, supports almost all drive interfaces, and seems to be updated regularly.

    HDDScan can be used on Windows 10, 8, 7, Vista and XP, as well as Windows Server 2003.

    DiskCheckup is a free hard drive checker that works with most drives.

    The program displays SMART information such as the number of read errors, the time it takes for the wafer pack to spin up from rest to operating speed, the frequency of errors when positioning the magnetic head unit, and temperature. In addition, it can perform a quick and advanced disk scan.

    You can configure the program so that SMART section information is sent by email or displayed when the disk parameters exceed the threshold values ​​​​recommended by the manufacturer.

    Hard drives that have a SCSI connection or implement hardware RAID are not supported by DiskCheckup.

    DiskCheckup runs on Windows 10/8/7/Vista/XP and Windows Server 2008/2003 operating systems.

    GSmartControl can perform a variety of hard drive tests, providing detailed results and an overall assessment of the drive's health.

    GSmartControl can perform three self-tests to troubleshoot a drive:

    • Quick check: takes about 2 minutes and is used to identify a seriously damaged hard drive.
    • Extended check: Takes about 70 minutes and will scan the entire surface of the hard drive to detect failures.
    • Transportation check: This test takes 5 minutes and is designed to look for damage that may have occurred while the drive was in transit.

    GSmartControl can be downloaded for Windows either as a portable version or as an installable program. It works on system versions from Windows 10 to Windows XP. You can also get a version of the program for Linux and Mac operating systems and programs in LiveCD/LiveUSB format.

    Windows Drive Fitness Test is a free hard drive diagnostic software that can work on most drives available today.

    Unfortunately, Windows Drive Fitness Test can only test USB drives and other internal drives.

    WinDFT can be installed on Windows 10, 8, 7, Vista, and XP operating systems.

    Samsung HUTIL is a free utility for diagnosing Samsung hard drives. Sometimes HUTIL is called ES-Tool.

    The Samsung HUTIL program is available as an ISO image for subsequent recording on a CD or USB flash drive. This approach makes HUTIL independent of the operating system and, in general, a more convenient tool for testing than programs developed for the Windows operating system. You can also run HUTIL from a boot diskette.

    Comment: The HUTIL program will only check Samsung hard drives. It will boot and find non-Samsung discs, but no diagnostics can be performed on such discs.

    Since the Samsung HUTIL program runs from a boot disk, it will require a functioning hard drive and an operating system to burn it to a CD or USB flash drive.

    The free Western Digital Data Lifeguard Diagnostic (DLGDIAG) program is designed to test only Western Digital branded hard drives.

    Western Digital Data Lifeguard Diagnostic can be downloaded as a portable version for Windows or as an ISO file with an image to burn to a bootable disk, and performs a number of tests on the hard drive. Detailed installation instructions from Western Digital can be found at the link below.

    Comment: The DOS version of DLGDIAG diagnoses only Western Digital drives, while the Windows version of this program also works with drives from other manufacturers.

    The Windows version of the program works on operating systems from Windows 10 to Windows XP

    Bart's Stuff Test

    Bart's Stuff Test is a free program for Windows that performs stress tests on hard drives.

    The program does not provide as many features and does not conduct as thorough testing of the hard drive as other programs on this list.

    All things considered, Bart's Stuff Test is a good addition to your disk testing arsenal, especially if you have difficulty testing with ISO image-based tools and want to take advantage of something other than the default tools provided Windows.

    Bart's Stuff Test, as stated, only works on operating systems from Windows XP to Windows 95. However, we checked its performance on the latest versions of the system (Windows 10 and Windows 8) and did not find any problems.

    Fujitsu Diagnostic Tool is a free hard drive diagnostic tool designed specifically for Fujitsu hard drives.

    The Fujitsu Diagnostic Tool (FJDT) is available in a Windows version and a DOS version using a boot diskette. Unfortunately, the DOS version is focused on using floppy disks - images that will run from CD or USB are not available.

    Fujitsu Diagnostic Tool provides two tests: a “quick test” (lasting about 3 minutes) and an “all third party test”, the execution time of which will depend on the size of the hard drive).

    Comment: Fujitsu Diagnostic Tool performs hard drive testing only for drives manufactured by Fujitsu. If you have a disk from another manufacturer, then you should try to use the manufacturer-independent programs listed at the beginning of the list.

    The Windows version of the Fujitsu Diagnostic Tool should work on all operating systems, from Windows 10 to Windows 2000.

    HD Tune performs hard drive checks while running Windows. It can work with any internal or external drives, SSD drives or memory cards.

    With HD Tune you can perform a performance test, check the health of the drive using Self-Monitoring Analysis mode and Drive Activity Reporting Technology (SMART). In addition, the program can scan the disk for errors.

    It supports Windows 7, Vista, XP, and 2000, although HD Tune has been tested to work correctly on Windows 10 and Windows 8.

    The Free EASIS Drive Check program, designed to check hard drives, has two built-in checking utilities - checking sectors and reading SMART attribute values.

    The SMART attribute check allows you to create a list of more than 40 parameters that describe the operation of the hard drive, and the sector check will check the surface of the media for read errors.

    A report on the execution of any of these tests can be seen directly in the program after its completion. In addition, you can configure the program so that the report is sent by email or printed.

    According to the description, EASIS Drive Check works on operating systems from Windows 2000 to Windows 7, but its performance has also been tested on Windows 8 and 10.

    The error checking program is sometimes called the scandisk program. This is a hard drive scan tool included with the Windows operating system that allows you to search for a variety of errors on your hard drive.

    This tool may also try to fix a number of hard drive-related problems.

    Macrorit Disk Scanner is a very simple program that checks for bad sectors on your hard drive. It is easy to use, installs quickly, and is also available in a portable version.

    The main part of its window is used to visually represent the scanning process and clearly indicate the location of damage.

    Especially well implemented in Macrorit Disk Scanner is the visual display of how much time is left until the end of the scan, because Some hard drive checking programs do not show this. In addition, you can select the option to automatically turn off the computer when the scan is completed.

    Operating systems that Macrorit Disk Scanner can run on are: Windows 10, 8, 7, Vista, XP, Windows Home Server, and Windows Server 2012/2008/2003.

    Ariolic Disk Scanner is very similar to Macrorit Disk Scanner in that it uses read-only to find bad disk sectors. This program has a minimal interface with a single button, and using it, it is easy to understand which parts of the disk contain “bad” sectors.

    The program has only a portable version, and its size is slightly more than 1 MB.

    The only thing that distinguishes this program from Macrorit Disk Scanner is that Ariolic Disk Scanner shows files that have read errors.

    We only tested Ariolic Disk Scanner on Windows 10 and XP, but it should also work on other versions of Windows.

    The safety of our files and data directly depends on the condition of the hard drive on which they are stored. It is important to have a complete understanding of the operation of this device and predict possible failures in time. This will make it possible to transfer important information to backup media. A complete picture of the condition of the mechanical part of the hard drive and the surface of the physical disks is provided by S.M.A.R.T technology.

    Reduction S.M.A.R.T. means, in loose translation, the technology of self-monitoring, analysis and reporting. According to its name, it is engaged in self-monitoring of the disk, analyzing parameters for a suspected failure and reporting on a set of attributes.

    One group of attributes reflects the current state of the disk, the other records the mechanical wear of the device parts. Each attribute has its own number and meaning( Value). The disk stores the attribute value in a convenient hexadecimal format ( Raw value), and the program recalculates it into decimal numbers that we understand. A modern information security system makes it possible to ensure that disk parameters are such that an attacker will not be able to gain access to confidential information.

    The DLP system creates a protective digital barrier, which prevents information leaks. To assess the state, there are threshold values ​​of attributes ( Threshold), they are determined by the disk manufacturer. The value is below the threshold, the hard drive is no longer functioning normally or is generally malfunctioning. Very useful for predicting failures, the worst value of the attribute ( Worst), shows the worst number that the parameter took over the entire period of disk operation. Additionally, many programs show the attribute value in color (green, yellow, red) or with a scale. Value usually has a range from 0 to 100, but there are attributes with values ​​above 200.

    Attributes S.M.A.R.T. quite a lot, let’s look at the main and vital ones. We will take a set of parameters from the article about the program for checking the hard drive. What does the table look like? S.M.A.R.T. shown in the picture below.

    Here is the attribute number, its description, meaning Value, meaning Worst, Raw value in hex format and threshold value Threshold. Next to the attribute is a circle, the color of which allows you to evaluate the value of the attribute.

    001 Raw Read Error Rate

    — How often do read errors appear due to the drive hardware. There should be fewer mistakes.

    003 Spin Up Time

    – How quickly the disk picks up operating speed. Increases with wear.

    004 Start/Stop Count

    – Number of starts and stops of the disk. Not critical.

    005 Reallocation Sector Count

    – An important attribute. Number of unreadable reassignments ( Bad) sectors to the spare area of ​​the disk.

    The bad sector is replaced by a spare one from the spare area.

    When hitting Bed the head goes to the reassigned sector, reads the information and returns. The reassignment operation is called Remap. A large number of reassigned sectors indicates a defect in the disk surface and possible imminent data loss.

    007 Seek Error Rate

    – Errors in the positioning of the magnetic heads of the disk. Caused by mechanical or surface wear.

    008 Seek time performance

    – How quickly the heads are positioned.
    Increases with wear.

    009 Power-On Hours Count

    – Disk operating time. As Threshold running time
    for failure in manufacturer tests.

    010 Spin Retry Count

    – Counter for the number of retry attempts to spin up the disk to operating speed. If there are many such attempts, a quick failure is inevitable.

    011 Recalibration Retrieves

    – Recalibration repeat counter if the first attempt is unsuccessful. Shows mechanical wear.

    012 Device Power Cycle Count

    – How many times did the disk turn on and off? Pure usage statistics.

    013 Soft read error rate

    – Number of software errors during reading. Has nothing to do with mechanics and is not critical.

    183 SATA Downshift Error Count

    – Present on drives manufactured by Samsung and Western Digital. This informational parameter is not critical, but indicates the aging of the disk.

    184 End To End Error Count

    – The disk checks and compares the data that is transmitted and received by the motherboard. The attribute displays the number of comparison errors. Not critical.

    187 Reported Uncorrectable Error

    – Unrecoverable errors. The fewer errors, the better. The value deteriorates with wear.

    188 Reported Command Timeouts

    – Report about the delay of the team. Not critical.

    190 Airflow Temperature

    – Temperature inside the hard drive case. The minimum and maximum values ​​are indicated.

    194 HDA Temperature

    – The readings from the thermal sensor inside the disk enclosure are used to calculate attribute 190.

    195 Hardware ECC Recovered

    – How many error corrections were performed by the disk hardware. An increase in the number warns of a possible failure.

    196 Reallocation Event Count

    – Another important attribute. Counts successful and unsuccessful attempts Remap. The reading is growing
    even after the disk reserve area has been fully used. Critical.

    197 Current Pending Errors Count

    – The number of disk sectors that operations generate errors on. The program prepares them for possible reassignment ( Remap). An increase in the number of sectors signals a possible failure and loss of information.

    198 Uncorrectable Errors Count

    – Number of sector access errors that cannot be corrected. This is critical.

    199 UltraDMA CRC Errors

    – Checksum errors during data transmission. This indicates a faulty cable or oxidized connector contacts rather than a faulty drive.

    200 Write Error Rate

    — Number of disk write errors. Increases with service life.

    201 Soft Read Error Rate

    – How often do software errors appear when reading information? Not critical.

    From the described parameters, you can get a complete picture of the condition of the disk surface and the mechanical life.

    If any of the critical parameters reaches the value Threshold

    You need to immediately back up your information. If critical attributes fail, recovery of lost data is extremely difficult or often completely impossible.





    

    2024 gtavrl.ru.