SAS Drives: New Challenges in Recovery Processes


1 August 2013

As SAS drives become more popular on the market, data recovery professionals are facing new challenges in handling various issues introduced by these devices. In this post, we would like to cover the most critical aspects related to recovery processes of SAS drives and how they compare to SATA devices.


SAS Interface

The first thing to note is that even though SAS and SATA have a similar physical level of the interface, their protocols have nothing in common. SAS drives cannot be accommodated by recovery processes built for the ATA protocol. No adapters or converters can be used to connect a SAS drive to your SATA/IDE recovery tools because these two protocols have entirely different sets of commands, subsystems, and concepts used in their architecture, which results in different recovery methods applicable to SAS and SATA drives.

One of the critical challenges of the SAS protocol is that it has a connection state, while the ATA protocol is stateless. This means that connection to SAS drives with various instabilities is more difficult to control when compared to SATA drives. For example, when a SATA drive powers up, the host can check its status right away by reading its ATA Registers based on the controller’s port number that the drive is connected to. The same task of requesting the status from a SAS drive requires multiple connection steps, such as going through many PHY negotiation states, discovery of the SAS device and its attributes like its SAS address, configuring SAS link and port to properly communicate to a discovered device, and only after that the host can actually send the first SAS command to a device to check its status. If something goes wrong during this lengthy process then the SAS controller may reject the connection and fail to even discover the device. That is one of the reasons why selection of a SAS controller used in your recovery processes is usually a crucial step.


Advantages of SAS Protocol

The good news is that the SAS protocol has some advantages when compared to ATA. The first advantage is that many commands are processed by the drive’s Micro Controller Unit (MCU) asynchronously to the drive’s operation. When a SATA drive is processing any initialization or read/write operation, its interface is locked with a busy status and so the drive is not even “listening” to any commands, while a SAS drive would still be communicating to the host and providing the current status of its operation, such as 'POWER ON OCCURRED', 'LOGICAL UNIT TRANSITIONING TO ANOTHER POWER CONDITION', 'LOGICAL UNIT IS IN PROCESS OF BECOMING READY', ' SELF-TEST IN PROGRESS', etc. This is clearly an important fact since it gives us the ability to validate the state of the drive, distinguishing “non-responding” situations from “currently processing” ones.

A significant advantage of SAS drives from the data recovery perspective is that SAS/SCSI protocols have an extended error-reporting system providing more information on issues encountered by the drive. This reporting system helps with the diagnostics of the drive by providing much better information on what exactly is happening behind the scenes and how the drive itself handles specific issues while accessing the data. Here are some examples of diagnostic messages: 'FAILURE PREDICTION THRESHOLD EXCEEDED', 'LOGICAL UNIT FAILED SELF-CONFIGURATION', 'MECHANICAL POSITIONING ERROR', 'DEFECT LIST ERROR'; and the following are examples of data access error handling messages: 'READ RETRIES EXHAUSTED', 'RECOVERED DATA WITHOUT ECC - DATA AUTO-REALLOCATED', 'AUTO REALLOCATE FAILED', 'ADDRESS MARK NOT FOUND FOR ID FIELD'.

The fact that some of these error-reporting messages can be provided asynchronously with the drive’s operation makes this system even more useful. In contrast, SATA drives have no ability to inform the host of a specific error that occurred during operation or to provide any information on how exactly the drive handles any issue.

Another useful characteristic of the SAS protocol for data recovery is that reading sectors while ignoring ECC is supported for the entire capacity of the drive. The ATA protocol has a 28-bit address limit (~130GB) for this functionality. Also, based on our tests, SAS drives achieve much better results when reading without ECC and as such return more data from corrupted sectors, while most modern SATA drives produce too much noise, making this functionality ineffective for them. The only inconvenience that should be noted here is that, since some newer SAS drives use ECC data for encryption, reading while ignoring ECC may return unencrypted data. Because of this, prior to applying this method of accessing data on the drive it should be verified that reading while ignoring ECC doesn’t retrieve encrypted data.


New features – new challenges

Many SAS drives have the ability to format their media with a larger size of the logical sector, such as 520 or 528 bytes, to store some extra “protection information” with each sector. This protection information is used by some RAIDs to store their metadata, which doesn’t usually affect the user data stored in the first 512 bytes of the sector, so from the data recovery perspective these extra bytes can be discarded. However, the problem is that nearly all data recovery imaging tools available on the market do not support mass storage devices with a non-standard logical sector size, therefore leaving data recovery professionals without the ability to handle such drives.

Another challenge is that SAS drives have a tendency to self destruct at a faster pace after read-write head(s) damage when compared to SATA drives. This can be attributed to the fact that, on average, SAS drives have a higher rotation speed. In many cases, if the drive has substantial head(s) or media degradation then intensive read operations may cause a physical read-write head(s) crash and complete destruction of the magnetic layer on the disk platters, leading to a permanent loss of all data on this drive. This fact requires extra caution when imaging SAS drives to decrease the stress on the drive as much as possible and minimize the risk of the drive’s self-destruction. As we mentioned in our previous articles, this can be achieved by applying techniques of selective imaging by files and heads and minimizing access to problematic areas of the drive during the initial imaging pass(es) by decreasing the read sector timeout value as much as the drive permits.

For the same reason, it is usually necessary to use some kind of cooling stand while the drive is being imaged. A minimum requirement is to place a fan directly over the drive so that it doesn’t overheat. As a rule, SAS drives generate an enormous amount of heat and require a specific server cooling environment even when operating under normal conditions. Degradation processes often further exacerbate this issue thereby enhancing cooling requirements further. As a side effect, higher operating temperatures may also affect the read-write channel of the drive increasing the noise, rate of degradation, the amount of read instabilities and the number of bad sectors. In some cases, should the degradation processes escalate, the drive needs to be powered off and cooled down prior to continuing imaging in order to reduce the risk of complete drive failure.


Good news!

As the number one provider of professional data recovery imaging solutions for the data recovery industry, DeepSpar once again proved its market-leading position by releasing the first dedicated SAS hardware imager on the market. The DeepSpar SAS Imager is a PCIe hardware and software kit that addresses all of the challenges of recovery processes for SAS drives mentioned in this post, achieves imaging speeds limited only by the speed of the connected drives, implements support for imaging by selective heads, and has no limitations in accessing drives of any capacity and size of the logical sector.