Get It Right the First Time! Diagnosing Failed Hard Drives


20 November 2012

Diagnostics is a crucial step of the data recovery process and is usually done by the most experienced person in the data recovery operation. We believe that up to 50% of the success of each data recovery case depends on the right diagnostics results, for two reasons:

  • If you miss an actual issue, you will never be able to get access to data.
  • Each time you try to solve a wrong issue, you incur a high risk of ending up with an unrecoverable case.

In “PDR Survey: An Inside Look at the Causes of Hard Drive Failure”, we described that, according to the study we conducted with some of our largest clients (55 data recovery companies in 15 different countries), over 13% of unrecoverable cases are caused by previous recovery attempts.

Let’s look at the most important factors to consider when diagnosing a hard drive. Some of these factors may seem obvious; however, as far as diagnostics is concerned, we have found that it is quite common to miss obvious things and end up applying much more complex and time-consuming techniques while trying to identify the cause of a hard drive failure.


Evaluating the physical condition of the drive

The first thing you should look at is the physical condition of the drive.

Check whether the drive’s board has any of the following characteristics:

  • Burned or grayed-out components
  • Burnt smell
  • Signs of the drive being underwater, in smoke, or in dust
  • Connectors between the board and the head disk assembly with oxidized contacts

Shake the drive slightly to hear whether the head disk assembly has any significant internal damage, such as broken or fallen-off parts of the head assembly or air filters.

Clearly, if you notice any burned-out or damaged components or hear any damage inside the head disk assembly, proceed with replacing those parts prior to powering up the drive.

If you see oxidized contacts, try cleaning them first. In fact, we recommend cleaning the contacts in any case, because oxidized contacts is one of the common causes of drive failure. This issue can even cause various drive instability issues, such as bad sectors.

If the physical condition of the drive seems fine, connect it to a power supply unit to check its booting symptoms before connecting the drive to any device. Some failed drives can cause a host computer or any other device connected to the drive to fail because of a short circuit between the power and data signals on the drive’s board. This possibility is why we usually recommend having a good quality external power supply unit that is specifically dedicated to drive diagnostics purposes.

The following diagnostic steps depend on what symptoms you observe when you power up the drive.


The drive makes an unusual sound

If the drive has failed or degraded heads, it usually makes a knocking or clicking sound. The heads assembly makes this sound as it goes back and forth from the parking zone to the disk platters. The drive cannot locate any track at all after it pushes the head assembly onto the disk platters. The only thing left is for the drive to reset itself, which moves the heads assembly back to the parking zone and restarts the booting process over and over again.

Heads on some drives may not read properly because of debris sticking to a read-write element of the heads. If this is the case, the drive behaves exactly the same way as one with failed heads. This issue can be diagnosed by inspecting read-write elements of the heads in a microscope. In such cases, the heads usually just need to be cleaned to regain access to disk platters.

Less often, the drive may also make a clicking sound due to a failure of its board. In general, the clicking indicates a lost or corrupted signal anywhere within the read-write channel of the drive, that is, anywhere from the read-write heads up to the Micro Controller Unit (MCU).

An interesting fact is that many modern drives make exactly the same clicking sound if you try to use a board from another drive (even if you take a board from exactly the same model of the drive). This happens because the configuration/adaptive parameters (called the adaptives in the data recovery industry) burned into the ROM chip are unique for every single drive.

This phenomenon is why we recommend, whenever possible, that you try to verify that the drive has an original board. You can track any previous recovery attempts with the client or, for some drives, use DeepSpar Disk Imager or PC-3000 to verify whether the board is original or not. You can also check whether the board looks like an original one, that is, that it looks like the board from that particular drive model.

This verification process is worth doing because it’s not unusual for a client to bring a drive that has a non-original board on it, since someone else has already tried swapping a board on this drive (for example, using boards from other dead drives) and left the last board they tried on the drive.

Some drives click because of a failure of one single head. It is important to determine whether the entire heads assembly has failed or whether it’s a failure of only one or a few read-write heads, because the recovery process in these cases could be very different. In some cases, when the drive has just one failed or degraded head, it is still possible to recover data without replacing the head assembly.

You can use tools that can disable specific read-write heads in the MCU’s RAM to diagnose which particular head is causing the drive to click and then use a head-by-head imaging to recover data from good heads. Since modern drives have up to 10 heads, it is still possible to recover many files in cases where the data from only one head is not accessible.

And lastly, in more rare cases, the clicking noise can be caused by a firmware failure in the System Area (on disk platters). Even though firmware failures are properly diagnosed using drive-level firmware tools, such as PC-3000, you can still determine whether the read-write channel (board/heads) is causing the clicking using the following method:

Check whether the clicking noise is a raw or plain knocking sound or whether it is a more complex noise that involves multiple repositioning of the heads assembly. If you hear or see the drive trying to reposition the heads assembly and stopping for a moment on any particular track while it’s clicking, then it’s not a failure of the read-write channel. That leaves either a firmware failure or an issue with a single head only. If the drive has a read-write channel issue, it cannot even locate and position/reposition the heads to any particular track, since locating a track requires the drive to read servo-data from the disk platters and so its read-channel has to work to accomplish this.


The drive doesn’t make any sound

If the drive doesn’t produce any sound at all, it can have either an electronic or a mechanical issue.

One of the most common electronic issues for drives with these symptoms is a failure of one of the components in power circuitries, such as a burnt fuse or shorted TVS-diode. Modern drives usually have a power protection component (a fuse or TVS-diode) to protect the drive from over-powering. When the host computer happens to get a significant power burst in its 5V/12V power signals, that protection circuitry triggers and effectively disconnects incoming power wires from the rest of the components on the board. Quite often, it is enough to repair the fuse (or to remove a TVS-diode) on the board to restore the functionality of the drive.

Less often, the board may also have other failures, such as a failed motor controller chipset or a failed MCU.

Another common failure of drives that don’t make any sound at all (or make a little buzzing sound) is a mechanical issue, such as a seized motor or the read-write heads sticking to the disk platters. To diagnose these issues, open the head disk assembly and check whether the heads are located at the parking zone. Then try to manually rotate the disk assembly to verify that the motor is not seized.

An easier method to identify such mechanical issues without opening the head disk assembly is to use the Current Monitor Add-on for DeepSpar Disk Imager. In fact, you can also use this Add-on to diagnose electronic failures mentioned above, such as a burnt fuse or power converter chip, a failed motor controller, shorted motor windings and such. You can read more information on such diagnostics methods here.

Another method to diagnose some of the electronic failures is to use SATA Native Functions that your data recovery equipment may implement. This includes the ability to determine whether the drive has issues with power circuitry, again such as a burned fuse, corrupted ROM or failed MCU, or even bad or noisy SATA connector or cabling. We covered this subject in detail in our post about SATA Native Functions.

In some (rather rare) cases, the drive may identify failed heads during its initial power-on diagnostics executed by the MCU. If this happens, the drive will not spin up. To determine whether this is the case, place a piece of dielectric material, such as a sheet of paper, between the board and the head disk assembly so that it disconnects the read-write heads from the board. Then power up the drive. Most drives will not be able to identify a failure of the heads in this case and therefore the drive will spin up and you will know that it has failed heads.


The drive spins up and doesn’t make any unusual sound

If the drive spins up and sounds fine, connect it to a data recovery tool for further diagnostics.
Note: We would like to remind you that all of the above diagnostics were considered while the drive was still connected to a power supply unit only.

In most cases, drives that spin up and don’t make a clicking sound either have read-instability issues or a firmware failure. Certainly, we are talking about problematic drives here—drives that do not give access to user data for whatever reason. Only in some rare cases, drives with these symptoms may have a failure of the read-write heads.

To diagnose these drives, you need a data recovery tool such as DeepSpar Disk Imager. System software (BIOS and OS) may not even recognize the drive if it has non-critical read-instability issues, such as responding with errors to some inessential configuration commands, staying busy for too long, responding to requests too slowly, or having other inconsistency in communications to the host.

You cannot accurately diagnose such a drive without access to the system software. You can only guess whether it’s a critical issue requiring drive-level firmware or mechanical repair procedures, or whether it’s a non-critical read-instability issue where you can proceed to image the drive with a data recovery imager.

Data recovery equipment usually has a number of functions designated specifically for diagnostics purposes. For example, the most useful functions of DeepSpar Disk Imager are the Express Diagnostics, used to identify any issues in communication to the drive, and the Media Test function, which can give you an overview of the media issues (if any) and also identify any bad or degraded heads.

If the drive has a firmware issue, you must use drive-level firmware tools to diagnose it. However, in many cases, you can just listen carefully for the drive’s recalibration sound. See whether the drive completes its recalibration or aborts it due to some kind of firmware exception or corruption. The easiest way to verify this result is to take a good drive of the same model/family and compare its recalibration sound with the sound made by your bad drive. If the bad drive has an incomplete recalibration, it most likely has a firmware issue.


P.S.

In this blog post, we have covered most common symptoms and corresponding methods used by data recovery labs for hard drive diagnostics. This subject is rather wide and, if you look at it deeper, requires very different areas of expertise: mechanical design, hardware engineering, firmware architecture, knowledge of file system structures, and more.

Because of this fact, as a rule, data recovery (and hard drive diagnostics in particular) is quite a challenging area and definitely requires professional tools.

However, we hope you also noted here that, as with any technology, you can always find some straightforward techniques that don’t require expensive tools and rely on a general knowledge of the subject and common sense.