The Risks Behind Software Recovery Attempts - Part 2

Software tools must work through hardware which is simply not designed to deal with hard drive instabilities. To make matters worse, standard computer hardware is often specifically designed not to deal with damaged storage devices because its main goal is to ensure the system keeps running as a whole. Working with unstable storage devices that are constantly throwing errors is typically not a good way of accomplishing that goal, so past a certain point of instability, system firmware/software (BIOS/OS) will drop the device and refuse to work with it. If the hardware has decided not to work with the drive, then it does not really matter what the software is capable of.

Hardware dropping unstable drives is even more problematic than it seems at first because it's not just a matter of failing to do the recovery. Software will not always be aware of the fact that hardware has stopped working with the drive, so a drive dropped in this manner could still be powered on and idling, while software fruitlessly continues to try to access it. Almost all modern drives have an automatic mechanism that comes into play whenever the drive is in the idling state for a certain period of time. This mechanism makes the drive scan its own surface in order to find and reallocate bad or weak sectors. It essentially makes the read/write heads scan every accessible sector and any unhealthy sectors found are then reallocated (i.e. added to the grown defect list). This is the absolute last thing we want an unstable drive to do! The drive will be quickly degrading as if it was being read, but there won't be any data to show for it. To make matters even worse, the defect list can become full and cause a firmware failure on top of the initial problem. Hardware which was designed for data recovery purposes will never stop working with the drive in this manner and it will also be able to sense and stop the drive whenever it initiates an unwanted self scan.

Very often, technicians in IT companies will connect patient drives directly to a computer which is running an operating system (OS) like Windows or OS X, in order to run data recovery software on the drive. If the file system on the drive is supported by the OS it is connected to, the OS will try to immediately mount it to give the user access to the files. Just the mounting process utilized by Windows/OS X can be extremely heavy when ran on an unstable drive. The exact details of the mounting process will vary depending on the version of OS, but we can outline how Windows 7 and OS X Yosemite work in this respect.

Windows 7 begins its mounting process by reading the Master Boot Record of the drive 9 times in a row. If successful, it will begin reading the Master File Table (MFT) section of the file system in blocks of 128 sectors while simultaneously sending occasional write commands to update various minor logs. If a drive fails to read a block within the MFT then Windows will automatically try to read the same block again, and again, and again... up to 9 times in a row. If all of those attempts do not work either, Windows will break down the problematic 128 sector block into smaller blocks equivalent to the cluster size being used by the file system (typically 8 sectors or 4KB). This same 128 sector block, which already failed to read 9 times, will then be attempted 8 sectors at a time. If any of those smaller 8 sector blocks fail to read, Windows will also try them 9 times each. If all of those attempts fail yet again then Windows will simply give up, reset the drive, and automatically restart the whole mounting process from scratch. If allowed, Windows will restart the mounting process as many times as it takes until the drive crashes and stops responding altogether. Because the damage in this case is happening due to high repetition of failed standard ATA read commands, the usage of a write blocker does not help prevent it in any way.

If Windows does successfully mount the drive, it will continue writing to it some more in order to finish updating its system logs. Every time a file is opened Windows will also be updating its attributes to keep track of the last time each file was accessed. Naturally, all of these write commands are overwriting old data, causing permanent data loss and additional unnecessary drive degradation.

OS X Yosemite has a different approach to mounting drives. It does not try quite as hard as Windows 7 to achieve a successful mount. It does a lot of very similar reading and writing operations during the process, however if it finds a block that it can not read within a critical section of the file system (for example within the root directory structure) then it will try to read it only 5 times at which point it will give up and stop trying to mount the drive altogether. Such a drive will of course remain invisible to the operating system, but it would still be powered on and idling, which can be a cause for concern as previously described. If the critical structures all read successfully and the less critical ones (such as catalog entries for specific user files) do not, then OS X will try the less critical blocks up to 10 times before forgetting about them and mounting the drive without the data in those blocks. In this case, OS X will see and be able to work with the drive, but some files will be missing. The user will not be notified of this issue whatsoever.

A little known feature of Windows is that it automatically 'cleans up' any file system entries that it does not understand on mounted devices. In other words, all partially corrupted MFT entries that Windows happens to come across on the mounted patient drive will be deleted without a single user prompt or notification, once again causing permanent data loss and unnecessary drive degradation. Even partially corrupted MFT entries can still be very useful when parsed using methods which are designed for the task. Properly designed hardware tools will never give an operating system like Windows or OS X direct access to the patient drive, ensuring that problems like these will never happen.

Previous
Previous

Using "Read Ignoring ECC" Commands

Next
Next

The Risks Behind Software Recovery Attempts - Part 1