Why the “Image-First” Process Became Obsolete

Oct 21

We started the whole concept of hardware data recovery (DR) imaging tools back in 2005 with the release of the original DeepSpar Disk Imager (DDI). We were alone on the market until various competitors started to appear in 2009+, and thus a whole new class of equipment was born. The accepted DR process back then was to fully image unstable drives as the first step and analyze the image afterwards to reconstruct its logical structure. There were various hardware tools being sold by different vendors strictly for DR imaging, and various software tools sold by other vendors used to analyze images in order to recover files, reassemble RAID arrays, and do all kinds of other tasks. In this article we will go over the main reasons why this process became obsolete, which is what made us discontinue our DDI product line after selling it for 19 years straight.

Storage devices were quite different back when the “image-first” process made sense: SSDs did not exist yet; HDDs had much smaller capacities; there was a lot less empty/unused storage space; and HDDs were much more robust in general. Trying to go after files directly would often do more harm than good because it forced the read/write heads to do a lot of small fragmented reads. The negative impact of this was usually more significant than the positive impact of not being forced to read the whole drive, considering that the necessary files likely made up a large portion of the drive’s capacity anyway. Thinking about drive stress during DR was also less important as a whole because older drives could take a lot of abuse before they failed completely, making it more attractive to follow the same simple “image-first” process for every case, regardless of the specific circumstances.

The various changes in storage technology over the years forced a fundamental change in the DR process to move away from full imaging. HDDs dramatically grew their capacity by increasing the density of data storage on their platters. Everything became many times more precise and the overall complexity of HDD technology had to increase substantially to support higher levels of precision. These changes made HDDs much more fragile – it used to take weeks for an unstable drive to progress to complete failure, and now it only takes days, if not hours. To make matters worse, after modern HDDs do progress to complete failure, they often become entirely unrecoverable by anyone, primarily due to platter damage and/or unsolvable firmware issues.

The situation with SSDs/flash is even worse, once again primarily due to increased data densities. Modern NAND is incredibly dense and fragile, which makes it prone to a cascade failure of NAND cells one after another like dominoes. The SSD keeps up with it at first, hiding the problem from the user by quietly reallocating bad cells, until it cannot do that anymore. At that point the SSD starts hanging randomly as it fails to keep up with reallocations and the firmware subsystem starts suffering various internal exceptions. The primary way DR tools handle this is essentially by repowering the SSD every time it hangs and continuing. This works for some time until the SSD degrades further and eventually becomes entirely unresponsive, often just hours after initial symptoms. Usually nothing can be done after that point, which is why the SSD recovery rate in even the most advanced DR companies in the world is only around 30% - the main hope is to get to it early and handle it quickly before NAND failure has progressed to the degree where it is unmanageable.

The dramatically shortening window of opportunity to get the data from degraded storage devices has made it unwise to go after full images because in far too many cases the drive will suffer complete failure before the image is complete, leaving the user with some random chunk of the drive, like the operating system, instead of files that are actually important. This is what forced the market to abandon the old process of taking full images as the first step and adopt the new process where only important files are retrieved first, and the rest of the drive is attempted to be imaged afterward, if it is possible. This is done essentially by imaging only filesystem metadata at first, parsing it to show the user a file tree, and then imaging only sectors that belong to selected files. A sector map is maintained which lets the tool know what it has already imaged in the past, so that every new step keeps seamlessly adding to the same image, and nothing ever needs to be read twice.

Practically all manufacturers of hardware DR tools implemented this process by developing proprietary software which was made to work with their specific hardware. The software side would do logical reconstruction, i.e. parse the filesystem structure retrieved by the hardware, and then guide the hardware to go after relevant sectors which belong to specific files. This worked well for years, but with time the number of different filesystems, their variants, and different types of encryption on the market has grown to the level where it became difficult to keep up with. This type of software development has absolutely nothing in common with the development of DR hardware, so a separate development team was required which had the sole purpose of learning how to parse different logical structures. A factor which greatly complicates this task is the requirement of parsing not only healthy, but also corrupt filesystem structures, because unstable drives usually have corruption as a result of bad sectors.

The reality is that right now in 2024, the best dedicated logical recovery software tools on the market are firmly better at logical reconstruction than the software-side of every single hardware DR tool that exists in the world today. This is why every professional DR company uses a range of dedicated software tools for logical reconstruction, in addition to their various hardware tools. It’s just not realistic for hardware vendors, for whom software development is a secondary goal, to keep up with software vendors who have been working towards the singular goal of logical reconstruction for multiple decades straight. The dedicated logical reconstruction tools are significantly better at assembling corrupt filesystems, they support more filesystem types, reconstruct RAIDs, handle decryption, etc.

This is why we built the architecture of our USB Stabilizer product line, which replaced the DDI, to be software-agnostic. The ability to transparently use our hardware with absolutely any Windows-based software allows using the best logical reconstruction software for each job. You may use one software tool for typical every-day recoveries of common filesystems, then a different tool for RAIDs, a third tool for DVR systems, a fourth tool for on-the-fly decryption, and so on. Being able to choose the best software for each case provides the maximum probability to successfully focus the increasingly time limited recovery effort on specific files, which ultimately improves the recovery rate. Additionally, the ability to focus only on the hardware allowed us to put more development resources towards maximizing key hardware performance metrics, like error handling speed, making our USB Stabilizer 10Gb the fastest read instability handling tool that exists in the world today.

Serge Shirobokov

Why the “Image-First” Process Became Obsolete

Recovering Problematic PCIe SSDs