Recovering Problematic PCIe SSDs.

It is now clear that the interface of the future for solid state drives (SSDs) is PCI Express (PCIe). Samsung, the leading manufacturer of SSDs, announced at the Samsung SSD Global Summit 2016 that they project 111 million PCIe SSDs to be shipped in 2018 versus only 25 million SATA SSDs. Effectively SATA SSDs will quickly start to disappear from the market as early as next year.

Why PCIe?

SATA was designed for spinning hard drives and relies on an external host bus adapter (HBA). In a typical computer, the OS sends a Frame Information Structure (FIS) packet to the HBA over PCIe which contains the ATA command that the OS wants to execute. The HBA then packs that command into a physical serial SATA packet and sends it over the physical SATA channel. Finally the SATA device receives that serial packet, unpacks it, and executes the ATA command. This extra layer of communication between the HBA and SATA device adds substantial latency and bandwidth overheads which are unacceptably high for modern flash memory speeds.

To avoid this issue on PCIe SSDs, the HBA was integrated into the SSD. Integration of a SATA HBA with an SSD results in a PCIe SSD based on the AHCI protocol. In this case the OS still sends the same FIS packets containing ATA commands to the SSD over PCIe, but since the HBA is now built into the SSD, manufacturers are no longer forced to serialize SATA packets for communication between HBA and SSD, allowing for higher bandwidth than the maximum 6Gbps offered by SATA. Such architecture is supported by most host platforms, since from host perspective there is no difference between an AHCI PCIe SSD and a typical combination of a SATA HBA and SATA SSD.

While AHCI PCIe SSDs do solve bandwidth limitations, the AHCI protocol still has significant latency bottlenecks because it contains far too many extra steps and can only send one command at a time. For example, before sending FIS read commands, the host has to check that the SATA drive is physically connected and ready to receive FIS packets. It also has to check to ensure that the command finished processing before requesting data from the device, and much more. These extra protocol steps were necessary only because the SATA device was a separate entity from the HBA.

To solve these inefficiencies, a new protocol called Non-Volatile Memory Express (NVMe) was developed which reduces the number of register accesses for one read/write block operation by up to three times and also allows multiple data requests to be processed at once. This gives a significant latency improvement, especially at higher data transfer rates, making PCIe interface via NVMe protocol the obvious choice for future SSDs.

What does this mean for the data recovery (DR) industry?

From DR perspective, tools supporting a new interface/protocol have to provide low level control of communication to the problematic device on all layers, i.e. Physical, Data Link, Transport/Transaction, and Protocol. All existing tools used by the DR industry today are built to support SATA/IDE/SAS/SCSI/USB devices. Moving to PCIe NVMe devices brings a shock to both DR service providers and DR technology vendors, because as far as communication to the mass storage device is concerned, everything changes on all communication layers, so practically not a single line of code or hardware design solution used in existing tools can be utilized for PCIe NVMe tools. In other words, pretty much the entire expertize of communication over currently used interfaces and protocols cannot be applied to PCIe NVMe SSDs.

Let's take a look at the type of control PCIe SSD DR tools must have to properly address read instability issues.

The first control feature is the ability to manually select a particular PCIe link speed. Some problematic PCIe SSDs have electronic instabilities and due to a low signal-to-noise ratio, communication with the device can become intermittent at higher speeds. This may result in either total inability to access the device or in so-called "phantom bad blocks", i.e. bad blocks caused by intermittent communication with the device rather than by actual problems with flash memory or firmware. Even though the PCIe specification does have a state machine instructing the host to lower the link speed automatically when needed, in reality this only occurs when the device doesn't respond to host at all, i.e. when the device doesn't support higher link speeds. In situations when the device does support a higher speed, but has instability issues, the host will still stick to the highest available link speed and fail to reliably communicate with the device.

It is also necessary to manually select the number of PCIe lanes being used because some lanes can have electronic instabilities. For example, if a 4x PCIe 3.0 SSD has some issues on PCIe lane 3 then the DR tool must be able to switch to 2x PCIe 3.0 or even 1x PCIe 1.0 mode to avoid using the third lane, thereby achieving stable communication with the device. Obviously, the maximum data transfer speed will drop in this case, but operating at the highest interface speed is usually not important from DR perspective, since even at 1x PCIe 1.0 (~250MB/s), imaging could still be completed rather quickly.

The next control feature that is important for any interface/protocol combination is the ability to use various types of resets when the device fails to respond within a certain time frame. While SATA devices have just a few types of resets due to relative simplicity of the SATA architecture, NVMe/AHCI PCIe SSDs have many different resets at all communication layers from physical to protocol. That is, starting from PCIe level, where we have PERST PCIe reset signal, Secondary Bus Reset, Retraining Link, Function Level Reset and others, up to NVMe/AHCI protocol level resets. Having multiple reset options is important because the effectiveness of different resets depends on the particular operation currently being executed as well as the SSD family and firmware revision.

Another critical feature is the ability to repower the SSD when it is completely non-responsive, e.g. after a serious firmware exception. Repowering a PCIe SSD requires processing the entire handshake and initialization phases of communication. This is the most demanding feature as it could only be implemented if the tool has complete hardware and software control over the interface and protocol of the SSD. The good thing about repowering PCIe SSDs is that most of them can be repowered and initialized within 1-2 seconds, which is quite fast, compared to HDDs which could take up to 20-30 seconds.

Today, the following types of physical PCIe SSD connectors are present on the market:

Standard PCIe connector used on PCIe SSDs built primarily for desktop PCs. These SSDs may use as many PCIe lanes as supported by the motherboard.
M.2 M Key PCIe connector on PCIe M.2 form factor SSDs usually used in laptop computers. The M.2 M Key specification provides up to 4 PCIe lanes.

There are also M.2 SATA SSDs which could be recovered by regular SATA tools using a corresponding M.2 to SATA adapter. Such M.2 SATA SSDs either have M.2 B Key or M.2 B&M Key connector, which could be easily identified, since it has a different pinout compared to M.2 M Key connector.

Apple's proprietary PCIe SSD connector used in 2013+ MacBooks. Older MacBooks had up to 4 PCIe lanes, while newer ones have up to 16.
U.2 PCIe connector used in enterprise-level SSDs. The U.2 specification supports up to 4 PCIe lanes.

The good news is that all of these types of connectors only specify the mechanical layout of PCIe signals, so passive adapters converting a corresponding connector to a standard PCIe connector can be used to support all of them.

Today there are no commercially available DR solutions for PCIe SSDs on the market. The only recovery option is to connect a PCIe SSD to a standard PCIe slot and use software tools running under one of the standard OSes. Needless to say, such a method has a huge number of limitations and will only work on devices which are mostly healthy, as all commonly used OSes do not even provide the ability to reset or repower a PCIe SSD by any means. In other words, an OS and any software running under that OS assumes that a PCIe SSD can only be initialized once when the system boots up and no other reinitialization is possible while the OS is operating.

At the same time, our research shows that there is a significant number of problematic PCIe SSDs on the market which have read instability issues resulting in the SSD being completely unresponsive after hitting a problematic area, e.g. a non-remapped NAND bad block. Many SSDs with such symptoms cannot even be recognized by the OS, because they lock up before the OS finishes the initialization process. In other cases such SSDs could be identified by the OS, but will quickly cause a total system lock while accessing some of those problematic areas due to the SSD becoming unresponsive.

Good news!

Finally after years of research we have released the DeepSpar Disk Imager PCIe SSD Add-on, which offers all the functionality described in this blog post, once again cementing our position as the worldwide leader in data recovery imaging solutions.