New computer forensic tools will make it possible to recover more data from corrupted hard drives so long as the missing files haven't been overwritten.
Tools designed to harvest images from disks even after they have been deleted from the file system can be adapted to seek other file formats including Word documents, says Nasir Memon, a professor at the Polytechnic Institute of New York University.
In research that he hasn't yet published, Memon says will show that techniques used to cull images can be adapted to find text files, a capability that would be attractive to businesses trying to salvage data that may be fragmented and dispersed across a corrupted drive.
The text tool will examine fragmented chunks of files that may be distributed across a disk and analyze their content to see which ones likely go together. "It looks at global differences, for example, Twain vs. Shakespeare. Syntax helps eliminate false positives," Memon says.
The tool is based on a recovery method known as SmartCarving that was discovered at NYU and is commercially sold by vendor DigitalAssembly, which was founded by former students of Memon.
SmartCarving can reclaim 10% to 15% of digital images that conventional forensic tools miss when trying to find files that have been deleted from the file registry, he says.
Traditional file recovery seeks a known header and footer for a file and gathers all the related data blocks in between. If the data blocks making up the file are fragmented, traditional tools crash when they hit a fragment of a different format that might be sandwiched between pieces of the file being sought, Memon says.
SmartCarving images involves drawing together data blocks from a single image that are arranged consecutively on a drive and linking them to other groups of data blocks based on whether they seem to blend using criteria such as pixel density and dimensions of the image.
In this way, it becomes possible to recover partial images when pieces are missing and to recover images when headers are missing, Memon says. "You look at what can be decoded and pick the best," he says.
The actual sorting is done by algorithms that graph data segments to see which are most like another based on preset criteria. The closer fragments fall on the graph, the better the fit they are considered to have.
Using the technique on photos can reassemble photos even if some of the image data is missing, resulting in a picture that has a band missing from it. The method can create approximate headers for image files as well, which results in gathering and reassembling enough pieces to recreate a version of the image that might not be as sharp as the original, Memon says.
New research adapts this technique to other file formats such as .doc files. It looks at the content in file fragments and classifies it based on syntactical similarities. This is akin to sorting a box of jigsaw puzzle pieces that contains the pieces for many puzzles so that all the pieces from each puzzle are put in separate boxes, Memon says. Each puzzle can then be assembled based on the shapes of the pieces and the image fragment printed on them.