Unintentional exposure of sensitive data through Word files is a has caused problems for companies in the past, especially when people forget that Track Changes can easily allow document recipients to view information that has been deleted or sanitised for release.
Recovery of information from PDF files has also led to some unintended consequences when it was discovered that the attempt to redact information was as simple as placing a black square/rectangle over the text, making it a simple process to recover the original text.
Didier Stevens, who gained attention for his recent discoveries relating to hiding content in PDF files, has again discovered a side effect of creating PDF files that might lead to unexpected information disclosure for the unaware.
The concept of an Incremental Update in PDF files is relatively well known, when changes to an existing PDF document don't result in the PDF file being completely rewritten on saving. How an incremental update is actually represented in the raw PDF file is less well known, but it is basically the amended data being appended to the original document, with the process repeating for subsequent updates. Stevens discovered that the process of stripping away an update and recovering the original content is an extremely simple one.
What this means is that for documents that have been redacted or otherwise modified by replacing text instead of drawing a black rectangle over it, the deleted/replaced text can be recovered along with the original unmodified document in a simple one-step procedure. Making the process even simpler is that it can often be achieved with a text editor and it doesn't matter if the PDF content has been encrypted.
There are some efforts to increase awareness of the risk of document metadata, but this recent rediscovery adds another item to check prior to releasing documents for wider consumption. It is also another simple tool for forensic researchers to help in recovering original data from a document. A saving grace appears to be that many applications that export to PDF as part of their Save process do not support incremental updates, which means that if you want to redact data, do it in the original application and then export the redacted version.
It is nothing that can't be gained from reading the PDF specification, but who takes the time to read in depth the technical specification for the data format that they are using?