A recently announced SHA-1 collision attack has the potential to break code repositories that use the Subversion (SVN) revision control system. The first victim was the repository for the WebKit browser engine that was corrupted after someone committed two different PDF files with the same SHA-1 hash to it.
The incident happened hours after researchers from Google and Centrum Wiskunde & Informatica (CWI) in the Netherlands announced the first practical collision attack against the SHA-1 hash function on Thursday. Their demonstration consisted of creating two PDF files with different contents that had the same SHA-1 digest.
This proved without a doubt that SHA-1 is cryptographically broken because a hash function should always produce different digests (hashes) for different pieces of data or files. SHA-1 is a hash function used to calculate an alphanumeric string that serves as the cryptographic representation of a file or a piece of data.
A WebKit developer wanted to build a test to prove that the demonstrated collision can't be used for cache poisoning in the context of WebKit's disk cache deduplication feature that relies on SHA-1. In order to do this, he uploaded the two PDF files generated by CWI and Google to the WebKit SVN which then started giving out errors.
It seems that even after removing the files, some problems remained and further manual intervention was required to fix them.
The issue is not specific to WebKit's repository, but to all SVN-based repositories. The Subversion developers have released a script that SVN administrators can use to prevent SHA-1 colliding files from being committed to their repositories. Meanwhile, work for a more permanent fix is in progress.
Git, a competing and more popular version control system, also uses SHA-1 internally, and according to the CWI and Google researchers, is theoretically vulnerable.
"It is essentially possible to create two GIT repositories with the same head commit hash and different contents, say a benign source code and a backdoored one," the researchers said on their shattered.io website. "An attacker could potentially selectively serve either repository to targeted users."
This kind of attack would require attackers to compute their own collision, which at this time requires significant resources. It took Google over nine quintillion SHA-1 computations, the equivalent of a year of continuous computations on 110 GPUs or 6,500 CPUs.
Linus Torvalds, the founder of both Linux and git, doesn't seem too concerned about the attack's implications, partly because it can be easily deterred by adding some simple checks that would make an attack not worth it.
"Unlike some 'signing a pdf' attack, git doesn't fundamentally depend on the SHA1 as some kind of absolute security," Torvalds said in a discussion on the git mailing list. "If we have the minimal machinery in git to just notice the attack, the attack essentially goes away. Attackers can waste infinite amounts of CPU time, and if it's cheap for us to notice, it completely disarms all that attack work."
Later in that same discussion, the git developers decided to use the collision detection code provided by the CWI researchers to build some protection. Meanwhile moving git to another hash function is being discussed as a goal and SHA3 seems to be the chosen candidate because it has better performance than SHA2.
"I doubt the sky is falling for git as a source control management tool," Torvalds concluded in one of his emails. "Do we want to migrate to another hash? Yes. Is it 'game over' for SHA1 like people want to say? Probably not."
Even though it's been known for a long time that SHA-1 is theoretically vulnerable to collision attacks, the real-world implications of a practical attack for file synchronization, deduplication and backup systems, in particular, are yet to be seen. In 90 days, the Google and CWI researchers plan to disclose the code they used to generate the colliding PDF files, which will allow others to create similar collisions, if they have the necessary computing resources.