Mingis on Tech: The language of malware

Linguists are increasingly being used to try and figure out who's behind recent cyberattacks

IDG

IDG

Sometimes, how you say something can be as important as what you say -- especially when's there been a cyberattack and law enforcement officials are trying to figure out who you are.

That's what CSO senior writer Fahmida Rashid found when she looked into how cybersecurity firms go about tracking down the bad actors behind malware campaigns. While linguistics may not be the first thing companies worry about when trying to protect -- or retrieve access to -- their data, it can help pinpoint an attack's origin, Rashid told Computerworld Executive Editor Ken Mingis.

Linguistics analysis has been used to investigate various attacks, including the 2014 Sony breach, ShadowBrokers and Guccifer 2.0 -- and it seems to be gaining traction  because it can help identify the shadowy figures behind ransomware attacks, Rashid said. For example, Flashpoint analysts analyzed every language version of the ransom notes that accompanied WannaCry, and determined that the notes written in Bulgarian, French, German, Italian, Japanese, Korean, Russian, Spanish and Vietnamese had been translated from a note originally written in English. (In the CoinVault ransomware attack, investigators found several phrases in “perfect Dutch,” indicating a Dutch connection.)

Ransomware lends itself well to linguistic analysis because when attackers write the  ransom notes their speech patterns show up in the text. There happens to be more text to analyze, and unlike spam and phishing messages where attackers have to  mimic legitimate entities, ransom notes can hide clues on how comfortable the writer is in that language.

The fascinating part, according to Rashid, is that linguists can learn about attackers by the way they phrase certain words, or even by the words themselves. That's particularly true of ransomware like WannaCry, where victims get a message from the attackers -- and that message can contain hidden clues. Linguists like Shlomo Argamon, professor of computer science at the Illinois Institute of Technology, say it’s important to have as much text as possible to analyze. The more there is, the more likely the “true” attributes can be surfaced.

It's not fool-proof, Rashid noted. Different people can speak multiple languages and with differing degrees of proficiency, sometimes obscuring an attack's origin. Attackers regularly employ red herrings and false flags to throw investigators off; they  manipulate when they launch attacks; change timestamps; and even intentionally insert cultural references and phrases to misdirect investigators. Even so, it is hard to consistently plant fake clues in speech.

For an audio podcast only, click play (or catch up on all episodes) below. Or you can now find us on iTunes, where you can download each episode and listen at your leisure.

Happy listening, and please, send feedback or suggestions for future topics to us. We'd love to hear from you.

Join the newsletter!

Error: Please check your email address.

More about CSOSonyTechnology

Show Comments

Market Place

[]