Spam. It fills our in-boxes, wastes our time and spreads malware -- and it's only getting worse. According to Ferris Research, which studies messaging and content control, 40 trillion spam messages are expected to be sent in 2008, costing businesses more than US$140 billion worldwide -- a significant increase from the 18 trillion spam messages sent in 2006 and the 30 trillion in 2007.
In theory, e-mail filtering software and appliances allow "good" or "true" e-mail messages to pass through while prohibiting spam. But the filters can err in either of two ways: They can mistakenly allow spam to pass through, believing it to be true e-mail (known as a "false negative" situation), or they can mistakenly block true e-mail, believing it to be spam (a "false positive").
Typically, after identifying a message as spam, the filtering software either blocks it outright or places it in a quarantine folder, allowing the recipient to review it later. Although the latter method provides a chance to retrieve false positives, it requires time and effort from the user -- and some users never bother to check their quarantine folders at all.
Users and organizations that receive spam incur a cost in deleting it -- about $.04 per message, according to Ferris Research. But Ferris analyst Richi Jennings points out that the cost to locate missing true e-mail is far greater than that of deleting spam -- about US$3.50 per message.
(Ferris developed these figures using published data on such factors as labor size and hourly labor costs, then applied its own estimates, such as the percentage of workforces having e-mail access and volumes of spam messages. A downloadable spreadsheet [registration required] illustrates Ferris' model.)
Even worse, Jennings says, organizations incur potentially greater costs through missed opportunities because of false positives that they never see -- for example, a consulting firm that fails to receive a request for proposal.
To minimize the false positives caused by spam filters, it helps to know a bit about how they work. To keep up with ever more sophisticated spam, filters have used a variety of techniques over the years, often used in combination with one another. Here is a bird's-eye view of some popular techniques, in rough chronological order:
Keyword-based and Bayesian filters
The earliest filters searched a subject line and message body for particular words, such as "Viagra" or "online pharmacy." More sophisticated versions employ Bayesian analyses, which combine keyword searches with techniques such as determining ratios of "good" to "bad" words and assigning probability scores based on these ratios.
Unrecognized senders receive a reply asking them to validate themselves by supplying letters and characters that appear in images onscreen, a technique also known as CAPTCHA (completely automated public Turing test to tell computers and humans apart). This test is based on the idea that humans can detect and input certain patterns, while computers are unable to do so. Once a sender has been validated, his e-mail messages are sent straight through without the challenge step.