Athenæum

Previous entry | Next entry

01/15/2004: :: Technologica

Daphnia blue-crested fish cattle
or, Spammers attempt to defeat Bayesian antispam filters
from Wired

Daphnia blue-crested fish cattle, darkorange fountain moss, beaverwood educating, eyeblinking advancing, dulltuned amazons...."

This is not a failed attempt at free-form prose. It's a snippet of a spam message intended to promote a sexual stimulant, a deliberate crack at sneaking past and spoiling some of the most popular antispam filters.

Antispam experts agreed that this isn't a brand-new technique, but said the addition of potentially filter-foiling gibberish is rapidly becoming a common component of spam.

"I'd say at least half of the spam that I bother to look at now contains a paragraph or two of random blather. Until recently we'd see it in only one or two spams a week at the most," said Anthony Baxter, one of the developers of SpamBayes, a free, open-source Bayesian antispam filter.

"This is yet another escalation of the arms race between spammers and those people who like to have a useful e-mail inbox," Baxter added.


Search:
Random Acts of Spamness
By Michelle Delio

Story location: http://www.wired.com/news/infostructure/0,1377,61886,00.html

02:00 AM Jan. 13, 2004 PT

"Daphnia blue-crested fish cattle, darkorange fountain moss, beaverwood educating, eyeblinking advancing, dulltuned amazons...."

This is not a failed attempt at free-form prose. It's a snippet of a spam message intended to promote a sexual stimulant, a deliberate crack at sneaking past and spoiling some of the most popular antispam filters.

Antispam experts agreed that this isn't a brand-new technique, but said the addition of potentially filter-foiling gibberish is rapidly becoming a common component of spam.

"I'd say at least half of the spam that I bother to look at now contains a paragraph or two of random blather. Until recently we'd see it in only one or two spams a week at the most," said Anthony Baxter, one of the developers of SpamBayes, a free, open-source Bayesian antispam filter.

"This is yet another escalation of the arms race between spammers and those people who like to have a useful e-mail inbox," Baxter added.

The addition of seemingly nonsensical words is aimed at confusing the antispam filters that incorporate Bayesian analysis techniques, such as SpamBayes and SpamAssassin. These filters examine incoming e-mail messages and calculate the probability of it being spam based on each message's contents.

But unlike simple content filters that simply troll text looking for specific words like Nigeria, money and opt, Bayesian spam filters evolve according to each user's needs, analyzing all mail to determine what words and phrases are apt to appear in a user's legitimate e-mail and which are not. This process is called training, and results in a highly personalized and efficient filtering system.

By throwing a hundred or so random words rarely used in sales spiels into each e-mail missive, spammers hope to thwart Bayesian filters by making the spam appear to be personal correspondence. Incorporating words that might be used in legitimate e-mails is also intended to poison the checklist the filter uses, forcing it to mark, for example, e-mails with somewhat common words like Amazon and fish as spam indicators.

The strange strings of words, which usually appear at the bottom of spam and sometimes in the subject line, are automatically added by spammers' mass-mailer software, according to Steve Linford of Spamhaus, an antispam advocacy organization.

"This random noise is technically known as a 'hash buster,'" Linford explained. "Hashing" is a technique used by some spam filters to quickly compare incoming mail to known spam.

"Most of the illegal-exploit spammers use hash busters and any other trick they can to get past filters, refusing to accept that people use spam filters because they really don't want spam," Linford added.

Baxter and Linford said that spammers' use of hash busting is definitely on the rise, but such tricks can rarely circumvent a well-trained Bayesian filter.

"To slip past the filters, spam messages need a lot of 'good' words in the hash buster," Baxter explained. "Good words vary a lot by person -- for instance, I would have a lot of computer terms in my e-mail, while a friend of mine uses e-mail to discuss his love of 1960s Corvettes. Words that my filter says are good wouldn't work that well for my friend's e-mail."

Content filters, which just look for specific words, can get hung up on analyzing a torrent of jumbled jargon, but the use of a hash buster in an e-mail is also a prime way of identifying e-mail marketers who are knowingly and deliberately spewing spam, said Linford.

"What spammers probably don't realize is that the mere presence of hash busters screams 'Spam!' and it's impossible for spammers to claim they're not spamming when the spam contains hash busters," Linford said. "Spamhaus sees hash busters as proof a spammer knows he's spamming and is deliberately trying to get past filters, so we actually come down on them harder when they're using hash busters."

And as much as spammers would like to believe that they can cleverly disguise their unsolicited missives, there's just no way to cloak sappy sales pitches.

"Spam is trying to sell you something," Baxter said. "So they still need to include their sales spiel, and they can't put too much garbage in the message or else the people they're trying to reach will not read the message."

Some spammers have started hiding hash busters from consumers by formatting the filter-fouling gibberish in white text on a white background. Users probably won't see it, but the filters will still be able to "read" it.

But it's not hard to filter for that trick, either.

"In the end spammers who use hash busters are just making it easier for filters to spot spam," said Suresh Ramasubramanian, manager of security and antispam operations for Outblaze, a Hong Kong-based provider of e-mail and messaging solutions. "You just train your Bayesian filters to look for the presence of white noise, and treat that as a sure sign that the message is spam.

"Happily, spammers are sometimes a bit too clever for their own good."