who ]
first ] random ] last ] calendar ]
links ]

 «  2004/07/05  »   1716

spam 

Spam's Plan For Defeating Bayesian Filtering

The first thing that sticks out of my mind in Ed Felten's article is that he begins with saying that Bayesian filters are "trained by the bad guys". Which is true, but it is not the whole truth; Bayesian filters learn from both good (hammy) and spam emails.

The key to Bayesian filtering's success is that everyone's e-mail is different. While tokens signifying spam don't vary much between users, those signifying useful e-mail do. For example, they may include the names of a user's friends and family members, or technical terms related to a particular profession. To get around a customized Bayesian filter, a spammer must customize a message for every user, and by definition, spam isn't customized.

[more]

The idea of poisoning a Bayesian filter doesn't work simply because the filter will adjust the token ratings by decreasing the weight of tokens that can occur in both spam and ham while increasing the weight of the differing tokens. It will, in essence, make the filter "tighter" (a smaller set of differing tokens), but no less effective (as they will be weighted all the more heavily).

If the spammers were really smart, they would monitor users' emails and build custom spams for each individual. Not an easy task, given that spams are sent out in the millions. It's made even more hazardous considering the wrath it would invoke among indignant users/companies/nations. Even if they could do it, spammers would still have to find a way to insert their HTML and/or URL in there without flagging the Bayesian filter.

I guess I'm saying that while it IS possible to fool Bayesian filtering, you can't do so in a practical manner. Especially not in the manner that spammers are currently trying to do it.

Spammers are our enemy, and our enemy will show us where we're weak. But they have as of yet to convincingly beat any Bayesian filter I've used (keeping in mind that I get several hundred spams a day).

[Comment on the above]

tenshi
There's ham now? No one wants spam now there's ham

Recent comments

2010/08/03 Hwan I won't say that all is well (for I don't believe it to be so), but I am better. Thanks to all for asking!
2010/07/20 QYV Expected range for Creatinine for guys is 60 - 110 umol/L
2010/07/20 llamariffic Hmm, macrocytosis here as well, but to be honest I've had it since before I truly embarked on drinking as a proper hobby. Similarly, stopped drinking entirely, and it didn't go away. Just one of those things, I think.
2010/07/19 girl ack!! It's weird to think that I am now a parental unit. It was nice to see you hwan!
2010/05/21 Hwan I recall trying earplugs well back in my undergrad years, to mixed results. My sleep was troubled by feelings of claustrophobia. I also have a, perhaps unfounded, fear of not hearing the essential alarm in the mornings. However, I may give these another go, thanks.
2010/05/21 llamatron Have you tried sleeping with earplugs? My flat faces out onto a main road, so I've started using the standard foam plugs. It took a few nights to get used to them, but they make a big difference.
2010/05/21 girl The original swedish title: "Men who hate women". I'm not sure if it's the fault of the translation, but I never liked the reporter dude.
2001/03/07 Hwan Damn.. it seems Unweb has since died. http://www.gamegrene.com/node/183
2001/03/07 TY SHARDEL YOU CAN TRADE WITH THE UNIVERSE AND ENABLE SOCIAL NEEDS, OR PERHAPS POST WISH LISTS, HUG THE GLOBE LIKE A BIG OCTOPUS... TY
2010/03/24 Hwan I am amused by the John Irving comparison. http://en.wikipedia.org/wiki/John_Irving#Recurring_themes

[X]

Google