World’s fastest spam filter (awesome replies)

April 13, 2008

Years ago, insulting someone on the other side of the globe was difficult, and best left to diplomats and comedians.  But no more.  Such is the power of the internet.

I feel bad that I was busy this week, and didn’t approve many of the comments from my last blog entry until this morning.  What’s important is that they’re all approved now.  These comments are especially awesome – when I write that I don’t understand a group of languages or read certain alphabets, I love it that I’m insulted in those same langauges/alphabets.  That’s quality – keep up the good work.

Almost as awesome as some that (apparently) thought my post was about only allowing English (or specifically the latin alphabet) for email/communication.  Nowhere near what I was saying, please reread.

Recap: If a user in $BIG_COMPANY_WEBMAIL set a language preference, and all outgoing emails are with a certain alphabet, and 100% of incoming emails in different alphabets are marked as spam, at some point an algorithm should take note.  This isn’t only important for the quality of the individual user experience, but for group filtering. Example:  If a user in Cairo doesn’t understand Hebrew, and gets a email with Hebrew characters, not only might a filter pick that up, but once flagged it only makes sense to check how many other users got the same email context, and reflag.  Hence my post title: For Wade, only checking for certain unicode characters would give a Bayesian probability of 100%.  That’s fast.  If anyone knows how to do this in gmail, let me know.  I’m certainly no email administrator, but it seems like that would be a simple and effective means to identify spam.

My only concern with the comments I received?  Either there was too much slang for babelfish.altavista or, or some people weren’t as creative with insults as I had hoped.  Could my comment forum be used to improve the quality of translation tools?  Let’s hope.  If you’re especially proud of an insult in my blog but are afraid your creativity will be lost, make sure to let Google know.  In the meantime, I’ll look for more imaginative insults in languages I don’t understand, and I’ll continue to look for Unicode/UTF filtering options for gmail.


