Inside Out Outside In

Stopping Spam is like trying to stop a thief.

Spammers, like thieves are generally opportunistic in nature.  email is free, so they spam.  Thieves steals if they  thinks they can get away with it.  Drive by a big electronic store and look at the front.  It used to be that all you would see were iron bars protecting the windows, until the thieves learned it was easy to steal a car and just drive it through the window for a smash and grab.  Now you see big cement columns.

Other than the obviously escalating security measures, the main idea is to make it so unattractive for the thieves to steal from you that they choose to steal from somebody else.  To actually nab the offenders requires a bit more effort.  Camera's help, but to really stop them, you have to be able to stop them in the act.  This requires onsite enforcement which can be costly.

The same goes with spam.  As coders, we have obligations to prevent the opportunistic spammer from abusing the systems we create such as blog engines.  There are many ways to accomplish this, mostly by attacking the core nature of the spammer which means you have to understand your adversary to begin with. 

The following steps, in an escalating fashion help to direct the spammers attention elsewhere.

  • Eliminate automated system attacks
    Spammers implement bot networks because it's easy and requires little effort.  A wide enough spam campaign using bots can manage enough of a distribution to make money.  This is where the Captcha comes in.  The strongest Captcha is the one that can't be deciphered by machine. This also means it's sometimes hard for a human as well.  A Captcha is a distorted image containing a pass phrase which must be entered. A Variant of a Captcha is to use a fill in the blank approach containing a phrase that your target audience can identify or a multiple choice question such as "Who's buried in Grant's Tomb".  Both Captcha's and the variants must also be random to prevent automated systems from being adapted to the system.
  • Eliminate Volume
    Actually part of the automated system attack, but if you have a blog it would be pretty unusual for anyone to post 100 comments in an hour.  Set a threshold limit where a specific user can only send n number of comments.  This can be accomplished by checking the IP address of the submitter (except for wide area bot attacks, see next step).
  • Eliminate Anonymity
    Spammers thrive on not being found out.  Requiring a validation step where an email is sent which must be replied to attacks this aspect of the spammer.  There are many levels to validation, including simple reply, an approval formal request method, or even submission of documents of some type.  Also in this category is the "Pay to Play" idea where you have to be a paying member or donate to your favorite charity.  PayPal is great for this.
  • The Gate Keeper
    The final step is the most resource intensive, it's YOU!  Moderate the comments and require approval of all submissions before posts and re-broadcasts.  This can be a little easier if you have trusted friends that can be set up as moderators.

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
Michael Dinowitz's Gravatar Your solutions are really all social ones. Captcha depends on sight and fails if your site is 508 compliant. It's also annoying. \n\nVolume limits are also good except when you have an active site or an active thread, such as Damon's recent one. \n\nRequiring people to sign in is ok unless they just want to post and run. For those people, the act of registering is a pain.\n\nFinally, being a gatekeeper/moderator is ok if you have a small or slow site but once you get into something big like the CF-Talk list or some of the really active blogs, moderation takes more time than its worth. \n\nI'm all for a more technical solution. I just talked about flash forms as a mechanism of blocking form spam on Blog of Fusion and will be covering the anonymity point in great detail as well. The bottom line problem is that any social solution will result in a reduction in your traffic as people get annoyed at one thing or another. I prefer to avoid that if I can.
# Posted By Michael Dinowitz | 8/4/06 5:20 PM
Michael Dinowitz's Gravatar The "your comment will be reviewed" message is a nice gateway but it's more work for you and annoying to people who expect their posts to not only go through immediatly, but to not be censored, even in theory. Reviews always make one worry about that.
# Posted By Michael Dinowitz | 8/4/06 5:23 PM
DannyT's Gravatar Sound advice, it's just a shame that such measures are required, especially as it equally deters passing commenters from getting involved (particularly with blog spam).\n\nSolutions such as http://akismet.com/ are also an option
# Posted By DannyT | 8/4/06 5:37 PM
admin's Gravatar Michael,\nThe variant approach of fill in the blank addresses the section 508 compliance factor. \n\nVolume limits when targeted at an individual should be OK, I guessing that 100 posts for an an individual in one day is within reason as a threshold limit. These are base ideas. Combinations of these ideas can build a robust, yet friendly system. After a threshold is reached, they could still submit, but you send a message to the user that they have reached they daily maximum threshold and future submissions for this time period will be reviewed.\n\nThe post and run still works with validation. It's just posted but will not appear until they check their email. Again, combined with a review process, I can choose to approve a message regardless of validation.\n\nMy take on it is if it's worth writing, it's worth being approved. Yes there is a review worry, but this is not a "Public" forum even though I allow comments. Because of this, there are liabilities involved, actually more so since I have the approval process and run this blog in an editorial capacity. \n\nI'm not going to allow any comments that I would consider libelous, obscene or are what I consider vague attempts of blatant site promotion with nothing to add to my blog, such as "Good Post, visit my site at Viagra.com"\n\nThe end goal for me is to make sure that any post to the system comes from a human and not a machine. If I don't stop the junk on the front, I'll have to delete the junk at the end, which can take more time. As machine intelligence for spamming grows, there will always have to be a corresponding growth of technological measures to counter it. Bayesian Filtration looks promising. Black Lists work OK except for bot armies.\n\nUntil we have that absolute unique identifier to an individual (lets not talk about big brother issues); we'll have to find the middle ground. There are other ways to attack as well such as better laws that are enforceable at a global level.
# Posted By admin | 8/5/06 12:54 PM