Confirming your address for spam

I just posted an idea up to the iMS list and got back an email from a list member asking me to confirm that I'm a human making a post. I rarely respond to these requests but on a whim I decided to check the domain that was making the request. The result was a failure. No website for the domain the mail was supposed to go to. This got me thinking.

A spammer values confirmed email addresses over all others. In the past, this confirmation came from people trying to unsubscribe from a spammer's list. This was quickly seen as a verification ploy and few do this anymore. So how would a spammer go about getting verifications for emails? Easy.

They subscribe to every mailing list they can and then set up a 'confirm' script to capture any list posts. Anyone who posts to the list will get a confirm message and most people just click the confirm without thought. The result is a list of confirmed email addresses that can then be used or sold.

The more I think about this and the more I write, the more I realize that this is a very powerful technique that can easily be automated. Just find a site that tracks mailing lists and set a bot to subscribe to all of the high traffic ones. A bot monitoring Yahoo and Google groups will do the job. I'd say the code to do this from start to finish should take someone less than a day using CF.

Scary

Anti-bot spam with simplicity

Spam-bots are getting smarter. I've seen bots that fill out forms with close to perfect information. I've seen bots that copy blog comments and then add in their spam. I've seen bots that adapt to their situations. What I have not seen are bots that can look at a form like a human does.

Let me give you a situation. House of Fusion has a sign-up form that not only allows for the standard contact information but also allows for multiple alternate email addresses. I've been seeing some spam coming through that has multiple emails of the same value. This reminded me of something very simple about forms - multiple form fields with the same name become a comma delimited list on submit.

This means that two input boxes named email that are both filled with "spam@spam.com" will result in a variable of form.email with a value of "spam@spam.com,spam@spam.com". This is wonderful!

All we have to do to block some of these new 'thinking' bots is have a form with two input fields called email. The display for the first will be email while the display for the second will be alternate email.

On the action page we just have to check that the form.email has 2 items and if they are the same to block the post as spam.

<cfif listlen(form.email) EQ 2 and listfirst(form.email) IS listlast(form.email)>
---spam code here---
</cfif>

This works great but there are always alternates. While giving this to Clark Valberg for use on the Developers Circuit contact form, he modified the idea into which led to another modification. The first is to take the second email input and label it as something other than email. Zip code is a perfect example as it is something not always required. To a human it looks like a standard form with zip and email but to a bot there's no zip but 2 emails. The second modification is to exchange form fields. Have the email form field labeled zip and the zip labeled email. A human will read the label and enter what is expected (zip in email field, email in zip field) while a bot will enter the wrong values.

These are simple checks that can defeat many of the spam-bots out there today and they all depend on the difference between how a human and a bot sees a page - something that will always be in our favor.

Modding BlogCFC: Moderated if url is present - Admin Update

There are two more minor tweaks to this anti-spam approach that need to be implemented. The first is to go back into the blog.cfc and comment out lines 1489 and 1491. This CFIF limits the recent comments to ones that have moderated set to true (which should be the default). Without the CFIF removed, spam comments that have to be moderated will still show up in the recent comments section.

The second minor edit is to the adminlayout, commenting out lines 47 and 49. Now the admin can control the moderated comments.

The more I fix this technique the more I see I have to do it right. One thing that I really have to look at is those CFIFs that 'comment out' the moderated db value in queries. Every comment has a moderated value whether moderation is turned on or not. The only reason I can find for these CFIFs is to remove one of the sql criteria and possibly speeding up the query by some fraction of a millisecond. Unless there's something I'm missing....

Modding BlogCFC: Moderated if url is present - Update

There is one other step needed to make this work if your using the hack. There is a CFIF statement in blog.cfc that checks if moderation is turned on or not before displaying a comment. It does not matter if a specific comment is moderated or not, if moderation is not turned on, then the code I mentioned in the last post will just not work. The comment will be saved in the DB as being moderated, but it will still show on the site.

If your using the hack, all you have to do is remove the CFIF tag on lines 1074, 1357, and 1489. Leave the content of the CFIF alone, just remove the tags. Once this is done, moderation will be performed on a per comment basis. If a comment is set to be moderated, it will not be shown. If a comment is posted with an url, it will automatically be moderated. Otherwise, comments will be posted as unmoderated and shown. Basically a "mixed moderation mode".

When the code is done for real, these lines should be modified to check if either moderation is on OR if moderationforurlposts is on (or whatever you call the variable).

Modding BlogCFC: Moderated if url is present

I've been getting some human posted spam to my blog and I want it to stop. The problem is, to stop a human you have to either moderate all posts, force a sign in to post or search the comment for one of a million odd keywords. None of those solutions appeal to me. Time to come up with something new.

Looking at the problem I see that almost all human posted spam involves a link to some outside resource. Ah, a pattern I can work with. If I can set moderation on for any comment that has a link, it would help block most human posted spam, at least in theory.

I can do this right or I can do it as a fast hack. As I don't have the time to do it right at the moment, I'm going to write a single regular expression and add it to the blog.cfc. I'll write myself a note to remember to go back and fix it correctly later.

The solution I came up with is to go to line 236 in the blog.cfc component and add in this line to the cfif statement:

or refindnocase('https?://', arguments.comments)

This will force moderation on for any comment with a link in it. Great for stopping most form spam, great for letting most comments through, terrible for letting real comments with links through.

To do this properly I'd have to set a new variable in the blog.ini file to specify if I want to moderate based on urls. Then modify the initialization code for blogcfc to grab that variable along with all of the others. Once that's done, line 236 would be modified to look at the new variable to see if moderation should be on for the post. Rather straight forward.

One thing I noticed when writing this post is that I never posted my article on my technique to block all bot spam. I know I posted it somewhere, but I guess I never posted it here. I'll add a reminder to myself to do that as well. Oh, for a 48 hour day. :)

BlogCFC was created by Raymond Camden. This blog is running version 5.9. Contact Blog Owner
House of Fusion | ColdFusion Jobs @ House of Fusion | Fusion Authority