How to seal Apache and leave out spammers

A huge problem

If you are a webmaster or at least own a website, you probably know that badwidth is very expensive, but what is even more expensive is the waste of time associated with filtering spam out of our mailboxes.

Well, one of the preferred methods spammers have to collect email addresses is to use custom softwares called spambots to download all the content of websites and scan them in search of strings of characters that resemble email addresses.

Doing this nasty thing, they obviously consume our bandwidth subtracting net resources from honest users.

A working approach

If you are so lucky to have the first class webserver Apache, you can use its mod_rewrite module to try to distinguish spammers from regular users.

This method is based on the identification strings all web clients (aka browsers) send to the server when they initiate a connection. This strings usually contain informations about the browser producer and version, spambots have usually particular identification strings that can be used to identify them and leave them out of our site.

Of course we cannot avoid false negatives, in case the spambots use the same identification string of regular browsers.

Mod_rewrite: the URL swissknife

In order to use mod_rewrite we must use some configuration code in the .htaccess file (given that the module has been enabled globally by the system administrator).

An example configuration:


# attiva il motore
RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} Extractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Telesoft [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.Mozilla/2.01 [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule .* nospam.html [L]

The last line instructs Apache to send the special page nospam.html to all visitors whose user agent’s (aka browser) identification string matches one of the regular expressions listed above.

It's only fair to share...Tweet about this on TwitterShare on FacebookShare on LinkedInPin on PinterestShare on RedditShare on Google+