Sunday, April 23, 2006

Get Rid of Referer Spammers

What is a referrer?

When a user follows a link from WebSiteA to WebSiteB, it generates a logfile entry in the web server logs of WebSiteB.

Here is a sample Apache logfile entry showing a link from Search Engine Optimization Services to this web site:

195.173.57.160 - - [10/Jun/2004:03:48:56 -0600] "GET / HTTP/1.1" 200 12569 "http://www.search-engine-optimization-services.co.uk/seo-expert.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"

How do webmasters track referrers?

Most webmaster use a web site statistics reporting tool such as AWStats or Webalizer. Some webmasters (either purposefully or unintentionally) make the reports generated by these tools publically available.

One of the reports generated by most web site statistics reporting tools is a "top referrers" report. This reports lists the remote web pages that have sent the most visitors to the web site.

How are referrers used to spam?

In most web site statistics reporting tools, each top referrer is listed with a link.

That link becomes a backlink to the top referring page.

Some unscrupulous webmasters create phony requests of other web pages, just to get a spot in the top referrers list.

This technique is utilized to artifically boost Page Rank.

Unfortunately, this technique has several negative consequences:

  • It uses resources (bandwitch, disk space, CPU cycles, etc...) of the "victim" web site.
  • It fills the logfiles of the victim website with trash, making them useles for statitical analysis.
  • It "steals" backlinks from the actual top referrers.

Preventing referrer spam

The easiest method to avoid becoming a target of referrer spam is to not publish your web site statistics publically.

If you wish to publish your web site statistics publically, you can manually ban web sites which use you for referrer spam.

If you have already been a victim of referrer spam, there are tools available to clean the spam entries from your web server log files. One such tool is Referrer Spam Fucker 3000.


From the readme:

REF(errer)SPAM FUCKER 3000
v1.1, April 8th 2004
(c) Carlo Zottmann, [Link]

This script is supposed to parse an Apache log file, look for all domains
that exceed a certain amount of accesses (threshold is set as percentage
of all accesses), normalize/generalize them and then add it to a
.htaccess file in order to block those domains for the future.

Example ruleset generated by this script:

### REFSPAM FUCKER 3000 ### START
SetEnvIfNoCase Referer ".*(intralinx).*" BadReferrer
SetEnvIfNoCase Referer ".*(health-shop).*" BadReferrer
SetEnvIfNoCase Referer ".*(moneyblaster).*" BadReferrer
SetEnvIfNoCase Referer ".*(bartertraffic).*" BadReferrer
# NOBLOCK google
# NOBLOCK vivisimo
# NOBLOCK feedster
order deny,allow
deny from env=BadReferrer
### REFSPAM FUCKER 3000 ### END

Please note that the rules are added to the very top of the .htaccess
file. The file is opened, the lines are appended, then the file is
written again. It doesn't overwrite anything in your .htaccess file.
It still might be a good idea to BACKUP your .htaccess file prior to
running this script.

I run RefSpamFucker3000 every night on my spam-ridden domains, so the
blocking process is fully automated for me. Which is good.

It might not be perfect, but it works for me. YMMV. Also, the script is
not very tidy -- it was a half-an-hour job, and I decided to work like
evolution: it does the job, so it's good enough


You can also try astatspam.

It is a script like this, but it has some extra feautures, plus it connects to a daily updated database of spam domains

The url is http://www.thetopsites.net/referer_spam/

delicious digg technorati yahoo newsvine google socialize