Understanding Your Web Analytics – Bad ROBOTs
Posted Thursday Apr 10 2008 | Anne Haynes | Analytics |
by: Queen of Search
I’ve been reading the WebTrends data for one of my clients and I continually find robots everywhere in the hit results. I’m working with the engineer to create filters and custom reports, but it’s insane how much traffic is coming from robots. None of the visits going to the site are real people. In order to tackle this problem – I’ve decided to update my education and conduct some Google research.
Some of these robots are designed to scrape email addresses and harvest them for spamming later. Most of the emails are used when renting email lists. If you’ve ever purchased an email list you know there is a huge variance between prices. Now you know where the cheap list providers get their email address, from the bad bots.
Last month an article came out naming the six web robots responsible for 85% of the email spam.
Some robots are designed to copy entire sites. Simply put, robots are never seen by the user, so they add no value.
There are a few ways to prevent web bots from accessing your website; change your htaccess.txt file and/or architect a high-tech robots.txt file. Web crawlers and bad robots will read the robots.txt file and use this file to know “where to goâ€. These are the devious web crawlers; web crawlers accessing the “don’t crawl†files using the robots.txt to know where to go.
I’ve heard horror stories about people creating the wrong type of robots.txt file. One person accidentally reversing the meaning of the robots.txt file. In other words, he/she entered in all the directories that he/she wanted the search engine spiders to index.
I found this old, but great resource when searching for solutions: How to keep bad robots, spiders and web crawlers away
If you continue to have problems, just write a letter to the spambots. Think Artificially.org wrote a funny blog post that’s a good read.
When conducting my research, I had to go to the Wikipedia.org and check out their robots.txt commands. Notice how they speak to the bad web crawlers telling them to slow down or they’re out. I also like the comment, “Friendly, low-speed bots are welcome viewing article pages, but not dynamically-generated pages please.â€
When everything is said and done, study, learn and test to find out what works for you and your website.
comments off |
