ZboX
Founder
Administrator
Posting Maniac
    
Offline
Posts: 1695

What 'ya got there??
|
Hi Xena, When sites change their content that usually will generate a lot of 404 errors. The problem there is that some search engines keep the now defunct pages/images indexed for seemingly forever. Not much you can do about that. Check the IP numbers that are looking for the missing pages. If those IP's repeat over and over you can of course deny that IP. The best tool I've found for investigating IP numbers is:
http://centralops.net
The Centralops "Domain Dossier" will give you a wealth of info about IP numbers if it's available.
Another thing you can do to help is to use a robots.txt file. The robots.txt file should be placed in your root public_html directory. Here's an example of a robots.txt file:
User-agent: * Disallow: /image1/ Disallow: /image2/ Disallow: /image3/ Disallow: /board/Smileys/ User-Agent: OmniExplorer_Bot Disallow: / User-Agent: RufusBot Disallow: / User-Agent: Gigabot Disallow: / In this example I am not allowing spiders to crawl my image directories. I'm also refusing them access to my forums Smiileys directory. And I am slamming the door totally on OmniExplorer, RufusBot, and Gigabot, as they are known ill mannered bots. However a robots.txt file will not keep out evil bots. They just blow right past it. For those IP deny is your friend.
~;-)
Bert
|