Our Web Scraper has to respect the rules found in a websites robots.txt file. One of the main reasons for this, other than being nice, is that web scrapers that don't follow the robots.txt file can find themselves being blacklisted by a honeypot service.
These services use the robots.txt to tell a web scraper not to visit a certain file that is linked to from the website. If the web scraper still visits the file then the web scrapers IP address is blacklisted, preventing the web scraper visiting the web site in the future.