Robots.txt
September 17th, 2007 Posted in Search Engine OptimizationRobots are programs that wander the web automatically performing functions and gathering information. They are also known as crawlers, spiders, or web wanderers. Sometimes they are called web worms or web ants. Robots are the generic name, spiders sounds more impressive when referred to by the press. Crawlers are robots that perform a specific function. Web worms usually are replicating type programs and Web ants are distributed and cooperating robots. These robots do not physically visit the sites, but simply request documents from sites, and thereby from any links to the sites.
Robots perform many functions, but the most common duties are indexing, mirroring, HTML validation, what’s New monitoring and link validation. Most people are familiar with the robots that perform indexing functions. By using the computer to collect and categorize large amounts of data, more effective use of human analysis can be made.
Robots are created to automatically perform the functions for which they are identified. The choice of which pages are accessed by robots is the dependent upon the type of robot and who the sending agent is. Robots may use an historical list of URL’s, server lists, or popular web sites in order to get a starting place of sites to visit. You can submit a list of the pages that you want to have crawled by a robot.
By reviewing your web log, you can determine if your site has been visited by a robot. If this is something you want to happen, then you need do nothing further. Once you see that your web site has been visited by a robot, the rest is automatic. The robot automatically schedules revisits, looking for anything different. The material that is retrieved from your website is stored on the search engine site where it can be used to perform the specific function for which the robot was created.
If you have pages that you don’t want the robot to access, there are specific coding and files which can be set up to accomplish the desired exclusion. If you can’t write a robots.txt file, you can choose to insert HTML information into your source code that will accomplish the same result. Exclusion of robots from access to any or all pages on your website is at best a ‘Do not enter sign” rather than a locked door. There are forums that have been set up to talk about various issues and problems in dealing with the whole subject of robots.txt.
There are several disadvantages to using robots.txt software applications. First is the undue strain it places on bandwidth resources. Particularly where users have low bandwidth quotas, a ‘rapid fire’ robot can be extremely threatening to the network. Robots will also place extra demands upon the server as well.
Other issues arising with wider use of robots include determining which pages should be included or excluded. In the past, almost everything collected by the robot was retained, which can be very wasteful and very expensive; Progression toward a standard across the industry has resulted in the web standard of exclusion. This is robot.txt. It tells the robot to ignore specific pages or blocks of text.














One Response to “Robots.txt”
By Carmel0 Lisciotto on Sep 17, 2007
Worthfile to add a robots.txt file to your site..
Carmelo Lisciotto