No, it’s not an actual spider were talking about. It’s a program or script which crawls the World Wide Web in a methodical and automated manner. The process for this is often referred to as web crawling or spidering (hence the title), but you may know it as one of the following: web robot, ant, automatic indexer, bot or a worm (not the same as a virus).
Many sites use spidering to
keep their content up to date by gathering information from other sites that can be quickly shown or searched. This is done by taking a virtual snapshot of a website which is then placed into a database. A specifically written algorithm can then be run on the database, which finds indexed information that a user has requested.
Indexing each website makes searching much faster, as all the information needed is local and so can be processed with speed. In case you haven’t already guessed it yet, search engines are the biggest users of web crawling/site indexing. They depend on their spiders to go out and retrieve information from all sites on the internet.
All search engines have their own individual script, which is essentially a set of instructions for their spiders to follow. As they are top secret no one actually knows how they work, but generally they will start with a number of URL’s that the spiders will go to.
Each spider will be given a specific amount of time in which to crawl a URL, before jumping on to the next. The amount of time a spider spends on a website is usually set by the search engine for each individual URL. As the time frame set can be very short, it is very important that a website is SEO (Search Engine Optimised) with good key words, internal linking, site validation, fresh content and friendly URL’s etc….
The better the
SEO, the higher a page will usually rank in the search engine. If the site wasn’t SEO, then the spider would find it hard to get around and in some cases may even abandon the site altogether. If this was the case your website may never be shown in a search engine. So if you are one of the many millions out there waiting for your site to show up in a search engine, keep SEO in mind.