A spider trap is anything that would prevent a Search Engine from crawling your website. Spider Traps can include dynamic pages, pages containing text on images, no text links, password protected parts of your site, or frame sites.
To prevent spider traps properly format URL’s in your website.Dynamic or database-driven web pages that are generated via scripts, databases, and/or have a ?, %, $, =, +, or % in the URL can present spider traps.
Search engines tend to have problems fully indexing dynamic websites (in other words, sites that are hooked up to a database of content.
Spiders also have issues with overly complex URL structures. A common problem created by programmers is passing UserID’s and SessionID’s in the url string. By doing this the spider may see the same page with dozens of different urls’. This not only creates a spider trap but can also cause duplicate content issues.
These types of urls are not only an issue to spiders but can also create problems for end users. Such URLs could cause issues for customers if they copy the URL and paste it in an email to a friend, or add a link on their own website to that particular page deep within your site. If this happens when someone post a link from there site to your it could prevent a back link that would help your seo efforts.
A spider trap exists when a search engine spider keeps following links to URLs that appear to be different from URLs that have already been explored, however it is the same content.
Each search engine has its own tolerance levels as to how many variables in the URL are acceptable. The idea, however, is to eliminate all signs of the dynamic nature of the pages from the URL, in other words removing all stop characters, question marks, ampersands, equals signs, cgi-bin, user IDs, and session IDs from the URLs to make the page infinitely more palatable to the spiders.
Not only does a clean, simple URL eliminate the potential problems, as a bonus, the site is also more likely to garner more "deep links" from other sites because the URL looks user-friendly, stable, and easy to copy-and-paste (into a web browser, email message, or web page editor.
The best approach is to replace all dynamic looking links with search engine friendly ones. Don’t be tempted just to take a short cut approach and create a site map with links to all these search engine friendly URLs, leaving all the remaining links as is across your site. We say this because the URLs that you haven’t fixed will not enhance the PageRank score of the pages with the friendly URLs. You want to maximize your PageRank score by having as few variations in each URL as possible. Variations in the URLs lead to PageRank dilution because not all possible votes are voting for the same page. Some of them are spread out, some of them voting for some versions of the page with one URL and others voting with other versions of the URL.
One option is to fix the URLs on the server or, alternatively, use a third party hosted proxy serving solution.
The first option is preferable given IT resources to implement it, and the server supports the technology required for URL re-writing (for example mod_rewrite for Apache, ISAPI_rewrite for Microsoft's IIS Server).
If such rewriting modules or plug-ins are not available, alternatively recode your scripts to look for variables embedded within the directory names or the file names instead of the "query string," however this tends to be quite a bit more complicated to implement.
|