| Search engines like to see unique content in pages ranked in the top ten. Cloaking is definitely one way of insuring that scrapers don’t get our proprietary content. It’s also a method to hide what may be considered duplicate content by the search engines if more than one URL points to the same content. |
| There is also a technique we must incorporate in the actual writing of our body copy, and it’s called the. |
| Snippet Optimization Process, or SOP gets its name directly from the mathematical formulas expressed in a patent by Google (and similar methods are used by most other engines as well) to determine if the content on two pages are the same. A summary of this from the patent is as follows: |
| A summary of Googles SOP patent is as follows: |
- Each of the candidate results (CRs) undergoes a process known as document linearization. The goal is to represent each CR as a linearized stream of terms or in other words a text stream).
- Next a sliding window of a given length is superimposed over this text stream and shifted. The patent makes provision for using a sliding window consisting of about 15 terms or of about 100 characters.
- If a term window is used then the idea is to shift this window, one term at a time and counting either term frequencies (occurrence) of the queried terms or number of unique queried terms inside the current window.
- The window is shifted until one reaches the end of the text stream. Then all the windows as sorted according to either term frequencies or number of unique queried terms. After sorting, the two windows with the highest counts are used to define a query relevant (QR) snippet. Thus, each snippet should consist of roughly 30 terms or 200 characters.
|
| SEO best practices for avoiding duplicate content |
When writing ALL copy for any page put the intended keywords at the beginning of the title tag and Meta description tag.
Then include <h1> and <h2> header tags right after the body tag.
<body>
<h1> KeywordText</h1>
<h2> KeywordText </h2> |
| Style them the page with CSS to match to look and feel of the web site. Then define the H1 and H2 tags with similar keywords to what was used in the title and Meta tags. |
We then continue optimizing the rest of the document, submit this to Google. When Google has indexed and ranked the page, search Google for the intended query. Often, Google displays the <h1> and <h2> snippet in its entirety. This is confirmation that we have avoided the duplicate content filter. |