Give Feedback
HomeTechnologyInternet

When Ghosts of Content Past Haunt Your SEO

Sometimes a site’s own past dampens its bright SEO future. The forgotten designs, the also-ran campaigns, the abandoned domains, the general rubbish that accumulates like dust bunnies in the deep corners of a web server, all can have a hidden drain on a site’s ability to rank and drive natural search-referred traffic and sales. Search engines retain records in their indexes long after site owners have forgotten that content exists. It may be orphaned. It may be unloved. It may not be a business priority. But as a hidden source of duplicate content, it could be hurting your natural search performance.

Domain Protection
Abandoned and forgotten domains are the easiest example. Companies tend to register many domains to protect their brand names at all the different TLDs and misspellings. Some get tripped up by actually hosting duplicate versions of the primary site on all of the domains they’ve registered. Someone in legal registered the domains years ago, someone in IT hooked them up, and no one has thought about them in years. But the engines remember.

For example, Coldwater Creek has inadvertently allowed its entire site to be crawled and indexed at https://www.coldwatercreek.org, https://www.coldwatercrek.com, https://www.thecreek.com, and many other variations. Notice that these examples are all on the secure https protocol. Coldwater Creek has optimal 301 redirects on the nonsecure http protocol, preventing the duplicate content issue there. Placing the 301 redirects on the secure protocol as well would remedy situation.

Finding alternate domains that may be live without your knowledge is as easy as searching for a strong of text that’s unique to your site and excluding your domain from the results. For example, try Googling the following for Coldwater Creek: Careers at Coldwater Creek Investor Relations Social Responsibility Terms of Use © 1984 -site:coldwatercreek.com

Legacy Content
Content becomes “legacy” when a site redesigns, rewrites URLs, discontinues products, and when campaigns expire. The SEO optimal action for legacy or discontinued content is to 301 redirect it to the most relevant alternative. More frequently, it’s just removed from internal linking structures or promotions to it stop. It’s abandoned in its live state. Because the URLs load with a 200 OK server header status when requested, the search engines retain them in their indexes.

For example, New Balance has a lovely set of URL rewrites in place at the initial category & product level. Unfortunately, they neglected to 301 redirect their legacy URLs to their new rewritten URLs. At the category level, we have http://www.newbalance.com/productList.php?cat=2 and http://www.newbalance.com/outdoor/footwear/ loading the same page of content. At the product level we have the legacy http://www.newbalance.com/get_product.php?style=WW811/&cat=6&subcat=5&ptype=1&g=w and the rewritten http://www.newbalance.com/fitness/walking/WW811/ loading the same page of content.

In other instances, legacy content can be a missed opportunity rather than a duplicate content issue. Contests and promotions that generate buzz can accumulate a decent number of backlinks before they expire. Leaving them live to wither away is a wasted opportunity to harvest that link popularity to strengthen the site. For example, the band Linkin Park had a MySpace contest in June 2008. The contest promotion page is still live at http://linkinpark.com/myspace-contest and has acquired a modest visible PageRank 2, with 25 external links from 10 unique domains linking to it. A quick 301 redirect to another relevant page on the site would put that link popularity to better use to strengthen content that the band actually wants to rank.

Finding legacy content requires patience and determination. It can be uncovered with a site crawl and by analyzing Google’s indexation data. For example, I uncovered New Balance’s legacy URLs by Googling a product number in their URL combined with a site query, like this: site:www.newbalance.com inurl:WW811. Then I did an intitle query to dig up the category example: site:www.newbalance.com intitle:"See all Outdoor by New Balance".

Unfortunately, there’s not enough room in this post to detail all the ways to find duplicate content. To learn more, read this blog post on finding duplicate content.
This was originally published here

Subscribe to commentsExpand all commentsRSS Subscribe to comments
Comments (0)