Skip to content

Content Scrapers, and Why You Should Worry About Them

In online marketing, high-quality content is a big deal. Hosting it on your site is a surefire way to move up in the rankings. It’s also why other sites will copy and paste your content into their domain. Some sites even have automated content scrapers for just this purpose.

Content scraping is illegal – your website’s content is your intellectual property, after all – but your content is still there after it’s been scraped, so why should you worry?

  1. Internal links in the scraped content are now backlinks, but often from sites of disrepute,
  2. The content is now duplicative, which could trigger a penalty for your site,
  3. The scraped version of your content can drag readers away from your site, and
  4. The scraped version could take points away from your search engine optimization (SEO).

But First: What Is Content Scraping and How Does It Work?

Content scrapers can take legal blog articles and hurt SEO

Plenty of people on the internet build websites that aim to attract viewers, build a volume of traffic, and then sell ad space on their site. The best thing about this business model is that these people don’t even need to write any original content of their own: Instead, their website acts as an aggregator of content in a given field by using automated content scrapers to pull posts and articles from other sites in that field, typically by monitoring a site’s RSS feed.

Google might hate these sites, but it might seem like a good thing for your law firm: Someone else is copying your legal blog’s content and posting it somewhere else? Sounds like publicity! Some of these aggregators even make it seem like it’s an honor to have an article reposted on their site.

It isn’t. And it can hurt in four different ways.

Internal Links Become Possibly Suspicious Backlinks

When you wrote the post, you probably included internal links to other blog posts and landing pages on your law firm’s website.

Content scrapers copy those internal links along with the rest of the content. When they paste the content into the aggregating website, though, those links are no longer internal – they are external links that point from the aggregating site back to your law firm’s site.

This is an important difference, because not all backlinks are created equally. Backlinks that come from highly reputable sites, like .gov or .edu sites, are great for your site’s SEO. However, just like outbound links from your own site, links coming back to you from sites that search engines look upon with suspicion can actually hurt your site’s SEO. And because they host duplicative content, aggregating websites that scrape content are almost always going to be sites of ill repute.

Speaking of Duplicative Content…

Search engines are in the business of bringing users to relevant and important websites, for any given search query. When two pages have the exact same content, search engine algorithms get confused and the user would see duplicative, and therefore useless, results if nothing is done. So Google and other search engines penalize sites that host duplicative content, if there are signs that it is being done to manipulate the rankings.

This penalty should be levied on the aggregating website that scraped your article, rather than your own site, because the search engine should be able to see that the website has no original content. However, it would not be unheard of to have your site hit with a penalty after having its content scraped.

Diminished Traffic Flow

Even if you avoid a penalty, two things can still happen that hurt your website’s SEO.

First, if a search engine detects duplicative content, but there are no signs that it is being done to manipulate the rankings – having multiple identical pages on your site is a common mistake – it will filter the duplicative pages from the search engine results page (SERP). If your page is the one to get filtered from the SERP, it won’t appear in the results, and anyone interested in your article will go to the aggregating site’s version of it, hurting your page view count and sapping the SEO benefits of writing such a popular article.

Second, if the search engine does not detect the duplicate content on your site and the aggregator’s, they’ll both appear side-by-side in the SERP, competing against one another for clicks and readers. Anyone who clicks on the website that scraped your content, though, is one less potential client for your law firm.

The SEO Snowball Effect

Even in the best case scenario, when a content scraper takes your legal blog article, posts it somewhere else, but your site does not get penalized and the two websites are now competing against each other, online, your law firm still loses. Any click for the other website is one less for your own, and those clicks snowball: Remember that ranking well for a given search query leads to more clicks that increase web traffic, and that more web traffic is one of the best ways to improve your ranking.

It is not unusual for this scenario to end with your original post to rank beneath the aggregator’s scraped edition of your legal blog article.

Professional Legal Blogging at Myers Freelance

That’s why you should be wary of content scrapers. In next week’s blog, we’ll go over how to detect them and what you can do to stop them from hurting your law firm’s online marketing efforts.