Google's war against scraper sites continues
- 28 August, 2011 04:03
- Comments
Google appears to be getting ready to launch another offensive against website scrapers.
Scraper sites are usually operated by spammers. They copy almost all the content of the scraper site from other websites. By doing so, they hope to exploit the popularity of the material from original content makers to steer search engine traffic to their sites to make money through advertising.
"Scrapers getting you down? Tell us about blog scrapers you see... We need datapoints for testing," Google's web spam leader Matt Cutts said in a recent tweet.
Cutts' war cry illustrates Google feels more effort is needed to combat scrapers.
Along with his tweet, Cutts included a link to a form that allows web surfers to report scraper pages to Google. Some of that information may be used to test and improve Google's algorithm, the company said.
The form asks for the text of the search query that produced the scraping problem -- such as a scraper site outranking an original content site -- as well as the URL for the original content site and scraper site. There's also a form field for top-of-head comments.
Some scrapers are so successful in what they do that their sites achieve higher search engine rankings than the sites of the content makers from whom they pinch their material. Google attempted to correct that situation in January, when it changed its top-secret search algorithm aimed, among other things, to address the scraping problem.
Scraping, along with search results poisoning, have long been a problem with search engine results, although Google has steadfastly defended the quality of its results, saying the results are better than they have ever been in terms of relevance, freshness and comprehensiveness.
Earlier this year, Google announced changes, including filter changes, in its algorithm. The filter changes, referred to as "Panda," didn't quell the problem. Quite the contrary, it may have made it worse. "We've experienced a significant drop in our traffic (almost 35%) as a result of this change (with an equivalent drop in revenue)," wrote one webmaster after the change took effect. "We believe that our only crime is that we host user-generated content."
Google took another crack at the scraping problem in June, when it rolled out version 2.2 of the Panda filters. Reviews of that move appear to be mixed.
With this latest effort by Google to garner information on scraping sites, maybe the next version of Panda will finally put the issue to bed.
Follow freelance technology writer John P. Mello Jr. and Today@PCWorld on Twitter.
- Bookmark this page
- Share this article
- Got more on this story? Email TechWorld
- Follow TechWorld on twitter
- Automating Your Processes to Outperform Your Competition
- Seven SOA Practices to Unlock Business Value
- Top Reasons to Implement an SOA Governance Strategy: A List for IT Executives
- Best Practices for Implementing a Data Warehouse on the Oracle Exadata Database Machine
- High Availability with Oracle Database 11g Release 2
-
CSIRO develops hands-free technology for mining repairs
-
Broadband Forum to improve IPTV performance with new spec
-
Amazon Web Services moves backups to cloud with new appliance
-
Callforfree.net.au offers free calls to 70 countries
-
Intel ponders solar-powered CPU tech in graphics, memory
-
Windows 7 for Seniors for Dummies®
-
Teach Yourself Visually Windows 7
-
MYOB Software for Dummies 6E Australian Edition
-
Microsoft Office
-
Windows 7 for Dummies® Dvd+book Bundle
-
Computers for Seniors for Dummies, 2nd Edition
-
Excel 2007 All-In-One Desk Reference for Dummies
-
Office 2007 All-In-One Desk Reference for Dummies
-
Office 2007 for Dummies








Comments
Post new comment