Skip to main content
All CollectionsTroubleshooting and FAQ
Strategies to avoid automated and manual content scraping
Strategies to avoid automated and manual content scraping

When competitors or bots gather your locator content, use these strategies to thwart their efforts

Michael Fatica avatar
Written by Michael Fatica
Updated over a week ago

One ubiquitous risk to posting content online is that others might use it in manner for which it was not intended. The same issue exists with online directories, locators and partner networks.

Why Locators are a Target

Locators are commonly a target for competitors to harvest detailed information regarding distribution, sales and referral networks. They commonly contain direct contact information regarding the target brand's distribution, sales, referral or stockist networks.

Competitors can use this contact information as a "hit list" of sales opportunities from which to invade a network and provide competing products and services. Protecting a locator against this type of usage can be very important in these types of environments.

Types of Farming

Content farming, or scraping, can be broken down into two basic categories, automated and manual. Automated crawling can be further divided into desirable and undesirable crawling. Certainly many customers do not want to block Google or other search engine crawlers from indexing their content for display and inclusion on a search engine; while bots that disobey robots.txt and farm content regardless can become a nuisance.

Manual content scraping can be harder to detect and block, since the behavior can look like normal customer utilization. Performing repeated searches and simply copying the results.

Mitigation Strategies

The following strategies are all built-in technologies already deployed in MetaLocator to block crawling.

  1. MetaLocator provides a built-in method that blocks manual and automated content farming by systematic searching. This prevents both users and bots from performing the same search over and over again but only changing one or two parameters. This isn't real user behavior and is detected and blocked, even when hidden behind a proxy network.

  2. MetaLocator uses Google Recaptcha on our forms and our locator search to prevent unauthorized automated interaction with the locator.

  3. MetaLocator leverages AWS Firewall with Bot Control. We use this to monitor bot activity and to block particularly malicious bots.

Some mitigation strategies can be deployed by our customers in the form of settings that make content farming difficult:

  1. Maximum Search Results. This Interface setting is found under Results Settings and limits the total result count of any search. In practice, users generally shouldn't need to visit multiple pages of results and so limiting the number of results can be frustrating for content farmers, both automated and manual.

  2. IP-based Blocking. MetaLocator can block specific IP addresses if a particular source of visitors is identified as problematic.

  3. Hiding content behind a click. When critical content is moved to the detail page or otherwise placed behind another click it can slow down competitor crawling.

  4. Password Protection. While not a good strategy for public locators, MetaLocator can put the entire locator behind a password if needed. This is a common strategy for locators that are for internal uses only.

If you believe your locator may be subject to undesired crawling, contact our support team and they can provide a helpful set of recommendations.

Did this answer your question?