Uncovering the Scraper Bots Plaguing APIs
- Netacea, Agentless Bot Management
6 minutes read
Most cyber threats — like credential stuffing and card cracking — are committed by fraudsters with the aim of stealing money, data, or both. The law is clear on these cyberattacks: online fraud is illegal.
But unlike these overtly malicious threats, web scraping isn’t always illegal, or even unethical. Aggregator sites like travel agencies and price comparison websites use scraper bots to help customers find the best deals. Governments and journalists use them to gather information for statistics and investigations.
But web scraping can also jeopardize online businesses. Just like credential stuffing and carding attacks, malicious programmers create bots that scrape websites to steal content, data, and product information.
In this article, we’ll uncover what scraper bots are, how they affect website APIs, and how you can protect your API from scraper bot abuse.
What are scraper bots?
Scraper bots are automated programs that crawl websites looking for specific information. They then capture this information and send it back to the programmer in a readable format, such as a spreadsheet.
Commonly scraped site data includes:
- User data — personally identifiable information can be stolen and sold on the dark web
- Site content — good site content is valuable, so content scraping can improve competitor sites’ SEO, while damaging yours
- Product information — scraping pricing and stock availability information enables competitors to undercut you.
Most scraped site information can be collected manually. But it’s usually too time-consuming and/or difficult to do this. So scrapers use scraper bots to perform the actions automatically.
What is an API and why is it vulnerable to web scraping?
Many websites and apps use an API — or Application Programming Interface — so developers can create external programs that interact with their website. The API is an interface that allows specific groups to access their website or app data:
- Open API — an interface that anyone can access
- Partner API — an interface that selected external partners can access
- Private API — an interface that can only be accessed by specific company employees.
Open APIs allow all developers to create integrated services for site visitors. If they’re used ethically, this can create great UX for customers. It also potentially brings your website to new markets, as more businesses or developers link their services with yours.
But it also leaves your API exposed to malicious online entities. If your API isn’t secure or restricted, bad scraper bot programmers can easily exploit vulnerabilities within it to steal classified information (such as user data). They can also use it to gather available information (such as content, pricing, and stock availability) at lightning speed, allowing competitors and other organizations to undercut your business.
What’s the difference between web scraping and using an API?
A lot of commonly scraped information is readily available through a website’s API. So you might wonder why programmers go to the trouble of using a web scraper bot, which may need to emulate a real web browser to bypass common defenses.
If the site owner allows you to collect the data you want using their API, there’s no need to jump through so many hoops to look ‘human’. But most websites will have terms or security barriers that restrict how you can use their API. In most cases, this includes limiting what kind of content can be scraped and how scraped content can be used.
Programmers who don’t want to or can’t get permission from the API owner to collect data via the API might turn to web scraper bots to get the information they need.
Who is affected by API scraping?
All websites and apps with an API are susceptible to API scraping. Big retailers are particular targets, as they often have valuable content and large amounts of user data hidden in their API.
However, some types of scraping are sector-specific:
- Odds scraping — betting sites and online casinos are often subject to odds scraping bots due to the rise of arbitrage betting
- Crypto scraping — investors scrape crypto content to pre-empt the markets and find the best time to buy or sell.
Real-life API scraping: how web scraping affects real businesses
At Netacea, our bot management software detects all kinds of scraper bots, no matter how sophisticated they are. So we’ve worked with some of the world’s biggest retailers to block scraper bots targeting their APIs.
One large US retailer knew scraper bots were attacking their product listing API, which contained data on product pricing, specifications, and stock availability. This enabled attackers to scalp in-demand products like PlayStation 5 consoles. Huge volumes of bot traffic also compromised site performance for their real users.
In another case, a large fashion eCommerce site fell victim to scrapers that were collecting content and pricing information from their API. They had also been affected by carding attacks — so they feared bad bots would generate even more cyberattacks, such as scalping and fraud.
Ultimately, while scraping itself isn’t always illegal or unethical, if it’s used by bad actors it can lead to:
- Loss of business — competitors may scrape your prices so they can undercut you and win customers at your expense
- Intellectual property theft — your brand can be quickly compromised if someone scrapes and republishes your content without your permission
- Scalping — stock availability scraping can lead to product scalping, which ultimately damages your business’s reputation with genuine customers
- Poor site performance — if your website or API is slow, your customers may eventually do business elsewhere
- Inflated infrastructure costs — Serving billions of requests by unwanted scrapers has a tangible cost for no return, and removing this burden means you can downsize your infrastructure.
How can you protect your API from scraper bots?
Scraping isn’t always a bad thing. If you want to be featured on aggregate sites, or ensure your data is available for statistical research, your API should be accessible to those that need it. But you must also ensure it’s protected against malicious scraper bots that endanger your online business.
Restricting access to your API is one way to protect it. If you currently use an open API, you might want to consider changing it to a partner or private API. But any API can expose data if it has security flaws within it. Besides, many programmers are able to create sophisticated bots to infiltrate even private APIs.
That’s why bot management is the key to API protection. Netacea’s agentless approach to threat detection means we can analyze all web requests, including API traffic, with one solution.
In the two cases above, Netacea’s bot management software quickly detected and contained scraper bot attacks. The first involved dealing with extremely high search volumes — billions of API requests every day — and using our real-time threat detection platform to identify bad bots among this traffic.
We integrated our software with their content delivery network, so we could rapidly roll out our solution with zero impact on site performance. And our AI blocked traffic with malicious intent as soon as it encountered them.
Ultimately, we reduced bad bot traffic to their API by 84% — equating to more than 10 billion requests. That meant they could save money by reducing their infrastructure — and they’re automatically protected against future bot attacks.
Read more about how we helped this big box retailer survive sustained scraper bot attacks on their API. If you want to learn more about how Netacea’s anti-scraping bot management solution can help your business, watch our two-minute demo.
Schedule Your Demo
Tired of your website being exploited by malicious malware and bots?We can help
Subscribe and stay updated
Insightful articles, data-driven research, and more cyber security focussed content to your inbox every week.