As the global amount of data produced hits a whopping 2.5 quintillion bytes per day, web scraping has become indispensable for any business that wants to collect publicly available data at scale.
We’ve seen a significant rise in data collection tools – from APIs to subscription-based services – each offering a different approach.
However, as this concept becomes more complex, so does the pricing, which makes it harder for companies to assess how much they’re willing to spend. In this article, we’ll make it easier for you to understand how web scraping pricing works. And, of course, help you choose a solution based on your budget and data extraction goals.
Why are Web Scraping Prices So Confusing?
Well, web scraping pricing isn’t confusing for the sake of it being confusing. In theory, the concept is straightforward, but there are many complexities related to accessing the raw data. For example, each site’s technology and setting elements differ. Because of this, extraction solutions need to offer different functionalities based on the project you’re working on. This ties well into our first point:
1. Web scraping has become more complex
Two reasons why web scraping is becoming more complicated:
Anti-scraping techniques and web technologies advancement.
Since data has become more valuable, more people are using web scraping to collect data at a huge rate, with some creating poorly optimized applications that can really harm the target websites.
This has worsened the race between web scrapers trying to collect data and websites trying to block them completely. Along with the latter, websites are implementing newer and more sophisticated anti-scraping techniques that go beyond header and IP fingerprinting.
In return, data collection services must adapt to deal with these roadblocks effectively, adding more processes to the mix and thus increasing the details to take into account for their pricing.
On the other hand, the web has also gotten more complex, with websites showing geo-specific data and using JavaScript to inject dynamic content, making it more difficult to scrape efficiently.
Regardless of what you choose, it must handle anti-scraping techniques and website complexities without hurting speed and reliability. There’s no point in a tool being fast if the majority of the requests fail, and if it takes too long, it will delay other operations.
2. There are many different approaches to web scraping
Users’ varying knowledge requires a different approach to web scraping that heavily influences the pricing of a tool and the features it provides. For example, teams without development experience nor a development team would be more attracted to a plug ‘n play solution, which will charge differently to a web scraping API or a low-code tool.
This also means that comparing different solutions can become tricky if you don’t know what you need or don’t have the knowledge to understand the different layers of the pricing models. Although every solution provides more or less the same output (mostly formatted data), the approach to get there is different, hence the pricing model.
3. Every company uses a different pricing model
Because every company and solution is different, their pricing modules can vary a lot.
There are some SaaS-based tools like ScraperAPI and ScrapeIN that use a credit system – where each plan comes with a set number of API credits. You consume credits to perform requests and use certain functionalities. Other companies like Bright Data charge based on the amount of data you need to scrape, measured in GB. So, as you can see, it varies significantly.
Without understanding your needs and how these different models work, it can get very hard to choose the right tool.
Here are six things to consider when comparing web scraping solutions.
6 Factors to Consider When Comparing Web Scraping Tools’ Pricing
Instead of trying to list unique factors you could possibly need for your next project (which would be impossible for us to do), we’re going to take a look at the six most important and common aspects of a web scraping tool, and explain how they work and influence pricing.
We’ll add specific examples from some of the most popular web scraping tools. By the end of this article, you will have all the information necessary to choose the right solution for your project.
Level of Abstraction: Off-the-Shelf vs. Code-Based
Web scraping tools can be categorized into a spectrum of abstraction.
One end of the spectrum is done-for-you tools, which automate the process completely, while on the other end are tools that provide you with some important help, like proxies, rotation, and CAPTCHA handling, but the rest is up to you.
Off-the-Shelf Tools
As you can imagine, off-the-shelf solutions tend to be more expensive and less customizable, as these tools are highly automated and try to make it as easy as possible to scrape data programmatically without needing input from the user. A good example of this tool is Octoparse, a point-and-click interface to allow users to build scrapers. You can expect to pay $249/month for 250 tasks on the professional plan – which they define as a crawler working on a site without any URL limit.
This means you should be able to scrape 250 websites a month, but that’s in theory. In most cases, you’ll want to scrape several times a month or even in real-time, so in theory, you should be able to scrape 250 websites as many times as you want for the same amount, right?
Well, that’s why it’s so important to understand how scraping works.
Octoparse uses a concept called “workflow,” which is the automation of a task. Every workflow is considered a task, so every time the workflow runs, you’re using one task against your limit.
If you need to run your workflow 10 times a month per website, you can scrape 25 websites a month with the same plan.
When checking a ready-to-use tool, take a look at:
- Dollar to data ratio
- The type of websites that can be scraped
- How they define their limits (based on GBs, tasks, etc.)
- Do they have the extra functionalities you need?
- Can you ask for custom scrapers if needed?
- Can you access the data outside of the tool
Note: For tools like Octoparse, you’ll need to use their API to pull your data out of their systems or use some of their export functionalities.
Web Scraping APIs
In the middle of the spectrum, you can find a wide variety of web scraping APIs, which manage a lot of complexities for you (like IP rotation and geo targeting, etc.), but you’ll have to write your own scripts. These tools usually use a credit system of some kind and, of course, are usually more affordable than off-the-shelf solutions.
For example, ScraperAPI offers a complete web scraping solution with just an API call. By adding a simple line of code to your script, you can automate functionalities like IP rotation, geo targeting, and CAPTCHA handling. ScraperAPI’s business plan comes with 3 million API credits – from which 1 successful request equals 1 API credit used – for $299/month.
To put it into perspective, let’s break it down into the number of pages and websites you’d be able to scrape per month with this plan:
- Considering individual pages, you’ll able to scrape 3M pages per month.
- If every website has 1000 URLs, you would be able to scrape 3k websites per month.
- If you want to monitor 1000 URL websites once a week, you’d be able to monitor 750 websites per month
- But if you need daily monitoring on these websites, with 3M ScraperAPI credits, you’d be able to monitor 100 websites per month
It’s important to note that depending on your needs, the API can use more or fewer credits. For example, when using the Amazon scrapes feature, every successful request will cost 5 API credits instead of 1, so with 3M API credits, you’ll be able to scrape up to 600k Amazon product pages.
When comparing web scraping APIs and proxy managers, take a closer look at:
- How many credits does every feature cost?
- Do they charge for unsuccessful requests?
- Do they offer the functionalities you need?
- Do they handle CAPTCHAs?
- What’s their success rate and proxies’ uptime?
A clear advantage of these web scraping API tools is that they handle a lot of common scraping complexities, and give you full control over your scrapers’ behavior. However, your team needs to have enough technical knowledge to build their own scraping scripts.
Proxy Providers, CAPTCHA handlers, etc.
The extreme end of the spectrum is proxy providers, CAPTCHA handling services, and other service providers that only offer a solution for one specific challenge. To use these solutions, you’ll need a more experienced team of developers capable of creating and maintaining the infrastructure to connect these services and use them in their scripts.
For example, you’ll need to create and maintain the systems to:
- Pick the right proxies for the right sites
- Rotate your proxies after requests
- Avoid CAPTCHAs and honeypot traps
- Choose the right headers for every site
- Handling dynamic content
Developers have full control over every aspect of the project and can build highly customized solutions for businesses and applications, but there’s also a lot of complexities that need to be handled.
Oxylab is a great example of this type of tool.
As a proxy provider, they offer a well-optimized, maintained, and scalable proxy pool you can use on your project. Depending on the type of project, you can choose a pay-as-you-go approach and pay $15/GB of data scraped, or subscribe to a monthly subscription to reduce the price to (for example) $10/GB with a $600/month commitment.
There are fewer things to consider in these types of solutions, but you should always ensure they provide positive results 99% of the time.
Geo targeting
With geo targeting, you can change the location from where your requests are sent, allowing you to access accurate geo-specific and/or geo-locked information from anywhere in the world. eCommerce and search engine scrapers are clear use cases for this functionality, as the results displayed usually depend on where the user is located.
If you are working on a project that requires you to collect and/or compare data from different regions, you’ll want to pay closer attention to this. Here’s a table of three solutions that offer this functionality:
ScrapeIN | ScrapingBee | ScraperAPI | Oxylabs’ Web Scraper API | |
Geo targeting | 20 API Credits | Available with Premium Proxies | Free in all Plans | Free in all Plans |
By looking at their pricing tables, the four tools above look as if they offer the same geo targeting advantage, but once you dig deeper you can find out some more context:
- ScraperIN charges 20 credits when using geo targeting. So if you subscribe to the 3M API credits plan ($199/month) and then use geo targeting, you would reduce the total number of successful requests to 150k.
- In ScrapingBee’s case, something similar occurs. Premium proxies cost 10 credits, so activating this functionality alone would reduce their 2.5M credits* ($249/month) to 250k successful requests.
- ScraperAPI does not charge any additional credits for geo targeting, so you can get the full 3M ($299/month) successful requests with geo targeting.
- Oxylabs also offers a web scraping API with geo targeting included in every plan. However, their business plan only provides 399k ($399/month) successful requests.
When checking the availability of a feature, take a look at the documentation to understand better how each provider handles it.
Proxy Types and Management
Proxies are a big part of a web scraper’s success, but they are not all made equal. You’ll want to have high-quality, well-maintained, and optimized proxies to rely on, so this is an aspect to pay attention to.
These are some of the types of proxies you will find:
- Data center proxies – These proxies are not associated with an internet service provider (ISP) and are instead hosted on a data center or cloud hosting service.
- ISP proxies – These proxies are bought or leased from an internet service provider, and they are not associated with an end user or device. Still, because they are associated with an ISP, there’s a lower risk of bans and blocks.
- Residential proxies – These would be considered premium proxies as they are proxies provided by an ISP to a homeowner, so they are great for emulating users programmatically.
- Mobile proxies – Like residential proxies, these are real IP addresses associated with a mobile device, which makes them great for emulating users’ behavior and accessing data as if you were a mobile user.
Most proxy providers will give you access to a mix of these proxies based on your needs. For example, Bright Data and Oxylabs give you the choice to buy a monthly plan for any of these types of IPs, where data center proxies tend to be the cheapest and residential and mobile proxies the most expensive – but you’ll have to commit to use one type of proxies or buy a different limit of each based on your needs.
On the other end, off-the-shelf solutions like Octoparse won’t give you control over the proxies you use for the workflows, as they will try different combinations to gather the data you request.
Web scraping APIs like ScraperAPI and ScrapingBee use parameters to define when to use premium proxies (residential and mobile) and give you complete control over the proxies while working on the project. This flexibility is possible because of a credit system. Both mentioned solutions charge 10 API credits for premium proxies.
Note: It’s important to mention that ScraperAPI uses machine learning and years of statistical analysis to handle all complexities automatically. Although there are some very specific circumstances where you might benefit from having more control, 99% of the time, there’s no need for additional input.
Another factor to consider is proxy management. There are many reasons you might not want to self-managed your proxies – to name a few:
- It’s a resource-heavy process in terms of time and money
- You’ll have to rotate IP addresses from several pools
- You must create systems to handle CAPTCHAs
- You’ll have to manually set retries
Most proxy providers have their own version of a proxy manager while scraping APIs are technically proxy managers themselves.
When choosing a proxy management system, you want to pass as much of the hard work to the provider as possible without losing complete control over what’s happening behind the scene – nor getting overcharged for the same functionalities others provide.
For example, Bright Data’s web unlocker will cost you $1000/month (year plan) for 476,190 successful requests. But here’s where additional research is important. Their pricing seems clear enough, but in their documentation, they state:
“Although you have not been charged for the failed request, BrightData charges for additional headers or browser automation Bandwidth which was used. In order to have a stable and transparent price for the tool, you can contact your account manager to change the price from BW to CPM.” In the same plan, the CPM cost is $2.10/1000 successful requests – $2.40/CPM with the monthly plan.
At the same price range ($999), ScraperAPI and ScrapingBee offer the same functionality but a total of 14M and 12.5M API credits, respectively. Which, without any extra features enabled, would constitute more than 10M additional successful requests.
Even if we vector Geo targeting and premium proxies in the mix:
ScrapeIN | ScrapingBee | ScraperAPI | |
Geo targeting | 20 API Credits | Available with Premium Proxies | Free in all Plans |
Premium | 10 API Credits | 10 API Credits | 10 API Credits |
Cost | $599/month | $999/month | $999/month |
API Credits | 15M | 12.5M | 14M |
Successful Requests | 500k | 1.25M | 1.4M |
Note: It’s worth mentioning that all providers offer technical support, but Bright Data offers a dedicated account manager at every plan level. ScraperAPI also offers dedicated support but only for enterprise clients.
Specialization vs. Multipurpose
You must consider whether you need a general-purpose tool to help you scrape a wide range of websites or a specialized tool to help you scrape specific sites – usually tough sites like Amazon and Google.
Of course, some tools offer both things and get work very well, but you have to know the nature of the pages you want to scrape to make an informed decision. For example, if you want to build some kind of SEO app that requires you to monitor search results, you’ll want a tool that can make this process faster – especially if you require real-time data.
From the tools we’ve already mentioned, ScraperAPI, ScrapingBee, Bright Data, and Oxylabs offer a SERP API that can be used to retrieve data from Google SERPs in JSON format. Here’s a quick overview of their plans:
ScrapingBee’s Google Search API (Enterprise) offers:
- 500k searches
- 12.5M API credits
- Each successful request costs 25 API credits
- 500k total successful requests
- Cost: $999
- Only Google search
- Returns JSON data
Oxylabs’ SERP scraper API (Corporate) offers:
- 526k Pages – equivalent to successful requests
- Cost $999 or $1.99/1000 successful requests
- Works with Google, Baidu, Bing, and Yandex
- Returns JSON data
Bright Data’s SERP API (Advance) offers:
- 476,190 successful requests
- Cost $1000/month or $2.40/CPM
- 1 CPM is equal to 1000 successful requests
- Works with Google, Bing, DuckDuckGo, Yandex and Baidu
- Returns JSON and HTML Data
ScraperAPI doesn’t offer a distinct plan for scraping Google. Instead, it uses an auto parser to return Google search and Google shopping data in JSON format that can be used with its regular plans. It would look something like this:
ScraperAPI’s Google search auto parse (Professional) offers:
- No search limits
- 14M API credits
- Each successful request costs 25 API credits
- 560k total successful requests
- Cost: $999
- Works with Google search and Google shopping
- Returns JSON data
So, if you only need to scrape Google SERPs, you better off with ScraperAPI or ScrapingBee. To scrape other search engines, Bright Data and Oxylabs are better options, with Oxylabs providing a more affordable plan. However, if you want to scrape DuckDuckGo from a parser-like service (returning JSON data), the best option is to go with Bright Data – as long as it makes sense in terms of budget, as you’ll get more data from scraping APIs, or if you want a plug ‘n play tool.
JavaScript Rendering
More and more data-heavy websites are being built with JavaScript frameworks like React, Angular, and Vue, which allow dynamic content to be injected into the page and improve user experience.
That being said, regular scripts can’t access this content because it requires the browser to render the page and execute the JavaScript code to make it work. Now, traditionally, you can use a headless browser with Puppeteer (Node.js), for example. But this will slow down your data collection and make it harder to scale. Not to mention the risk involved in performing this rendering “at home.”
Let’s use ScraperAPI as an example. When you use Puppeteer to control a headless browser, you’re basically opening a browser instance locally and invoking your API URLs programmatically – e.g., https://api.scraperapi.com/?api_key=YOUR_KEY&url=https://example.com
–, fetching the content through ScraperAPI, but using the browser to render the page.
That’s where problems begin.
To render the page, your browser will need to download all embedded resources (JS files, CSS files, etc.), and because your local browser is the one sending the request, it’ll use your real IP, exposing you to your target site. (In theory, you could write an interception code to hijack the requests and fetch the resources through ScraperAPI, but it will add more complexities and doesn’t take care of the entire problem.)
If you take a look at the link above, you’ll notice that it has your API key in the URL, which means that any resources that are downloaded through this method will see this URL as the referer (or the origin when it comes to CORS), including your API key. When choosing a web scraping tool, this is a feature that can’t be missing, or you’ll be limiting yourself severely.
Most off-the-shelf tools (like Octoparse) and scraping tools in a per-page model (like Bright Data and Oxylabs) should use headless browsers on their end to handle JavaScript content, but we couldn’t find any specification in their documentation, so you would have to contact them to learn more. However, APIs (like ScraperAPI, ScrapeIN, and ScrapingBee) allow you to enable JS rendering and charge extra API credits for each successful request, taking the rendering out of your machine so you can solely focus on data.
Final Thoughts
Once you understand how the different web scraping tools work, it’s easier to evaluate their pricing and find the little details to help you plan your project’s budget. It’s crucial to read every tool documentation and learn their particular language to avoid surprises in your billing.
Also, consider what the requirements are for your project and list them in a checklist. Without a clear scope, you can make a decision solely based on money and end up making the wrong choice.
If you’re still in doubt, send us your pricing questions, and we’ll be glad to help you. Until next time, happy scraping!