Scrape Google Search Results Consistently – Even with JavaScript

How to Bypass and Scrape Fastly Bot-Protected Sites with Python

Tutorial on how to bypass and scrape fastly bot protected sites with Python

If you’ve tried scraping a website only to be blocked repeatedly, chances are you’ve encountered a bot protection system like Fastly. Fastly is a content delivery network (CDN) that many websites use not just for speed but also to keep out unwanted bots. This can be a major obstacle for web scrapers like you—especially when traditional scraping methods don’t work.

In this guide, I’ll show you:

  • How Fastly’s bot manager works
  • The major techniques to bypass Fastly’s challenges
  • How to bypass Fastly’s bot protection using Python and ScraperAPI

Whether you’re scraping articles, product listings, or any other data, this step-by-step tutorial will help you access Fastly-protected content without hitting roadblocks. 

Sound good? Let’s get started!

How Fastly Blocks Web Scrapers

Here’s how Fastly’s defense mechanisms work to block scrapers like yours:

1. Advanced Bot Detection

Fastly uses sophisticated bot classification to identify scrapers. It goes beyond basic checks like user agents, looking at:

  • Traffic Patterns: Fastly detects unusual traffic behaviors, such as making requests too quickly or repeatedly accessing certain endpoints. These patterns are typical of bots and stand out in Fastly’s system.
  • Device Fingerprints: Fastly collects detailed information about your device and browser, including plugins, screen resolution, and language settings. If your scraper fails to mimic a real user’s fingerprint or displays inconsistent data, it’s likely to get flagged.
  • IP Reputation: Fastly evaluates the reputation of the IP addresses your scraper uses. If your IP is known for bot activity or if it’s part of a proxy network, Fastly may block it right away. This makes rotating proxies essential when bypassing Fastly.

2. Multi-Layered Scraper Blocking

To prevent scrapers from accessing protected content, Fastly employs a combination of active and passive defenses:

  • Active Challenges (JavaScript and CAPTCHA): Fastly can force your scraper to solve JavaScript challenges or CAPTCHAs to prove it’s human. Scrapers that can’t execute JavaScript or handle CAPTCHAs get blocked.
  • Passive Behavior Analysis: Even without challenges, Fastly silently monitors visitor behavior, such as mouse movements and scrolling patterns. Scrapers tend to interact with websites in predictable or mechanical ways, making them easy to spot. If your bot’s behavior doesn’t match what’s expected from a real user, it’s flagged as suspicious.
  • Rate Limiting and IP Blocking: Fastly applies rate limits to prevent excessive requests from a single source. If your scraper exceeds these limits, it will be blocked. Fastly also maintains a list of known malicious IPs, and if your scraper’s IP is associated with suspicious behavior, it could be instantly blocked.

3. Real-Time Scraper Detection and Blocking

Fastly’s bot management system provides real-time insights into scraping activity. Site owners can monitor and analyze traffic through an intuitive dashboard that shows trends in scraper behavior. This allows them to adjust security settings on the fly to block scrapers more effectively by:

  • Creating Custom Rules: Site owners can create specific rules to block certain patterns of behavior, IPs, or even geographical regions often associated with bots. These customizable settings give Fastly users full control over how to block scrapers.
  • Blacklists and Whitelists: Fastly allows for the creation of detailed blacklists and whitelists. Site owners can decide exactly which traffic should be blocked or allowed, fine-tuning their defenses to keep scrapers out while letting legitimate users through.

With these advanced detection methods and blocking strategies, Fastly is designed to stop scrapers in their tracks. But while Fastly’s protections are powerful, they’re not unbeatable.

In the next section, I’ll show you how to use Python and ScraperAPI to bypass these defenses and access protected content.

Bypassing Fastly with ScraperAPI

ScraperAPI simplifies bypassing Fastly by managing the toughest aspects of web scraping for you, such as rotating proxies, handling headers and cookies, and rendering JavaScript. This lets you focus on scraping the content you need without worrying about getting blocked.

Now, let’s dive into how ScraperAPI works and walk through a Python script that scrapes top headlines from Le Monde.

Here’s the code you’ll use:

import requests
from bs4 import BeautifulSoup

API_KEY = "YOUR_SCRAPER_API_KEY"
URL = "https://www.lemonde.fr/"

params = {
    'api_key': API_KEY,
    'url': URL,
    'render': 'true'  # Enable JavaScript rendering to bypass Fastly's challenges
}

response = requests.get("http://api.scraperapi.com", params=params)

if response.status_code == 200:
    print("Successfully bypassed Fastly!")
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Extract top articles from the page
    top_articles = soup.find_all("li", class_="top-article")  # Extract top articles
    
    for article in top_articles:
        print({"headline": article.find("p").text, "link": article.find("a").get("href")})  
else:
    print(f"Failed to bypass Fastly. Status code: {response.status_code}")

Breaking Down the Code:

  1. Set Up ScraperAPI:
    • Replace "YOUR_SCRAPER_API_KEY" with your actual ScraperAPI key – you’ll need to create a free ScraperAPI account to test the snippet. This key gives you access to ScraperAPI’s features, such as rotating proxies and JavaScript rendering.
    • The target URL is Le Monde (https://www.lemonde.fr/), which is protected by Fastly’s bot management.
  2. Enable JavaScript Rendering:
    • Fastly uses JavaScript challenges to identify bots, so enabling JavaScript rendering is crucial. The render='true' parameter tells ScraperAPI to handle the JavaScript, which makes your requests look more like genuine human traffic.
  3. Send the Request:
    • The script uses requests.get() to send a request to ScraperAPI. ScraperAPI handles all the complex behind-the-scenes tasks like proxy rotation and JavaScript execution, allowing you to bypass Fastly’s defenses.
  4. Check the Response:
    • If the status code is 200 OK, it means the request was successful, and you’ve bypassed Fastly. If not, you might encounter a 403 Forbidden or 503 Service Unavailable, indicating the request was blocked. In such cases, try adjusting your request strategy (e.g., disabling JS rendering or slowing down the request rate) – if the issue continues, contact ScraperAPI’s support team.
  5. Extract and Parse the Content:
    • Once you’ve successfully bypassed Fastly, use BeautifulSoup to parse the HTML content. In this example, we’re extracting the top articles from the page. The script looks for <li> tags with the class "top-article" and extracts the headlines (<p>) and links (<a href>).
    • The final step prints out the headlines and their corresponding links.

Why Use ScraperAPI for Fastly?

ScraperAPI simplifies the process of bypassing Fastly by automating several critical tasks that would otherwise require complex setups. Here’s why ScraperAPI is a powerful solution for scraping Fastly-protected websites:

Smart IP rotation

Fastly frequently blocks scrapers based on IP address reputation or due to rate-limiting. ScraperAPI solves this by automatically rotating IPs, providing fresh, high-quality proxies for each request. This helps your scraper mimic organic traffic, significantly reducing the risk of getting blocked by Fastly’s IP-based defenses.

JavaScript rendering with the new render instruction set

One of the biggest challenges with Fastly is its use of JavaScript-based bot detection. Many scrapers fail at this stage because they can’t execute JavaScript.

ScraperAPI now includes a new render instruction set that automates browser-like behavior on its servers, allowing your scraper to pass Fastly’s JavaScript checks effortlessly. This means you don’t need to manually run headless browsers like Puppeteer—ScraperAPI handles the rendering for you as if your requests were coming from a real user’s browser.

Headers and cookie management

Fastly closely monitors HTTP headers and cookies to detect bots. If your scraper doesn’t manage these correctly, it will likely be flagged.

ScraperAPI automatically sets and manages headers and cookies for each request, ensuring that your requests look like they are coming from a real browser session. This reduces the chance of Fastly detecting your scraper based on inconsistent headers or missing session cookies.

Ease of use and focus on data

By automating proxy rotation, JavaScript rendering, and session management, ScraperAPI allows you to focus on what matters most—scraping the data you need. You no longer need to worry about the complexities of bot detection, IP blocking, or JavaScript execution. ScraperAPI takes care of these challenges, letting you extract content more efficiently.

4 Techniques to Bypass Fastly Bot Protection

Fastly’s bot protection is designed to keep automated traffic out, but with the right techniques, you can still bypass its defenses. Here are the key strategies you’ll need:

1. Rotate Proxies to Avoid IP Blocking

Fastly monitors and blocks IP addresses that make too many requests or behave suspiciously. To get around this, rotating proxies is essential. By switching between different IP addresses, you make it harder for Fastly to detect your scraper.

Residential proxies are particularly effective, as they mimic real user traffic. Services like ScraperAPI provide rotating proxies, which automate this process and help distribute requests across different IPs.

Related:How to use and rotate proxies with Python.

2. Render JavaScript to Pass Challenges

Fastly frequently uses JavaScript challenges to verify that visitors are human. Traditional scrapers struggle with these challenges, but by using a headless browser like Puppeteer or Playwright, you can simulate real browser behavior and render JavaScript.

Alternatively, services like ScraperAPI have built-in JavaScript rendering, allowing you to bypass these challenges automatically without needing to run a full browser environment.

Related: How to scrape large dynamic websites using JS rendering.

3. Simulate Human Behavior

Fastly tracks user behavior on the site, looking for actions that indicate whether a visitor is a bot or a real user. Bots often exhibit repetitive patterns—sending requests too quickly or visiting pages in an unnatural sequence.

To avoid detection, simulate human behavior by introducing random delays between requests and varying your browsing patterns. Tools like Puppeteer and Selenium can help simulate realistic actions like scrolling, clicking, and mouse movements, making your scraper less predictable.

Related: Selenium web scraping 101.

Alternatively, you can use ScraperAPI’s render instruction sets to outsource these resource-intensive tasks to ScraperAPI’s servers instead of running them locally.

4. Handle Cookies and Headers

Fastly uses cookies to track sessions and closely monitors HTTP headers to identify bots. To make your requests appear more legitimate, you need to properly handle cookies and manage your headers.

By storing cookies across sessions and setting headers like User-Agent and Referer to match real browser traffic, your scraper will blend in with regular user activity, reducing the likelihood of getting blocked.

Related: How to use custom headers and cookies for web scraping.

To effectively bypass Fastly, you’ll need to rotate proxies to avoid IP bans, render JavaScript to pass security challenges, simulate human behavior to avoid being flagged, and manage cookies and headers to maintain session continuity. By combining these techniques, you can significantly increase your chances of successfully scraping Fastly-protected websites without getting blocked.

Bypass Fastly Bot Manager

ScraperAPI lets you scrape Fastly-protected sites with a simple API call. Automate proxy rotation, headers and cookie management, CAPTCHA handling, and more.

Troubleshooting and Tips When Scraping FastFly

Even with the right strategies in place, scraping Fastly-protected sites can sometimes run into issues. Here are some common problems you might face and tips on how to resolve them:

1. Blocked Requests

If your requests are still getting blocked, even with proxies and JavaScript rendering, there are a few adjustments you can make:

  • Slow Down the Request Rate: Fastly blocks scrapers that send requests too quickly. Add random delays between requests to mimic human behavior.
  • Change User Agents: Although ScraperAPI handles headers for you, you can use custom headers to mimic different browsers or devices – this should only be used as a last resource.
  • Use Residential Proxies: These proxies mimic real users and are less likely to get flagged than datacenter proxies. You can force ScraperAPI only to use residential proxies with the premium=true parameter or mobile proxies with ultra_premium=true.

If you’re still experiencing issues to bypass Fastly, contact ScraperAPI’s support team.

2. Incomplete Content

Sometimes, the page content you’re trying to scrape may not load fully. This could be due to JavaScript that hasn’t finished rendering. Here’s what you can try:

  • Increase Wait Times: If you’re using ScraperAPI, try increasing the wait time in the API parameters. This will allow more time for JavaScript to render before the response is returned.
  • Use a Headless Browser: If the content still isn’t loading, consider combining ScraperAPI with headless browsers like Puppeteer or Selenium. This can help simulate more complex user interactions and ensure that all the JavaScript executes properly.

Related: Integrating ScraperAPI with Selenium | Integrating ScraperAPI with Puppeteer.

By making these adjustments, you can troubleshoot common scraping issues and refine your approach to avoid detection while scraping Fastly-protected websites.

Ready to get started? Try ScraperAPI for free with 5,000 API credits and access to all premium tools.

Working on a large project? Contact our sales team to get a custom plan, including dedicated Slack support channels, a dedicated account manager, and 100+ concurrent threads.

About the author

Picture of Ize Majebi

Ize Majebi

Ize Majebi is a Python developer and data enthusiast who delights in unraveling code intricacies and exploring the depths of the data world. She transforms technical challenges into creative solutions, possessing a passion for problem-solving and a talent for making the complex feel like a friendly chat. Her ability brings a touch of simplicity to the realms of Python and data.

Related Articles

Talk to an expert and learn how to build a scalable scraping solution.