How to Scrape G2 Reviews Using Python (5 Simple Steps)

John Fawole
January 24, 2025

Do you want to scrape G2 reviews without getting blocked? Then you’re in the right place!

As a trusted source of information, G2 is a go-to resource for businesses to research and evaluate genuine user reviews and ratings for software products and services.

Unfortunately, scraping G2 reviews and user feedback can be tricky. G2 implemented robust security measures, including heavy use of CAPTCHA blocks that make it highly challenging to bypass its anti-scraping bot system.

Collect Customer Feedback from All Major Review Sites

Review sites use heavy anti-scraping mechanisms. Bypass them with a simple API call.

In this web scraping tutorial, we’ll show you how to collect product reviews from G2 without getting detected or blacklisted. The extracted G2 review data includes the review’s dates, URLs, ratings, and user information (username).

TL;DR of Full Python Code on G2 Software Product Review Scraper

For those in a hurry, here’s the full Python script for the web scraper we’re making in this tutorial:

	import requests
	from bs4 import BeautifulSoup
	import json
	
	API_KEY = "Your_API_Key"
	url = "https://www.g2.com/products/scraper-api/reviews"
	
	payload = {"api_key": API_KEY, "url": url}
	
	html = requests.get("https://api.scraperapi.com", params=payload)
	
	soup = BeautifulSoup(html.text, "lxml")
	
	# Initialize the results dictionary
	results = {"product_name": "", "number_of_reviews": "", "reviews": []}
	
	# Extracting the product name
	product_name_element = soup.select_one(".product-head__title a")
	results["product_name"] = (
		product_name_element.get_text(strip=True)
		if product_name_element
		else "Product name not found"
	)
	
	# Extracting the number of reviews
	reviews_element = soup.find("li", {"class": "list--piped__li"})
	results["number_of_reviews"] = (
		reviews_element.get_text(strip=True)
		if reviews_element
		else "Number of reviews not found"
	)
	
	# Loop through each review element
	for review in soup.select(".paper.paper--white.paper--box"):
		review_data = {}
	
		# Extracting the username
		username_element = review.find("a", {"class": "link--header-color"})
		review_data["username"] = (
			username_element.get_text(strip=True)
			if username_element
			else "No username found"
		)
	
		# Extracting the review summary
		summary_element = review.find("h3", {"class": "m-0 l2"})
		summary_text = (
			summary_element.get_text(strip=True) if summary_element else "No summary found"
		)
		review_data["summary"] = summary_text.replace('"', "")
	
		# Extracting the review
		user_review_element = review.find("div", itemprop="reviewBody")
		review_data["review"] = (
			user_review_element.get_text(strip=True)
			if user_review_element
			else "No review found"
		)
	
		# Extracting the review date
		review_date_element = review.find("time")
		review_data["date"] = (
			review_date_element.get_text(strip=True)
			if review_date_element
			else "No date found"
		)
	
		# Extracting the review URL
		review_url_element = review.find("a", {"class": "pjax"})
		review_data["url"] = (
			review_url_element["href"] if review_url_element else "No URL found"
		)
	
		# Extracting the rating
		rating_element = review.find("meta", itemprop="ratingValue")
		review_data["rating"] = (
			rating_element["content"] if rating_element else "No rating found"
		)
	
		# Add the review data to the reviews list
		results["reviews"].append(review_data)
	
	# Writing results to a JSON file
	with open("G2_reviews.json", "w", encoding="utf8") as f:
		json.dump(results, f, indent=2)
	
	print("Data written to G2_reviews.json")

Note: Don’t have an API key? Create a free ScraperAPI account and receive 5,000 API credits to test all our tools.

If you want to follow along, open your preferred editor, and let’s get started!

Scraping G2 Product Reviews with Python

For this tutorial, we’ll focus on scraping ScraperAPI’s G2 reviews, collecting details like:

Reviewer usernames
Review summaries (titles)
Reviews
Dates
Review’s URLs
G2 product/service ratings

As the final step, we’ll export all of the scraped G2 product review data into a JSON file (or a CSV file if you prefer) to make it easier to navigate and repurpose the way you want.

Requirements of G2 Data Extractor

The packages needed as prerequisites for this project are beautifulsoup4, requests, and lxml. You can install these using the following command:

	pip install beautifulsoup4 requests lxml

You’ll also need to have Python installed, preferably version 3.8 or beyond, to prevent issues.

Understanding G2 Review Site Structure

Understanding the HTML structure of the G2 website before starting scraping for reviews is critical, as it will allow you to extract data more efficiently.

Use the search bar to search for ScraperAPI on the G2 website. You will get a page similar to this:

Each review on the G2 product reviews page is typically contained within distinct HTML elements with unique class names or attributes. These elements contain the data we will scrape.

Showing the name of the HTML atribute to scrape on G2 webpage

We’ll use BeautifulSoup to target specific data points by identifying unique selectors such as class names, IDs, or attributes.

Start by opening the G2 review page and inspecting the HTML with your browser’s developer tools. This is usually accessed by right-clicking the page and selecting “Inspect“.

The HTML structure can be seen in the developer tools window:

Showing how to collect each element and its particular CSS selector

Now that we know this, it’s a matter of taking notes of each element and its particular CSS selector.

Step 1: Setting Up Your G2 Web Scraping Project

First, we need to import the necessary libraries for this project into our Python file:

	import requests
	from bs4 import BeautifulSoup
	import json

Next, we define the API key and the URL of the product page we want to scrape as variables.

Note: You can find your API key from your ScraperAPI dashboard.

	API_KEY = "your_api_key"
	url = "https://www.g2.com/products/product_id/reviews"

It’s important to note that for scraping G2 effectively, a premium subscription to ScraperAPI is recommended as it provides additional capabilities that are particularly suited for dealing with G2’s level of protection.

Collecting Data from Hard-to-Scrape Sites?

Popular sites tend to use more advanced anti-scraping mechanisms. Talk to our experts to find the perfect plan for your use case.

Step 2: Sending a Request and Parsing the Response

We create a payload object that includes our API key and the G2 URL we wish to scrape. This payload is then used to send a get() request to our Scraping API, which will handle the complexities of web scraping such as smart IP and header rotation, CAPTCHA handling, and more, using machine learning and statistical analysis.

	payload = {"api_key": API_KEY, "url": url}
	html = requests.get("https://api.scraperapi.com", params=payload)

Then, we can use BeautifulSoup to parse the HTML response and store it as a soup object – allowing us to then navigate the parsed tree using CSS selectors.

	soup = BeautifulSoup(html.text, "lxml")

Step 3: Extracting G2 Software Reviews and Their Details

This step involves initializing a dictionary to store the results and extracting the product name, number of reviews, and user review data from the parsed HTML response.

We start by creating a dictionary called results to store the data we’ll extract.

	results = {"product_name": "", "number_of_reviews": "", "reviews": []}

Here, product_name and number_of_reviews are strings that will hold the name of the product and the total number of reviews, respectively, while reviews is a list containing dictionaries, each representing an individual review.

Once we have our dictionary ready, we extract the product name from the parsed HTML.

We use BeautifulSoup’s select_one() method to select the first element that matches the CSS selector .product-head__title a. We then get the text of this element and assign it to results['product_name'].

Showing the product name element on CSS selector

	product_name_element = soup.select_one(".product-head__title a")
	results["product_name"] = product_name_element.get_text(strip=True) if product_name_element else "Product name not found"

Similarly, we extract the number of reviews by selecting the element with the CSS selector li.list--piped__li and getting its text.

Extracting the number of reviews on the CSS selector

	reviews_element = soup.find("li", {"class": "list--piped__li"})
	results["number_of_reviews"] = reviews_element.get_text(strip=True) if reviews_element else "Number of reviews not found"

After that, we loop through all the reviews on the page. Each review is selected using the selector .paper.paper--white.paper--box. We extract the username, summary, full review, date, URL, and rating for each review, then proceed to store this information in a new dictionary called review_data.

Scraping the Username of G2 Reviews

The username is typically found within an anchor (a) tag. To extract it, we look for the anchor tag with a class that indicates it contains the username.

For example, if the class name is link--header-color, we then use BeautifulSoup to find this a tag and then get its text content.

Extracting the username with CSS selector

	username_element = review.find("a", {"class": "link--header-color"})
	review_data["username"] = username_element.get_text(strip=True) if username_element else "No username found"

Scraping the G2 Review Summary

The review summary is found within an h3 heading with a class set to m-0 l2.

Extracting the product review summary with CSS selector

	summary_element = review.find("h3", {"class": "m-0 l2"})
	review_data["summary"] = summary_element.get_text(strip=True).replace('"', "") if summary_element else "No summary found"

Scraping the G2 Review Body

The full text of the review is usually contained within a div with an itemprop attribute set to reviewBody. To extract the review text, we find this div and retrieve its text content.

Extracting the product review body with CSS selector

	user_review_element = review.find("div", itemprop="reviewBody")
	review_data["review"] = user_review_element.get_text(strip=True) if user_review_element else "No review found"

Scraping the G2 Review Date

The date of the review is often located within a time element.

Extracting the review date with CSS selector

	review_date_element = review.find("time")
	review_data["date"] = review_date_element.get_text(strip=True) if review_date_element else "No date found"

Scraping the G2 Review URL

To extract the URL of each review, we look for an anchor tag that contains the href attribute. Specifically, we are interested in the a tags with the class pjax, as this is the class used by G2 for the links to the individual reviews.

Here’s how we can include this in our extraction process:

	review_url_element = review.find("a", {"class": "pjax"})
	review_data["url"] = review_url_element["href"] if review_url_element else "No URL found"

Scraping the G2 Rating

Ratings are found within a meta tag with an attribute itemprop set to ratingValue. We search for this tag and extract its content attribute to get the rating value.

	rating_element = review.find("meta", itemprop="ratingValue")
	review_data["rating"] = rating_element["content"] if rating_element else "No rating found"

Each piece of data is then stored in a dictionary called review_data. After extracting the data for each review, we append review_data to results['reviews'].

	results["reviews"].append(review_data)

By the end of this G2 review extraction step, the results will be a dictionary containing the product name, number of reviews, and a list of dictionaries, each representing an individual review.

Need More than 3M API Credits?

Talk to our team of experts to build the perfect plan for your use case, including premium support and a dedicated account manager.

Step 4: Write the Results to a JSON File

We serialize the scraped data into a JSON format and write it to a file named G2_reviews.json. The json.dump() function is used for this purpose, which takes the data and a file object as arguments, and writes the JSON data to the file.

	with open("G2_reviews.json", "w", encoding="utf8") as f:
    json.dump(results, f, indent=2)

print("Data written to G2_reviews.json")

The encoding=utf8 parameter ensures that our data is encoded in UTF-8, which is the most common encoding used for web content.

UTF-8 encoding supports all Unicode characters, which means it can handle non-ASCII characters without any issues. This is particularly important when dealing with special characters or emojis that may appear in a user’s review.

Step 5: Testing the G2 Web Scraper

After running your script, the results are stored in a JSON file named G2_reviews.json, which should look like this:

	{
		"product_name": "Scraper API",
		"number_of_reviews": "11 reviews",
		"reviews": [
		  {
			"username": "Deni H.",
			"summary": "A Must-Have Tool for Automating Web Scraping",
			"review": "What do you like best about Scraper API?Experience for the Scraper API is likely to be an experienced web scraper developer with knowledge of various scraping technologies, including Python (BeautifulSoup and Selenium), JavaScript (Puppeteer or Cheerio), and Node.js (Demand and Cheerio). Additionally, the ideal developer will have experience with cloud infrastructure services such as AWS and Azure, and be familiar with the Scraper API platform and its features.Review collected by and hosted on G2.com.What do you dislike about Scraper API?I don't like the Scraper API because it's expensive and unreliable. Also, APIs can be difficult to use and often require a lot of coding to get the desired results. In addition, the support provided by services is often lacking and can be slow or unhelpful. The fact that it's a black-box solution and provides little visibility into what's going on behind the scenes could be another drawback. Finally, the Scraper API also often requires its authentication system, which can be a pain for me.Review collected by and hosted on G2.com.What problems is Scraper API solving and how is that benefiting you?Despite the drawbacks, there are a few scenarios where using the Scraper API could be useful to me. For example, if I need to quickly and easily access large amounts of data from multiple sources without having to write complex code, the Scraper API might be one solution. Additionally, the scalability of the service makes it ideal for high-volume scraping tasks that are difficult to achieve with manual coding. Finally, the fact that it's managed by a third party means I don't have to worry about updating my code or making sure it works with new versions of websites.Review collected by and hosted on G2.com.",
			"date": "Dec 03, 2022",
			"url": "https://www.g2.com/survey_responses/scraper-api-review-7456830",
			"rating": "4.0"
		  },
		  MORE REVIEWS  ]
	  }

What You Can Do with the G2 Review Data

After scraping G2 reviews and exporting it into a JSON file, you can use the data — during market research — to gain insights about the product. You can calculate the average rating, make data-driven decisions, or track changes in ratings over time.

This rating data is particularly valuable to businesses looking to enhance customer support, as understanding common issues and complaints from reviews can help you proactively address these problems.

You can use these reviews to improve your FAQs, create targeted support materials, or train your customer service team to handle recurring issues more effectively.

Also, you can use this process for competitive analysis, as you can find what your competitors’ customers are saying about them, letting you find weaknesses in their software to enhance yours or promote features they’re lacking.

Legal and Ethical Considerations When Scraping G2

G2 heavily relies on Cloudflare challenges to protect its site, and while most data on G2.com is publicly accessible, scraping must be done without harming the website.

Resource: Scrape Cloudflare Protected Websites with Python

Always consult and adhere to G2’s Terms of Service and community guidelines, which outline permissible data collection and use. Attempting to bypass any form of access control, such as login walls, is unethical and potentially illegal.

Furthermore, if your scraping activities involve collecting personal data, such as reviewer emails, you must be mindful of regulations like the GDPR in the EU, which impose strict rules on handling personal data.

Extract G2 Review Data with ScraperAPI

G2.com is a valuable resource for company reviews and comparisons, allowing you to collect direct feedback from customers to make marketing and business decisions.

In this tutorial, we have:

Gone through the essential steps of sending HTTP requests, parsing HTML, and exporting all reviews found into a JSON file
Addressed navigating G2’s robust security measures by utilizing ScraperAPI to facilitate data extraction without triggering a site ban
Understood the legal implications of scraping G2 and how to keep your scrapers compliant

Are you working on a large project? With ScraperAPI, collecting data at an enterprise level can be affordable. For $299 per month, you’ll get 300,000 API credits, along with 100 concurrent threads and a global geo-targeting feature (country-level). Need more API credits? Contact our sales team to get custom pricing. Dive deeper into ScraperAPI pricing and features here.

Until next time, happy scraping!

Do you also collect data from other websites? These Python web scraping tutorials may interest you:

About the author

John Fawole

John Fáwọlé is a technical writer and developer. He currently works as a freelance content marketer and consultant for tech startups.

Ready to start scraping?

Get started with 5,000 free API credits or contact sales

Get Started For Free

Tutorial on How to scrape AI Snippets in Google Search Engine Results Pages

How to Scrape AI Snippets in Google Search Results

If you’ve ever searched for something on Google and noticed a helpful AI-generated summary at the top of the results, you’ve encountered Google’s AI overviews.

Read article

November 25, 2024

How to Bypass and Scrape Amazon WAF Bot Control with Python

When scraping data from the web, one of the toughest challenges you’ll face is bot protection systems like AWS WAF Bot Control. It is widely

Read article

November 25, 2024

Safe Proxies for Financial Data Aggregation

Alternative financial data (alt-data) has become the mainstream for companies making strategic financial decisions nowadays. It goes beyond traditional data sources like company filings, broker

Read article

November 18, 2024

Need More Than 3M API Credits per Month?

Talk to an expert and learn how to build a scalable scraping solution.

Async Scraper Service

Structured Data

DataPipeline

Scraping API

Large-Scale Data Acquisition

Ecommerce

Market Research Firms

SEO Agencies

Travel Agencies and Hotels

VCs and Hedge Funds

AI and ML

SERP Data Collection

Ecommerce Data Collection

Market Research Scraper

Real Estate Data Collection

Online Reputation Management

cURL

Python

NodeJS

PHP

Ruby

Java

DataPipeline

Developer Guides

Free Downloads

Product FAQs

Case Studies

Webinars

Comparisons

Learning Hub

Glossary

Blog

Async Scraper Service

Structured Data

DataPipeline

Scraping API

Large-Scale Data Acquisition

Ecommerce

Market Research Firms

SEO Agencies

Travel Agencies and Hotels

VCs and Hedge Funds

AI and ML

SERP Data Collection

Ecommerce Data Collection

Market Research Scraper

Real Estate Data Collection

cURL

Python

NodeJS

PHP

Ruby

Java

DataPipeline

Developer Guides

Free Downloads

Product FAQs

Case Stuides

Webinars

Comparisons

Learning Hub

Glossary

Blog

How to Scrape G2 Reviews Using Python (5 Simple Steps)

TL;DR of Full Python Code on G2 Software Product Review Scraper

Scraping G2 Product Reviews with Python

Requirements of G2 Data Extractor

Understanding G2 Review Site Structure

Step 1: Setting Up Your G2 Web Scraping Project

Step 2: Sending a Request and Parsing the Response

Step 3: Extracting G2 Software Reviews and Their Details

Scraping the Username of G2 Reviews

Scraping the G2 Review Summary

Scraping the G2 Review Body

Scraping the G2 Review Date

Scraping the G2 Review URL

Scraping the G2 Rating

Step 4: Write the Results to a JSON File

Step 5: Testing the G2 Web Scraper

What You Can Do with the G2 Review Data