How to Scrape Realtor.com with Python [Code + Guide]

Realtor.com is the second-largest real estate platform in the United States for individuals and businesses to buy, rent, and sell properties and for analysts to gather a lot of property data that can be used to examine future trends in any locality.

However, collecting this information manually is not only inefficient but impossible. Listings change too fast, and new properties are added daily, so the best way to approach this is web scraping.

In this article, we’ll walk you through a step-by-step guide on scraping Realtor.com data using Python (to write our script) and ScraperAPI’s standard API to avoid getting blocked.

TL;DR: Full Realtor.com Scraper

For those in a hurry, here’s the full script we’ll build in this tutorial:

import requests
from bs4 import BeautifulSoup
import json
 
output_data = []
base_url = "https://www.realtor.com/realestateandhomes-search/Atlanta_GA/show-newest-listings/sby-6"
API_KEY = "API_KEY"
 
 
def scrape_listing(num_pages):
    for page in range(1, num_pages + 1):
        # To scrape page 1
        if page == 1:
            url = f"{base_url}"
        else:
            url = (
                f"{base_url}/pg-{page}"  # Adjust the URL structure based on the website
            )
        print(f"Scraping data from page {page}... {url}")
 
        payload = {"api_key": API_KEY, "url": url}
        # Make a request to the ScraperAPI
        r = requests.get("http://api.scraperapi.com", params=payload)
        html_response = r.text
 
        # Parse the HTML response using BeautifulSoup
        soup = BeautifulSoup(html_response, "lxml")
 
        # scraping individual page
        listings = soup.select("div[class^='BasePropertyCard_propertyCardWrap__']")
        print("Listings found!")
 
        for listing in listings:
           
            price = listing.find("div", class_="card-price")
            price = price.get_text(strip=True) if price else "nil"
 
            full_address = listing.find("div", class_="card-address")
            full_address = full_address.get_text(strip=True) if full_address else "nil"
            address_parts = full_address.split(", ")
            address = address_parts[0] if address_parts else "nil"
            township = address_parts[1] if len(address_parts) > 1 else "nil"
 
            property_url_elements = listing.select("a[class^='LinkComponent_anchor__']")
            property_url = "nil"  # Default value if property_url_elements is empty
            for element in property_url_elements:
                property_url = "https://www.realtor.com" + element["href"]
                break
 
            beds = listing.find(
                "li",
                class_="PropertyBedMetastyles__StyledPropertyBedMeta-rui__a4nnof-0",
            )
            beds = (
                beds.find("span", {"data-testid": "meta-value"}).text.strip()
                if beds
                else "nil"
            )
            baths = listing.find(
                "li",
                class_="PropertyBathMetastyles__StyledPropertyBathMeta-rui__sc-67m6bo-0",
            )
            baths = baths.find("span").text.strip() if baths else "nil"
 
            sqft = listing.find(
                "li",
                class_="PropertySqftMetastyles__StyledPropertySqftMeta-rui__sc-1gdau7i-0",
            )
            sqft = (
                sqft.find("span", {"data-testid": "screen-reader-value"}).text.strip()
                if sqft
                else "nil"
            )
 
            plot_size = listing.find(
                "li",
                class_="PropertyLotSizeMetastyles__StyledPropertyLotSizeMeta-rui__sc-1cz4zco-0",
            )
            plot_size = (
                plot_size.find(
                    "span", {"data-testid": "screen-reader-value"}
                ).text.strip()
                if plot_size
                else "nil"
            )
 
            property_data = {
                "price": price,
                "address": address,
                "township": township,
                "url": property_url,
                "beds": beds,
                "baths": baths,
                "square_footage": sqft,
                "plot_size": plot_size,
            }
 
            output_data.append(property_data)
 
 
num_pages = 5  # Set the desired number of pages
 
# Scrape data from multiple pages
scrape_listing(num_pages)
 
# our property count
output_data.append({"num_hits": len(output_data)})
 
# Write the output to a JSON file
with open("Realtor_data.json", "w") as json_file:
    json.dump(output_data, json_file, indent=2)
 
print("Output written to output.json")

Note: Substitute API_KEY in the code with your actual API key before running the script.

Want to learn how we built it? Keep reading for a step-by-step explanation.

Scraping Realtor.com’s Product Data

Before you begin the scraping, defining what specific information you aim to extract from the webpage is essential. For this tutorial, we’ll focus on the following details:

Property Selling Price
Property Address
Property Listing URL
Number of Beds and baths
Property Square footage
Plot size

Prerequisites

The main prerequisites for this tutorial are Python, Requests, BeautifulSoup, and Lxml libraries. Run this command to install accordingly.

pip install beautifulsoup4 requests lxml

Step 1: Setting Up Your Project

Note: Before starting, make sure to sign up for a free ScraperAPI account to obtain your API key.

First, we import the necessary Python libraries at the top of our .py file.

import requests
from bs4 import BeautifulSoup
import json

Then, we initialize the variables we’ll use throughout our script.

output_data = []
base_url = "https://www.realtor.com/realestateandhomes-search/Atlanta_GA/show-newest-listings/sby-6"
API_KEY = "YOUR_API_KEY"

output_data stores the data
base_url is the URL of the Realtor.com page we want to scrape – at least the initial URL – which you can get by navigating to the site and performing a search
API_KEY will hold our ScraperAPI key as a string

Step 2: Define Your Scraping Function

We define a function, scrape_listing(), which takes the number of pages to scrape as an argument, allowing us to scrape multiple pages.

def scrape_listing(num_pages):
for page in range(1, num_pages + 1):
    # To scrape page 1
    if page == 1:
        url = f"{base_url}"
    else:
        url = f"{base_url}/pg-{page}" # Adjust the URL structure based on the website
    print(f"Scraping data from page {page}... {url}")
 
    payload = {"api_key": API_KEY, "url": url}
    # Make a request to the ScraperAPI
    r = requests.get("http://api.scraperapi.com", params=payload)
    html_response = r.text
    soup = BeautifulSoup(html_response, "lxml")

We loop over each page, construct the URL for the page, make a GET request to ScraperAPI, and conjure a BeautifulSoup object for each page.

Note: We need to send our requests through ScraperAPI to avoid getting our IP banned, allowing us to collect data on a large scale.

Step 3: Parse HTML Response

By parsing the HTML response using BeautifulSoup, we can turn the raw HTML into a parsed tree we can navigate using CSS selectors.

If we inspect the page, we can see that each listing is wrapped inside a card (div) with the BasePropertyCard_propertyCardWrap__ class.

Using this class, we can now store all property listings into a listing variable.

# Parse the HTML response using BeautifulSoup
soup = BeautifulSoup(html_response, "lxml")
 
# scraping individual page
listings = soup.select("div[class^='BasePropertyCard_propertyCardWrap__']")
print("Listings found!")

We’ll print a success message to the console to get some feedback as our code runs.

Step 4: Extract Property Data

For each listing found, we’ll extract the property data such as price, address, URL, number of bedrooms, bathrooms, square footage, and lot size.

To extract the prices of each listing, we use the selector div[class^='card-price']. This selector targets the div elements whose class starts with card-price. These divs contain the price of the property.

price = listing.find("div", class_="card-price")
price = price.get_text(strip=True) if price else "nil"

To extract the address from the property listings card, we use the selector div[class^='card-address']. These divs contain the address of the property.

full_address = listing.find("div", class_="card-address")
full_address = full_address.get_text(strip=True) if full_address else "nil"
address_parts = full_address.split(", ")
address = address_parts[0] if address_parts else "nil"
township = address_parts[1] if len(address_parts) > 1 else "nil"

To extract other property listing details, we walk down a couple of li elements which contain the selectors PropertyBedMetastyles__StyledPropertyBedMeta-rui__a4nnof-0 and PropertySqftMetastyles__StyledPropertySqftMeta-rui__sc-1gdau7i-0.

The number of beds and baths and the square footage of each listed property can be found in these CSS selectors.

for listing in listings:
price = listing.find("div", class_="card-price")
price = price.get_text(strip=True) if price else "nil"
 
full_address = listing.find("div", class_="card-address")
full_address = full_address.get_text(strip=True) if full_address else "nil"
address_parts = full_address.split(", ")
address = address_parts[0] if address_parts else "nil"
township = address_parts[1] if len(address_parts) > 1 else "nil"
 
property_url_elements = listing.select("a[class^='LinkComponent_anchor__']")
property_url = "nil"  # Default value if property_url_elements is empty
for element in property_url_elements:
    property_url = "https://www.realtor.com" + element["href"]
    break
 
beds = listing.find(
    "li",
    class_="PropertyBedMetastyles__StyledPropertyBedMeta-rui__a4nnof-0",
)
beds = (
    beds.find("span", {"data-testid": "meta-value"}).text.strip()
    if beds
    else "nil"
)
baths = listing.find(
    "li",
    class_="PropertyBathMetastyles__StyledPropertyBathMeta-rui__sc-67m6bo-0",
)
baths = baths.find("span").text.strip() if baths else "nil"
 
sqft = listing.find(
    "li",
    class_="PropertySqftMetastyles__StyledPropertySqftMeta-rui__sc-1gdau7i-0",
)
sqft = (
    sqft.find("span", {"data-testid": "screen-reader-value"}).text.strip()
    if sqft
    else "nil"
)
 
plot_size = listing.find(
    "li",
    class_="PropertyLotSizeMetastyles__StyledPropertyLotSizeMeta-rui__sc-1cz4zco-0",
)
plot_size = (
    plot_size.find(
        "span", {"data-testid": "screen-reader-value"}
    ).text.strip()
    if plot_size
    else "nil"
)
 
property_data = {
    "price": price,
    "address": address,
    "township": township,
    "url": property_url,
    "beds": beds,
    "baths": baths,
    "square_footage": sqft,
    "plot_size": plot_size,
}
 
output_data.append(property_data)

This data is stored in a dictionary and appended to the output_data list. Each piece of data is extracted using the .find() method and the appropriate CSS selector.

Step 5: Scrape Data from Multiple Pages

We call the scrape_listing() function to scrape data from the desired number of pages. Feel free to modify the num_pages variable to scrape data from more pages if needed.

num_pages = 5 # Set the desired number of pages
 
# Scrape data from multiple pages
scrape_listing(num_pages)

Step 6: Write the Output to a JSON File

Before finishing our script, we add the total number of properties scraped to the output_data list.

# our property count
output_data.append({"num_hits": len(output_data)})

Finally, we write the output_data list to a JSON file.

# Write the output to a JSON file
with open("Realtor_data.json", "w") as json_file:
   json.dump(output_data, json_file, indent=2)
 
print("Output written to output.json")

Congratulations, you just scrapped vital data from Realtor.com!

Here is how the end result in the Realtor_data.json file will look like:

{
    "price": "$315,000",
    "address": "2383 Baker Rd NWAtlanta",
    "township": "GA 30318",
    "url": "https://www.realtor.com/realestateandhomes-detail/2383-Baker-Rd-NW_Atlanta_GA_30318_M56102-81051?from=srp-list-card",
    "beds": "3",
    "baths": "2",
    "square_footage": "nil",
    "plot_size": "8,712 square foot lot"
  },
  {
    "price": "$365,000",
    "address": "3150 Woodwalk Dr SE Unit 3408Atlanta",
    "township": "GA 30339",
    "url": "https://www.realtor.com/realestateandhomes-detail/3150-Woodwalk-Dr-SE-Unit-3408_Atlanta_GA_30339_M51624-69914?from=srp-list-card",
    "beds": "2",
    "baths": "2",
    "square_footage": "nil",
    "plot_size": "1,307 square foot lot"
  },
...
   
MORA DATA
...
{
    "num_hits": 84
  }

Wrapping Up

In this tutorial we’ve built a Realtor.com scraper that:

Navigates to a specific page from Realtor.com to collect property data
Sends all requests through ScraperAPI servers to avoid getting blocked by anti-scraping mechanisms
Navigates through the pagination to scrape multiple pages
Writes all scraped data into a JSON file

Now, change the URL from the base_url variable (or create a list of URLs) and start collecting Realtor property data at scale!

Need more than 3M API credits? Contact our sales team and let them help you build a custom solution that fits your needs.

Frequently Asked Questions

What Kind of Data Can I Gather from Realtor.com?

From Realtor.com you can get data like:
– Property Selling Price
– Property Address
– Property Listing URL
– Number of Beds and baths
– Property Square footage
– Plot size
All of this information can then be transformed into JSON or a CSV file for analysis.

What Can I Do with Realtor.com’s Product Data?

You can use Realtor.com’s data to know the price and qualities of property in a location, spot pricing and demand trends in specific areas, improve your listings based on your competitors, create property alerts when prices are down, and much more.

Does Realtor.com Block Scraping?

Yes, it does. This is the reason it is vital to use formidable scraping tools such as ScraperAPI.

Async Scraper Service

Structured Data

DataPipeline

Scraping API

Large-Scale Data Acquisition

Ecommerce

Market Research Firms

SEO Agencies

Travel Agencies and Hotels

VCs and Hedge Funds

AI and ML

SERP Data Collection

Ecommerce Data Collection

Market Research Scraper

Real Estate Data Collection

cURL

Python

NodeJS

PHP

Ruby

Java

DataPipeline

Developer Guides

Free Downloads

Product FAQs

Case Studies

Webinars

Comparisons

Learning Hub

Glossary

Blog

Async Scraper Service

Structured Data

DataPipeline

Scraping API

Large-Scale Data Acquisition

Ecommerce

Market Research Firms

SEO Agencies

Travel Agencies and Hotels

VCs and Hedge Funds

AI and ML

SERP Data Collection

Ecommerce Data Collection

Market Research Scraper

Real Estate Data Collection

cURL

Python

NodeJS

PHP

Ruby

Java

DataPipeline

Developer Guides

Free Downloads

Product FAQs

Case Stuides

Webinars

Comparisons

Learning Hub

Glossary

Blog

How to Scrape Realtor.com with Python [Code + Guide]

BROWSE TOPICS

TL;DR: Full Realtor.com Scraper

Scraping Realtor.com’s Product Data

Prerequisites

Step 1: Setting Up Your Project

Step 2: Define Your Scraping Function

Step 3: Parse HTML Response

Step 4: Extract Property Data

Step 5: Scrape Data from Multiple Pages

Step 6: Write the Output to a JSON File

Wrapping Up

Frequently Asked Questions

About the author

John Fawole