20% off on all annual plans. Use 20blackfriday code at checkout.

How To Use BeautifulSoup’s find() and find_all() Method

Tutorial on BeautifulSoup find and find_all methods

BeautifulSoup’s find() method lets you quickly locate the first element on a webpage that matches your search criteria when scraping, such as a tag name, an attribute, or a combination of both.

In contrast, the find_all() method helps you search for every matching element on a web page, extracting all elements with the same characteristics. It’s useful when you have multiple elements of the same kind on a page and want them all.

Get JSON Data Without
Parsing

ScraperAPI lets you get consistent, structured data from in-demand domains with a couple of lines of code.

In this article, you will learn how to use BS4’s find() and find_all() methods, the different ways you can extract data using them, and the main differences between them.

How to Use BeautifulSoup.Find()

When is the soup.find() method your best choice? It’s perfect when your goal is to identify the first occurrence of an element matching your specific search needs on a webpage.

Imagine you’re browsing a cooking website and want to extract the headline of the top recipe. You just specify the type of element you’re searching for within the find() method.

Don’t worry, it’ll make more sense after some demonstrations:

Find by HTML Tag

For example, if you want to extract the first h1 tag on a page filled with several h1 tags.

Here’s how you’d approach it:

<pre class="wp-block-syntaxhighlighter-code">
	from bs4 import BeautifulSoup

	html_doc = """
	<html>
		<body>
			<h1>Top Recipe: Classic Tomato Soup</h1>
			<h1>Second Best: Spicy Chicken Wings</h1>
			<h1>Editor's Choice: Vegan Lasagna</h1>
		</body>
	</html>
	"""
	
	soup = BeautifulSoup(html_doc, 'html.parser')
	
	# Using .find() to fetch the first <h1> tag
	print(soup.find('h1'))
	# Output: <h1>Top Recipe: Classic Tomato Soup</h1>
	
	first_h1 = soup.find('h1')
	
	print(first_h1.get_text())
	# Output: Top Recipe: Classic Tomato Soup
</pre>

Here, the find() method gets the first h1 tag, showing how you can directly get the specific data you need from a page filled with similar elements.

Find by Class

Finding elements by their CSS class is one of the most common tasks in web scraping, as classes often group similar items on a webpage.

BeautifulSoup’s find() method lets you quickly locate the first element with a specific class. This is particularly useful for websites that categorize content using class attributes.

For instance, continuing from our previous example of extracting the headline of the top recipe, imagine you want to find the recipe’s description, which is in a div tag with the class recipe-description.

Here’s how you would find it:

<pre class="wp-block-syntaxhighlighter-code">
	# Using .find() to fetch the first <div> tag with class 'recipe-description'
	soup.find('div', class_='recipe-description')
</pre>

Find by ID

In HTML, the ID attribute uniquely identifies an element within a webpage, making it a precise target for web scraping using soup.find(). Since an ID is unique to a single element, specifying the element type is optional when searching by ID.

If you’re looking to find a specific section with user reviews for the top recipe, identified by its unique ID, user-reviews, here’s how you’d proceed:

  # Using .find() to fetch the element with ID 'user-reviews'
  user_reviews_section = soup.find(id='user-reviews')
  
  print(user_reviews_section)
  # This would print out the element with the ID 'user-reviews'
  
  print(user_reviews_section.get_text())
  # This prints the text content of the element with the ID 'user-reviews'

Find by Attribute

In addition to standard attributes like class and ID, BeautifulSoup’s find() method allows you to search for elements based on any attribute. This flexibility is useful when targeting elements identified by less common attributes, such as data-* attributes, aria-labels, or custom attributes specific to a webpage’s structure.

Suppose each recipe is located within a div tag that has a custom attribute data-recipe-type indicating the type of meal (e.g., “soup,” “main course,” “dessert”).

To find the first recipe tagged as a “main course” using its data-recipe-type attribute, you would use the find() method as follows:

<pre class="wp-block-syntaxhighlighter-code">
	# Using .find() to fetch the first <div> tag with a 'data-recipe-type' attribute of 'main course'
	soup.find('div', attrs={'data-recipe-type': 'main course'})
</pre>

Find by Text

Searching by text content is another powerful feature of BeautifulSoup’s find() method, which, as its name implies, allows you to locate elements based on their text.

This method is useful when you know the specific text content of an element you’re looking for but not necessarily its position or attributes on the webpage.

Considering our ongoing example with the cooking website, let’s say you want to find a section explicitly mentioning “award-winning” within its text, perhaps about a recipe or chef accolade.

To find an element containing the exact string “award-winning,” you can use the string parameter with the find() method.

  # Find the first string that exactly matches 'award-winning'
  soup.find(string="award-winning")

If your search criteria involve finding text that includes “award-winning” as part of a larger string, or when you’re looking for variations of the phrase, incorporating regular expressions (regex) with the string parameter enhances your search flexibility.

  import re
  # Find the first string that contains 'award-winning', case-insensitive
  soup.find(string=re.compile("award-winning", re.IGNORECASE))

Find With Multiple Criteria

BeautifulSoup’s find() method allows for searching by a single criteria, such as tag name, class, ID, or text, but it also supports combining multiple criteria for more precise element selection. This allows you to narrow your search to specific elements that simultaneously meet several conditions.

Imagine you want to find a div element containing the class recipe-card and a data-award attribute signifying that the recipe has won an award. This multi-criteria search ensures you’re targeting a specific type of content on the webpage.

Here’s how you can perform this search using find() with multiple criteria:

<pre class="wp-block-syntaxhighlighter-code">
	# Using .find() to locate a <div> with a specific class and a custom attribute
	soup.find('div', class_='recipe-card', attrs={'data-award': True})
</pre>

Find Using Regex

Regular expressions (regex) are a powerful tool for pattern matching in strings, allowing you to search for complex patterns that might not be possible with simple substring searches.

In BeautifulSoup, you can use regular expressions with the .find() method to locate elements based on patterns within their text or attributes.

Let’s consider a situation where you want to find the first element whose class name starts with the word “recipe.”

Here’s how you can find the element using soup.find() with regex:

  import re

  # Using .find() to find the first element with a class name that matches the regex pattern
  matched_element = soup.find(class_ = re.compile("^recipe"))

How to Use BeautifulSoup.find_all()

When should you use the find_all() method? It’s perfect when you want to see all the items on a webpage that match your search.

Imagine we’ve just found the top recipe on a cooking website using the find() method. Now, we’re curious about what other recipes are out there.

That’s where find_all() comes in. It’s like saying, “The first recipe was great, but I want to check out all the others too.”

By telling find_all() what we’re looking for, it helps us collect every recipe on the site. This way, we get to explore every dish the website has, making sure we don’t miss out on any other delicious options.

Find All HTML Tags

Let’s say you’re interested in all the <h1> tags on a page, not just the first one. You want to see every main heading.

Here’s what you would do:

<pre class="wp-block-syntaxhighlighter-code">
	from bs4 import BeautifulSoup

	html_doc = """
	<html>
		<body>
			<h1>Top Recipe: Classic Tomato Soup</h1>
			<h1>Second Best: Spicy Chicken Wings</h1>
			<h1>Editor's Choice: Vegan Lasagna</h1>
		</body>
	</html>
	"""
	
	soup = BeautifulSoup(html_doc, 'html.parser')
	
	# Using .find_all() to grab all <h1> tags
	h1_tags = soup.find_all('h1')
	
	for h1 in h1_tags:
		print(h1.get_text())
	
	# Output:
	# Top Recipe: Classic Tomato Soup
	# Second Best: Spicy Chicken Wings
	# Editor's Choice: Vegan Lasagna
</pre>

Using find_all(), we can gather every <h1> tag on the page. With find_all(), remember it always returns a list of elements matching your criteria. This means you’ll get a list back even if there’s only one <h1> tag on the page.

Find All Elements by Class

Finding elements by class is common in web scraping because classes group similar items on a webpage. find_all() helps you find all elements that share a specific class.

Let’s say we’re moving on from just headlines to wanting all recipe descriptions on our cooking website. These are in <div> tags with the class recipe-description.

Here’s a quick way to find them:

<pre class="wp-block-syntaxhighlighter-code">
	# Using .find_all() to collect all <div> tags with class 'recipe-description'
	soup.find_all('div', class_='recipe-description')
</pre>

Elements by ID with find_all()

The ID attribute in HTML is unique to each element on a webpage, making it a highly precise target for web scraping. With BeautifulSoup, even though an ID should be unique and find() is typically used for ID searches, you might use find_all() out of habit or for consistency in your code. Remember, even if you use find_all() to search by ID, you’ll likely get a list with just one item because of the ID’s uniqueness.

If you’re after a specific section, like user reviews for the top recipe marked by the unique ID user-reviews, you’d usually use find().

However, here’s how you would do it with find_all():

  # Using .find_all() to fetch the element with ID 'user-reviews'

  soup.find_all(id='user-reviews')

Find All Elements by Attribute

Beyond the usual class and ID, BeautifulSoup’s find_all() lets you search for elements by any attribute, which is great for targeting specific details like data-* attributes or custom webpage attributes.

Imagine each recipe on our site is wrapped in a <div> tag, and each one has a custom attribute data-recipe-type telling you the meal type (like “soup,” “main course,” or “dessert”).

To gather all recipes classified as a “main course” using the data-recipe-type attribute, here’s how you’d use find_all():

<pre class="wp-block-syntaxhighlighter-code">
	# Finding all <div> tags marked as 'main course'
	soup.find_all('div', attrs={'data-recipe-type': 'main course'})
</pre>

Find Multiple Elements by Text

Using find_all() to search by text content is a strong tool in BeautifulSoup, perfect for when you’re after elements that contain specific text. This is useful if you know what the element says but not where it is or what it looks like on the page.

Imagine we’re still exploring the cooking website, and now we’re looking for any mention of “award-winning,” whether it’s about a recipe or a chef’s achievements.

To collect all elements with the exact phrase “award-winning,” you’d do this:

  # Find all strings exactly matching 'award-winning'
  soup.find_all(string="award-winning")

And if you’re looking for any variation of “award-winning” within the text, maybe in different cases or within a longer sentence, using regular expressions (regex) can help:

  import re
  # Find all strings containing 'award-winning', case-insensitive
  soup.find_all(string=re.compile("award-winning", re.IGNORECASE))

find_all() with Multiple Criteria

find_all() in BeautifulSoup doesn’t just let you search by one criterion; you can combine several, like tag name, class, ID, or text. This is perfect for when you want to zero in on elements that meet more than one condition simultaneously.

Let’s say we’re looking for every <div> element tagged with the class recipe-card with a data-award attribute, indicating the recipe is award-winning. This approach helps us find very specific content on the website.

Here’s how you’d go about it with find_all() for multiple criteria:

<pre class="wp-block-syntaxhighlighter-code">
	# Finding all <div> tags with a specific class and custom attribute
	soup.find_all('div', class_='recipe-card', attrs={'data-award': True})
</pre>

find_all() Using Regex

Regular expressions (regex) allow you to search for complex patterns in strings, making them a valuable tool for finding specific patterns in text or attributes that simple searches can’t handle. With BeautifulSoup, using regex can enhance your searches, allowing you to use find_all() to locate multiple elements that fit intricate patterns.

Imagine we want to find all elements on a webpage whose class names begin with “recipe.” This situation is where regex shines, helping us identify elements by pattern rather than exact matches.

THere’s how you’d apply regex in find_all():

  mport re

  # Using .find_all() to find elements where the class name matches our regex pattern
  soup.find_all(class_ = re.compile("^recipe"))

Get Elements with JSON Keys

Send a get() request to our structured endpoints and get specific elements with predictable key-value pairs.

find() vs find_all()

We’ve already explored how the find_all() method is invaluable for gathering multiple elements that match your search criteria from a webpage. Now, let’s contrast it with find(), to clarify when and why you might choose one over the other.

find_all() is your tool for comprehensive searches, allowing you to retrieve every element that fits your specified criteria, such as tag name, class, ID, or other attributes. It’s the method to use when your goal is to compile a list of elements for further analysis or processing.

In contrast, the find() method specializes in pinpoint accuracy. It returns the first element that matches your search criteria, stopping the search process as soon as this match is found. This method is particularly useful for scenarios where you’re interested in a single, specific piece of data from the page, and further searching is unnecessary once this piece of data is located.

Here’s a straightforward comparison:

  • find_all() returns a list containing all elements that match the search criteria, ideal for when you need a comprehensive dataset from the page.
  • find() returns the first matching element only, making it the better choice for quick lookups or when only the first occurrence of an element is relevant to your needs.

Choosing between find() and find_all() depends on the nature of your web scraping task. If you aim to extract all instances of a particular element, find_all() is the way to go.

However, if you’re only interested in the first instance or need to quickly locate a specific piece of information, find() offers a more efficient approach.

Both methods are fundamental to efficient web scraping with BeautifulSoup, and understanding their distinct functions allows you to more effectively navigate and extract data from HTML and XML pages.

Keep Learning

So that’s a wrap-up on using BeautifulSoup’s find() and find_all() methods. You now know how this essential tool can help you pinpoint and extract the data you need from HTML and XML pages.

Want to learn more about web scraping? Visit our blog to tackle some real projects following advanced tutorials.

Here are a couple worth checking out:

Until next time, happy scraping!

About the author

Picture of Ize Majebi

Ize Majebi

Ize Majebi is a Python developer and data enthusiast who delights in unraveling code intricacies and exploring the depths of the data world. She transforms technical challenges into creative solutions, possessing a passion for problem-solving and a talent for making the complex feel like a friendly chat. Her ability brings a touch of simplicity to the realms of Python and data.

Related Articles

Talk to an expert and learn how to build a scalable scraping solution.