How to select elemetns by class in XPath?

Learn the essentials of using XPath to select elements by class, a crucial skill for efficient data extraction. This guide provides straightforward steps to master this technique, enhancing your scraping capabilities.

Best practices

  • Use `[@class="classname"]` to select elements with an exact class match, ensuring precision in targeting specific elements.

  • Employ `contains(@class, "classname")` to find elements where the class attribute includes a particular class, useful for broader matches.

  • Utilize `contains(concat(" ", normalize-space(@class), " "), " classname ")` for accurate selection when targeting elements with multiple classes, ensuring the class is not part of another class name.

  • When selecting elements with a combination of specific classes, list all required classes in the attribute selector `[@class="class1 class2"]` to match the exact group of classes.

from lxml import html
import requests

# Fetch the webpage
response = requests.get('https://sandbox.oxylabs.io/products')
tree = html.fromstring(response.content)

# Select elements by exact class match
products = tree.xpath('//div[@class="product"]')
print("Products by exact class match:", products)

# Select elements where class attribute contains a specific class
featured_products = tree.xpath('//div[contains(@class, "featured")]')
print("Products with 'featured' in class:", featured_products)

# Select elements with multiple classes, checking one of them
sale_products = tree.xpath('//div[contains(concat(" ", normalize-space(@class), " "), " sale ")]')
print("Products with 'sale' in class:", sale_products)

# Handling multiple classes in a single element
multi_class_items = tree.xpath('//div[@class="product featured sale"]')
print("Products with multiple specific classes:", multi_class_items)

Common issues

  • Exact Class Matching: Avoid partial matches; always use exact class selectors for precise targeting.

  • Dynamic Classes: Use contains for elements with dynamically added or changing class parts.

  • Multiple Classes: Ensure isolation with concat(" ", normalize-space(@class), " ") to avoid false positives.

  • Order-Independent Classes: Combine multiple contains for class combinations without relying on order.

# Incorrect: This might select elements with class names like 'product-info' as well
incorrect_selection = tree.xpath('//div[@class="product"]')

# Correct: Use exact class matching to avoid selecting similar class names
correct_selection = tree.xpath('//div[@class="exactClassName"]')

# Incorrect: May fail if additional classes are added dynamically
static_class_selection = tree.xpath('//div[@class="menu active"]')

# Correct: Use contains for dynamic class parts, handles additional dynamic classes
dynamic_class_selection = tree.xpath('//div[contains(@class, "partOfClassname")]')

# Incorrect: Fails to select element if 'sale' is not isolated (e.g., 'flashsale')
bad_multi_class_handling = tree.xpath('//div[@class="sale"]')

# Correct: Ensures 'sale' is a distinct class, not part of another class name
good_multi_class_handling = tree.xpath('//div[contains(concat(" ", normalize-space(@class), " "), " sale ")]')

# Incorrect: Fails if the order of classes in the attribute changes
order_dependent_selection = tree.xpath('//div[@class="firstClass secondClass"]')

# Correct: Specify multiple classes without depending on order
order_independent_selection = tree.xpath('//div[contains(@class, "firstClass") and contains(@class, "secondClass")]')

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

Get the latest news from data gathering world

I'm interested