Best practices

  • Always use the text() function in XPath to extract the text content of an element, ensuring you retrieve only the human-readable part.

  • Utilize predicates in XPath expressions to filter and refine selections, enhancing the specificity and accuracy of your data extraction.

  • Use the contains() function to match elements based on partial attribute values or text, making your XPath queries more flexible and robust.

  • Regularly update and test your XPath queries to adapt to changes in the webpage structure, ensuring your code remains functional over time.

# pip install requests lxml
from lxml import html
import requests


# Fetching the webpage
response = requests.get('https://sandbox.oxylabs.io/products')

# Parsing the content
tree = html.fromstring(response.text)


# Example 1: Extracting all product names using XPath
product_names = tree.xpath('//h4/text()')
print(product_names)
print()


# Example 2: Extracting first product's price
first_product_price = tree.xpath('(//div[contains(@class, "price-wrapper")])[1]/text()')
print(first_product_price)
print()


# Example 3: Extracting product URLs from links
product_links = tree.xpath('//div[contains(@class, "product-card")]/a[contains(@class, "card-header")]/@href')
print(product_links)
print()


# Example 4: Using predicates to filter data
specific_product = tree.xpath('//h4[contains(text(), "Mario")]/text()')
print(specific_product)
print()

Common issues

  • Ensure that your XPath expressions are correctly formed to avoid syntax errors, which are common when navigating complex HTML structures.

  • When using XPath with namespaces in XML documents, remember to register and use the namespace prefixes properly to avoid selection issues.

  • Avoid absolute XPath paths in your scripts; instead, use relative paths to make your code more resilient to changes in the webpage layout.

  • Handle exceptions when web pages fail to load or return unexpected content to maintain the robustness of your web scraping scripts.

# pip install requests lxml
from lxml import html
import requests


response = requests.get('https://sandbox.oxylabs.io/products')
tree = html.fromstring(response.text)


# Incorrectly formed XPath, missing closing bracket
try:
    prices = tree.xpath('//div[contains(@class, "price-wrapper")/text()')
    print(prices)
except Exception as e:
    print(e)

# Correctly formed XPath
prices = tree.xpath('//div[contains(@class, "price-wrapper")]/text()')
print(prices)


# Incorrect namespace handling, missing namespace registration
products = tree.xpath('//ns:product')

# Correct namespace handling
tree.xpath('//ns:product', namespaces={'ns': 'http://example.com/ns'})


# Incorrect: Using absolute XPath, brittle if HTML changes
first_product_title = tree.xpath('//*[@id="__next"]/main/div/div/div/div[2]/div/div[1]/a[1]/h4/text()')
print(first_product_title)

# Correct: Using relative XPath, more flexible
first_product_title = tree.xpath('(//h4)[1]/text()')
print(first_product_title)


# Incorrect: No exception handling, may crash if page fails to load
response = requests.get('https://sandbox.oxylabs.io/products')
tree = html.fromstring(response.text)

# Correct: With exception handling
try:
    response = requests.get('https://sandbox.oxylabs.io/products')
    tree = html.fromstring(response.text)
except Exception as e:
    print(f"Failed to load page or parse content: {e}")

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

Python Web Scraping Tutorial: Step-By-Step
Python Web Scraping Tutorial: Step-By-Step
Authors avatar

Adomas Sulcas

2025-04-01

How to Find Elements With Selenium in Python
Enrika avatar

Enrika Pavlovskytė

2024-06-21

lxml Tutorial: XML Processing and Web Scraping With lxml visuals
lxml Tutorial: XML Processing and Web Scraping With lxml
Gabija Fatenaite avatar

Gabija Fatenaite

2021-08-30

Get the latest news from data gathering world

I'm interested