Always use the text() function in XPath to extract the text content of an element, ensuring you retrieve only the human-readable part.
Utilize predicates in XPath expressions to filter and refine selections, enhancing the specificity and accuracy of your data extraction.
Use the contains() function to match elements based on partial attribute values or text, making your XPath queries more flexible and robust.
Regularly update and test your XPath queries to adapt to changes in the webpage structure, ensuring your code remains functional over time.
# pip install requests lxml
from lxml import html
import requests
# Fetching the webpage
response = requests.get('https://sandbox.oxylabs.io/products')
# Parsing the content
tree = html.fromstring(response.text)
# Example 1: Extracting all product names using XPath
product_names = tree.xpath('//h4/text()')
print(product_names)
print()
# Example 2: Extracting first product's price
first_product_price = tree.xpath('(//div[contains(@class, "price-wrapper")])[1]/text()')
print(first_product_price)
print()
# Example 3: Extracting product URLs from links
product_links = tree.xpath('//div[contains(@class, "product-card")]/a[contains(@class, "card-header")]/@href')
print(product_links)
print()
# Example 4: Using predicates to filter data
specific_product = tree.xpath('//h4[contains(text(), "Mario")]/text()')
print(specific_product)
print()Scraper APIs
Web, SERP, E-Commerce
JSON, AWS S3, GCS
Auto-retry system
JavaScript rendering
Unlimited Trial
(up to 2K results)
Micro
$0.95/1K requests
Starter
$0.9/1K requests
Advanced
$0.88/1K requests
Promo Code:
Web Unblocker
177M+ proxy pool
CAPTCHA bypass
Automated unblocking
Human-like browsing
Free Trial
$0/mo
Micro 5GB
$15 $10/GB
Starter 25GB
$13 $9/GB
Advanced 60GB
$11 $7/GB
Promo Code:
Residential proxies
175M+ Residential IPs
99.95% success rates
HTTPS, HTTP, SOCKS5
30-min session duration
Pay as you go
$8 $4/GB
Micro 13GB
$7.75 $3.87/GB
Starter 40GB
$7.5 $3.75/GB
Advanced 86GB
$6.98 $3.49/GB
Promo Code:
Mobile Proxies
20M+ mobile IPs
Network 3G/4G/5G
Real mobile devices
Fast performance
Pay as you go
$9/GB
Micro 12GB
$8.25 $5.7/GB
Starter 38GB
$7.89 $5.5/GB
Advanced 80GB
$7.5 $5.2/GB
Promo Code:
Ensure that your XPath expressions are correctly formed to avoid syntax errors, which are common when navigating complex HTML structures.
When using XPath with namespaces in XML documents, remember to register and use the namespace prefixes properly to avoid selection issues.
Avoid absolute XPath paths in your scripts; instead, use relative paths to make your code more resilient to changes in the webpage layout.
Handle exceptions when web pages fail to load or return unexpected content to maintain the robustness of your web scraping scripts.
# pip install requests lxml
from lxml import html
import requests
response = requests.get('https://sandbox.oxylabs.io/products')
tree = html.fromstring(response.text)
# Incorrectly formed XPath, missing closing bracket
try:
prices = tree.xpath('//div[contains(@class, "price-wrapper")/text()')
print(prices)
except Exception as e:
print(e)
# Correctly formed XPath
prices = tree.xpath('//div[contains(@class, "price-wrapper")]/text()')
print(prices)
# Incorrect namespace handling, missing namespace registration
products = tree.xpath('//ns:product')
# Correct namespace handling
tree.xpath('//ns:product', namespaces={'ns': 'http://example.com/ns'})
# Incorrect: Using absolute XPath, brittle if HTML changes
first_product_title = tree.xpath('//*[@id="__next"]/main/div/div/div/div[2]/div/div[1]/a[1]/h4/text()')
print(first_product_title)
# Correct: Using relative XPath, more flexible
first_product_title = tree.xpath('(//h4)[1]/text()')
print(first_product_title)
# Incorrect: No exception handling, may crash if page fails to load
response = requests.get('https://sandbox.oxylabs.io/products')
tree = html.fromstring(response.text)
# Correct: With exception handling
try:
response = requests.get('https://sandbox.oxylabs.io/products')
tree = html.fromstring(response.text)
except Exception as e:
print(f"Failed to load page or parse content: {e}")Get the latest news from data gathering world
Scale up your business with Oxylabs®
Proxies
Advanced proxy solutions
Data Collection
Datasets
Resources
Innovation hub