Best practices

  • Use specific attributes in your XPath to accurately target the desired sibling elements, ensuring precision in selection.

  • To select all following siblings of an element, use the `following-sibling::

  • ` axis; this avoids unintentionally selecting unrelated nodes.

  • When you need only the immediately adjacent sibling, append `[1]` to your XPath, as in `following-sibling::

from lxml import html
import requests

# Fetch the webpage
response = requests.get('https://sandbox.oxylabs.io/products')
tree = html.fromstring(response.content)

# Select all sibling elements following the current element
following_siblings = tree.xpath('//div[@class="product"]/following-sibling::*')
print("Following siblings:", following_siblings)

# Select all sibling elements preceding the current element
preceding_siblings = tree.xpath('//div[@class="product"]/preceding-sibling::*')
print("Preceding siblings:", preceding_siblings)

# Select the immediately following sibling
next_sibling = tree.xpath('//div[@class="product"]/following-sibling::*[1]')
print("Next sibling:", next_sibling)

# Select the immediately preceding sibling
previous_sibling = tree.xpath('//div[@class="product"]/preceding-sibling::*[1]')
print("Previous sibling:", previous_sibling)

Common issues

  • Ensure your XPath queries are context-aware to avoid selecting sibling elements from different parent nodes unintentionally.

  • Regularly update and verify the accuracy of your XPath selectors, especially when the structure of the HTML document changes.

  • Use comments or documentation to clarify the purpose of specific XPath selections, particularly when using complex sibling selection patterns.

  • Test XPath expressions in tools like browser developer consoles to confirm they select the intended elements before implementing them in your code.

# Incorrect: This might select siblings from different parents if not careful
all_siblings = tree.xpath('//div/following-sibling::*')

# Correct: Ensure the context node is specific to avoid unintended selections
specific_siblings = tree.xpath('//div[@id="specificId"]/following-sibling::*')

# Bad practice: Using outdated or incorrect XPath after HTML structure change
old_xpath = tree.xpath('//div[@class="oldClass"]/following-sibling::*')

# Good practice: Regularly check and update XPath to match current HTML structure
updated_xpath = tree.xpath('//div[@class="newClass"]/following-sibling::*')

# No documentation: Complex XPath without explanation
complex_xpath = tree.xpath('//div[starts-with(@class, "prod")]/following-sibling::div[contains(@class, "info")]')

# With documentation: Explained complex XPath for better understanding
# Selects siblings that are divs with a class containing 'info', following divs starting with class 'prod'
documented_xpath = tree.xpath('//div[starts-with(@class, "prod")]/following-sibling::div[contains(@class, "info")]')

# Testing in isolation: XPath tested only in code
untested_xpath = tree.xpath('//div[@class="test"]/following-sibling::*')

# Testing in development tools: XPath confirmed in browser dev tools before use
# Use browser developer tools to test and verify XPath works as intended
tested_xpath = tree.xpath('//div[@class="test"]/following-sibling::*')

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

Get the latest news from data gathering world

I'm interested