How to select sibling elements in XPath?

Learn to navigate XML and HTML structures efficiently by mastering the selection of sibling elements in XPath. This guide provides clear steps and essential tips for enhancing your data extraction techniques.

Best practices

  • Use specific attributes in your XPath to accurately target the desired sibling elements, ensuring precision in selection.

  • To select all following siblings of an element, use the `following-sibling::

  • ` axis; this avoids unintentionally selecting unrelated nodes.

  • When you need only the immediately adjacent sibling, append `[1]` to your XPath, as in `following-sibling::

from lxml import html
import requests

# Fetch the webpage
response = requests.get('https://sandbox.oxylabs.io/products')
tree = html.fromstring(response.content)

# Select all sibling elements following the current element
following_siblings = tree.xpath('//div[@class="product"]/following-sibling::*')
print("Following siblings:", following_siblings)

# Select all sibling elements preceding the current element
preceding_siblings = tree.xpath('//div[@class="product"]/preceding-sibling::*')
print("Preceding siblings:", preceding_siblings)

# Select the immediately following sibling
next_sibling = tree.xpath('//div[@class="product"]/following-sibling::*[1]')
print("Next sibling:", next_sibling)

# Select the immediately preceding sibling
previous_sibling = tree.xpath('//div[@class="product"]/preceding-sibling::*[1]')
print("Previous sibling:", previous_sibling)

Common issues

  • Ensure your XPath queries are context-aware to avoid selecting sibling elements from different parent nodes unintentionally.

  • Regularly update and verify the accuracy of your XPath selectors, especially when the structure of the HTML document changes.

  • Use comments or documentation to clarify the purpose of specific XPath selections, particularly when using complex sibling selection patterns.

  • Test XPath expressions in tools like browser developer consoles to confirm they select the intended elements before implementing them in your code.

# Incorrect: This might select siblings from different parents if not careful
all_siblings = tree.xpath('//div/following-sibling::*')

# Correct: Ensure the context node is specific to avoid unintended selections
specific_siblings = tree.xpath('//div[@id="specificId"]/following-sibling::*')

# Bad practice: Using outdated or incorrect XPath after HTML structure change
old_xpath = tree.xpath('//div[@class="oldClass"]/following-sibling::*')

# Good practice: Regularly check and update XPath to match current HTML structure
updated_xpath = tree.xpath('//div[@class="newClass"]/following-sibling::*')

# No documentation: Complex XPath without explanation
complex_xpath = tree.xpath('//div[starts-with(@class, "prod")]/following-sibling::div[contains(@class, "info")]')

# With documentation: Explained complex XPath for better understanding
# Selects siblings that are divs with a class containing 'info', following divs starting with class 'prod'
documented_xpath = tree.xpath('//div[starts-with(@class, "prod")]/following-sibling::div[contains(@class, "info")]')

# Testing in isolation: XPath tested only in code
untested_xpath = tree.xpath('//div[@class="test"]/following-sibling::*')

# Testing in development tools: XPath confirmed in browser dev tools before use
# Use browser developer tools to test and verify XPath works as intended
tested_xpath = tree.xpath('//div[@class="test"]/following-sibling::*')

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

Get the latest news from data gathering world

I'm interested