How to select elemetns by text in XPath?

Learn the essentials of using XPath to select elements based on their text content. This guide provides a straightforward approach to pinpoint specific data within HTML documents, enhancing your scraping efficiency.

Best practices

  • Use the `text()` function in XPath to match elements with exact text content for precise selection.

  • Employ the `contains()` function to find elements that include a specific substring, which is useful for partial text matches.

  • For case-insensitive searches, utilize the `translate()` function in XPath to convert text to a single case before matching.

  • When searching for text within a specific attribute, combine attribute-specific queries with text-based functions to refine your search criteria.

# Importing the necessary library
from selenium import webdriver

# Setting up the WebDriver
driver = webdriver.Chrome()

# Navigate to the target website
driver.get("https://sandbox.oxylabs.io/products")

# Example 1: Exact text match
element = driver.find_element_by_xpath("//tagname[text()='Exact Text']")

# Example 2: Contains specific text
element = driver.find_element_by_xpath("//tagname[contains(text(), 'Part of Text')]")

# Example 3: Case-insensitive text search using translate
element = driver.find_element_by_xpath("//tagname[contains(translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'case insensitive text')]")

# Example 4: Text search in a specific attribute
element = driver.find_element_by_xpath("//tagname[@attribute='value'][contains(text(), 'Text in Attribute')]")

# Close the browser
driver.quit()

Common issues

  • Ensure that the XPath expression correctly matches the structure and tags of the HTML document to avoid selection errors.

  • Verify that the text being searched for does not include leading or trailing whitespace, as this can affect both exact and partial text matches.

  • Regularly update your XPath queries to accommodate changes in the website's HTML structure, which might otherwise lead to incorrect element selection.

  • Test XPath expressions in tools like browser developer consoles to ensure they accurately select the desired elements before implementing them in code.

# Incorrect tag name in XPath
element = driver.find_element_by_xpath("//wrongtag[text()='Exact Text']")

# Correct tag name in XPath
element = driver.find_element_by_xpath("//correcttag[text()='Exact Text']")

# Incorrect handling of whitespace in text
element = driver.find_element_by_xpath("//tagname[text()=' Text with spaces ']")

# Correct handling by trimming spaces
element = driver.find_element_by_xpath("//tagname[text()='Text with spaces']")

# Outdated XPath after HTML structure change
element = driver.find_element_by_xpath("//div[@class='old-class'][text()='Text']")

# Updated XPath matching new HTML structure
element = driver.find_element_by_xpath("//section[@class='new-class'][text()='Text']")

# Testing XPath in code without prior verification
element = driver.find_element_by_xpath("//tagname[text()='Unverified Text']")

# Pre-testing XPath in browser console before using in code
# Use browser developer tools to verify XPath works as expected

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

Get the latest news from data gathering world

I'm interested