How to Get Elements by XPath in JavaScript?

Best practices

Always use absolute paths cautiously in XPath to avoid breaking your code if the HTML structure changes; relative paths are generally more maintainable.
When using XPath in JavaScript, ensure to handle possible null returns from `document.evaluate` to avoid runtime errors.
Optimize your XPath expressions by using predicates to directly access the desired elements, reducing the need for post-filtering in JavaScript.
Use `XPathResult.ORDERED_NODE_ITERATOR_TYPE` to retrieve multiple nodes to maintain the document order, which is especially useful when the sequence is important.

// Importing necessary functions
const { JSDOM } = require('jsdom');
const { window } = new JSDOM();
const { document } = window;

// Example URL
const url = 'https://sandbox.oxylabs.io/products';

// Fetch the HTML content
fetch(url)
.then(response => response.text())
.then(html => {
const dom = new JSDOM(html);
const doc = dom.window.document;

// Get element by XPath - single element
const getElementByXPath = (path) => {
return document.evaluate(path, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
};
console.log(getElementByXPath('/html/body/div/p[1]'));

// Get elements by XPath - multiple elements
const getElementsByXPath = (path) => {
const iterator = document.evaluate(path, document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);
let result = [], node = iterator.iterateNext();

while (node) {
result.push(node);
node = iterator.iterateNext();
}
return result;
};
console.log(getElementsByXPath('//div[contains(@class, \'product\')]'));
})
.catch(error => console.error('Failed to fetch page: ', error));

Common issues

Ensure that the XPath expression is correctly formatted and matches the actual HTML structure to prevent selection errors.
Regularly update and test your XPath queries to align with changes in the web page's DOM structure, as outdated XPaths can lead to incorrect or no data being retrieved.
When integrating XPath with JavaScript, consider using a try-catch block to gracefully handle exceptions that may occur during the evaluation process.
Be aware of the performance implications when using complex XPath queries in large documents; simpler and more direct paths can significantly enhance execution speed.

// Incorrect XPath format leading to errors
const result = document.evaluate('///incorrect/xpath', document, null, XPathResult.ANY_TYPE, null);

// Correct XPath format
const result = document.evaluate('/html/body/div', document, null, XPathResult.ANY_TYPE, null);

// Outdated XPath not reflecting current DOM structure
const oldResult = document.evaluate('/html/body/div[2]', document, null, XPathResult.ANY_TYPE, null);

// Updated XPath after DOM changes
const updatedResult = document.evaluate('/html/body/section/div', document, null, XPathResult.ANY_TYPE, null);

// No exception handling, may crash if XPath is wrong
const riskyResult = document.evaluate('/non/existent/path', document, null, XPathResult.ANY_TYPE, null).iterateNext();

// Using try-catch to handle potential exceptions
try {
const safeResult = document.evaluate('/non/existent/path', document, null, XPathResult.ANY_TYPE, null).iterateNext();
} catch (error) {
console.error('XPath evaluation failed:', error);
}

// Complex XPath query slowing down performance
const slowResult = document.evaluate('//div[@class="example"]/ul/li[a/@href="#"]', document, null, XPathResult.ANY_TYPE, null);

// Simplified XPath for better performance
const fastResult = document.evaluate('//div[@class="example"]/ul/li', document, null, XPathResult.ANY_TYPE, null);