Best practices

  • Use specific and unique CSS selectors to accurately target the element from which you want to extract text.

  • When using innerText, remember it reflects the text as seen on the page, including handling styles that affect visibility, which is useful for scraping rendered text.

  • To extract raw text without any HTML tags, use textContent as it ignores styling and provides the content of the node and its descendants.

  • If you need to capture the HTML content and then remove tags, use innerHTML combined with a regular expression to strip out HTML tags, ensuring you get only the textual content.

Scrollable code block. Use arrow keys to scroll.

Common issues

  • Ensure that the page has fully loaded before attempting to extract text, as Puppeteer might try to access elements that aren't yet available on the DOM.

  • Handle potential null values returned from querySelector by checking if the element exists before attempting to access its properties to avoid runtime errors.

  • Consider using page.waitForSelector to ensure the element is present and avoid timing issues when trying to retrieve text.

  • Be aware of the differences between textContent, innerText, and innerHTML to choose the most appropriate method based on whether you need to consider the style or just extract raw data.

Scrollable code block. Use arrow keys to scroll.

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

JavaScript Web Scraping using Node.js & Puppeteer
adelina avatar

Adelina Kiskyte

2025-11-13

Puppeteer Tutorial: Scraping With a Headless Browser
Gabija Fatenaite avatar

Gabija Fatenaite

2025-07-30

Puppeteer vs Selenium: Which to Choose
author avatar

Yelyzaveta Hayrapetyan

2025-05-27

Get the latest news from data gathering world