Best practices

  • Use specific and unique CSS selectors to accurately target the element from which you want to extract text.

  • When using innerText, remember it reflects the text as seen on the page, including handling styles that affect visibility, which is useful for scraping rendered text.

  • To extract raw text without any HTML tags, use textContent as it ignores styling and provides the content of the node and its descendants.

  • If you need to capture the HTML content and then remove tags, use innerHTML combined with a regular expression to strip out HTML tags, ensuring you get only the textual content.

// npm init -y
// npm install puppeteer 

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://sandbox.oxylabs.io/products');

    // Get text using textContent
    const text1 = await page.evaluate(
        () => document.querySelector('h4').textContent
    );
    console.log(text1);

    // Get text using innerText
    const text2 = await page.evaluate(
        () => document.querySelector('h4').innerText
    );
    console.log(text2);

    // Get text using innerHTML and strip tags
    const text3 = await page.evaluate(
        () => document.querySelector('h4').innerHTML.replace(/<[^>]*>?/gm, '')
    );
    console.log(text3);

    await browser.close();
})();

Common issues

  • Ensure that the page has fully loaded before attempting to extract text, as Puppeteer might try to access elements that aren't yet available on the DOM.

  • Handle potential null values returned from querySelector by checking if the element exists before attempting to access its properties to avoid runtime errors.

  • Consider using page.waitForSelector to ensure the element is present and avoid timing issues when trying to retrieve text.

  • Be aware of the differences between textContent, innerText, and innerHTML to choose the most appropriate method based on whether you need to consider the style or just extract raw data.

// Incorrect: Trying to get text without ensuring the page has fully loaded
const example1 = await page.evaluate(
    // '?' ensures no error is raised if an element isn't found
    () => document.querySelector('.in-stock')?.textContent || 'No element found'
);
console.log(example1);

// Correct: Ensure the page is fully loaded
await page.waitForSelector('.in-stock');
const example1_fix = await page.evaluate(
    () => document.querySelector('.in-stock').textContent
);
console.log(example1_fix)



// Incorrect: Using waitForSelector without specifying options, might not wait enough
await page.waitForSelector('.in-stock');
const example2 = await page.evaluate(
    () => document.querySelector('.in-stock').innerText
);
console.log(example2);

// Correct: Use waitForSelector with options to handle timing properly
await page.waitForSelector('.in-stock', { visible: true, timeout: 3000 });
const example2_fix = await page.evaluate(
    () => document.querySelector('.in-stock').innerText
);
console.log(example2_fix);



// Incorrect: Using innerHTML when needing text, might include unwanted HTML tags
const example3 = await page.evaluate(
    () => document.querySelector('.category').innerHTML
);
console.log(example3);

// Correct: Use textContent or innerText based on need (style consideration or not)
const example3_fix = await page.evaluate(
    () => document.querySelector('.category').innerText
);
console.log(example3_fix);

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

Puppeteer vs Selenium: Which to Choose
author avatar

Yelyzaveta Hayrapetyan

2025-05-27

Web Scraping in JavaScript With Node.js & Puppeteer
adelina avatar

Adelina Kiskyte

2024-10-29

Puppeteer Tutorial: Scraping With a Headless Browser
Gabija Fatenaite avatar

Gabija Fatenaite

2022-03-09

Get the latest news from data gathering world

I'm interested