How to Get Text from Element in Puppeteer?

Best practices

Use specific and unique CSS selectors to accurately target the element from which you want to extract text.
When using innerText, remember it reflects the text as seen on the page, including handling styles that affect visibility, which is useful for scraping rendered text.
To extract raw text without any HTML tags, use textContent as it ignores styling and provides the content of the node and its descendants.
If you need to capture the HTML content and then remove tags, use innerHTML combined with a regular expression to strip out HTML tags, ensuring you get only the textual content.

// npm init -y
// npm install puppeteer 

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://sandbox.oxylabs.io/products');

    // Get text using textContent
    const text1 = await page.evaluate(
        () => document.querySelector('h4').textContent
    );
    console.log(text1);

    // Get text using innerText
    const text2 = await page.evaluate(
        () => document.querySelector('h4').innerText
    );
    console.log(text2);

    // Get text using innerHTML and strip tags
    const text3 = await page.evaluate(
        () => document.querySelector('h4').innerHTML.replace(/<[^>]*>?/gm, '')
    );
    console.log(text3);

    await browser.close();
})();

Common issues

Ensure that the page has fully loaded before attempting to extract text, as Puppeteer might try to access elements that aren't yet available on the DOM.
Handle potential null values returned from querySelector by checking if the element exists before attempting to access its properties to avoid runtime errors.
Consider using page.waitForSelector to ensure the element is present and avoid timing issues when trying to retrieve text.
Be aware of the differences between textContent, innerText, and innerHTML to choose the most appropriate method based on whether you need to consider the style or just extract raw data.

// Incorrect: Trying to get text without ensuring the page has fully loaded
const example1 = await page.evaluate(
    // '?' ensures no error is raised if an element isn't found
    () => document.querySelector('.in-stock')?.textContent || 'No element found'
);
console.log(example1);

// Correct: Ensure the page is fully loaded
await page.waitForSelector('.in-stock');
const example1_fix = await page.evaluate(
    () => document.querySelector('.in-stock').textContent
);
console.log(example1_fix)



// Incorrect: Using waitForSelector without specifying options, might not wait enough
await page.waitForSelector('.in-stock');
const example2 = await page.evaluate(
    () => document.querySelector('.in-stock').innerText
);
console.log(example2);

// Correct: Use waitForSelector with options to handle timing properly
await page.waitForSelector('.in-stock', { visible: true, timeout: 3000 });
const example2_fix = await page.evaluate(
    () => document.querySelector('.in-stock').innerText
);
console.log(example2_fix);



// Incorrect: Using innerHTML when needing text, might include unwanted HTML tags
const example3 = await page.evaluate(
    () => document.querySelector('.category').innerHTML
);
console.log(example3);

// Correct: Use textContent or innerText based on need (style consideration or not)
const example3_fix = await page.evaluate(
    () => document.querySelector('.category').innerText
);
console.log(example3_fix);