Best practices

  • Use specific and unique class names to ensure accurate selection and avoid confusion with other elements.

  • When targeting multiple classes, ensure they are correctly concatenated without spaces in the selector string to match elements with all specified classes.

  • Regularly update and test your selectors if the source website changes its layout or class naming conventions to maintain the accuracy of your data extraction.

  • Utilize Cheerio's built-in functions like .text() or .html() to directly extract and manipulate the content of selected elements based on your requirements.

// npm install axios cheerio
const axios = require('axios');
const cheerio = require('cheerio');

// Define the URL to scrape
const url = 'https://sandbox.oxylabs.io/products/1';

// Function to fetch HTML and find elements by class
const fetchAndParse = async () => {
  try {
    // Fetch the page
    const response = await axios.get(url);
    const html = response.data;

    // Load HTML into cheerio
    const $ = cheerio.load(html);

    // Find elements by class using .className
    const elements = $('.title');

    // Display text of elements
    elements.each(function() {
      console.log($(this).text());
    });

    // Alternative: Get multiple classes
    const multiClassElements = $('.css-13df51w.e1knbtv71');
    console.log(multiClassElements.html()); // Display HTML of inner elements

  } catch (error) {
    console.error('Error fetching data:', error);
  }
};

fetchAndParse();

Common issues

  • Ensure that the class names used in your selectors are case-sensitive and match exactly with those in the HTML to avoid missing elements.

  • Debug issues where no elements are returned by logging the entire fetched HTML to verify that the page has fully loaded and contains the expected classes.

  • Avoid using overly broad class names that might return more elements than intended, which can lead to performance issues or incorrect data scraping.

  • When using Cheerio, remember that it operates on a static snapshot of the HTML, so dynamic changes made by JavaScript after page load won't be reflected.

// Incorrect class name casing
const elements = $('.Product-card'); // Might return no elements if class name in HTML is 'product-item'

// Correct class name casing
const elements = $('.product-card'); // Correctly matches the class in HTML


// Debugging by logging HTML
const html = response.data;

// Check if the HTML contains the expected classes before parsing
// If not, you may need to use a headless browser to render all the data
console.log(html);


// Using overly broad class names
const elements = $('.title'); // Might return unrelated elements

// Using specific class names
const elements = $('.title.css-1k75zwy'); // Only targets the main product

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

How to Find Elements With Selenium in Python
Enrika avatar

Enrika Pavlovskytė

2024-06-21

Web Scraping With Java
Maryia Stsiopkina avatar

Maryia Stsiopkina

2023-09-28

XPath vs CSS Selectors
Monika Maslauskaite avatar

Monika Maslauskaite

2021-07-13

Get the latest news from data gathering world

I'm interested