Best practices

  • Use specific and unique class names to ensure accurate selection and avoid confusion with other elements.

  • When targeting multiple classes, ensure they are correctly concatenated without spaces in the selector string to match elements with all specified classes.

  • Regularly update and test your selectors if the source website changes its layout or class naming conventions to maintain the accuracy of your data extraction.

  • Utilize Cheerio's built-in functions like `.text()` or `.html()` to directly extract and manipulate the content of selected elements based on your requirements.

// Import the necessary library
const axios = require('axios');
const cheerio = require('cheerio');

// Define the URL to scrape
const url = 'https://sandbox.oxylabs.io/products';

// Function to fetch HTML and find elements by class
const fetchAndParse = async () => {
try {
// Fetch the page
const response = await axios.get(url);
const html = response.data;

// Load HTML into cheerio
const $ = cheerio.load(html);

// Find elements by class using .className
const elements = $('.product-item');
console.log(elements.text()); // Display text of elements

// Alternative: Get multiple classes
const multiClassElements = $('.product-item.active');
console.log(multiClassElements.html()); // Display HTML of elements

} catch (error) {
console.error('Error fetching data:', error);
}
};

// Call the function
fetchAndParse();

Common issues

  • Ensure that the class names used in your selectors are case-sensitive and match exactly with those in the HTML to avoid missing elements.

  • Debug issues where no elements are returned by logging the entire fetched HTML to verify that the page has fully loaded and contains the expected classes.

  • Avoid using overly broad class names that might return more elements than intended, which can lead to performance issues or incorrect data scraping.

  • When using Cheerio, remember that it operates on a static snapshot of the HTML, so dynamic changes made by JavaScript after page load won't be reflected.

// Incorrect class name casing
const elements = $('.Product-item'); // Might return no elements if class name in HTML is 'product-item'

// Correct class name casing
const elements = $('.product-item'); // Correctly matches the class in HTML

// Debugging by logging HTML
const html = response.data;
console.log(html); // Check if the HTML contains the expected classes before parsing

// Using overly broad class names
const elements = $('.item'); // Might return unrelated elements, not just product items

// Using specific class names
const elements = $('.product-item'); // Specifically targets only product item elements

// Static HTML assumption
const dynamicContent = $('#dynamic-content').text(); // May not reflect updates made by JavaScript

// Reminder: Cheerio parses static HTML
console.log('Remember: Cheerio does not execute JavaScript');

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

Get the latest news from data gathering world

I'm interested