Use specific and unique class names to ensure accurate selection and avoid confusion with other elements.
When targeting multiple classes, ensure they are correctly concatenated without spaces in the selector string to match elements with all specified classes.
Regularly update and test your selectors if the source website changes its layout or class naming conventions to maintain the accuracy of your data extraction.
Utilize Cheerio's built-in functions like .text() or .html() to directly extract and manipulate the content of selected elements based on your requirements.
// npm install axios cheerio
const axios = require('axios');
const cheerio = require('cheerio');
// Define the URL to scrape
const url = 'https://sandbox.oxylabs.io/products/1';
// Function to fetch HTML and find elements by class
const fetchAndParse = async () => {
try {
// Fetch the page
const response = await axios.get(url);
const html = response.data;
// Load HTML into cheerio
const $ = cheerio.load(html);
// Find elements by class using .className
const elements = $('.title');
// Display text of elements
elements.each(function() {
console.log($(this).text());
});
// Alternative: Get multiple classes
const multiClassElements = $('.css-13df51w.e1knbtv71');
console.log(multiClassElements.html()); // Display HTML of inner elements
} catch (error) {
console.error('Error fetching data:', error);
}
};
fetchAndParse();Ensure that the class names used in your selectors are case-sensitive and match exactly with those in the HTML to avoid missing elements.
Debug issues where no elements are returned by logging the entire fetched HTML to verify that the page has fully loaded and contains the expected classes.
Avoid using overly broad class names that might return more elements than intended, which can lead to performance issues or incorrect data scraping.
When using Cheerio, remember that it operates on a static snapshot of the HTML, so dynamic changes made by JavaScript after page load won't be reflected.
// Incorrect class name casing
const elements = $('.Product-card'); // Might return no elements if class name in HTML is 'product-item'
// Correct class name casing
const elements = $('.product-card'); // Correctly matches the class in HTML
// Debugging by logging HTML
const html = response.data;
// Check if the HTML contains the expected classes before parsing
// If not, you may need to use a headless browser to render all the data
console.log(html);
// Using overly broad class names
const elements = $('.title'); // Might return unrelated elements
// Using specific class names
const elements = $('.title.css-1k75zwy'); // Only targets the main product


Get the latest news from data gathering world
Scale up your business with Oxylabs®
Proxies
Advanced proxy solutions
Data Collection
Datasets
Resources
Innovation hub