Use specific and unique class names to ensure accurate selection and avoid confusion with other elements.
When targeting multiple classes, ensure they are correctly concatenated without spaces in the selector string to match elements with all specified classes.
Regularly update and test your selectors if the source website changes its layout or class naming conventions to maintain the accuracy of your data extraction.
Utilize Cheerio's built-in functions like .text() or .html() to directly extract and manipulate the content of selected elements based on your requirements.
// npm install axios cheerio const axios = require('axios'); const cheerio = require('cheerio'); // Define the URL to scrape const url = 'https://sandbox.oxylabs.io/products/1'; // Function to fetch HTML and find elements by class const fetchAndParse = async () => { try { // Fetch the page const response = await axios.get(url); const html = response.data; // Load HTML into cheerio const $ = cheerio.load(html); // Find elements by class using .className const elements = $('.title'); // Display text of elements elements.each(function() { console.log($(this).text()); }); // Alternative: Get multiple classes const multiClassElements = $('.css-13df51w.e1knbtv71'); console.log(multiClassElements.html()); // Display HTML of inner elements } catch (error) { console.error('Error fetching data:', error); } }; fetchAndParse();
Ensure that the class names used in your selectors are case-sensitive and match exactly with those in the HTML to avoid missing elements.
Debug issues where no elements are returned by logging the entire fetched HTML to verify that the page has fully loaded and contains the expected classes.
Avoid using overly broad class names that might return more elements than intended, which can lead to performance issues or incorrect data scraping.
When using Cheerio, remember that it operates on a static snapshot of the HTML, so dynamic changes made by JavaScript after page load won't be reflected.
// Incorrect class name casing const elements = $('.Product-card'); // Might return no elements if class name in HTML is 'product-item' // Correct class name casing const elements = $('.product-card'); // Correctly matches the class in HTML // Debugging by logging HTML const html = response.data; // Check if the HTML contains the expected classes before parsing // If not, you may need to use a headless browser to render all the data console.log(html); // Using overly broad class names const elements = $('.title'); // Might return unrelated elements // Using specific class names const elements = $('.title.css-1k75zwy'); // Only targets the main product
Web scraper API
Public data delivery from a majority of websites
From
49
Get the latest news from data gathering world
Scale up your business with Oxylabs®
Proxies
Advanced proxy solutions
Data Collection
Datasets
Resources
Innovation hub