How to find element by class?

Discover the essentials of locating elements by class in HTML documents. This guide provides straightforward steps and practical advice to efficiently identify and extract class-based data, enhancing your data gathering techniques.

Best practices

  • Use specific and unique class names to ensure accurate selection and avoid confusion with other elements.

  • When targeting multiple classes, ensure they are correctly concatenated without spaces in the selector string to match elements with all specified classes.

  • Regularly update and test your selectors if the source website changes its layout or class naming conventions to maintain the accuracy of your data extraction.

  • Utilize Cheerio's built-in functions like `.text()` or `.html()` to directly extract and manipulate the content of selected elements based on your requirements.

// Import the necessary library
const axios = require('axios');
const cheerio = require('cheerio');

// Define the URL to scrape
const url = 'https://sandbox.oxylabs.io/products';

// Function to fetch HTML and find elements by class
const fetchAndParse = async () => {
try {
// Fetch the page
const response = await axios.get(url);
const html = response.data;

// Load HTML into cheerio
const $ = cheerio.load(html);

// Find elements by class using .className
const elements = $('.product-item');
console.log(elements.text()); // Display text of elements

// Alternative: Get multiple classes
const multiClassElements = $('.product-item.active');
console.log(multiClassElements.html()); // Display HTML of elements

} catch (error) {
console.error('Error fetching data:', error);
}
};

// Call the function
fetchAndParse();

Common issues

  • Ensure that the class names used in your selectors are case-sensitive and match exactly with those in the HTML to avoid missing elements.

  • Debug issues where no elements are returned by logging the entire fetched HTML to verify that the page has fully loaded and contains the expected classes.

  • Avoid using overly broad class names that might return more elements than intended, which can lead to performance issues or incorrect data scraping.

  • When using Cheerio, remember that it operates on a static snapshot of the HTML, so dynamic changes made by JavaScript after page load won't be reflected.

// Incorrect class name casing
const elements = $('.Product-item'); // Might return no elements if class name in HTML is 'product-item'

// Correct class name casing
const elements = $('.product-item'); // Correctly matches the class in HTML

// Debugging by logging HTML
const html = response.data;
console.log(html); // Check if the HTML contains the expected classes before parsing

// Using overly broad class names
const elements = $('.item'); // Might return unrelated elements, not just product items

// Using specific class names
const elements = $('.product-item'); // Specifically targets only product item elements

// Static HTML assumption
const dynamicContent = $('#dynamic-content').text(); // May not reflect updates made by JavaScript

// Reminder: Cheerio parses static HTML
console.log('Remember: Cheerio does not execute JavaScript');

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

Get the latest news from data gathering world

I'm interested