Cheerio vs. Puppeteer: Which Should You Use for Web Scraping?

Shinthiya Nowsain Promi

Last updated on

2026-06-23

10 min read

AI Summary:

Cheerio is a fast, lightweight server-side HTML parser (jQuery-like) for static pages but can't run JavaScript, while Puppeteer drives a real Chromium instance to render dynamic content at the cost of speed and memory. The rule: if data appears in "View Source," use Cheerio; if it loads after JavaScript runs, use Puppeteer – and the two pair well, with Puppeteer rendering and Cheerio parsing.

Choosing the right tool for web scraping often comes down to one question: does the page you're targeting render its content with JavaScript? If the answer is no, you probably don't need a full browser. If the answer is yes, you do – and picking the wrong tool means either missing data entirely or burning unnecessary resources to get it.

Cheerio and Puppeteer are two of the most popular tools for JavaScript-based web scraping, but they solve fundamentally different problems. Cheerio is a fast, lightweight HTML parser that works like jQuery on the server side. Puppeteer is a browser automation tool that controls a real Chromium instance and can handle anything a human user could do in a browser. Both are excellent – in their respective lanes.

This overview comparison will cover:

What Cheerio and Puppeteer are and how they work under the hood
How they compare across performance, JavaScript support, and ease of use
When to reach for one over the other
How to use them together for a more efficient scraping pipeline

What is Cheerio?

Cheerio is a fast, server-side HTML parser for Node.js – a backend runtime environment – that implements a subset of jQuery's API. It takes raw HTML data as input, builds a consistent DOM model from it, and lets you query and extract data using familiar CSS selectors. As a DOM parser, it can also handle HTML or XML data, so the same approach works for XML files and feeds, not just web pages.

Crucially, Cheerio doesn't execute JavaScript, render pages, or load external resources like scripts and stylesheets. It only parses raw HTML you hand it. For fetching, use the built-in fromURL for simple cases, or pair it with an HTTP client (axios/fetch) when you need custom headers, cookies, or auth. That narrow focus is its biggest strength. Because there's no browser overhead, Cheerio is extremely fast and memory-efficient, making it well-suited for scraping static pages at scale.

Key Features of Cheerio

jQuery-like syntax – if you've written any frontend JavaScript, the $('selector').text() pattern will feel immediately familiar
No JavaScript execution – Cheerio only processes the HTML you provide; it won't run scripts, load external resources, or wait for dynamic content to load
An HTML and XML parser – it builds a consistent DOM model from HTML documents or XML data, so you can query both with the same API
Lightweight and fast – no browser binary, no rendering engine, and minimal memory usage compared to headless browser tools
Requires a separate HTTP client – Cheerio doesn't fetch web pages itself; you pair it with axios, node-fetch, or the native fetch API to retrieve the HTML file before parsing

What is Puppeteer?

Puppeteer is a Node.js browser automation library developed by Google that provides a high-level API for controlling Chromium (or Chrome) programmatically. Unlike Cheerio, Puppeteer launches a real browser instance – headless by default – which means it processes HTML, can execute JavaScript, loads external resources, handles cookies and sessions, and renders the page exactly as a user's browser would.

This makes Puppeteer capable of scraping content that only appears after JavaScript runs, including single-page applications, infinite scroll feeds, and web applications that require user interaction before data is visible. Because it drives a full browser engine, it can emulate users' behavior step by step.

Key features of Puppeteer

Headless Chrome/Chromium control – runs a full browser engine without a visible window, or with one if you need to debug visually
Full JavaScript execution – waits for scripts to run and the DOM to settle before you query it, handling dynamic content naturally
DOM interaction – can click buttons, fill forms, scroll pages, hover over elements, and navigate between web pages just like a real user
Flexible element targeting – locate target elements with CSS and XPath selectors against the live, rendered DOM
Screenshot and PDF generation – captures full-page screenshots or renders pages to PDF, useful for monitoring and archiving workflows

Cheerio vs. Puppeteer: key differences

At their core, Cheerio and Puppeteer aren't really competing for the same job. The Cheerio library is an HTML parser – it reads a markup string and gives you tools to query it. Puppeteer is a browser automation framework – it opens Chromium, loads a URL, and gives you programmatic control over everything that happens inside. Understanding this cheerio vs puppeteer comparison starts with recognizing that gap.

Feature	Cheerio	Puppeteer
Type	HTML parser / DOM parser	Browser automation
JavaScript execution	No	Yes (full V8 engine)
Speed	Very fast	Slower (browser overhead)
Memory usage	Low	High (~100–200 MB per instance)
Dynamic content	No	Yes
Installation size	Small (~MB)	Large (~300 MB with Chromium)
Fetching HTML	Built-in (fromURL) or any HTTP client	Built-in via browser
Loads external resources	No	Yes
Screenshots / PDF	No	Yes
Dynamic content	No	Yes
Selectors	CSS	CSS and XPath
Learning curve	Easy (low)	Moderate
Best for	Static HTML scraping	JavaScript-rendered pages, automation

Performance and speed

This is where the Cheerio vs. Puppeteer comparison is most lopsided. Cheerio wins by a significant margin – and it's not close.

When Cheerio fetches a page, it makes a single HTTP request and parses the response as a string. There's no browser to launch, no rendering pipeline to run, and no JavaScript engine to initialize. A Cheerio web scraper can process hundreds of pages per minute on modest hardware.

Puppeteer, by contrast, launches a full Chromium instance. Even in headless mode, that means allocating memory for a browser process, establishing a DevTools Protocol connection, waiting for the page to load, executing scripts, and waiting for the DOM to stabilize before you can query anything. Each new browser instance typically consumes 100–200 MB of memory, and startup alone adds hundreds of milliseconds of overhead.

For large scraping tasks – think thousands of product pages, news articles, or documentation pages – that difference compounds quickly. If you're running a Puppeteer scraper at scale, you'll need to manage a pool of browser instances carefully to avoid memory exhaustion. With Cheerio, you can fire off concurrent requests to scrape pages with far less infrastructure, which is exactly why it shines when scraping static pages.

The trade-off is capability, not a flaw in Puppeteer's design. Puppeteer is slow relative to Cheerio because it's doing incomparably more work. If your target page requires it, that overhead is unavoidable regardless of which tool you use.

JavaScript rendering and dynamic content

Cheerio has no JavaScript engine. It parses the HTML string it receives – nothing more. If a site uses React, Vue, Angular, or any framework that builds the DOM client-side, the raw HTML response will contain little more than a shell: a <div id="root"></div> and a bundle of script tags. Cheerio will parse that shell faithfully and find nothing useful in it. This isn't a bug; it's simply outside the scope of what the Cheerio JavaScript library is designed to do. It never performs JS rendering or touches dynamic elements – it returns a parsed version of exactly the raw HTML data you fed it.

Puppeteer handles this natively. Because it runs a real Chromium instance, the full page lifecycle plays out: HTML is parsed, scripts and other external resources are downloaded and executed, API calls are made, and the DOM is populated with real content. You control exactly when to query – after a specific element appears, after a network request completes, or after a fixed delay – using Puppeteer's built-in waitFor methods.

This distinction matters beyond single-page applications (SPAs) – web apps that load a single HTML shell and build all content dynamically in the browser using JavaScript frameworks like React, Vue, or Angular. Many modern websites and e-commerce sites lazy-load prices, reviews, or stock status via JavaScript after the initial HTML loads. Even sites that look static in a browser may deliver empty containers to a plain HTTP client. A quick way to check: open DevTools, disable JavaScript, and reload the page. If the data you need disappears, you're dealing with dynamic websites and need Puppeteer – or another headless browser tool like Playwright. If it's still there, Cheerio will handle it without issue.

Ease of use and learning curve

Cheerio has a very low barrier to entry and an easy learning curve. If you've used jQuery before, the API will feel immediately familiar – $('h1').text(), $('a').attr('href'), $('.price').each(...). Even without jQuery experience, the CSS selectors model is straightforward and well-documented. A working Cheerio scraper typically takes fewer than 20 lines of code.

Puppeteer requires a bit more to get right. The core concepts – launching a browser, opening a page, waiting for elements, querying the DOM for target elements – are simple enough, but real-world usage introduces complexity quickly. You need to think about when to query (before or after JavaScript runs), how to handle navigation and redirects, when to close the browser to avoid memory leaks, and how to manage async timing with waitForSelector or waitForFunction. None of this is difficult, but it requires more deliberate thinking than a Cheerio scraper does, and for extra-complex projects Puppeteer can take real planning.

That said, Puppeteer's API is well-designed and its documentation is thorough. It's nowhere near the steep learning curve of lower-level browser automation. Most developers with basic async JavaScript experience can get a working Puppeteer scraper running within an hour – moderate, but far gentler than the alternatives.

When to use Cheerio vs. Puppeteer

Cheerio:

Rule of thumb: if the data you need is visible when you right-click → View Source, use Cheerio. If it's in the raw HTML, there's no reason to spin up a browser.

Cheerio is the right choice for:

Static websites – blogs, news articles, documentation pages, and any site that delivers its full content in the initial HTML response
E-commerce product listings – when prices, titles, and SKUs are present in the page source rather than loaded dynamically
Price monitoring pipelines – high-frequency, lightweight requests across many web pages where speed and low resource usage matter
Large-volume crawlers – when you need to process thousands of URLs efficiently, or gather alternative data for research, without managing browser instances
Simple data extraction – pulling hrefs, table data, headings, or structured HTML where no interaction is required

Puppeteer:

Rule of thumb: if the data you need isn't in the raw HTML source – if it only appears after the page finishes loading – use Puppeteer for web scraping.

Puppeteer is the right choice for:

SPAs built with React, Vue, or Angular – where the entire UI is rendered client-side and the raw HTML is just an empty shell
Infinite scroll feeds – social media timelines, job listings, and product feeds that load new content as you scroll; Puppeteer is what you reach for to scrape infinite scrolling reliably
Authenticated pages – workflows that require login forms, session cookies, or multi-step authentication before data is accessible
Form-based interactions – search filters, dropdowns, or actions where you submit forms before results appear
Screenshot or PDF generation – capturing visual snapshots of web applications for monitoring, archiving, or reporting workflows

Can you use Cheerio and Puppeteer together?

Yes – and it's a legitimate, widely-used pattern. The two libraries complement each other well: Puppeteer handles the parts of the page lifecycle that require a real browser, and Cheerio takes over for the parsing work once the HTML is ready.

The typical three-step workflow looks like this:

Puppeteer fetches and renders the page – it launches Chromium, navigates to the URL, waits for JavaScript to execute and the target content to appear in the DOM, then extracts the fully rendered HTML via page.content().
Cheerio parses the HTML – that HTML string is passed directly to Cheerio, which loads it and gives you a familiar jQuery-like interface to query and extract the data you want.
You process and output the results – Cheerio returns the values you selected in a clean result data structure; from there you can log the scraped data, write to a file, push to a database, or pipe into the next stage of your pipeline.

The advantage of this pattern is that you get the best of both tools: Puppeteer handles JavaScript rendering and any required interactions, while Cheerio provides a cleaner, more concise querying API than Puppeteer's native DOM methods. It's also easier to test the parsing logic in isolation – you can save the rendered HTML once and run your Cheerio selectors against that parsed version repeatedly without relaunching the browser.

Cheerio and Puppeteer: web scraping code example

To see the cheerio - puppeteer combination in action, we'll build a web scraper that fetches the book listings from books.toscrape.com – one of the test websites built specifically for scraping practice. Puppeteer will handle the page load, and Cheerio will parse the HTML to extract data like book titles and prices.

1. Install the Dependencies

Start by initializing a Node.js project and installing both libraries with the node package manager. Run this inside a new project folder:

npm init -y
npm install puppeteer cheerio

Then open package.json and add "type": "module" to enable ESM import syntax throughout:

{
  "type": "module"
}

One install note worth knowing: Puppeteer doesn't bundle a browser inside the library – it downloads a matching Chrome binary via an install script that runs automatically right after npm install. Most modern package managers (pnpm, Yarn, Bun, Deno, and newer npm) now manage install scripts by default for security reasons. If yours does, the install will appear to succeed, but no browser is downloaded – and your scraper will later crash at runtime with a "Could not find Chrome" error.

If you hit that, download the browser manually after installing (the official site documents this too):

npx puppeteer browsers install chrome

With dependencies in place, create a new file called scraper.js inside your project folder – that's where the rest of the code will go.

2. Fetch the Page with Puppeteer

Puppeteer launches a headless Chromium browser, navigates to the target URL, waits for the page to fully load, and extracts the rendered HTML as a string. Because we enabled ESM, we can use top-level await directly; in a CommonJS setup you'd wrap this logic inside an async function.

import puppeteer from 'puppeteer';
import * as cheerio from 'cheerio';

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://books.toscrape.com', { waitUntil: 'domcontentloaded' });
const html = await page.content();
await browser.close();

waitUntil: 'domcontentloaded' tells Puppeteer to proceed once the HTML is parsed and the DOM is ready – appropriate here since books.toscrape.com is a static site. For JavaScript-heavy pages, use 'networkidle0' instead to wait for all network activity to settle.

3. Parse the HTML with Cheerio

Pass the HTML string from Puppeteer directly into Cheerio's load() function. From there, use CSS selectors to extract the data you need – in this case, each book's title and price.

const $ = cheerio.load(html);
const books = [];

$('article.product_pod').each((i, el) => {
  const title = $(el).find('h3 a').attr('title');
  const price = $(el).find('.price_color').text().trim();
  books.push({ title, price });
});

article.product_pod matches each book card on the page. .find() drills into each card to pull the title from the <a> tag's title attribute and the price from the .price_color element.

4. Output the Results

With the data collected, log it to the console or write it to a file.

console.log(`Found ${books.length} books:\n`);
books.forEach(book => {
  console.log(`${book.title} — ${book.price}`);
});

Running node scraper.js should output all 20 books listed on the homepage, each with its title and price. Here's the full code:

import puppeteer from 'puppeteer';
import * as cheerio from 'cheerio';

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://books.toscrape.com', { waitUntil: 'domcontentloaded' });
const html = await page.content();
await browser.close();

const $ = cheerio.load(html);
const books = [];

$('article.product_pod').each((i, el) => {
  const title = $(el).find('h3 a').attr('title');
  const price = $(el).find('.price_color').text().trim();
  books.push({ title, price });
});

console.log(`Found ${books.length} books:\n`);
books.forEach(book => {
  console.log(`${book.title} — ${book.price}`);
});

Wrapping up

Cheerio and Puppeteer solve different problems, and knowing which to reach for comes down to one thing: whether the data you need exists in the raw HTML or only after JavaScript runs. For static pages, Cheerio is the faster, lighter, and simpler choice. For dynamic content, authenticated sessions, or anything requiring you to scrape dynamic pages with real browser interaction, Puppeteer is the right tool.

The good news is you don't always have to choose. As the code example above shows, combining the two into a single pipeline gives you the rendering power of a full browser with the clean parsing ergonomics of a jQuery-like API – a pattern that scales well for real-world projects.

If you’re scraping at scale or targeting sites with aggressive bot management & protection, managing proxies and browser fingerprinting yourself can become a project in its own right. In those cases, it’s worth looking at purpose-built solutions like the Oxylabs Web Scraper API, which handles JavaScript rendering, IP rotation, and managing CAPTCHA out of the box – so you can focus on the data rather than the infrastructure.

For more on the broader scraping ecosystem, check out our comparison of the best JavaScript web scraping libraries, Scrapy vs. Puppeteer, or dive deeper with our Puppeteer tutorial.

Frequently asked questions

How is Cheerio web scraping different from Puppeteer?

Cheerio is an HTML parser – it takes an HTML string and lets you query it with CSS selectors, similar to jQuery. It doesn't open a browser or execute JavaScript. Puppeteer is a browser automation library that controls a real Chromium instance, executes JavaScript, and can interact with web pages the way a human would. The key difference is that Cheerio works only with static HTML, while Puppeteer handles dynamic, JavaScript-rendered content.

Can Cheerio scrape JavaScript-rendered websites?

Should you use Cheerio or Puppeteer for web scraping?

It depends on the target website. Use Cheerio when the data you need is present in the raw HTML source – it's faster, lighter, and simpler to set up. Use Puppeteer when the page requires JavaScript execution to render its content, or when you need to interact with the page (log in, click buttons, scroll) before the data appears. When in doubt, right-click the page and choose View Source – if the data is there, Cheerio will work.

When should you use Puppeteer instead of Cheerio?

Use Puppeteer when the content you need isn't in the raw HTML. Common cases include single-page applications built with React, Vue, or Angular; pages that load data via API calls after the initial HTML loads; sites that require login or session management; and workflows that involve clicking, scrolling, or submitting forms before results appear. Puppeteer is also the right choice when you need to generate screenshots or PDFs of rendered pages.

Is Cheerio faster than Puppeteer?

Yes, significantly. Cheerio makes a single HTTP request and parses the raw HTML response as a string – there's no browser to launch, no rendering engine, and no JavaScript to execute. Puppeteer launches a full Chromium instance, which adds hundreds of milliseconds of startup time and consumes 100–200 MB of memory per instance. For high-volume scraping of static pages, Cheerio can process hundreds of pages per minute with minimal resource usage, while Puppeteer requires careful instance management to avoid memory exhaustion at scale.

About the author

Shinthiya Nowsain Promi

Technical Content Researcher

With a background in Computer Science, Shinthiya likes to turn technical jargons into clear, perspective-driven writing that rewards a reader's time rather than wasting it.

Learn more about the author Shinthiya Nowsain Promi Learn more about the author Shinthiya Nowsain Promi

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.