How to Use a Proxy in Puppeteer

While web scraping and automation tools have developed significantly in recent years, handling dynamic websites has become a breeze. Headless browsers without a graphical user interface offer an efficient way of collecting public data, as you can control them programmatically. If combined with proxy servers, they're even better.

In this guide, we'll go through the Puppeteer integration process with Oxylabs Residential Proxies and provide an example.

How to Use a Proxy in Puppeteer

Click the video below if you'd like to see the integration process on YouTube:

How to integrate Oxylabs' proxies with Puppeteer? 

Step 1. Install the required tools. 

Before getting started with Puppeteer proxy server integration, you'll need to install some basic tools: Node.js and a code editor of your choice. After that, create a Node.js project and install the required packages. You may find a detailed guide on installing and running Puppeteer in our blog post and the official Puppeteer page. 

Once everything is set up, we can move on to the next part – Oxylabs' Residential Proxies integration with Puppeteer.

Step 2. Enter the value.

Within Puppeteer, fill in the value (proxy host:port). The value for Residential Proxies is as follows:

pr.oxylabs.io:7777

Step 3. Authenticate the proxy

We will look at two different authentication methods.

Using the authenticate() method

Under page.authenticate, input your Oxylabs proxy sub-user's username and password. The example of a code looks like this:

const puppeteer = require('puppeteer');

(async () => {

  const browser = await puppeteer.launch({

    headless: false,

    args: ['--proxy-server=pr.oxylabs.io:7777]  

});

  const page = await browser.newPage();

    await page.authenticate({

        username: 'USERNAME',

        password: 'PASSWORD'

    });

    await page.goto('https://ip.oxylabs.io');

    await page.screenshot({path: 'example.png'});

    await browser.close();

})();

Using the proxy-chain package

Alternatively, you can use another integration method - proxy-chain. It is an open-source package developed by Apify that offers a feature to “anonymize” an authenticated proxy. 

To understand how the proxy-chain method works, remember that Chrome and Chromium do not support proxy URLs that include usernames and passwords, such as http://USER:PASSWORD@pr.oxylabs.io:7777.  

These browsers only support proxy URLs without usernames and passwords, such as http://pr.oxylabs.io:7777. You would need to handle the authentication separately.

Proxy-chain, on the other hand, works well with proxy URLs that include username and password, eliminating the need for handling the authentication separately.

The most crucial method of proxy-chain is anonymizeProxy(). This method uses the proxy URL that contains the username and password as follows:

await proxyChain.anonymizeProxy('http://USER:PASSWORD@pr.oxylabs.io:7777' 
);

The anonymizeProxy() method starts a local proxy server. The local proxy servers sit between proxy servers and Chromium, seamlessly handling authentication.

Naturally, at the end of your script, you must call the close of the anonymizeProxy() method, which closes the local proxy server.

To add proxy-chain into your project, open the terminal and enter the following command:

npm install proxy-chain

In the code, create an anonymized proxy:

const anonymizedProxy = await proxyChain.anonymizeProxy(proxy);

Send this anonymized proxy to Puppeteer as a launch argument in the following format:

await puppeteer.launch({
    args: [`--proxy-server=${anonymizedProxy}`],
  });

That’s all you need to do to use a proxy with authentication. 

The following code block shows everything put together:

const puppeteer = require("puppeteer");
const proxyChain = require("proxy-chain");

const proxyServer = "pr.oxylabs.io:7777";
const username = "proxy-user-name"; 
const password = " proxy-password";

(async () => {
  const proxy = `http://${username}:${password}@${proxyServer}`;

  const anonymizedProxy = await proxyChain.anonymizeProxy(proxy);

  const browser = await puppeteer.launch({
    args: [`--proxy-server=${anonymizedProxy}`],
  });

  const page = await browser.newPage();

  const response = await page.goto("https://ip.oxylabs.io");
  console.log(await response.text());

  await browser.close();
  await proxyChain.closeAnonymizedProxy(anonymizedProxy, true);
})();

Which method is better?

If all you need is authentication, it really does not matter which method you use. Both the native page.authenticate(), and the proxy-chain package will help you work with proxies that require authentication.

However, the proxy-chain package offers a few advanced features, such as custom error messages, custom responses,  measuring traffic statistics, etc. You can explore the official page to learn more.

Step 4. Use country-specific entries.

If needed, you can also use country-specific entries. For example, if you put us-pr.oxylabs.io under Host and 10000 under Port, you'll receive a US exit node. Please check our documentation for a complete list of country-specific entry nodes or if you need a sticky session. 

And that's it! You've successfully integrated Oxylabs' Residential proxies with Puppeteer.

Most common issues

Let's take a quick look at the most common problems while integrating proxy servers with Puppeteer and potential solutions.

Puppeteer returning an error

One of the issues you might encounter is having Puppeteer return an error when you’re trying to connect to a proxy server. The reasons for that may be diverse, with the main one being your proxy server requiring authentication with the proxy credentials. 

To solve this problem, you'll need to enter the username and password for your proxy server in the Puppeteeroptions object. After that, try using a proxy again.

The page isn't loading

Even though Puppeteer might be able to connect to the proxy, it still may have trouble loading the page. The most common reason for it's a poor internet connection. 

The most common method to fix this issue is to increase the timeout in the Puppeteeroptions object. Once done, using a proxy should no longer be a problem.

Other issues

If the issue remains unresolved, it means that, for some reason, your proxy fails to maintain the connection. In this case, you should check your proxy server or contact your provider for further assistance. Also, avoiding free public proxies is highly recommended as they tend to demonstrate low-quality performance.

Conclusion

Puppeteer, in combination with Oxylabs' Residential Proxy servers, could be of great help when it comes to scraping public data from dynamic websites. If you found this guide relevant, you might also want to check the Puppeteer on AWS Lambda blog post. 

In case you have any queries about integrating Oxylabs proxies with Puppeteer, please get in touch with us at any time.

Please be aware that this is a third-party tool not owned or controlled by Oxylabs. Each third-party provider is responsible for its own software and services. Consequently, Oxylabs will have no liability or responsibility to you regarding those services. Please carefully review the third party's policies and practices and/or conduct due diligence before accessing or using third-party services.

Frequently asked questions

What is Puppeteer?

Puppeteer is an easy-to-use and powerful library for Node.js, mostly used for automating tests and various tasks utilizing the Chromium browser engine. It runs headless by default but can be configured to run full (non-headless) Chrome or Chromium. Puppeteer can do anything a standard browser can do, so it's hugely beneficial for building web scrapers as well. For example, it can help reach the web page's HTML and imitate standard user behavior, such as scrolling through the page.

Get the latest news from data gathering world

I'm interested

Get Puppeteer proxies for $15/GB