How to Use a Proxy in Puppeteer

While web scraping and automation tools have developed significantly in recent years, handling dynamic websites has become a breeze. Headless browsers without a graphical user interface offer an efficient way of collecting public data, as you can control them programmatically. If combined with proxy servers, they're even better. In this article, we'll go through the Puppeteer integration process with Oxylabs' Residential Proxies and provide an example.

Get Residential Proxies

How to integrate Oxylabs' proxies with Puppeteer? 

Step 1. Install the required tools. 

Before getting started with Puppeteer proxy server integration, you'll need to install some basic tools: Node.js and a code editor of your choice. After that, create a Node.js project and install the required packages. You may find a detailed guide on installing and running Puppeteer in our blog post and the official Puppeteer page. 

Once everything is set up, we can move on to the next part – Oxylabs' Residential Proxies integration with Puppeteer.

Step 2. Enter the value.

Within Puppeteer, fill in the value, for example:

pr.oxylabs.io:7777

Step 3. Authenticate the proxy

We will look at two different authentication methods.

Using the authenticate() method

Under 'page.authenticate', input your Oxylabs' proxy server sub-user username in the 'username' value and your password. The example of a code looks like this:

const puppeteer = require('puppeteer');

(async () => {

  const browser = await puppeteer.launch({

    headless: false,

    args: ['--proxy-server=pr.oxylabs.io:7777]  

});

  const page = await browser.newPage();

    await page.authenticate({

        username: 'USERNAME',

        password: 'PASSWORD'

    });

    await page.goto('https://ip.oxylabs.io');

    await page.screenshot({path: 'example.png'});

    await browser.close();

})();

Using the proxy-chain package

Alternatively, you can use another integration method - proxy-chain. It is an open-source package developed by Apify that offers a feature to “anonymize” an authenticated proxy. 

To understand how the proxy-chain method works, remember that Chrome and Chromium do not support proxy URLs that include usernames and passwords, such as http://USER:PASSWORD@pr.oxylabs.io:7777.  

These browsers only support proxy URLs without usernames and passwords, such as http://pr.oxylabs.io:7777. You would need to handle the authentication separately.

Proxy-chain, on the other hand, works well with proxy URLs that include username and password, eliminating the need for handling the authentication separately.

The most crucial method of proxy-chain is anonymizeProxy(). This method uses the proxy URL that contains the username and password as follows:

await proxyChain.anonymizeProxy('http://USER:PASSWORD@pr.oxylabs.io:7777' 
);

The anonymizeProxy() method starts a local proxy server. The local proxy servers sit between proxy servers and Chromium, seamlessly handling authentication.

Naturally, at the end of your script, you must call the close of the anonymizeProxy() method, which closes the local proxy server.

To add proxy-chain into your project, open the terminal and enter the following command:

npm install proxy-chain

In the code, create an anonymized proxy.

const anonymizedProxy = await proxyChain.anonymizeProxy(proxy);

Send this anonymized proxy to Puppeteer as a launch argument in the following format:

await puppeteer.launch({
    args: [`--proxy-server=${anonymizedProxy}`],
  });

That’s all you need to do to use a proxy with authentication. 

The following code block shows everything put together.

const puppeteer = require("puppeteer");
const proxyChain = require("proxy-chain");

const proxyServer = "pr.oxylabs.io:7777";
const username = "proxy-user-name"; 
const password = " proxy-password";

(async () => {
  const proxy = `http://${username}:${password}@${proxyServer}`;

  const anonymizedProxy = await proxyChain.anonymizeProxy(proxy);

  const browser = await puppeteer.launch({
    args: [`--proxy-server=${anonymizedProxy}`],
  });

  const page = await browser.newPage();

  const response = await page.goto("https://ip.oxylabs.io");
  console.log(await response.text());

  await browser.close();
  await proxyChain.closeAnonymizedProxy(anonymizedProxy, true);
})();

Which method is better?

If all you need is authentication, it really does not matter which method you use. Both the native page.authenticate(), and the proxy-chain package will help you work with proxies that require authentication.

However, the proxy-chain package offers a few advanced features, such as custom error messages, custom responses,  measuring traffic statistics, etc. You can explore the official page to learn more.

Step 4. Use country-specific entries.

If needed, you can also use country-specific entries. For example, if you put us-pr.oxylabs.io under 'host' and 10001 under 'port', you'll receive a US exit node with a sticky session. Please check out our documentation for a complete list of country-specific entry nodes. 

And that's it! You've successfully integrated Oxylabs' Residential proxies with Puppeteer.

Most common issues

Let's take a quick look at the most common problems while integrating proxy servers with Puppeteer and potential solutions.

Puppeteer returning an error

One of the issues you might encounter is having Puppeteer return an error when you’re trying to connect to a proxy server. The reasons for that may be diverse, with the main one being your proxy server requiring authentication with the proxy credentials. 

To solve this problem, you'll need to enter the username and password for your proxy server in the Puppeteeroptions object. After that, try using a proxy again.

The page isn't loading

Even though Puppeteer might be able to connect to the proxy, it still may have trouble loading the page. The most common reason for it's a poor internet connection. 

The most common method to fix this issue is to increase the timeout in the Puppeteeroptions object. Once done, using a proxy should no longer be a problem.

Other issues

If the issue remains unresolved, it means that, for some reason, your proxy fails to maintain the connection. In this case, you should check your proxy server or contact your provider for further assistance. Also, avoiding free public proxies is highly recommended as they tend to demonstrate low-quality performance.

Conclusion

Puppeteer, in combination with Oxylabs Residential Proxy servers, might be of great help when it comes to scraping public data from dynamic websites. If you found this article interesting, you should also check out the Puppeteer on AWS Lambda blog post. 

In case you have any queries about integrating Oxylabs proxies with Puppeteer, please get in touch with us at any time.

Please be aware that this is a third-party tool not owned or controlled by Oxylabs. Each third-party provider is responsible for its own software and services. Consequently, Oxylabs will have no liability or responsibility to you regarding those services. Please carefully review the third-party's policies and practices and/or conduct due diligence before accessing or using third-party services.

Frequently asked questions

What is Puppeteer?

Puppeteer is a Node.js library offering a top-level API for headless Chrome or Chromium browsers' control through the DevTools Protocol. Puppeteer is also known as a Headless Chrome Node API and is used for automating Chrome to run website tests.

Get the latest news from data gathering world

I'm interested

Get Puppeteer Proxies For $15/GB