While web scraping and automation tools have developed significantly in recent years, handling dynamic websites has become a breeze. Headless browsers without a graphical user interface offer an efficient way of collecting public data, as you can control them programmatically. If combined with proxy servers, they're even better.
In this guide, we'll go through the Puppeteer integration process with Oxylabs Residential Proxies and provide an example.
Click the video below if you'd like to see the integration process on YouTube:
Before getting started with Puppeteer proxy server integration, you'll need to install some basic tools: Node.js and a code editor of your choice. After that, create a Node.js project and install the required packages. You may find a detailed guide on installing and running Puppeteer in our blog post and the official Puppeteer page.
Once everything is set up, we can move on to the next part – Oxylabs' Residential Proxies integration with Puppeteer.
Within Puppeteer, fill in the value (proxy host:port). The value for Residential Proxies is as follows:
pr.oxylabs.io:7777
We will look at two different authentication methods.
Under page.authenticate, input your Oxylabs proxy sub-user's username and password. The example of a code looks like this:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
args: ['--proxy-server=pr.oxylabs.io:7777]
});
const page = await browser.newPage();
await page.authenticate({
username: 'USERNAME',
password: 'PASSWORD'
});
await page.goto('https://ip.oxylabs.io');
await page.screenshot({path: 'example.png'});
await browser.close();
})();
Alternatively, you can use another integration method - proxy-chain. It is an open-source package developed by Apify that offers a feature to “anonymize” an authenticated proxy.
To understand how the proxy-chain method works, remember that Chrome and Chromium do not support proxy URLs that include usernames and passwords, such as http://USER:PASSWORD@pr.oxylabs.io:7777.
These browsers only support proxy URLs without usernames and passwords, such as http://pr.oxylabs.io:7777. You would need to handle the authentication separately.
Proxy-chain, on the other hand, works well with proxy URLs that include username and password, eliminating the need for handling the authentication separately.
The most crucial method of proxy-chain is anonymizeProxy(). This method uses the proxy URL that contains the username and password as follows:
await proxyChain.anonymizeProxy('http://USER:PASSWORD@pr.oxylabs.io:7777'
);
The anonymizeProxy() method starts a local proxy server. The local proxy servers sit between proxy servers and Chromium, seamlessly handling authentication.
Naturally, at the end of your script, you must call the close of the anonymizeProxy() method, which closes the local proxy server.
To add proxy-chain into your project, open the terminal and enter the following command:
npm install proxy-chain
In the code, create an anonymized proxy:
const anonymizedProxy = await proxyChain.anonymizeProxy(proxy);
Send this anonymized proxy to Puppeteer as a launch argument in the following format:
await puppeteer.launch({
args: [`--proxy-server=${anonymizedProxy}`],
});
That’s all you need to do to use a proxy with authentication.
The following code block shows everything put together:
const puppeteer = require("puppeteer");
const proxyChain = require("proxy-chain");
const proxyServer = "pr.oxylabs.io:7777";
const username = "proxy-user-name";
const password = " proxy-password";
(async () => {
const proxy = `http://${username}:${password}@${proxyServer}`;
const anonymizedProxy = await proxyChain.anonymizeProxy(proxy);
const browser = await puppeteer.launch({
args: [`--proxy-server=${anonymizedProxy}`],
});
const page = await browser.newPage();
const response = await page.goto("https://ip.oxylabs.io");
console.log(await response.text());
await browser.close();
await proxyChain.closeAnonymizedProxy(anonymizedProxy, true);
})();
If all you need is authentication, it really does not matter which method you use. Both the native page.authenticate(), and the proxy-chain package will help you work with proxies that require authentication.
However, the proxy-chain package offers a few advanced features, such as custom error messages, custom responses, measuring traffic statistics, etc. You can explore the official page to learn more.
If needed, you can also use country-specific entries. For example, if you put us-pr.oxylabs.io under Host and 10000 under Port, you'll receive a US exit node. Please check our documentation for a complete list of country-specific entry nodes or if you need a sticky session.
And that's it! You've successfully integrated Oxylabs' Residential proxies with Puppeteer.
Let's take a quick look at the most common problems while integrating proxy servers with Puppeteer and potential solutions.
One of the issues you might encounter is having Puppeteer return an error when you’re trying to connect to a proxy server. The reasons for that may be diverse, with the main one being your proxy server requiring authentication with the proxy credentials.
To solve this problem, you'll need to enter the username and password for your proxy server in the Puppeteeroptions object. After that, try using a proxy again.
Even though Puppeteer might be able to connect to the proxy, it still may have trouble loading the page. The most common reason for it's a poor internet connection.
The most common method to fix this issue is to increase the timeout in the Puppeteeroptions object. Once done, using a proxy should no longer be a problem.
If the issue remains unresolved, it means that, for some reason, your proxy fails to maintain the connection. In this case, you should check your proxy server or contact your provider for further assistance. Also, avoiding free public proxies is highly recommended as they tend to demonstrate low-quality performance.
Puppeteer, in combination with Oxylabs' Residential Proxy servers, could be of great help when it comes to scraping public data from dynamic websites. If you found this guide relevant, you might also want to check the Puppeteer on AWS Lambda blog post.
In case you have any queries about integrating Oxylabs proxies with Puppeteer, please get in touch with us at any time.
Please be aware that this is a third-party tool not owned or controlled by Oxylabs. Each third-party provider is responsible for its own software and services. Consequently, Oxylabs will have no liability or responsibility to you regarding those services. Please carefully review the third party's policies and practices and/or conduct due diligence before accessing or using third-party services.
What is Puppeteer?
Puppeteer is an easy-to-use and powerful library for Node.js, mostly used for automating tests and various tasks utilizing the Chromium browser engine. It runs headless by default but can be configured to run full (non-headless) Chrome or Chromium. Puppeteer can do anything a standard browser can do, so it's hugely beneficial for building web scrapers as well. For example, it can help reach the web page's HTML and imitate standard user behavior, such as scrolling through the page.
Get the latest news from data gathering world
Get Puppeteer proxies for $15/GB
GET IN TOUCH
General:
hello@oxylabs.ioSupport:
support@oxylabs.ioCareer:
career@oxylabs.ioCertified data centers and upstream providers
Connect with us
Advanced proxy solutions
Resources
Innovation hub