How to integrate Oxylabs' proxies with Puppeteer?
1. Install the required tools
Before getting started with Puppeteer proxy server integration, you'll need to install some basic tools: Node.js and a code editor of your choice. After that, create a Node.js project and install the required packages. You can find a detailed guide on installing and running Puppeteer in our blog post and the official Puppeteer page.
Once everything is set up, we can move on to the next part – Oxylabs proxies integration with Puppeteer.
2. Enter the value
Within Puppeteer, fill in the value, proxy host:port. The values for Oxylabs proxies are as follows:
Residential and Mobile Proxies: pr.oxylabs.io:7777
Enterprise Dedicated Datacenter Proxies: 1.2.3.4:60000 (a specific IP address)
Self-Service Dedicated Datacenter Proxies: ddc.oxylabs.io:8001
Datacenter Proxies: dc.oxylabs.io:8001
ISP Proxies: isp.oxylabs.io:8001
3. Authenticate the proxy
There are two different authentication methods.
Using the authenticate() method
Under page.authenticate, input your Oxylabs proxy user's username and password. The example of a code looks like this:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
args: [`--proxy-server=pr.oxylabs.io:7777`]
});
const page = await browser.newPage();
await page.authenticate({
username: 'USERNAME',
password: 'PASSWORD'
});
await page.goto('https://ip.oxylabs.io');
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
Using the proxy-chain package
Alternatively, you can use another integration method - proxy-chain. It is an open-source package developed by Apify that offers a feature to “anonymize” an authenticated proxy.
To understand how the proxy-chain method works, remember that Chrome and Chromium do not support proxy URLs that include usernames and passwords, such as http://USER:PASSWORD@pr.oxylabs.io:7777.
These browsers only support proxy URLs without usernames and passwords, such as http://pr.oxylabs.io:7777. You would need to handle the authentication separately.
Proxy-chain, on the other hand, works well with proxy URLs that include username and password, eliminating the need for handling the authentication separately.
The most crucial method of proxy-chain is anonymizeProxy(). This method uses the proxy URL that contains the username and password as follows:
await proxyChain.anonymizeProxy('http://USER:PASSWORD@pr.oxylabs.io:7777');
The anonymizeProxy() method starts a local proxy server. The local proxy servers sit between proxy servers and Chromium, seamlessly handling authentication.
Naturally, at the end of your script, you must call the close of the anonymizeProxy() method, which closes the local proxy server.
To add proxy-chain into your project, open the terminal and enter the following command:
In the code, create an anonymized proxy:
(async () => {
const proxy = `http://${username}:${password}@${proxyServer}`;
const anonymizedProxy = await proxyChain.anonymizeProxy(proxy);
Send this anonymized proxy to Puppeteer as a launch argument in the following format:
const browser = await puppeteer.launch({
args: [`--proxy-server=${anonymizedProxy}`],
});
That’s all you need to do to use a proxy with authentication. The following code block shows everything put together:
const puppeteer = require('puppeteer');
const proxyChain = require('proxy-chain');
const proxyServer = 'pr.oxylabs.io:7777';
const username = 'proxy-username';
const password = 'proxy-password';
(async () => {
const proxy = `http://${username}:${password}@${proxyServer}`;
const anonymizedProxy = await proxyChain.anonymizeProxy(proxy);
const browser = await puppeteer.launch({
args: [`--proxy-server=${anonymizedProxy}`],
});
const page = await browser.newPage();
await page.goto('https://ip.oxylabs.io');
await page.screenshot({ path: 'example2.png' });
await browser.close();
await proxyChain.closeAnonymizedProxy(anonymizedProxy, true);
})();
Which method is better?
If all you need is authentication, it really does not matter which method you use. Both the native page.authenticate(), and the proxy-chain package will help you work with proxies that require authentication.
However, the proxy-chain package offers a few advanced features, such as custom error messages, custom responses, measuring traffic statistics, etc. You can explore the official page to learn more.
4. Use country-specific entries
If needed, you can also use country-specific entries. For example, if you put us-pr.oxylabs.io under Host and 10000 under Port, you'll receive a residential US exit node.
For Oxylabs proxy customization options – such as country-specific entry points – refer to our documentations, respectively:
And that's it! You've successfully integrated Oxylabs proxies with Puppeteer.
Setting up a Puppeteer rotating proxy
Oxylabs proxies come with automatic proxy rotation, and Enterprise Dedicated Datacenter Proxies can utilize the Proxy Rotator feature, eliminating the need to rotate proxies manually. If you want to implement proxy rotation with Puppeteer yourself, you can do so in several ways.
Rotating proxies randomly
One of the most popular methods for rotating proxies is to pick a random proxy from a list. It’s simple yet effective, so let’s see how to set up a Puppeteer rotating proxy using the authenticate() method:
const puppeteer = require('puppeteer');
const proxies = [
'PROXY_ADDRESS_1:PORT',
'PROXY_ADDRESS_2:PORT',
'PROXY_ADDRESS_3:PORT'
];
(async () => {
const proxyRandom = proxies[Math.floor(Math.random() * proxies.length)];
const browser = await puppeteer.launch({
args: [`--proxy-server=${proxyRandom}`]
});
const page = await browser.newPage();
await page.authenticate({
username: 'USERNAME',
password: 'PASSWORD'
});
await page.goto('https://ip.oxylabs.io');
const pageText = await page.evaluate(() => document.body.innerText);
console.log(pageText);
await browser.close();
})();
First, create a proxies array that contains a list of different proxy IP addresses with their respective ports. Then, use Math.floor(), Math.random(), and proxies.length to generate a random index within the array's bounds and assign the corresponding element to proxyRandom. Once that’s done, pass the proxyRandom to the args field like so: `--proxy-server=${proxyRandom}`. Now, each time you run the request, it’ll output a random IP address from your proxy list.
The same tactics also work with the proxy-chain package:
const puppeteer = require('puppeteer');
const proxyChain = require('proxy-chain');
const proxies = [
'PROXY_ADDRESS_1:PORT',
'PROXY_ADDRESS_2:PORT',
'PROXY_ADDRESS_3:PORT'
];
const username = 'USERNAME';
const password = 'PASSWORD';
(async () => {
const randomProxy = proxies[Math.floor(Math.random() * proxies.length)];
const proxy = `http://${username}:${password}@${randomProxy}`;
const anonymizedProxy = await proxyChain.anonymizeProxy(proxy);
const browser = await puppeteer.launch({
args: [`--proxy-server=${anonymizedProxy}`],
});
const page = await browser.newPage();
await page.goto('https://ip.oxylabs.io');
const pageText = await page.evaluate(() => document.body.innerText);
console.log(pageText);
await browser.close();
await proxyChain.closeAnonymizedProxy(anonymizedProxy, true);
})();
Rotating proxies sequentially
Although random proxy rotation is effective for most scenarios, there are situations where sequentially rotating proxies in their listed order is preferable. For example, the code could pick the first proxy address and run a request, then use the second proxy address for the next request.
Consider the following example, which uses the authenticate() method and loops through an array of proxies:
const puppeteer = require('puppeteer');
const proxies = [
'PROXY_ADDRESS_1:PORT',
'PROXY_ADDRESS_2:PORT',
'PROXY_ADDRESS_3:PORT',
];
async function launchBrowser(proxy) {
const browser = await puppeteer.launch({
args: [`--proxy-server=${proxy}`]
});
const page = await browser.newPage();
await page.authenticate({
username: 'USERNAME',
password: 'PASSWORD'
});
await page.goto('https://ip.oxylabs.io');
const pageText = await page.evaluate(() => document.body.innerText);
console.log(pageText);
await browser.close();
};
(async () => {
for (const proxy of proxies) {
await launchBrowser(proxy);
};
})();
Here, the launchBrowser asynchronous function defines the scraping logic but isn’t run immediately. Then, the immediately invoked function expression (IIFE) iterates over each proxy address in the proxies array. As a result, the code runs three separate requests utilizing these three proxy addresses sequentially. The same logic follows if you want to use the proxy-chain module.
Most common issues
Let's take a quick look at the most common problems while integrating proxy servers with Puppeteer and potential solutions.
Puppeteer is returning an error
One of the issues you might encounter is having Puppeteer return an error when you’re trying to connect to a proxy server. The reasons for that may be diverse, with the main one being your proxy server requiring authentication with the proxy credentials.
To solve this problem, you'll need to enter the username and password for your proxy server in the Puppeteeroptions object. After that, try using a proxy again.
The page isn't loading
Even though Puppeteer might be able to connect to the proxy, it still may have trouble loading the page. The most common reason for it's a poor internet connection.
The most common method to fix this issue is to increase the timeout in the Puppeteeroptions object. Once done, using a proxy should no longer be a problem.
Other issues
If the issue remains unresolved, it means that, for some reason, your proxy fails to maintain the connection. In this case, you should check your proxy server or contact your provider for further assistance. Also, avoiding free public proxies is highly recommended as they tend to demonstrate low-quality performance.
Conclusion
Puppeteer, in combination with Oxylabs proxy servers, could be of great help when it comes to scraping public data from dynamic websites. If you found this guide relevant, you might also want to check the Puppeteer on AWS Lambda.
In case you have any queries about integrating Oxylabs proxies with Puppeteer, please get in touch with us at any time.
Please be aware that this is a third-party tool not owned or controlled by Oxylabs. Each third-party provider is responsible for its own software and services. Consequently, Oxylabs will have no liability or responsibility to you regarding those services. Please carefully review the third party's policies and practices and/or conduct due diligence before accessing or using third-party services.