This guide goes through the Selenium proxy server integration process with Oxylabs Residential, Mobile, and Datacenter Proxies using Python and Java for a smooth web scraping process.
Selenium is a tool that helps automate web browser interactions for website testing and more. It’s useful when you need to interact with a browser to perform a number of tasks, such as clicking on buttons, scrolling, etc. Even if Selenium is primarily used for website testing, it can also be used for web scraping as it helps locate the required public data on a website.
The following explains how to set up Oxylabs proxies with Selenium in Python. Note that the required version of Python is Python 3.5 (or newer).
Using the default Selenium module for implementing proxies that require authentication makes the whole process complicated. To make it less complex, install Selenium Wire to extend Selenium’s Python bindings. You can do it using the pip command:
pip install selenium-wire
Another recommended package for this integration is Selenium webdriver-manager. It simplifies the management of binary drivers for different browsers. In this case, there’s no need to manually download a new version of a web driver after each update.
You can install the Selenium webdriver-manager using the pip command as well:
pip install webdriver-manager
Once everything is set up, you can move on to the next part – proxy authentication. For proxies to work, you’ll be prompted to specify your account credentials and an endpoint.
Here are the endpoints for Oxylabs proxies (host:port):
Residential and Mobile Proxies: pr.oxylabs.io:7777
Enterprise Dedicated Datacenter Proxies: 1.2.3.4:60000 (a specific IP address)
Self-Service Dedicated Datacenter Proxies: ddc.oxylabs.io:8001
Datacenter Proxies: dc.oxylabs.io:8001
ISP Proxies: isp.oxylabs.io:8001
USERNAME = "your_username"
PASSWORD = "your_password"
ENDPOINT = "pr.oxylabs.io:7777"
You’ll need to adjust your_username and your_password fields with the username and password of your Oxylabs proxy user.
You can also use country-specific entries. For example, if you put us-pr.oxylabs.io under Host and 10000 under Port, you'll receive a residential US exit node.
For Oxylabs proxy customization options – such as country-specific entry points – refer to Datacenter per traffic, Datacenter per IP, Enterprise Dedicated Datacenter, Self-Service Dedicated Datacenter, Mobile, ISP, and Residential Proxies documentations, respectively.
To check if the proxy is working, visit ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you’re using.
try:
driver.get("https://ip.oxylabs.io/")
return f'\nYour IP is: {re.search(r"[0-9].{2,}", driver.page_source).group()}'
finally:
driver.quit()
import re
from typing import Optional
from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
# A package to have a chromedriver always up-to-date.
from webdriver_manager.chrome import ChromeDriverManager
USERNAME = "your_username"
PASSWORD = "your_password"
ENDPOINT = "pr.oxylabs.io:7777"
def chrome_proxy(user: str, password: str, endpoint: str) -> dict:
wire_options = {
"proxy": {
"http": f"http://{user}:{password}@{endpoint}",
"https": f"https://{user}:{password}@{endpoint}",
}
}
return wire_options
def get_ip_via_chrome():
manage_driver = Service(executable_path=ChromeDriverManager().install())
options = webdriver.ChromeOptions()
options.headless = True
proxies = chrome_proxy(USERNAME, PASSWORD, ENDPOINT)
driver = webdriver.Chrome(
service=manage_driver, options=options, seleniumwire_options=proxies
)
try:
driver.get("https://ip.oxylabs.io/")
return f'\nYour IP is: {re.search(r"[0-9].{2,}", driver.page_source).group()}'
finally:
driver.quit()
if __name__ == "__main__":
print(get_ip_via_chrome())
The following contains complete code demonstrating how Oxylabs proxies can be integrated with Selenium using Java.
First, you must meet the following requirements:
Java LTS 8+: You must set up the most recent Java LTS (Long Term Support) version. The most recent version of Java is 20.0.2 as of writing this. The latest version of the software is available here for download.
Maven: Maven is a tool for automating builds. You require this tool to handle the dependencies for your project. Maven is available at this site.
Java IDE: Any IDE can be used to develop your project, you just need to make sure Maven dependencies are supported. Throughout this tutorial, IntelliJ IDEA will be used.
Google Chrome: You need to test your code for proxy integration in some browsers. You can get Google Chrome from here.
After installing, you can verify your Java and Maven installations using the following lines:
java -version
mvn -v
You’re now ready to integrate proxies in Java with Selenium.
The first step is to create your project in your IDE. Select “New Project”, fill in the required details, and select the build system as Maven:
Let’s use BrowserMob Proxy as a middle layer to make the process easier. It runs proxies locally in JVM and allows the chaining of authenticated proxies. If you’re using Maven, add this dependency to the pom.xml file:
<dependency>
<groupId>net.lightbody.bmp</groupId>
<artifactId>browsermob-core</artifactId>
<version>2.1.5</version>
</dependency>
Another library for the project, Selenium WebDriverManager, is optional. It makes downloading and setting up ChromeDriver easier. To use this library, include the following dependencies in the pom.xml file:
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>4.11.0</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.32</version>
</dependency>
<!-- https://mvnrepository.com/artifact/io.github.bonigarcia/webdrivermanager -->
<dependency>
<groupId>io.github.bonigarcia</groupId>
<artifactId>webdrivermanager</artifactId>
<version>5.5.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.guava/guava -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>32.1.2-jre</version>
</dependency>
The final dependencies list in the pom.xml file will look like this:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java -->
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>4.11.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/net.lightbody.bmp/browsermob-core -->
<dependency>
<groupId>net.lightbody.bmp</groupId>
<artifactId>browsermob-core</artifactId>
<version>2.1.5</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.32</version>
</dependency>
<!-- https://mvnrepository.com/artifact/io.github.bonigarcia/webdrivermanager -->
<dependency>
<groupId>io.github.bonigarcia</groupId>
<artifactId>webdrivermanager</artifactId>
<version>5.5.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.guava/guava -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>32.1.2-jre</version>
</dependency>
</dependencies>
It’s suggested to get the dependencies from the Maven Repository to have the most accurate versions of dependencies. You can search for the required dependency in the search bar and get its dependency code for the latest version. This is illustrated in the screenshots below:
This way, you can get the latest versions of all the packages required to run the program.
The Oxylabs source code for proxy integration in Java using Selenium is ready for you to access with this link. You can get the source files from there and add them to your project.
The source code contains three files:
ProxySetup.java: This file contains some constants where you can put your username, password, and proxy address.
ProxyDemo.java: This file contains two functions, one for random proxy generation and the other for generating country-specific proxies. The functions use the ProxyHelper class to get a random proxy IP and a country-specific one.
ProxyHelper.java: This is the main file that creates the BrowserMob instance and gets the proxy.
1. To compile this Maven project, build the project in the IDE or run the following command in the terminal:
mvn clean package
This will create the oxylabs.io-jar-with-dependencies.jar file in the target folder.
2. To run the code, execute the following command in the IDE or in the terminal:
java -cp target/oxylabs.io-jar-with-dependencies.jar ProxyDemo
3. Open the ProxySetup.java file and update your username and password with your Oxylabs proxy credentials, and adjust the endpoint for the proxy you've purchased:
static final String ENDPOINT="pr.oxylabs.io:7777";
static final String USERNAME="yourUsername";
static final String PASSWORD="yourPassword";
You shouldn’t include the prefix customer- in the USERNAME. This will be added to the code for country-specific proxies.
4. Open the project in an IDE, open the ProxySetup.java file, and run the main() function. Doing so will return two IP addresses:
A random IP address;
A country-specific IP address.
5. Open the ProxyDemo.java file and send a two-letter country code to the CountrySpecificIPDemo function:
countrySpecificIPDemo("DE");
The value of this parameter is a case-insensitive country code in two-letter 3166-1 alpha-2 format. For example, DE for Germany, FR for France, etc. Check Oxylabs documentation for more details.
6. Execute the code. You should see an output similar to the following:
Done! You've successfully integrated Oxylabs proxies with your Selenium package in Java. However, you must install proper certificates to enable SSL support and avoid certificate-related warnings.
The code uses BrowserMob Proxy, which supports full MITM. However, you may still see invalid certificate warnings. To solve this, install the ca-certificate-rsa.cer file in your browser or HTTP client. Alternatively, you can generate your own private key rather than using the .cer files distributed with the repository.
Navigate to Keychain Access > System > Certificates (click the padlock icon next to System and enter your password when prompted).
Drag and drop the ca-certificate-rsa.cer file into the Certificates tab. A new certificate named LittleProxy MITM will appear.
3. Right-click the certificate and select Get Info.
4. Select Always Trust, close the dialog, and enter the password again when promoted.
Open the ca-certificate-rsa.cer file in Windows Explorer.
Right-click the file and select Install.
In the Certificate Import Wizard window, click Browse, select Trusted Publishers, and click OK to continue.
4. If you see a Security Warning, select Yes.
5. Follow the wizard to complete the installation.
The complexity of setting up BrowserMob Proxy and Chrome Options is hidden in the ProxyHelper class. In most cases, you should be able to use this file directly without any changes.
To create a ChromeDriver instance, go through a two-step process. First, create an instance of BrowserMobProxyServer. This is where you need to provide the proxy endpoint, username, and password.
The fourth parameter is a two-letter country code. If you don’t need a country-specific proxy, set it to null:
BrowserMobProxyServer proxy=ProxyHelper.getProxy(
ProxySetup.ENDPOINT,
ProxySetup.USERNAME,
ProxySetup.PASSWORD,
countryCode)
Next, call the ProxyHelper.getDriver() function. This function takes two parameters -BrowserMobProxyServer and a boolean headless. To run the browser in headless mode, send true:
WebDriver driver=ProxyHelper.getDriver(proxy,true);
driver is an instance of ChromeDriver. Now, you should write your code to use the ChromeDriver. Before exiting, remember to close the driver and stop the proxy:
driver.quit();
proxy.stop();
Selenium is a serviceable tool for web scraping, especially when learning the basics. With the help of Oxylabs proxies, web scraping is considerably more efficient.
If you have any questions about integrating Oxylabs proxies, you can contact us anytime. You should also visit our GitHub profile for the raw code and more integration tutorials.
Please be aware that this is a third-party tool not owned or controlled by Oxylabs. Each third-party provider is responsible for its own software and services. Consequently, Oxylabs will have no liability or responsibility to you regarding those services. Please carefully review the third party's policies and practices and/or conduct due diligence before accessing or using third-party services.
Playwright vs Selenium: Which One to Choose
Let's discuss Playwright vs Selenium, their relevance to web scraping, and what to remember when picking one for your scraping task.
Puppeteer vs Selenium: Which to Choose
Dive deeper into the features, benefits, and drawbacks of Puppeteer and Selenium to make an informed decision on which tool fits you best.
Web Scraping with Selenium and Python
Learn how the fundamentals of web scraping work by using Selenium, one of the better known tools for automating web browser interactions.
Get the latest news from data gathering world