Selenium is a tool that helps automate web browser interactions for website testing and more. It’s useful when you need to interact with a browser to perform a number of tasks, such as clicking on buttons, scrolling, etc. Even if Selenium is primarily used for website testing, it can also be used for web scraping as it helps locate the required public data on a website.
This guide will go through the Selenium integration process with Oxylabs Residential Proxies using Python and Java for a smooth web scraping process.
The following explains how to set up Oxylabs Residential Proxies with Selenium in Python. Note that the required version of Python is Python 3.5 (or newer).
Using the default Selenium module for implementing proxies that require authentication makes the whole process complicated. To make it less complex, install Selenium Wire to extend Selenium’s Python bindings. You can do it using the pip command:
pip install selenium-wire
Another recommended package for this integration is Selenium webdriver-manager. It simplifies the management of binary drivers for different browsers. In this case, there’s no need to manually download a new version of a web driver after each update.
You can install the Selenium webdriver-manager using the pip command as well:
pip install webdriver-manager
Once everything is set up, you can move on to the next part – proxy authentication. For proxies to work, you’ll be prompted to specify your account credentials.
USERNAME = "your_username"
PASSWORD = "your_password"
ENDPOINT = "pr.oxylabs.io:7777"
You’ll need to adjust your_username and your_password fields with the username and password of your proxy user (Oxylabs proxy sub-user’s credentials).
To check if the proxy is working, can visit ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you’re using.
try:
driver.get("https://ip.oxylabs.io/")
return f'\nYour IP is: {re.search(r"[0-9].{2,}", driver.page_source).group()}'
finally:
driver.quit()
import re
from typing import Optional
from seleniumwire import webdriver
# A package to have a chromedriver always up-to-date.
from webdriver_manager.chrome import ChromeDriverManager
USERNAME = "your_username"
PASSWORD = "your_password"
ENDPOINT = "pr.oxylabs.io:7777"
def chrome_proxy(user: str, password: str, endpoint: str) -> dict:
wire_options = {
"proxy": {
"http": f"http://{user}:{password}@{endpoint}",
"https": f"http://{user}:{password}@{endpoint}",
}
}
return wire_options
def get_ip_via_chrome():
options = webdriver.ChromeOptions()
options.headless = True
proxies = chrome_proxy(USERNAME, PASSWORD, ENDPOINT)
driver = webdriver.Chrome(
ChromeDriverManager().install(), options=options, seleniumwire_options=proxies
)
try:
driver.get("https://ip.oxylabs.io/")
return driver.page_source
finally:
driver.quit()
if __name__ == "__main__":
print(get_ip_via_chrome())
The following contains complete code demonstrating how Oxylabs Residential Proxies can be integrated with Selenium using Java.
Download and install Maven, Java SE Development Kit, and Google Chrome.
To make the process easier, let’s use BrowserMob Proxy as a middle layer. It runs proxies locally in JVM and allows the chaining of authenticated proxies. If you’re using Maven, add this dependency to the pom.xml file:
<dependency>
<groupId>net.lightbody.bmp</groupId>
<artifactId>browsermob-core</artifactId>
<version>2.1.5</version>
</dependency>
Another library for the project, Selenium WebDriverManager, is optional. It makes downloading and setting up ChromeDriver easier. To use this library, include the following dependency in the pom.xml file:
<dependency>
<groupId>io.github.bonigarcia</groupId>
<artifactId>webdrivermanager</artifactId>
<version>5.0.2</version>
</dependency>
Alternatively, to avoid using WebDriverManager, download ChromeDriver and set the system property as follows:
System.setProperty("webdriver.chrome.driver","/path/to/chromedriver");
This is a Maven project. To compile the project, run the following command from the terminal:
mvn clean package
This will create the oxylabs.io-jar-with-dependencies.jar file in the target folder.
To run the JAR, execute the following command from the terminal:
java -cp target/oxylabs.io-jar-with-dependencies.jar ProxyDemo
Open the ProxySetup.java file and update your username, password, and endpoint with your Oxylabs Residential Proxies credentials:
static final String ENDPOINT="pr.oxylabs.io:7777";
static final String USERNAME="yourUsername";
static final String PASSWORD="yourPassword";
You shouldn’t include the prefix customer- in the USERNAME. This will be added to the code for country-specific proxies.
Open the project in an IDE, open the ProxySetup.java file, and run the main() function. Doing so will return two IP addresses:
A random IP address.
A country-specific IP address from Germany.
Open the ProxyDemo.java file and send a two-letter country code to the CountrySpecificIPDemo function:
countrySpecificIPDemo("DE");
The value of this parameter is a case-insensitive country code in two-letter 3166-1 alpha-2 format. For example, DE for Germany, FR for France, etc. Check Oxylabs documentation for more details.
The code uses BrowserMob Proxy, which supports full MITM. However, you may still see invalid certificate warnings. To solve this, install the ca-certificate-rsa.cer file in your browser or HTTP client. Alternatively, you can generate your own private key rather than using the .cer files distributed with the repository.
Navigate to Keychain Access > System > Certificates (click the padlock icon next to System and enter your password when prompted).
Drag and drop the ca-certificate-rsa.cer file into the Certificates tab. A new certificate named LittleProxy MITM will appear.
3. Right-click the certificate and select Get Info.
4. Select Always Trust, close the dialog, and enter the password again when promoted.
Open the ca-certificate-rsa.cer file in Windows Explorer.
Right-click the file and select Install.
In the Certificate Import Wizard window, click Browse, select Trusted Publishers, and click OK to continue.
4. If you see a Security Warning, select Yes.
5. Follow the wizard to complete the installation.
The complexity of setting up BrowserMob Proxy and Chrome Options is hidden in the ProxyHelper class. In most cases, you should be able to use this file directly without any changes.
To create a ChromeDriver instance, go through a two-step process. First, create an instance of BrowserMobProxyServer. This is where you need to provide the proxy endpoint, username, and password.
The fourth parameter is a two-letter country code. If you don’t need a country-specific proxy, set it to null:
BrowserMobProxyServer proxy=ProxyHelper.getProxy(
ProxySetup.ENDPOINT,
ProxySetup.USERNAME,
ProxySetup.PASSWORD,
countryCode)
Next, call the ProxyHelper.getDriver() function. This function takes two parameters -BrowserMobProxyServer and a boolean headless. To run the browser in headless mode, send true:
WebDriver driver=ProxyHelper.getDriver(proxy,true);
driver is an instance of ChromeDriver. Now, you should write your code to use the ChromeDriver. Before exiting, remember to close the driver and stop the proxy:
driver.quit();
proxy.stop();
Selenium is a serviceable tool for web scraping, especially when learning the basics. With the help of Oxylabs Residential Proxies, web scraping is considerably more efficient.
If you have any questions about integrating Oxylabs proxies, you can contact us anytime. You should also visit our GitHub profile for the raw code and more integration tutorials.
Please be aware that this is a third-party tool not owned or controlled by Oxylabs. Each third-party provider is responsible for its own software and services. Consequently, Oxylabs will have no liability or responsibility to you regarding those services. Please carefully review the third party's policies and practices and/or conduct due diligence before accessing or using third-party services.
Playwright vs Selenium: Which One to Choose
Let's discuss Playwright vs Selenium, their relevance to web scraping, and what to remember when picking one for your scraping task.
Puppeteer vs Selenium: Which to Choose
Dive deeper into the features, benefits, and drawbacks of Puppeteer and Selenium to make an informed decision on which tool fits you best.
Web Scraping with Selenium and Python
Learn how the fundamentals of web scraping work by using Selenium, one of the better known tools for automating web browser interactions.
Get the latest news from data gathering world
Get Selenium proxies for $15/GB
GET IN TOUCH
General:
hello@oxylabs.ioSupport:
support@oxylabs.ioCareer:
career@oxylabs.ioCertified data centers and upstream providers
Connect with us
Advanced proxy solutions
Resources
Innovation hub