Selenium is a tool that helps automate web browser interactions for website testing and more. It’s useful when you need to interact with a browser to perform a number of tasks, such as clicking on buttons, scrolling, etc. Even if Selenium is primarily used for website testing, it can also be used for web scraping as it helps locate the required public data on a website.
This guide will go through the Selenium integration process with Oxylabs Residential and Datacenter Proxies using Python and Java for a smooth web scraping process.
The following explains how to set up Oxylabs proxies with Selenium in Python. Note that the required version of Python is Python 3.5 (or newer).
Using the default Selenium module for implementing proxies that require authentication makes the whole process complicated. To make it less complex, install Selenium Wire to extend Selenium’s Python bindings. You can do it using the pip command:
pip install selenium-wire
Another recommended package for this integration is Selenium webdriver-manager. It simplifies the management of binary drivers for different browsers. In this case, there’s no need to manually download a new version of a web driver after each update.
You can install the Selenium webdriver-manager using the pip command as well:
pip install webdriver-manager
Once everything is set up, you can move on to the next part – proxy authentication. For proxies to work, you’ll be prompted to specify your account credentials and an endpoint.
Here are the endpoints for Oxylabs proxies (host:port):
Residential Proxies: pr.oxylabs.io:7777
Dedicated Datacenter Proxies: 1.2.3.4:60000 (a specific IP address)
Shared Datacenter Proxies: dc.pr.oxylabs.io:10000
USERNAME = "your_username"
PASSWORD = "your_password"
ENDPOINT = "pr.oxylabs.io:7777"
You’ll need to adjust your_username and your_password fields with the username and password of your Oxylabs proxy user.
You can also use country-specific entries. For example, if you put us-pr.oxylabs.io under Host and 10000 under Port, you'll receive a residential US exit node.
For Oxylabs proxy customization options – such as country-specific entry points – refer to Shared Datacenter, Dedicated Datacenter, and Residential Proxies documentations, respectively.
To check if the proxy is working, visit ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you’re using.
try:
driver.get("https://ip.oxylabs.io/")
return f'\nYour IP is: {re.search(r"[0-9].{2,}", driver.page_source).group()}'
finally:
driver.quit()
import re
from typing import Optional
from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
# A package to have a chromedriver always up-to-date.
from webdriver_manager.chrome import ChromeDriverManager
USERNAME = "your_username"
PASSWORD = "your_password"
ENDPOINT = "pr.oxylabs.io:7777"
def chrome_proxy(user: str, password: str, endpoint: str) -> dict:
wire_options = {
"proxy": {
"http": f"http://{user}:{password}@{endpoint}",
"https": f"http://{user}:{password}@{endpoint}",
}
}
return wire_options
def get_ip_via_chrome():
manage_driver = Service(executable_path=ChromeDriverManager().install())
options = webdriver.ChromeOptions()
options.headless = True
proxies = chrome_proxy(USERNAME, PASSWORD, ENDPOINT)
driver = webdriver.Chrome(
service=manage_driver, options=options, seleniumwire_options=proxies
)
try:
driver.get("https://ip.oxylabs.io/")
return f'\nYour IP is: {re.search(r"[0-9].{2,}", driver.page_source).group()}'
finally:
driver.quit()
if __name__ == "__main__":
print(get_ip_via_chrome())
The following contains complete code demonstrating how Oxylabs proxies can be integrated with Selenium using Java.
Download and install Maven, Java SE Development Kit, and Google Chrome.
To make the process easier, let’s use BrowserMob Proxy as a middle layer. It runs proxies locally in JVM and allows the chaining of authenticated proxies. If you’re using Maven, add this dependency to the pom.xml file:
<dependency>
<groupId>net.lightbody.bmp</groupId>
<artifactId>browsermob-core</artifactId>
<version>2.1.5</version>
</dependency>
Another library for the project, Selenium WebDriverManager, is optional. It makes downloading and setting up ChromeDriver easier. To use this library, include the following dependency in the pom.xml file:
<dependency>
<groupId>io.github.bonigarcia</groupId>
<artifactId>webdrivermanager</artifactId>
<version>5.0.2</version>
</dependency>
Alternatively, to avoid using WebDriverManager, download ChromeDriver and set the system property as follows:
System.setProperty("webdriver.chrome.driver","/path/to/chromedriver");
This is a Maven project. To compile the project, run the following command from the terminal:
mvn clean package
This will create the oxylabs.io-jar-with-dependencies.jar file in the target folder.
To run the JAR, execute the following command from the terminal:
java -cp target/oxylabs.io-jar-with-dependencies.jar ProxyDemo
Open the ProxySetup.java file and update your username, password, and endpoint with your Oxylabs proxies credentials:
static final String ENDPOINT="pr.oxylabs.io:7777";
static final String USERNAME="yourUsername";
static final String PASSWORD="yourPassword";
You shouldn’t include the prefix customer- in the USERNAME. This will be added to the code for country-specific proxies.
Open the project in an IDE, open the ProxySetup.java file, and run the main() function. Doing so will return two IP addresses:
A random IP address.
A country-specific IP address.
Open the ProxyDemo.java file and send a two-letter country code to the CountrySpecificIPDemo function:
countrySpecificIPDemo("DE");
The value of this parameter is a case-insensitive country code in two-letter 3166-1 alpha-2 format. For example, DE for Germany, FR for France, etc. Check Oxylabs documentation for more details.
The code uses BrowserMob Proxy, which supports full MITM. However, you may still see invalid certificate warnings. To solve this, install the ca-certificate-rsa.cer file in your browser or HTTP client. Alternatively, you can generate your own private key rather than using the .cer files distributed with the repository.
Navigate to Keychain Access > System > Certificates (click the padlock icon next to System and enter your password when prompted).
Drag and drop the ca-certificate-rsa.cer file into the Certificates tab. A new certificate named LittleProxy MITM will appear.
3. Right-click the certificate and select Get Info.
4. Select Always Trust, close the dialog, and enter the password again when promoted.
Open the ca-certificate-rsa.cer file in Windows Explorer.
Right-click the file and select Install.
In the Certificate Import Wizard window, click Browse, select Trusted Publishers, and click OK to continue.
4. If you see a Security Warning, select Yes.
5. Follow the wizard to complete the installation.
The complexity of setting up BrowserMob Proxy and Chrome Options is hidden in the ProxyHelper class. In most cases, you should be able to use this file directly without any changes.
To create a ChromeDriver instance, go through a two-step process. First, create an instance of BrowserMobProxyServer. This is where you need to provide the proxy endpoint, username, and password.
The fourth parameter is a two-letter country code. If you don’t need a country-specific proxy, set it to null:
BrowserMobProxyServer proxy=ProxyHelper.getProxy(
ProxySetup.ENDPOINT,
ProxySetup.USERNAME,
ProxySetup.PASSWORD,
countryCode)
Next, call the ProxyHelper.getDriver() function. This function takes two parameters -BrowserMobProxyServer and a boolean headless. To run the browser in headless mode, send true:
WebDriver driver=ProxyHelper.getDriver(proxy,true);
driver is an instance of ChromeDriver. Now, you should write your code to use the ChromeDriver. Before exiting, remember to close the driver and stop the proxy:
driver.quit();
proxy.stop();
Selenium is a serviceable tool for web scraping, especially when learning the basics. With the help of Oxylabs proxies, web scraping is considerably more efficient.
If you have any questions about integrating Oxylabs proxies, you can contact us anytime. You should also visit our GitHub profile for the raw code and more integration tutorials.
Please be aware that this is a third-party tool not owned or controlled by Oxylabs. Each third-party provider is responsible for its own software and services. Consequently, Oxylabs will have no liability or responsibility to you regarding those services. Please carefully review the third party's policies and practices and/or conduct due diligence before accessing or using third-party services.
Playwright vs Selenium: Which One to Choose
Let's discuss Playwright vs Selenium, their relevance to web scraping, and what to remember when picking one for your scraping task.
Puppeteer vs Selenium: Which to Choose
Dive deeper into the features, benefits, and drawbacks of Puppeteer and Selenium to make an informed decision on which tool fits you best.
Web Scraping with Selenium and Python
Learn how the fundamentals of web scraping work by using Selenium, one of the better known tools for automating web browser interactions.
Get the latest news from data gathering world
Get Selenium proxies for $15/GB
GET IN TOUCH
General:
hello@oxylabs.ioSupport:
support@oxylabs.ioCareer:
career@oxylabs.ioCertified data centers and upstream providers
Connect with us
Advanced proxy solutions
Resources
Innovation hub
oxylabs.io© 2023 All Rights Reserved