Back to blog

Web Scraping With Selenium: DIY or Buy?

Gabija Fatenaite


In order to understand the fundamentals of data scraping with Python and what web scraping is in general, it’s important to learn how to leverage different frameworks and request libraries. By developing an understanding for various HTTP methods (mainly GET and POST) web scraping can become a lot easier. 

For instance, Selenium is one of the better known and often used tools that help automate web browser interactions. By using it together with other technologies (e.g., BeautifulSoup), you can get a better grasp on web scraping basics.

How does Selenium work? It automates your written script processes, as the script needs to interact with a browser to perform repetitive tasks like clicking, scrolling, etc. As described on Selenium’s official webpage, it is “primarily for automating web applications for testing purposes, but is certainly not limited to just that.”

In this guide, on how to web scrape with Selenium, we will be using Python 3.x. as our main input language (as it is not only the most common scraping language but the one we closely work with as well). 

Setting up Selenium 

Firstly, to download the Selenium package, execute the pip command in your terminal:

pip install selenium 

You will also need to install Selenium drivers, as it enables python to control the browser on OS-level interactions. This should be accessible via the PATH variable if doing a manual installation. 

You can download the drivers for Firefox, Chrome, and Edge from here.

Quick starting Selenium

Let’s begin the automatization by starting up your browser:

  • Open up a new browser window (in this instance, Firefox) 

  • Load the page of your choice (our provided URL)

from selenium import webdriver
browser = webdriver.Firefox()

This will launch it in the headful mode. In order to run your browser in headless mode and run it on a server, it should look something like this: 

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

options = Options()
options.headless = True

driver = webdriver.firefox(options=options, executable_path=DRIVER_PATH)

Data extraction with Selenium by locating elements


Selenium offers a variety of functions to help locate elements on a page: 

  • find_element_by_id

  • find_element_by_name

  • find_element_by_xpath

  • find_element_by_link_text (find element by using text value)

  • find_element_by_partial_link_text (find element by matching some part of a hyperlink text(anchor tag))

  • find_element_by_tag_name

  • find_element_by_class_name

  • find_element_by_css_selector (find element by using a CSS selector for id class)

As an example, let’s try and locate the H1 tag on homepage with Selenium: 

        ... something
        <h1 class="someclass" id="greatID"> Partner Up With Proxy Experts</h1>

h1 = driver.find_element_by_name('h1')
h1 = driver.find_element_by_class_name('someclass')
h1 = driver.find_element_by_xpath('//h1')
h1 = driver.find_element_by_id('greatID')

You can also use the find_elements (plural form) to return a list of elements. E.g.: 

all_links = driver.find_elements_by_tag_name('a')

This way, you’ll get all anchors in the page. 

However, some elements are not easily accessible with an ID or a simple class. This is why you will need XPath.


XPath is a syntax language that helps find a specific object in DOM. XPath syntax finds the node from the root element either through an absolute path or by using a relative path. e.g.: 

  • / : Select node from the root. /html/body/div[1] will find the first div

  • //: Select node from the current node no matter where they are. //form[1] will find the first form element

  • [@attributename=’value’]: a predicate. It looks for a specific node or a node with a specific value.


//input[@name='email'] will find the first input element with the name "email".

   <div class = "content-login"> 
     <form id="loginForm"> 
            <input type="text" name="email" value="Email Address:"> 
            <input type="password" name="password"value="Password:"> 
        <button type="submit">Submit</button> 


WebElement in Selenium represents an HTML element. Here are the most commonly used actions: 

  • element.text (accessing text element)

  • (clicking on the element) 

  • element.get_attribute(‘class’) (accessing attribute) 

  • element.send_keys(‘mypassword’) (sending text to an input)

Slow website render solutions

Some websites use a lot of JavaScript to render content, and they can be tricky to deal with as they use a lot of AJAX calls. There are a few ways to solve this:

  • time.sleep(ARBITRARY_TIME)

  • WebDriverWait()


    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "mySuperId"))

This will allow the located element to be loaded after 10 seconds. To dig deeper into this topic, go ahead and check out the official Selenium documentation.

Selenium vs Puppeteer

The biggest reason for Selenium’s popularity and complexity is that it supports writing tests in multiple programming languages. This includes C#, Groovy, Java, Perl, PHP, Python, Ruby, Scala, and even JavaScript. It supports multiple browsers, including Chrome, Firefox, Edge, Internet Explorer, Opera, and Safari. 

However, for web scraping tasks, Selenium is perhaps more complex than it needs to be. Remember that Selenium’s real purpose is functional testing. For effective functional testing, it mimics what a human would do in a browser. Selenium thus needs three different components:

  • A driver for each browser

  • Installation of each browser

  • The package/library depending on the programming language used

In the case of Puppeteer, though, the node package puppeteer includes Chromium. It means no browser or driver is needed. It makes it simpler. It also supports Chrome if that is what you need.

On the other hand, multiple browser support is missing. Firefox support is limited. Google announced Puppeteer for Firefox, but it was soon deprecated. As wehn writing this, Firefox support is experimental. So, to sum up, if you need a lightweight and fast headless browser for web scraping, Puppeteer would be the best choice. You can check our Puppeteer tutorial for more information.

Selenium vs scraping tools

Selenium is great if you want to learn web scraping. We recommend using it together with BeautifulSoup as well as focus on learning HTTP protocols, methods on how the server and browser exchange data, and how cookies and headers work.

However, if you’re seeking an easier method for web scraping, there are various tools to help you out with this process. Depending on the scale of your scraping project and targets, implementing a web scraping tool will save you a lot of time and resources.

At Oxylabs, we provide a group of tools called Scraper APIs.

  • E-Commerce Scraper API – focuses on e-commerce and allows you to receive structured data in JSON

  • SERP Scraper API –  focuses on scraping SERP data from leading search engines 

  • Web Scraper API – it allows you to carry out scraping projects for most websites in HTML

Our tools also have easy integration, here’s for Python:

    import requests
  from pprint import pprint

  # Structure payload.
  payload = {
    'source': 'universal',
    'url': '',
    'user_agent_type': 'desktop',

  # Get response.
  response = requests.request(
    auth=('user', 'pass1'),

  # This will return the JSON response with results.

More integration examples for other languages are available (shell, PHP, cURL) you can learn how to cURL with proxy in our blog post. 

The main benefits of Scraper APIs when comparing with Selenium are: 

  • All web scraping processes are automated

  • No need for extra coding

  • Easily scalable 

  • Guaranteed high success rates per successful requests

  • Have a built-in proxy rotation tool


Selenium is a great tool for web scraping, especially when learning the basics. But, depending on your goals, it is sometimes easier to choose an already-built tool that does web scraping for you. Building your own scraper is a long and resource-costly procedure that might not be worth the time and effort. 

To learn more about Scraper APIs and how to integrate them, you can check out our quick start guides for Web Scraper API, SERP Scraper API, and E-Commerce Scraper API, or if you have any product related questions, contact us at 

About the author

Gabija Fatenaite

Lead Product Marketing Manager

Gabija Fatenaite is a Lead Product Marketing Manager at Oxylabs. Having grown up on video games and the internet, she grew to find the tech side of things more and more interesting over the years. So if you ever find yourself wanting to learn more about proxies (or video games), feel free to contact her - she’ll be more than happy to answer you.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested


  • Setting up Selenium 

  • Quick starting Selenium

  • Data extraction with Selenium by locating elements

  • Selenium vs Puppeteer

  • Selenium vs scraping tools

  • Conclusion

Scale up your business with Oxylabs®