Back to blog

How to Rotate Proxies in Python

How to Rotate Proxies in Python

Roberta Aukstikalnyte

2022-09-30
Share

Proxies for web scraping are used in multiple scenarios – be it market research, price monitoring, or brand protection. Regardless of your proxy use cases, rotating them when scraping is essential. But why?

Today’s beginner-friendly guide will answer exactly why it’s necessary to rotate proxies while scraping. Afterwards, the guide will lay down the exact steps of rotating proxies using Python. In the last portion of the article, you’ll get some extra professional tips and tricks on proxy rotation – let’s get started. 

What is proxy rotation and why is it important? 

Proxy rotation is a process of automatically assigning different IP addresses to a new web scraping session. The process is based on a specific time frame status code or a number of requests. 

A common challenge in the web scraping field is avoiding getting blocked by the target website – that’s where proxy rotation comes into play. Websites are not keen on bots and may find thousands of requests coming from the same IP address suspicious. However, with rotating proxy IP addresses, you can enhance your anonymity, imitate the behavior of several organic users, and circumvent most anti-scraping measures. 

Now, there are mainly two options for rotating IP addresses: you can either use a third-party rotator tool (i.e., Oxylabs’ Proxy Rotator) or build your own in Python. Let’s take a look at the latter option.

Rotating proxies in Python: installing prerequisites

You should start by creating a virtual environment. You should do that by running this command: 

$ virtualenv venv

This will install Python, pip, and common libraries in your venv folder.

Next, you need to invoke the source command to activate the environment:

$ source venv/bin/activate

The last step is to install the requests module in the current virtual environment:

$ pip install requests 

And that’s it – you have successfully installed the requests module. 

You need to create a file with the .py extension and provide the following script:

import requests
response = requests.get('https://ip.oxylabs.io/ip')
print(response.text)

Now, you should run it from a terminal:

$ python no_proxy.py
128.90.50.100

The output will show your current IP address. Our goal is to show you how to hide your IP address and rotate different IP addresses to stay anonymous and avoid getting blocked, so let’s move forward. 

Sending GET requests through a proxy

Now, let’s start with the basics: how do we use a single proxy? In order to use a proxy server, you’ll need: 

  • Scheme (e.g., http);

  • IP address;

  • Port (e.g., 3128);

  • Username and password to connect to the proxy (optional). 

Once you have all the information, you need to set it up in this order:

SCHEME://USERNAME:PASSWORD@YOUR_PROXY_IP:YOUR_PROXY_PORT

Here are a few examples of the proxy formats you may encounter:\

http://2.56.215.247:3128

https://2.56.215.247:8091

https://my-user:aegi1Ohz@2.56.215.247:8044

Note that you can specify multiple protocols and even define specific domains for which a different proxy will be used:

scheme_proxy_map = {

    'http': PROXY1,

    'https': PROXY2,

    'https://example.org': PROXY3,

}

Finally, you should try to make a request by calling requests.get and passing all the variables we defined earlier. With our script, we can also handle the exceptions and show the error when a network issue occurs.

try:
    response = requests.get('https://ip.oxylabs.io/ip', proxies=scheme_proxy_map, timeout=TIMEOUT_IN_SECONDS)
except (ProxyError, ReadTimeout, ConnectTimeout) as error:
        print('Unable to connect to the proxy: ', error)
else:
    print(response.text)

The output of this script should show you the IP of your proxy:

$ python single_proxy.py
2.56.215.247

You are now hidden behind a proxy when making your requests through the Python script. Now, we can move on to learning how to rotate a list of proxies instead of using a single one. 

Rotating proxies using a proxy pool

In this part of the tutorial, we’re going to use a list of proxies in a CSV file called proxies.csv

http://2.56.215.247:3128

https://88.198.24.108:8080

http://50.206.25.108:80

http://68.188.59.198:80

... any other proxy server, each on a separate line

First of all, create a Python file and define both the file name and how long you are willing to wait for a single proxy to respond:

TIMEOUT_IN_SECONDS = 10

CSV_FILENAME = 'proxies.csv'

Next, write the code that opens the CSV file, reads every proxy server line by line into a csv_row variable, and builds the scheme_proxy_map configuration needed by the requests module:

with open(CSV_FILENAME) as open_file:
    reader = csv.reader(open_file)
    for csv_row in reader:
        scheme_proxy_map = {
            'https': csv_row[0],
        }

To check if everything is working, we’ll use the same scraping code as before to access the website via proxies:

with open(CSV_FILENAME) as open_file:
    reader = csv.reader(open_file)
    for csv_row in reader:
        scheme_proxy_map = {
            'https': csv_row[0],
        }
        
        # Access the website via proxy
        try:
            response = requests.get('https://ip.oxylabs.io/ip', proxies=scheme_proxy_map, timeout=TIMEOUT_IN_SECONDS)
        except (ProxyError, ReadTimeout, ConnectTimeout) as error:
            pass
        else:
            print(response.text)

If you want to scrape publicly available content using any working proxy from the list, add a break after print to stop going through the proxies in the CSV file: 

            response = requests.get('https://ip.oxylabs.io/ip', proxies=scheme_proxy_map, timeout=TIMEOUT_IN_SECONDS)
        except (ProxyError, ReadTimeout, ConnectTimeout) as error:
            pass
        else:
            print(response.text)
            break  # notice the break here

Now, the only thing that’s left preventing us from reaching our full potential is speed.  

How to rotate proxies using async

To rotate proxies using async, you should use the aiohttp module. You can install it using the following CLI command:

$ pip install aiohttp

Then, you need to create a Python file where you define:

  • The CSV filename that contains the proxy list;

  • A URL that you wish to use to check the proxies;

  • How long you’re willing to wait for each proxy – the timeout setting.

CSV_FILENAME = 'proxies.csv'
URL_TO_CHECK = 'https://ip.oxylabs.io/ip'
TIMEOUT_IN_SECONDS = 10

Next, you need to define an async function and run it using the asyncio module. It accepts two parameters:

  • the URL it needs to request;

  • the proxy to use to access it. 

Then, you need to print the response. If the script receives an error when attempting to access the URL via proxy, it will print it as well:

async def check_proxy(url, proxy):
    try:
        session_timeout = aiohttp.ClientTimeout(total=None,
                                                sock_connect=TIMEOUT_IN_SECONDS,
                                                sock_read=TIMEOUT_IN_SECONDS)
        async with aiohttp.ClientSession(timeout=session_timeout) as session:
            async with session.get(url, proxy=proxy, timeout=TIMEOUT_IN_SECONDS) as resp:
                print(await resp.text())
    except Exception as error:
        # you can comment out this line to only see valid proxies printed out in the command line
        print('Proxy responded with an error: ', error)
        return

The next step is to define the main function that reads the CSV file and creates an asynchronous task to check the proxy for every single record in the CSV file:

async def main():
    tasks = []
    with open(CSV_FILENAME) as open_file:
        reader = csv.reader(open_file)
        for csv_row in reader:
            task = asyncio.create_task(check_proxy(URL_TO_CHECK, csv_row[0]))
            tasks.append(task)

    await asyncio.gather(*tasks)

You should run the main function and wait until all the async tasks are completed.

asyncio.run(main())

That’s all – now, your proxies will be running at top speed.

More tips on proxy rotation

Lastly, let’s take a look at some general tips on proxy rotation to ensure a smooth web scraping process. 

Avoid free proxy services

Despite the appeal, using free proxy IP addresses has far more negatives than positives. With multiple people using free proxies simultaneously and a common lack of financial support, they tend to be considerably slower. Free proxy providers have no obligations to guarantee that their proxies will always be available: you may start working on your scraping project one day and find out the proxies you used are no longer available the following day. 

Additionally, there are multiple security and privacy issues associated with free proxies. For example, the majority of free proxy providers don’t support encrypted HTTPS connections.

To learn more about the risks of using free proxies, check out our Why You Shouldn't Use Free Proxies - Risks & Reasons blog post. 

Pair IP rotation with user-agent rotation

User-agents are strings in HTTP requests that help websites identify details like browser, operating system, software, and device type. With multiple requests coming from the same OS and browser in a short period of time, the target website can detect suspicious activity and ban you. Hence, besides rotating proxies, you should also rotate user agents to consolidate the evasion of blocks..

Choose a reliable premium proxy service 

Instead of using free proxies,risking your data privacy and security, and dealing with issues like slow speeds, it’s strongly recommended to go for a reputable premium proxy provider. Look out for a provider that’s transparent about their proxy sourcing practices and gives proof their proxies are obtained ethically.

Alternative solution: Oxylabs’ Scraper APIs with zero infrastructure management

Although building a proxy rotator in Python is relatively easy, you’ll still need to put additional time and effort into the process. If you’re looking for an all-in-one product that does all the work for you, Oxylabs Scraper APIs are the ideal solution. Our APIs incorporate a built-in proxy rotator, which automatically changes IP addresses regularly so you won’t have to deal with CAPTCHAs or risk getting banned. 

Conclusion

Proxy rotation is an integral part of any successful web scraping project; luckily, building a rotator in Python is relatively easy. However, if you have any further questions related to the topic, feel free to drop a message at support@oxylabs.io and one of our experts will be happy to help out. 

Also, if you prefer the visual format, you can check out our video on this topic:

Easy & Quick Tutorial - How to Rotate Proxies With Python

Finally, if you're interested in more Python solutions for web scraping, refer to the Related articles section below and where you'll find some other automation tutorials on running tasks as a service and scheduling recurring jobs.

About the author

Roberta Aukstikalnyte

Senior Content Manager

Roberta Aukstikalnyte is a Senior Content Manager at Oxylabs. Having worked various jobs in the tech industry, she especially enjoys finding ways to express complex ideas in simple ways through content. In her free time, Roberta unwinds by reading Ottessa Moshfegh's novels, going to boxing classes, and playing around with makeup.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested

IN THIS ARTICLE


  • Rotating proxies in Python: installing prerequisites

  • Sending GET requests through a proxy

  • How to rotate proxies using async

  • More tips on proxy rotation

  • Conclusion

Scale up your business with Oxylabs®