How to Set Up Proxies with AIOHTTP

Aiohttp is a Python library that can make asynchronous web services and applications that work fast and are scalable. For more information, see the official documentation and our aiohttp web scraping tutorial.

This tutorial will show you how to use Python's aiohttp and asyncio libraries to integrate Oxylabs' Residential, Datacenter Proxies, and Web Unblocker, along with code examples to rotate Datacenter Proxies.

Prerequisites

Beforehand, please ensure that you have the following prerequisites in place: 

  • Python 3.6 or above

  • Oxylabs sub-user’s credentials

Installing AIOHTTP

Run the following pip command in the terminal to install the aiohttp library:

pip install aiohttp

Use the following command in a Windows command prompt:

python -m pip install aiohttp

Additionally, you must install the asyncio package as it provides many tools to write non-blocking code. For installation, use the following command:

pip install asyncio

Now, the aiohttp library is all set to send web requests and receive web responses. The next step is to import the required packages in the Python file:

import aiohttp
import asyncio

Handling HTTP requests using AIOHTTP

The following example shows how to use aiohttp to handle HTTP requests. We’ll send an HTTP GET request to the https://ip.oxylabs.io/ web page. After receiving the request, the web page returns the IP address of the requester. 

Let’s start by creating the get_response() async function:

async def get_response():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                'https://ip.oxylabs.io/'
        ) as response:
            print('Status Code: ', response.status)
            print('Body: ', await response.text())

When executed, the get_response() function creates a session by using the ClientSession() method and then sends an HTTP GET request to the target URL. Once the response to this GET request arrives, the function prints the status code value and the response text. 

Let’s call this get_response() to see the HTTP request in action:

asyncio.run(get_response())
HTTP request response

The above output shows the success status, in our case 200, and the requester’s IP address.

Integrating AIOHTTP proxies

While web scraping, we often encounter IP blocking from the websites that have implemented anti-scraping measures on their side. This is because when we repeatedly access the website with the same IP, that IP gets blocked, and we get restricted from that website. 

Oxylabs’ proxies can integrate with your HTTP requests to avoid such problems. You can simply enter your proxy server IP address and credentials for proxy authentication within the GET method.

There are three integration steps. We’ll re-use the code from the previous section, and we’ll add additional code lines to integrate proxies:

Step 1: First, import the following packages before using their functionalities:

import asyncio
import aiohttp

Step 2: Create variables for the proxy address, username, and password. Later, we’ll use these variables in the GET request.

PROXY_ADDRESS = 'proxy_address'
USERNAME = 'username'
PASSWORD = 'password'

Here, you have to replace the username and password with your Oxylabs sub-user’s credentials and the proxy_address with the address of the proxy server you want to use.

Residential Proxies

Proxy type: HTTP, HTTPS, or SOCKS5

Proxy address: pr.oxylabs.io

Proxy port: 7777

For example, in the case of Residential Proxies, you can use the following proxy server address:

PROXY_ADDRESS = 'pr.oxylabs.io:7777'

You can use a country-specific proxy address as well. For example, if you replace the proxy address with us-pr.oxylabs.io and the port with 10000, then you’ll acquire the US exit node. For more country-specific entries or if you need a sticky session, please review our documentation.

Enterprise Dedicated Datacenter Proxies

Specify the following if you purchased Dedicated Datacenter Proxies via sales.

Proxy type: HTTP or SOCKS5

Proxy address: a specific IP address (e.g., 1.2.3.4)

Proxy port: 60000

For Enterprise Dedicated Datacenter Proxies, you’ll have to choose an IP address from the acquired list. Visit our documentation for more details.

Self-Service Dedicated Datacenter Proxies

Specify the following if you purchased Dedicated Datacenter Proxies via the dashboard.

Proxy type: HTTP, HTTPS, or SOCKS5

Proxy address: ddc.oxylabs.io

Proxy port: 8001

For Self-Service Dedicated Datacenter Proxies, the port indicates the sequential number of an IP address from the acquired list. Check our documentation for more details.

Datacenter Proxies

Proxy type: HTTP, HTTPS, SOCKS5

Proxy address: dc.oxylabs.io

Proxy port: 8001

Under the pay-per-IP subscription method, each port is assigned to an IP address sequentially from your list. For example, port 8001 will use the first IP address on the list. For additional details, check our documentation.

With the pay-per-traffic subscription, port 8001 will randomly select an IP address but will stay consistent for the session's duration. To specify the proxy's location, for instance the United States, you can use the user authentication string formatted as user-USERNAME-country-US:PASSWORD. See our documentation for more details.

ISP Proxies

Proxy type: HTTP, HTTPS, or SOCKS5

Proxy address: isp.oxylabs.io

Proxy port: 8001

Web Unblocker

Proxy type: HTTP or HTTPS

Proxy address: unblock.oxylabs.io

Proxy port: 60000

You can also utilize various features of Web Unblocker, such as geo-location settings, Headless Browser, and others, by passing the parameters as headers. For instance, to connect to an IP based in Germany, the code would look like this:

headers = {
    'x-oxylabs-geo-location': 'Germany'
}

async def get_response_using_proxy():
    async with aiohttp.ClientSession(headers=headers, connector=aiohttp.TCPConnector(ssl=False)) as session:
        # Remaining code...

You can also pass the headers within the session.get() function. Use the https://ip.oxylabs.io/location URL as target to see the geographic location of your IP address, along with other information.

Step 3: The final step is to submit an HTTP request with the target URL and the proxy dictionary.

The following aiohttp proxy integration code employs the Residential Proxy server setup specified in step 2 to send a GET request to https://ip.oxylabs.io/. Additionally, it prints the status code and the response text in the output.

async def get_response_using_proxy():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                'https://ip.oxylabs.io/',
                proxy=f'http://{USERNAME}:{PASSWORD}@{PROXY_ADDRESS}'
        ) as response:
            print('Status Code: ', response.status)
            print('Body: ', await response.text())

asyncio.run(get_response_using_proxy())

When it comes to Web Unblocker, you must ignore the SSL certificate by passing the connector argument to aiohttp.ClientSeesion():

async def get_response_using_proxy():
    async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(ssl=False)) as session:
        # Remaining code...

Testing proxy connection

In the previous section, we demonstrated all the steps to integrate proxies with Python and the aiohttp library. Let’s test the code to see the output:

import aiohttp
import asyncio

PROXY_ADDRESS = 'proxy_address'
USERNAME = 'username'
PASSWORD = 'password'

async def get_response_using_proxy():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                'https://ip.oxylabs.io/',
                proxy=f'http://{USERNAME}:{PASSWORD}@{PROXY_ADDRESS}'
        ) as response:
            print('Status Code: ', response.status)
            print('Body: ', await response.text())

asyncio.run(get_response_using_proxy())

Running this code should give a similar output:

HTTP request response

Proxy integration using basic authentication 

In the previous example, you can see that we’ve used the username, password, and proxy address in a single string and passed that with the GET request. Aiohttp also provides us with another way to perform user authentication – the BasicAuth method. The following example uses the BasicAuth method to pass the username and password with the request.

import asyncio
import aiohttp

PROXY_ADDRESS = 'proxy_address'
USERNAME = 'username'
PASSWORD = 'password'

async def get_response_using_proxy():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                'https://ip.oxylabs.io/',
                proxy=f'http://{PROXY_ADDRESS}',
                proxy_auth=aiohttp.BasicAuth(USERNAME, PASSWORD)
        ) as response:
            print('Status Code: ', response.status)
            print('Body: ', await response.text())

asyncio.run(get_response_using_proxy())

How to rotate proxies with Python and AIOHTTP

Some websites restrict web scraping and block any suspected IP addresses. There’s a possibility your proxy IP will be blocked if you use a single proxy server IP for repetitive scraping requests.

Oxylabs’ Residential Proxies can utilize different IP addresses for up to 30 minutes or randomly change the proxy server address with each request. These options are also available with Shared Datacenter Proxies, although an IP can be retained indefinitely. As a result, proxy rotation is performed internally by Oxylabs’ Residential and Shared Datacenter Proxies, and thus external rotation isn’t required. For additional information, see our documentation for Residential and Shared Datacenter Proxies.

On the other hand, Dedicated Datacenter Proxies don’t offer an inherent rotation functionality, but you can use our Proxy Rotator to quickly construct that.

As an alternative, Python aiohttp can be used for proxy rotation. Although the library has no built-in rotation feature, you may rotate proxy servers using the following two techniques.

Selecting a random proxy from the list

The most straightforward way to rotate proxies is to have a list of proxies and randomly select any of the proxies from the list.

If you have a list of proxies, the following code will allow you to rotate the proxies for every web request:

import asyncio
import random
import aiohttp

proxy_list = [
    'http://username:passwordW@PROXY_ADDRESS_1:60000',
    'http://username:password@PROXY_ADDRESS_2:60000',
                    .
                    .
                    .
    'http://username:password@PROXY_ADDRESS_N:60000'
]

proxy = random.choice(proxy_list)

async def get_response_using_proxy():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                'https://ip.oxylabs.io/',
                proxy=proxy
        ) as response:
            print('Status Code: ', response.status)
            print('Body: ', await response.text())

asyncio.run(get_response_using_proxy())

The above code creates a list of different IP addresses with usernames and passwords. Then it uses the random.choice() method to randomly select one proxy endpoint from the proxy_list. Once the proxy is fetched from the list of proxies randomly, the request is sent using this selected proxy server endpoint.

The random.choice() method can select the same proxy endpoints multiple times, so there’s a chance to use a particular proxy address multiple times.

Iterating over a proxy list

The previous proxy rotation approach is non-deterministic due to its unpredictability. Let’s use a more predictable rotation approach instead by employing a round-robin-style strategy.

With this method, you’ll build a list of proxy endpoints and cycle over the list indices until the list is finished. The subsequent value of i is mapped to the 0th index using modular arithmetic. It continues until all iterations of the for loop have been completed.

The following code demonstrates this concept:

import asyncio
import aiohttp

proxy_list = [
   'http://username:password@PROXY_ADDRESS_1:60000',
   'http://username:password@PROXY_ADDRESS_2:60000',
                .
                .
                .
   'http://username:password@PROXY_ADDRESS_N:60000'
]

async def get_response_using_proxy(target_url, proxy):
    async with aiohttp.ClientSession() as session:
        async with session.get(
                target_url,
                proxy=proxy,
        ) as response:
            print('Status Code: ', response.status)
            print('Body: ', await response.text())

number_of_requests = 10
length = len(proxy_list)
for i in range(number_of_requests):
    index = i % length
    asyncio.run(get_response_using_proxy('https://ip.oxylabs.io/', proxy_list[index]))

The for loop sends HTTP requests equal to the number_of_requests. Note how the index is determined throughout each cycle. The index value is kept within the proxy's boundaries using the expression i%length. It maps all i values in the range of 0 and length-1, resulting in a smooth proxy rotation.

How to reuse proxies

With proxy rotation, we select a different proxy on each request sent by aiohttp. We can reuse the same proxy until the website blocks it and then use another one. 

Let’s see an example:

import asyncio
import aiohttp

proxy_list = [
    'http://username:password@PROXY_ADDRESS_1:60000',
    'http://username:password@PROXY_ADDRESS_2:60000',
                    .
                    .
                    .
    'http://username:password@PROXY_ADDRESS_N:60000'
]

async def get_response_using_proxy(target_url, proxy):
    async with aiohttp.ClientSession() as session:
        async with session.get(
                target_url,
                proxy=proxy
        ) as response:
            print('Status Code: ', response.status)
            print('Body: ', await response.text())
            return response.status

index = 0
number_of_requests = 10
length = len(proxy_list)
for _ in range(number_of_requests):
    status_code = asyncio.run(get_response_using_proxy('https://ip.oxylabs.io/location', proxy_list[index]))
    if status_code != 200:
        index = index + 1  # selecting new proxy index
        index = index % length  # taking index within the proxy list size
    else:
        continue  # to reuse the same proxy

In this code, the for loop keeps on sending the requests with a single proxy. When it gets a status code other than 200 (success code), it changes the index and uses the next proxy from the list. 

Conclusion

Any scraping task performs exceedingly better when combined with proxies. One of the simpler ways is integrating proxies with the aiohttp library in Python to scrape at a greater scale and without IP blocks or geo-restrictions.

Please feel free to contact us via email or live chat if you need assistance or have any questions.

Please be aware that this is a third-party tool not owned or controlled by Oxylabs. Each third-party provider is responsible for its own software and services. Consequently, Oxylabs will have no liability or responsibility to you regarding those services. Please carefully review the third party's policies and practices and/or conduct due diligence before accessing or using third-party services.

Frequently Asked Questions

What is aiohttp used for?

Python's aiohttp library can be used to create asynchronous web services and applications. Compared to conventional synchronous web development frameworks, it enables developers to create quick and scalable applications that can manage high levels of concurrency and process more requests per second.

Is aiohttp better than Requests?

Requests is a straightforward and user-friendly library primarily used for sending HTTP requests to servers. It has an easy-to-use API to handle HTTP requests and responses, and it’s simple yet soundly constructed.

On the other hand, aiohttp is a complete package for creating asynchronous online services and applications. It offers more capabilities, such as middlewares, HTTP client and server components, RESTful services, and also client and server web sockets.

Is asyncio better than threads?

Depending on the project requirements, asyncio or threads may be preferable. Asyncio is a suitable option for I/O-bound applications since it offers efficient and scalable threading.

Threads can be employed to use several CPU cores for CPU-bound applications. Hence, it’s crucial to select the strategy that best suits your objectives.

Why is asyncio not thread-safe?

Although Asyncio isn’t intrinsically thread-safe, this isn’t always a drawback. The main reason is that asyncio was developed to offer effective and scalable concurrency through asynchronous programming rather than through multiple threads.

Is aiohttp a framework?

Aiohttp is a library for creating asynchronous online applications and services rather than a full-fledged framework. Aiohttp offers a straightforward and understandable API for creating HTTP clients and server components, handling requests and responses, and managing WebSockets. Additionally, it provides middlewares, connection management, support for integration with other libraries, and other valuable capabilities.

Get the latest news from data gathering world

I'm interested