How to follow redirects with Python Requests?

Learn how to handle URL redirects seamlessly using Python Requests in this concise tutorial. Master the techniques to ensure your data collection remains efficient and uninterrupted, even when faced with unexpected URL changes.

Best practices

  • Always use `allow_redirects=True` to automatically handle HTTP redirects unless specific workflow requires manual handling of redirects.

  • When disabling redirects with `allow_redirects=False`, always check the response's status code and headers to handle the next steps appropriately.

  • Utilize a `requests.Session()` object to maintain consistent session parameters and cookies when manually handling redirects.

  • When manually following redirects, validate the `Location` header to ensure the URL is a valid redirection target before making a subsequent request.

import requests

# Example 1: Default behavior (follow redirects automatically)
response = requests.get('https://sandbox.oxylabs.io/products')
print(response.url) # Prints the final URL after redirects

# Example 2: Disable following redirects
response_no_redirect = requests.get('https://sandbox.oxylabs.io/products', allow_redirects=False)
print(response_no_redirect.status_code) # Prints 302 or 301, which are typical redirect codes

# Example 3: Manually handle redirects
session = requests.Session()
response_manual = session.get('https://sandbox.oxylabs.io/products', allow_redirects=False)
while 300 <= response_manual.status_code < 400:
redirect_url = response_manual.headers['Location']
response_manual = session.get(redirect_url)
print(response_manual.url) # Prints final URL after manual redirect handling

Common issues

  • Ensure that the URL in the `Location` header is absolute, or convert it to an absolute URL before following a redirect manually.

  • Monitor the number of redirects using a counter to avoid infinite redirect loops, which can occur in faulty server configurations.

  • For debugging, log each URL visited during the redirect process to trace the path and identify potential issues.

  • When handling redirects manually, consider the possibility of encountering different HTTP methods and adjust your request method accordingly.

# Incorrect: Assuming 'Location' header contains an absolute URL
response_manual = session.get('https://example.com', allow_redirects=False)
if response_manual.status_code in (301, 302):
next_url = response_manual.headers['Location']
response_manual = session.get(next_url) # May fail if next_url is not absolute

# Correct: Ensure the URL is absolute before redirecting
from urllib.parse import urljoin
response_manual = session.get('https://example.com', allow_redirects=False)
if response_manual.status_code in (301, 302):
next_url = urljoin(response_manual.url, response_manual.headers['Location'])
response_manual = session.get(next_url)

# Incorrect: Not monitoring the number of redirects, risk of infinite loop
while response_manual.status_code in (301, 302):
response_manual = session.get(response_manual.headers['Location'])

# Correct: Use a counter to avoid infinite redirect loops
max_redirects = 10
redirect_count = 0
while response_manual.status_code in (301, 302) and redirect_count < max_redirects:
response_manual = session.get(response_manual.headers['Location'])
redirect_count += 1

# Incorrect: Not logging the URLs visited during redirects
while response_manual.status_code in (301, 302):
response_manual = session.get(response_manual.headers['Location'])

# Correct: Log each URL to trace the redirect path
import logging
logging.basicConfig(level=logging.DEBUG)
while response_manual.status_code in (301, 302):
logging.debug(f"Redirecting to {response_manual.headers['Location']}")
response_manual = session.get(response_manual.headers['Location'])

# Incorrect: Ignoring the HTTP method during manual redirect handling
response_manual = session.post('https://example.com', allow_redirects=False)
if response_manual.status_code in (301, 302):
response_manual = session.get(response_manual.headers['Location']) # Changes POST to GET

# Correct: Preserve the HTTP method across redirects
method = response_manual.request.method
while response_manual.status_code in (301, 302):
response_manual = session.request(method, response_manual.headers['Location'])

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

Get the latest news from data gathering world

I'm interested