Best practices

  • Always use `allow_redirects=True` to automatically handle HTTP redirects unless specific workflow requires manual handling of redirects.

  • When disabling redirects with `allow_redirects=False`, always check the response's status code and headers to handle the next steps appropriately.

  • Utilize a `requests.Session()` object to maintain consistent session parameters and cookies when manually handling redirects.

  • When manually following redirects, validate the `Location` header to ensure the URL is a valid redirection target before making a subsequent request.

import requests

# Example 1: Default behavior (follow redirects automatically)
response = requests.get('https://sandbox.oxylabs.io/products')
print(response.url) # Prints the final URL after redirects

# Example 2: Disable following redirects
response_no_redirect = requests.get('https://sandbox.oxylabs.io/products', allow_redirects=False)
print(response_no_redirect.status_code) # Prints 302 or 301, which are typical redirect codes

# Example 3: Manually handle redirects
session = requests.Session()
response_manual = session.get('https://sandbox.oxylabs.io/products', allow_redirects=False)
while 300 <= response_manual.status_code < 400:
redirect_url = response_manual.headers['Location']
response_manual = session.get(redirect_url)
print(response_manual.url) # Prints final URL after manual redirect handling

Common issues

  • Ensure that the URL in the `Location` header is absolute, or convert it to an absolute URL before following a redirect manually.

  • Monitor the number of redirects using a counter to avoid infinite redirect loops, which can occur in faulty server configurations.

  • For debugging, log each URL visited during the redirect process to trace the path and identify potential issues.

  • When handling redirects manually, consider the possibility of encountering different HTTP methods and adjust your request method accordingly.

# Incorrect: Assuming 'Location' header contains an absolute URL
response_manual = session.get('https://example.com', allow_redirects=False)
if response_manual.status_code in (301, 302):
next_url = response_manual.headers['Location']
response_manual = session.get(next_url) # May fail if next_url is not absolute

# Correct: Ensure the URL is absolute before redirecting
from urllib.parse import urljoin
response_manual = session.get('https://example.com', allow_redirects=False)
if response_manual.status_code in (301, 302):
next_url = urljoin(response_manual.url, response_manual.headers['Location'])
response_manual = session.get(next_url)

# Incorrect: Not monitoring the number of redirects, risk of infinite loop
while response_manual.status_code in (301, 302):
response_manual = session.get(response_manual.headers['Location'])

# Correct: Use a counter to avoid infinite redirect loops
max_redirects = 10
redirect_count = 0
while response_manual.status_code in (301, 302) and redirect_count < max_redirects:
response_manual = session.get(response_manual.headers['Location'])
redirect_count += 1

# Incorrect: Not logging the URLs visited during redirects
while response_manual.status_code in (301, 302):
response_manual = session.get(response_manual.headers['Location'])

# Correct: Log each URL to trace the redirect path
import logging
logging.basicConfig(level=logging.DEBUG)
while response_manual.status_code in (301, 302):
logging.debug(f"Redirecting to {response_manual.headers['Location']}")
response_manual = session.get(response_manual.headers['Location'])

# Incorrect: Ignoring the HTTP method during manual redirect handling
response_manual = session.post('https://example.com', allow_redirects=False)
if response_manual.status_code in (301, 302):
response_manual = session.get(response_manual.headers['Location']) # Changes POST to GET

# Correct: Preserve the HTTP method across redirects
method = response_manual.request.method
while response_manual.status_code in (301, 302):
response_manual = session.request(method, response_manual.headers['Location'])

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

Get the latest news from data gathering world

I'm interested