Best practices

  • Always use allow_redirects=True to automatically handle HTTP redirects unless specific workflow requires manual handling of redirects.

  • When disabling redirects with allow_redirects=False, always check the response's status code and headers to handle the next steps appropriately.

  • Utilize a requests.Session() object to maintain consistent session parameters and cookies when manually handling redirects.

  • When manually following redirects, validate the Location header to ensure the URL is a valid redirection target before making a subsequent request.

# pip install requests
import requests


# Example 1: Default behavior (follow redirects automatically)
response = requests.get("https://httpbin.org/redirect/3")
print(response.url) # Prints the final URL after redirects


# Example 2: Disable following redirects
response_no_redirect = requests.get(
    "https://httpbin.org/redirect/3",
    allow_redirects=False
)
# Prints 302 or 301, which are typical redirect codes
print(response_no_redirect.status_code)


# Example 3: Manually handle redirects
session = requests.Session()
response_manual = session.get(
    "https://httpbin.org/redirect/3",
    allow_redirects=False
)
while 300 <= response_manual.status_code < 400:
    redirect_url = response_manual.headers["Location"]
    response_manual = session.get("https://httpbin.org" + redirect_url)
    # Prints final URL after manual redirect handling
    print(response_manual.url)

Common issues

  • Ensure that the URL in the Location header is absolute, or convert it to an absolute URL before following a redirect manually.

  • Monitor the number of redirects using a counter to avoid infinite redirect loops, which can occur in faulty server configurations.

  • For debugging, log each URL visited during the redirect process to trace the path and identify potential issues.

  • When handling redirects manually, consider the possibility of encountering different HTTP methods and adjust your request method accordingly.

# pip install requests
import requests


# Incorrect: Assuming "Location" header contains an absolute URL
session = requests.Session()
response = session.get(
    "https://httpbin.org/redirect/3",
    allow_redirects=False
)
if response.status_code in (301, 302):
    next_url = response.headers["Location"]
    # May fail if next_url is not absolute
    print(next_url)

# Correct: Ensure the URL is absolute before redirecting
from urllib.parse import urljoin
session = requests.Session()
response = session.get(
    "https://httpbin.org/redirect/3",
    allow_redirects=False
)
if response.status_code in (301, 302):
    next_url = urljoin(response.url, response.headers["Location"])
    print(next_url)


# Incorrect: Not monitoring the number of redirects, risk of infinite loop
session = requests.Session()
response = session.get("https://httpbin.org/redirect/10", allow_redirects=False)
while response.status_code in (301, 302):
    response = session.get(
        "https://httpbin.org" + response.headers["Location"],
        allow_redirects=False
    )
    print(response.url)

# Correct: Use a counter to avoid infinite redirect loops
session = requests.Session()
response = session.get("https://httpbin.org/redirect/10", allow_redirects=False)
max_redirects = 3
redirect_count = 0
while response.status_code in (301, 302) and redirect_count < max_redirects:
    response = session.get(
        "https://httpbin.org" + response.headers["Location"],
        allow_redirects=False
    )
    redirect_count += 1
    print(f"{redirect_count}, {response.url}")


# Incorrect: Not logging the URLs visited during redirects
session = requests.Session()
response = session.get("https://httpbin.org/redirect/2", allow_redirects=False)
while response.status_code in (301, 302):
    response = session.get(
        "https://httpbin.org" + response.headers["Location"],
        allow_redirects=False
    )
    print(response.status_code)

# Correct: Log each URL to trace the redirect path
import logging
logging.basicConfig(level=logging.DEBUG)
session = requests.Session()
response = session.get("https://httpbin.org/redirect/2", allow_redirects=False)
while response.status_code in (301, 302):
    logging.debug(f"Redirecting to {response.headers["Location"]}")
    response = session.get(
        "https://httpbin.org" + response.headers["Location"],
        allow_redirects=False
    )
    print(response.status_code)


# Incorrect: Ignoring the HTTP method during manual redirect handling
session = requests.Session()
response = session.post("https://httpbin.org/redirect/2", allow_redirects=False)
if response.status_code in (301, 302):
    response = session.get(response.headers["Location"]) # Changes POST to GET

# Correct: Preserve the HTTP method across redirects
session = requests.Session()
response = session.post("https://httpbin.org/redirect/2", allow_redirects=False)
method = response.request.method
while response.status_code in (301, 302):
    response = session.request(method, response.headers["Location"])

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

What is cURL Command and How to Use It?
author avatar

Yelyzaveta Hayrapetyan

2024-11-18

Pagination In Web Scraping: How Challenging It May Be
Vejune avatar

Vejune Tamuliunaite

2024-09-11

How to Send GET Requests With cURL
How to Send GET Requests With cURL
Iveta Vistorskyte avatar

Iveta Vistorskyte

2023-06-09

Get the latest news from data gathering world

I'm interested