Best practices

  • Always use secure attributes for cookies that contain sensitive information to ensure they are only sent over HTTPS.

  • Set the HttpOnly attribute for cookies to prevent access via JavaScript, enhancing security against cross-site scripting (XSS) attacks.

  • Utilize session cookies for data that should only persist during an active session to minimize data exposure risks.

  • Implement expiration dates for persistent cookies to manage how long data is stored on the user's device, aiding in privacy control.

# pip install requests
import requests

# Send a GET request to the website
response = requests.get('https://en.wikipedia.org/wiki/Roman_Empire')


# Extract cookies from the response
cookies = response.cookies
# Print all cookies
print(f"All Cookies:\n{cookies}\n")


# Access specific types of cookies
session_cookies = [cookie for cookie in cookies if cookie.expires is None]
persistent_cookies = [cookie for cookie in cookies if cookie.expires is not None]

# Print session cookies (expire with the session)
print(f"Session Cookies:\n{session_cookies}\n")
# Print persistent cookies (have an expiration date)
print(f"Persistent Cookies:\n{persistent_cookies}\n")


# Check for Secure cookies (transmitted over HTTPS)
secure_cookies = [cookie for cookie in cookies if cookie.secure]
# Print secure cookies
print(f"Secure Cookies:\n{secure_cookies}\n")


# Check for HttpOnly cookies (not accessible via JavaScript)
httponly_cookies = [cookie for cookie in cookies if cookie.has_nonstandard_attr('HttpOnly')]
# Print HttpOnly cookies
print(f"HttpOnly Cookies:\n{httponly_cookies}\n")

Common issues

  • Ensure that the domain and path attributes of cookies are correctly set to restrict their scope and prevent them from being sent to unintended locations.

  • Regularly update and validate the expiration settings of persistent cookies to reflect changes in privacy policy and user preferences.

  • Use the Secure flag in conjunction with the HttpOnly flag for comprehensive security that guards against both interception and client-side scripting attacks.

  • Review and periodically clean up the session and persistent cookies to avoid unnecessary data retention and potential compliance issues.

import requests
from datetime import datetime, timedelta

# Send a GET request to the website
response = requests.get('https://en.wikipedia.org/wiki/Roman_Empire')

# Extract cookies from the response
cookies = response.cookies


# Bad: Not specifying domain and path when setting cookies
# When setting cookies in a request:
cookies_dict = {'user_id': '12345'}  # This would be used like: requests.get(url, cookies=cookies_dict)

# Good: Using cookie jar with proper domain and path settings
jar = requests.cookies.RequestsCookieJar()
jar.set('user_id', '12345', domain='en.wikipedia.org', path='/secure')
# Then use: requests.get(url, cookies=jar)


# Bad: Using outdated expiration for cookies
# Example of setting a cookie with expired date (not recommended)
expired_jar = requests.cookies.RequestsCookieJar()
# Using timestamp (seconds since epoch) for Jan 1, 1970 (0)
expired_jar.set('user_session', 'abcd', expires=0)

# Good: Set appropriate expiration date reflecting current policies
expiration_date = datetime.now() + timedelta(days=90)
good_jar = requests.cookies.RequestsCookieJar()
# Convert datetime to timestamp (seconds since epoch)
good_jar.set('user_session', 'abcd', expires=int(expiration_date.timestamp()))


# Bad: Setting cookies without Secure or HttpOnly flags
insecure_jar = requests.cookies.RequestsCookieJar()
insecure_jar.set('auth_token', 'secure123')

# Good: Use Secure and HttpOnly flags to enhance cookie security
secure_jar = requests.cookies.RequestsCookieJar()
# Set secure flag
secure_jar.set('auth_token', 'secure123', secure=True)
# HttpOnly isn't directly supported in set() method, you need to modify the cookie after creation
cookie = requests.cookies.create_cookie(name='auth_token', value='secure123', secure=True)
cookie.has_nonstandard_attr = lambda name: name.lower() == 'httponly'  # Add HttpOnly attribute
secure_jar.set_cookie(cookie)


# Bad: Keeping session cookies indefinitely without review
session_cookies = [cookie for cookie in cookies if cookie.expires is None]

# Good: Periodically review and clean up session cookies
# Define helper functions that were missing in the original code

def is_necessary(cookie):
    # Example logic: consider cookies with certain names as necessary
    necessary_cookie_names = ['auth_token', 'user_session', 'GeoIP']
    return cookie.name in necessary_cookie_names or 'auth' in cookie.name.lower()

def delete_cookie(cookie_jar, cookie):
    if cookie.name in cookie_jar:
        cookie_jar.clear(domain=cookie.domain, path=cookie.path, name=cookie.name)
        print(f"Cookie '{cookie.name}' has been deleted.")

# Implementation of the cookie cleanup logic
cookie_jar = requests.cookies.RequestsCookieJar()

for cookie in cookies:
    cookie_jar.set_cookie(cookie)

for cookie in session_cookies:
    if not is_necessary(cookie):
        delete_cookie(cookie_jar, cookie)

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

What is Browser Fingerprinting?
Authors avatar

Adomas Sulcas

2025-03-17

What Is a Web Session and How Is It Used in Web Scraping?
author avatar

Augustas Pelakauskas

2021-11-26

What Are HTTP Cookies and What Are They Used For?
Iveta Vistorskyte avatar

Iveta Vistorskyte

2020-10-13

Get the latest news from data gathering world

I'm interested