Always use secure attributes for cookies that contain sensitive information to ensure they are only sent over HTTPS.
Set the HttpOnly attribute for cookies to prevent access via JavaScript, enhancing security against cross-site scripting (XSS) attacks.
Utilize session cookies for data that should only persist during an active session to minimize data exposure risks.
Implement expiration dates for persistent cookies to manage how long data is stored on the user's device, aiding in privacy control.
# pip install requests import requests # Send a GET request to the website response = requests.get('https://en.wikipedia.org/wiki/Roman_Empire') # Extract cookies from the response cookies = response.cookies # Print all cookies print(f"All Cookies:\n{cookies}\n") # Access specific types of cookies session_cookies = [cookie for cookie in cookies if cookie.expires is None] persistent_cookies = [cookie for cookie in cookies if cookie.expires is not None] # Print session cookies (expire with the session) print(f"Session Cookies:\n{session_cookies}\n") # Print persistent cookies (have an expiration date) print(f"Persistent Cookies:\n{persistent_cookies}\n") # Check for Secure cookies (transmitted over HTTPS) secure_cookies = [cookie for cookie in cookies if cookie.secure] # Print secure cookies print(f"Secure Cookies:\n{secure_cookies}\n") # Check for HttpOnly cookies (not accessible via JavaScript) httponly_cookies = [cookie for cookie in cookies if cookie.has_nonstandard_attr('HttpOnly')] # Print HttpOnly cookies print(f"HttpOnly Cookies:\n{httponly_cookies}\n")
Ensure that the domain and path attributes of cookies are correctly set to restrict their scope and prevent them from being sent to unintended locations.
Regularly update and validate the expiration settings of persistent cookies to reflect changes in privacy policy and user preferences.
Use the Secure flag in conjunction with the HttpOnly flag for comprehensive security that guards against both interception and client-side scripting attacks.
Review and periodically clean up the session and persistent cookies to avoid unnecessary data retention and potential compliance issues.
import requests from datetime import datetime, timedelta # Send a GET request to the website response = requests.get('https://en.wikipedia.org/wiki/Roman_Empire') # Extract cookies from the response cookies = response.cookies # Bad: Not specifying domain and path when setting cookies # When setting cookies in a request: cookies_dict = {'user_id': '12345'} # This would be used like: requests.get(url, cookies=cookies_dict) # Good: Using cookie jar with proper domain and path settings jar = requests.cookies.RequestsCookieJar() jar.set('user_id', '12345', domain='en.wikipedia.org', path='/secure') # Then use: requests.get(url, cookies=jar) # Bad: Using outdated expiration for cookies # Example of setting a cookie with expired date (not recommended) expired_jar = requests.cookies.RequestsCookieJar() # Using timestamp (seconds since epoch) for Jan 1, 1970 (0) expired_jar.set('user_session', 'abcd', expires=0) # Good: Set appropriate expiration date reflecting current policies expiration_date = datetime.now() + timedelta(days=90) good_jar = requests.cookies.RequestsCookieJar() # Convert datetime to timestamp (seconds since epoch) good_jar.set('user_session', 'abcd', expires=int(expiration_date.timestamp())) # Bad: Setting cookies without Secure or HttpOnly flags insecure_jar = requests.cookies.RequestsCookieJar() insecure_jar.set('auth_token', 'secure123') # Good: Use Secure and HttpOnly flags to enhance cookie security secure_jar = requests.cookies.RequestsCookieJar() # Set secure flag secure_jar.set('auth_token', 'secure123', secure=True) # HttpOnly isn't directly supported in set() method, you need to modify the cookie after creation cookie = requests.cookies.create_cookie(name='auth_token', value='secure123', secure=True) cookie.has_nonstandard_attr = lambda name: name.lower() == 'httponly' # Add HttpOnly attribute secure_jar.set_cookie(cookie) # Bad: Keeping session cookies indefinitely without review session_cookies = [cookie for cookie in cookies if cookie.expires is None] # Good: Periodically review and clean up session cookies # Define helper functions that were missing in the original code def is_necessary(cookie): # Example logic: consider cookies with certain names as necessary necessary_cookie_names = ['auth_token', 'user_session', 'GeoIP'] return cookie.name in necessary_cookie_names or 'auth' in cookie.name.lower() def delete_cookie(cookie_jar, cookie): if cookie.name in cookie_jar: cookie_jar.clear(domain=cookie.domain, path=cookie.path, name=cookie.name) print(f"Cookie '{cookie.name}' has been deleted.") # Implementation of the cookie cleanup logic cookie_jar = requests.cookies.RequestsCookieJar() for cookie in cookies: cookie_jar.set_cookie(cookie) for cookie in session_cookies: if not is_necessary(cookie): delete_cookie(cookie_jar, cookie)
Web scraper API
Public data delivery from a majority of websites
From
49
Get the latest news from data gathering world
Scale up your business with Oxylabs®
Proxies
Advanced proxy solutions
Data Collection
Datasets
Resources
Innovation hub