Best practices

  • Ensure the requests library is up-to-date to avoid any compatibility issues with websites and to use the latest features and security fixes.

  • Use soup.find_all('a', href=True) to directly filter out <a> tags without an href attribute, making the code more efficient.

  • Validate and clean the extracted URLs to handle relative paths and possible malformed URLs properly.

  • Consider using requests.Session() to make multiple requests to the same host, as it can reuse underlying TCP connections, which is more efficient.

Scrollable code block. Use arrow keys to scroll.

Common issues

  • Ensure your script handles exceptions like requests.exceptions.RequestException to manage potential connectivity issues or HTTP errors gracefully.

  • Set a User-Agent in the header of your request to mimic a browser visit, which can help avoid being blocked by some websites that restrict script access.

  • Utilize response.raise_for_status() after your requests.get() call to immediately catch HTTP errors and handle them before parsing the HTML.

  • Incorporate a timeout in your requests.get() call to avoid hanging indefinitely if the server does not respond.

Scrollable code block. Use arrow keys to scroll.

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

Using Google Sheets for Basic Web Scraping visuals
Guide to Using Google Sheets for Basic Web Scraping
vytenis kaubre avatar

Vytenis Kaubrė

2025-07-18

Automated Web Scraper With Python AutoScraper [Guide]
Automated Web Scraper With Python AutoScraper [Guide]
Dovydas Vėsa avatar

Dovydas Vėsa

2024-05-06

Crawl a Website visual
15 Tips on How to Crawl a Website Without Getting Blocked
adelina avatar

Adelina Kiskyte

2024-03-15

Get the latest news from data gathering world