Best practices

  • Use wait_until='networkidle' in the page.goto() method to ensure all network requests have finished and all data has loaded before proceeding.

  • Set wait_until='domcontentloaded' to wait only for the HTML document to be fully loaded and parsed, which is faster when you don't need to wait for stylesheets, images, and subframes to finish loading.

  • Opt for wait_until='load' when you need to ensure that the whole page, including all dependent resources, is fully loaded.

  • Regularly update Playwright to leverage improvements and new features in handling page load strategies.

# pip install playwright
from playwright.sync_api import sync_playwright


url = 'https://sandbox.oxylabs.io/products'

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()

    # Navigate to the URL and wait until network is idle
    page.goto(url, wait_until='networkidle')
    print(page.title())

    # Alternative: wait until DOM is fully loaded
    page.goto(url, wait_until='domcontentloaded')
    print(page.title())

    # Another method: wait until everything is loaded
    page.goto(url, wait_until='load')
    print(page.title())
    
    browser.close()

Common issues

  • Ensure that your network conditions are stable and fast enough to avoid timeouts during page loads in Playwright.

  • If the page consistently takes longer to load, increase the default timeout in page.goto() to prevent premature termination.

  • Utilize page.wait_for_selector('your-selector') to wait for specific elements to appear on the page, ensuring that dynamic content is fully loaded.

  • Check for any JavaScript errors in the console after the page load that might indicate issues with complete page rendering.

# Incorrect: Using a very short timeout might lead to errors if the page takes longer to load
try:
    page.goto(url, timeout=10)
    print(page.title())
except:
    print('Timeout exceeded')

# Correct: Increasing the timeout to handle slow page loads
page.goto(url, timeout=10000)
print(page.title())


# Incorrect: Assuming page elements are available immediately after page.goto
page.goto(url)
# Might be None if the element loads dynamically
print(page.query_selector('.in-stock, .out-of-stock'))

# Correct: Waiting for a specific element to ensure it's loaded
page.goto(url)
page.wait_for_selector('.in-stock, .out-of-stock')
print(page.query_selector('.in-stock, .out-of-stock').text_content())


# Incorrect: Not checking for JavaScript errors which might affect page functionality
page.goto(url)

# Correct: Checking for JavaScript errors after loading the page
page.goto(url)
errors = page.evaluate('() => {return window.console.errors || [];}')
if errors:
    print('JavaScript errors found:', errors)


# Incorrect: Ignoring network conditions that might affect loading times
page.goto(url)
print(page.title())

# Correct: Adjusting settings based on network speed assumptions
slow_network = True
if slow_network:
    print('Slow network, adjusting settings.')
    page.set_extra_http_headers({'Cache-Control': 'no-cache'})
    page.goto(url, timeout=30000)
    print(page.title())
else:
    page.goto(url)
    print(page.title())

Try Oyxlabs' Proxies & Scraper API

Residential Proxies

Self-Service

Human-like scraping without IP blocking

From

8

Datacenter Proxies

Self-Service

Fast and reliable proxies for cost-efficient scraping

From

1.2

Web scraper API

Self-Service

Public data delivery from a majority of websites

From

49

Useful resources

Playwright Web Scraping Tutorial for 2025
Playwright Web Scraping Tutorial for 2025
Iveta Vistorskyte avatar

Iveta Vistorskyte

2025-01-02

How to Bypass CAPTCHA With Playwright
How to Bypass CAPTCHA With Playwright
author avatar

Yelyzaveta Hayrapetyan

2024-10-11

books illustration Scrapy Playwright
Scrapy Playwright Tutorial: How to Scrape JavaScript Websites
roberta avatar

Roberta Aukstikalnyte

2023-09-28

Get the latest news from data gathering world

I'm interested