How to wait for page to load in Playwright?

Learn the essentials of using Playwright to ensure your scripts efficiently wait for web pages to fully load before proceeding. This guide provides straightforward steps and key considerations to optimize your data extraction tasks.

Best practices

Use `wait_until='networkidle'` in the `page.goto()` method to ensure all network requests have finished before proceeding.
Set `wait_until='domcontentloaded'` to wait only for the HTML document to be fully loaded and parsed, which is faster when you don't need to wait for stylesheets, images, and subframes to finish loading.
Opt for `wait_until='load'` when you need to ensure that the whole page, including all dependent resources, is fully loaded.
Regularly update Playwright to leverage improvements and new features in handling page load strategies.

from playwright.sync_api import sync_playwright

# Initialize Playwright and start a browser
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()

# Navigate to the URL and wait until network is idle
page.goto('https://sandbox.oxylabs.io/products', wait_until='networkidle')

# Alternative: wait until DOM is fully loaded
page.goto('https://sandbox.oxylabs.io/products', wait_until='domcontentloaded')

# Another method: wait until everything is loaded
page.goto('https://sandbox.oxylabs.io/products', wait_until='load')

# Perform actions or extract data
# Example: print the page title
print(page.title())

# Close the browser
browser.close()

Common issues

Ensure that your network conditions are stable and fast enough to avoid timeouts during page loads in Playwright.
Increase the default timeout in `page.goto()` if your application consistently takes longer to load, to prevent premature termination.
Utilize `page.wait_for_selector(selector)` to wait for specific elements to appear on the page, ensuring that dynamic content is fully loaded.
Check for any JavaScript errors in the console after page load that might indicate issues with complete page rendering.

# Incorrect: Using a very short timeout might lead to errors if the page takes longer to load
page.goto('https://example.com', timeout=1000)

# Correct: Increasing the timeout to handle slow page loads
page.goto('https://example.com', timeout=10000)

# Incorrect: Assuming page elements are available immediately after page.goto
page.goto('https://example.com')
button = page.query_selector('button#submit') # Might be None if the button loads dynamically

# Correct: Waiting for a specific element to ensure it's loaded
page.goto('https://example.com')
page.wait_for_selector('button#submit')
button = page.query_selector('button#submit')

# Incorrect: Not checking for JavaScript errors which might affect page functionality
page.goto('https://example.com')

# Correct: Checking for JavaScript errors after loading the page
page.goto('https://example.com')
errors = page.evaluate('''() => {
return window.console.errors || [];
}''')
if errors:
print("JavaScript errors found:", errors)

# Incorrect: Ignoring network conditions that might affect loading times
page.goto('https://example.com')

# Correct: Adjusting settings based on network speed assumptions
if slow_network:
page.set_extra_http_headers({'Cache-Control': 'no-cache'})
page.goto('https://example.com', timeout=30000)
else:
page.goto('https://example.com')