1. Install Crawl4AI
Ensure you have installed Python 3.9 or later, which you can download from the official Python website. Once that's ready, install the main components of Crawl4AI using pip:
pip install crawl4ai
crawl4ai-setup
Once pip completes the installation, finalize the setup by executing the line shown below:
It’ll install and verify Playwright browsers and system requirements to ensure your environment is ready for web crawling. See the Crawl4AI documentation to learn more.
2. Make a test request
Before integrating proxies, let’s verify that the basic setup works correctly. The following script serves two purposes: it displays your current IP address and initializes a Chromium instance that will later handle proxies.
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
from crawl4ai.extraction_strategy import JsonXPathExtractionStrategy
from crawl4ai.async_configs import BrowserConfig
async def main():
schema = {
"name": "Check Your IP Address",
"baseSelector": "//body",
"fields": [
{
"name": "IP Address",
"selector": "//pre",
"type": "text"
}
]
}
browser_config = BrowserConfig(headless=False)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(
url="https://ip.oxylabs.io/",
config=CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
extraction_strategy=JsonXPathExtractionStrategy(schema)
)
)
print(result.extracted_content)
if __name__ == "__main__":
asyncio.run(main())
If any errors occur, please follow the Crawl4AI diagnostics instructions.
3. Configure proxies
To integrate proxies with Crawl4AI, you must use the proxy_config parameter to specify the proxy address, username, and password inside the BrowserConfig() class.
Let’s see an example in action using free Datacenter Proxies, which you can claim by registering on the Oxylabs dashboard and creating a proxy user:
browser_config = BrowserConfig(
proxy_config={
"server": "https://dc.oxylabs.io:8001",
"username": "user-USERNAME",
"password": "PASSWORD123"
}
)
The following sections show the integration process for different Oxylabs proxies.
Residential and Mobile Proxies
{
"server": "https://pr.oxylabs.io:7777",
"username": "customer-USERNAME",
"password": "PASSWORD123"
}
Our Residential and Mobile Proxies support HTTP, HTTPS, and SOCKS5 protocols, which you can specify in the server address, for example, socks5h://pr.oxylabs.io:7777.
Here, you can also set up your preferred geo-location settings. For instance, if you use de-pr.oxylabs.io combined with a 30000 port, you’ll utilize proxy IPs from Germany. Check out the Residential Proxies documentation and Mobile Proxies documentation to learn more.
Datacenter Proxies
Enterprise Dedicated Datacenter Proxies
{
"server": "http://1.2.3.4:60000",
"username": "USERNAME",
"password": "PASSWORD123"
}
Here, you have to use a specific IP address from your acquired proxy list and specify either HTTP or SOCKS5 protocol. Take a look at our documentation to learn more.
Self-Service Dedicated Datacenter Proxies
{
"server": "https://ddc.oxylabs.io:8001",
"username": "user-USERNAME",
"password": "PASSWORD123"
}
With self-service proxies, you can either use an 8000 port for random proxy rotation or enter a specific port starting from 8001, which sequentially picks an IP from your obtained proxy list. The supported protocols are HTTP, HTTPS, and SOCKS5. Check our documentation to find out more.
Datacenter Proxies
{
"server": "https://dc.oxylabs.io:8001",
"username": "user-USERNAME",
"password": "PASSWORD123"
}
In the case of free Datacenter Proxies and the pay-per-IP subscription, you can use the 8000 port to enable proxy rotation. Proxy ports are also mapped to IPs sequentially, with port 8001 using the first available IP address in your proxy list.
For the pay-per-traffic subscription, port 8001 assigns a random IP that remains consistent throughout the session. To target a specific country, append the two-letter country code to the username string. For example, use user-USERNAME-country-DE to connect through a German proxy.
Datacenter Proxies support HTTP, HTTPS, and SOCKS5 protocols. See the documentation for more details.
ISP Proxies
{
"server": "https://isp.oxylabs.io:8001",
"username": "user-USERNAME",
"password": "PASSWORD123"
}
ISP Proxies support HTTP, HTTPS, and SOCKS5 protocols and allow you to rotate IPs using the 8000 port, or pick specific IPs from your proxy list using ports sequentially. For instance, port 8001 will always select the first IP in your list. Take a look at our documentation to learn more.
4. Configure custom proxy rotation
You can also implement custom proxy rotation logic in different ways. The simplest approach is to create an async function that picks a random proxy and returns it. Then, you can pass this function to the proxy_config parameter in the BrowserConfig() class. See the example shown below, which uses our free Datacenter Proxies:
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
from crawl4ai.extraction_strategy import JsonXPathExtractionStrategy
from crawl4ai.async_configs import BrowserConfig
import random
async def get_random_proxy():
# Your custom proxy rotation logic.
proxies = [
"https://dc.oxylabs.io:8001",
"https://dc.oxylabs.io:8002",
"https://dc.oxylabs.io:8003",
"https://dc.oxylabs.io:8004",
"https://dc.oxylabs.io:8005",
]
proxy = random.choice(proxies)
return proxy
async def main():
schema = {
"name": "Check Your IP Address",
"baseSelector": "//body",
"fields": [
{
"name": "IP",
"selector": "//pre",
"type": "text"
}
]
}
browser_config = BrowserConfig(
proxy_config={
"server": await get_random_proxy(),
"username": "user-USERNAME",
"password": "PASSWORD123"
}
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(
url="https://ip.oxylabs.io/",
config=CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
extraction_strategy=JsonXPathExtractionStrategy(schema)
)
)
print(result.extracted_content)
if __name__ == "__main__":
asyncio.run(main())
Wrapping up
While Crawl4AI offers the ability to make your scraping requests appear organic to websites, implementing Oxylabs premium proxies will help you overcome even the most stringent anti-scraping mechanisms. This approach will let you scale as much as needed without worrying about IP bans, CAPTCHAs, or geo-restricted content.
If you have any questions about Oxylabs proxies, don’t hesitate to contact our 24/7 support via live chat or email.
Please be aware that this is a third-party tool not owned or controlled by Oxylabs. Each third-party provider is responsible for its own software and services. Consequently, Oxylabs will have no liability or responsibility to you regarding those services. Please carefully review the third party's policies and practices and/or conduct due diligence before accessing or using third-party services.