Web scraping can be a difficult task, considering potential challenges like CAPTCHAs or IP bans. Hence, it’s important to give yourself a hand and make the process as convenient as possible. One of the ways of doing so is integrating proxies with a third-party web scraping application like Apify.
Using Apify with Oxylabs proxies
To ensure a smooth scraping process, it’s essential to integrate proxies. Not only will it keep you anonymous, but it’ll also help you avoid the above-mentioned technical challenges. With Oxylabs’ proxy solutions integrated with Apify, you can carry out your public web scraping project without hassle. You may use Datacenter or Residential Proxies, according to your preferences and the task at hand.
Let’s look at the exact steps for setting up Oxylabs’ proxies with Apify.
2. Navigate to the menu on the left and select Store:
3. Inside it, pick your desired tool depending on your scraping project goals. You can browse the categories or use the search.
4. In our example, we’ll select the Web Scraper actor.
5. In the Input section, click Basic configuration, where you’ll be able to enter your target URLs.
6. Scroll down to Proxy and browser configuration and locate the Proxy configuration section. Here, click Custom proxies to change the Apify proxy settings.
7. If you want to use Residential Proxies, in the Custom proxies section, enter your own Oxylabs sub-user credentials and other details, as shown in the example below.
Residential Proxies
Host: pr.oxylabs.io
Port: 7777
Username: your Oxylabs sub-user’s username
Password: your Oxylabs sub-user’s password
The final URL should look like this (just with your own sub-user credentials):
Note that you can also use country-specific entries. For instance, if you enter us-pr.oxylabs.io under Host and 10000 under Port, you’ll acquire a US exit node. To see the full list of country-specific entry notes or if you need a sticky session, please see our documentation.
Now, if your tasks need to use Datacenter Proxies, insert the details as described below:
Enterprise Dedicated Datacenter Proxies
Follow these instructions if you purchased Dedicated Datacenter Proxies via sales.
Host: A specific IP address (e.g., 1.2.3.4)
Port: 60000
Please see our Enterprise Datacenter Proxy documentation for more information – here, you’ll learn how to see the IP list, where you’ll be able to choose your preferred IP address.
Once again, the final URL should look like this except with your own sub-user credentials and your chosen IP address:
Self-Service Dedicated Datacenter Proxies
Follow these instructions if you purchased Dedicated Datacenter Proxies via the dashboard.
Host:ddc.oxylabs.io
Port:8001
With Self-Service Dedicated Datacenter Proxies, the port number corresponds with the sequential number of the IP address from your acquired proxy list. Please refer to our documentation for more details.
Datacenter Proxies
Host: dc.oxylabs.io
Port:8001
For the Pay-per-IP subscription, the port corresponds to the sequential number assigned to an IP address from the provided list. Hence, port 8001 uses the first IP address on your list. Refer to our documentation for further details.
For the Pay-per-traffic subscription, port 8001 randomly selects an IP address but remains consistent throughout a session. You can also specify geo-location, such as the US, in the user authentication string: user-USERNAME-country-US:PASSWORD. For more details, see our documentation.
ISP Proxies
Host: isp.oxylabs.io
Port: 8001
8. To finish the Apify proxy configuration process, click Save & Start.
9. Once the web scraping process is finished, you can access and preview the data or download it in your preferred format.
And you’re done! Your proxy setup is that easy. Although, don’t forget to check your IP address before surfing the web to ensure you’ve got a proper connection to the server, which you can do at https://ip.oxylabs.io.
Conclusion
Integrating our proxy solutions with any of Apify’s actors is how your business can acquire beneficial publicly available data in a simple and convenient way. Although the process is quite straightforward, please feel free to contact Oxylabs’ support team via live chat or at support@oxylabs.io with any questions!
Please be aware that this is a third-party tool not owned or controlled by Oxylabs. Each third-party provider is responsible for its own software and services. Consequently, Oxylabs will have no liability or responsibility to you regarding those services. Please carefully review the third party's policies and practices and/or conduct due diligence before accessing or using third-party services.
Frequently asked questions
What is Apify?
Apify is a platform designed for automatic web data extraction on a large scale. The platform offers ready-to-use scraping tools for different websites and applications. As a customer, you have to choose your preferred scraper, enter a few necessary details and run the program – it’ll get the requested data in machine-readable formats like JSON and CSV.
How does Apify work?
The core purpose of Apify is to provide a platform and set of tools to allow developers to build, run, and manage web scraping and automation tasks. As a result, through these processes, Apify streamlines workflow for numerous scraping jobs.