How to Set Up Proxies With Octoparse

In this tutorial, we’ll guide you through the Octoparse integration process with Oxylabs Datacenter and Residential Proxies to ensure a quick start and smooth web scraping.

How to Set Up Proxies With Octoparse

What is Octoparse?

Octoparse is a simple-to-use data extraction tool. It allows you to scrape public data without coding and bypass most anti-scraping mechanisms by enabling automatic IP rotation and extended session time. Amplified by the advanced machine learning algorithms, Octoparse quickly locates the data when you click on it. It handles complex websites and captures all kinds of data, including texts, links, image URLs, and HTML codes. 

How to configure proxy settings in Octoparse

1. Download, install, and launch Octoparse.

2. To create a new task, click +New in the top-left corner and choose Custom Task.

Setting up a custom task

Setting up a custom task

3. Type the URL of the webpage you intend to extract data from in the URL Input and click Save. Let's use Oxylabs scraping sandbox as an example.

Adding the target URL

Adding the target URL

4. After your selected URL loads, go to Task Settings > Anti-blocking. Now, check Access websites via proxies, enable Use my own proxies, and click Configure.

Task settings

Task settings

Anti-blocking

Anti-blocking

Access websites via proxies > Use my own proxies > Configure

Access websites via proxies > Use my own proxies > Configure

5. Configure your Oxylabs proxies by specifying the following format:

IP/host:port:username:password

Residential Proxies

IP/host: pr.oxylabs.io

Port: 7777

You can also use country-specific entries. For example, entering ie-pr.oxylabs.io under IP/host and 25000 under Port will acquire an Irish exit node. Please refer to our documentation for a complete list of country-specific entry nodes or if you need a sticky session.

Enterprise Dedicated Datacenter Proxies

IP/host: a specific IP address (e.g., 1.2.3.4)

Port: 60000

For Enterprise Dedicated Datacenter Proxies, you’ll have to choose an IP address from the acquired list. Visit our documentation for more details.

Self-Service Dedicated Datacenter Proxies

IP/host: ddc.oxylabs.io

Port: 8001

For Self-Service Dedicated Datacenter Proxies, the port indicates the sequential number of an IP address from the acquired list. Check our documentation for more details.

Datacenter Proxies

IP/host: dc.oxylabs.io

Port: 8001

For the Pay-per-IP subscription, the port corresponds to the sequential number assigned to an IP address from the provided list. Hence, port 8001 uses the first IP address on your list. See our documentation for further details.

For the Pay-per-traffic subscription, port 8001 randomly selects an IP address but remains consistent throughout a session. You can also specify geo-location, such as the US, in the user authentication string: user-USERNAME-country-US:PASSWORD. For more details, see our documentation.

ISP Proxies

IP/host: isp.oxylabs.io

Port: 8001

Proxy settings

Proxy settings

6. Depending on whether you use a rotating or sticky session type, set up the Switch interval.

7. Enter your Oxylabs proxy user’s credentials. Use the same username and password you entered to create a new proxy user in the Oxylabs dashboard.

Proxies are now set up.

Web scraping with Octoparse

1. Select the desirable elements (video game titles) you want to scrape. To extract all elements from the same category, choose Select all similar elements and specify Text.

Selecting all similar elements

Selecting all similar elements

Text elements

Text elements

2. Set up pagination to scrape multiple pages. This particular website uses numbered pages, prompting you to choose Next page button.

Next page button

Next page button

3. Choose the exact button in the page layout that opens the following page – Forward – to automate pagination.

The Forward button

The Forward button

4. Complete the scraping setup and press ▶Run.

Completing the setup and running the scraper

Completing the setup and running the scraper

5. Choose Run on your device with Standard Mode to receive data as a file on your PC.

Running on a local device

Running on a local device

6. Let the scraping process run until complete. The process will be over when the final product page is reached or when you stop it manually.

Scraping progress

Scraping progress

7. Extract the collected data and select the file format.

The scraping run is complete

The scraping run is complete

Data export options

Data export options

Here's the final result in a spreadsheet.

The final spreadsheet

The final spreadsheet

That’s it – you are all set up and ready to focus on your web scraping tasks with Octoparse.

Conclusion

Combined with Oxylabs proxies, Octoparse can assist businesses in their data extraction operations. The tool is simple and doesn’t need any coding, yet fast and efficient. If you still have questions about the Octoparse proxy integration process, don’t hesitate to contact us.

Check for more similar web scraping tools like WebHarvy in our integrations.

Please be aware that this is a third-party tool not owned or controlled by Oxylabs. Each third-party provider is responsible for its own software and services. Consequently, Oxylabs will have no liability or responsibility to you regarding those services. Please carefully review the third party's policies and practices and/or conduct due diligence before accessing or using third-party services.

Frequently asked questions

How to use rotating IPs in Octoparse?

You can use Rotating Residential Proxies and set up the switch interval when configuring your proxy settings.

Get the latest news from data gathering world

I'm interested