In this tutorial, we’ll guide you through the Octoparse integration process with Oxylabs Datacenter and Residential Proxies to ensure a quick start and smooth web scraping.
Octoparse is a simple-to-use data extraction tool. It allows you to scrape public data without coding and bypass most anti-scraping mechanisms by enabling automatic IP rotation and extended session time. Amplified by the advanced machine learning algorithms, Octoparse quickly locates the data when you click on it. It handles complex websites and captures all kinds of data, including texts, links, image URLs, and HTML codes.
1. Download, install, and launch Octoparse.
2. To create a new task, click +New in the top-left corner and choose Custom Task.
Setting up a custom task
3. Type the URL of the webpage you intend to extract data from in the URL Input and click Save. Let's use Oxylabs scraping sandbox as an example.
Adding the target URL
4. After your selected URL loads, go to Task Settings > Anti-blocking. Now, check Access websites via proxies, enable Use my own proxies, and click Configure.
Task settings
Anti-blocking
Access websites via proxies > Use my own proxies > Configure
5. Configure your Oxylabs proxies by specifying the following format:
IP/host:port:username:password
IP/host: pr.oxylabs.io
Port: 7777
You can also use country-specific entries. For example, entering ie-pr.oxylabs.io under IP/host and 25000 under Port will acquire an Irish exit node. Please refer to our documentation for a complete list of country-specific entry nodes or if you need a sticky session.
IP/host: a specific IP address (e.g., 1.2.3.4)
Port: 60000
For Enterprise Dedicated Datacenter Proxies, you’ll have to choose an IP address from the acquired list. Visit our documentation for more details.
IP/host: ddc.oxylabs.io
Port: 8001
For Self-Service Dedicated Datacenter Proxies, the port indicates the sequential number of an IP address from the acquired list. Check our documentation for more details.
IP/host: dc.oxylabs.io
Port: 8001
For the Pay-per-IP subscription, the port corresponds to the sequential number assigned to an IP address from the provided list. Hence, port 8001 uses the first IP address on your list. See our documentation for further details.
For the Pay-per-traffic subscription, port 8001 randomly selects an IP address but remains consistent throughout a session. You can also specify geo-location, such as the US, in the user authentication string: user-USERNAME-country-US:PASSWORD. For more details, see our documentation.
IP/host: isp.oxylabs.io
Port: 8001
Proxy settings
6. Depending on whether you use a rotating or sticky session type, set up the Switch interval.
7. Enter your Oxylabs proxy user’s credentials. Use the same username and password you entered to create a new proxy user in the Oxylabs dashboard.
Proxies are now set up.
1. Select the desirable elements (video game titles) you want to scrape. To extract all elements from the same category, choose Select all similar elements and specify Text.
Selecting all similar elements
Text elements
2. Set up pagination to scrape multiple pages. This particular website uses numbered pages, prompting you to choose Next page button.
Next page button
3. Choose the exact button in the page layout that opens the following page – Forward – to automate pagination.
The Forward button
4. Complete the scraping setup and press ▶Run.
Completing the setup and running the scraper
5. Choose Run on your device with Standard Mode to receive data as a file on your PC.
Running on a local device
6. Let the scraping process run until complete. The process will be over when the final product page is reached or when you stop it manually.
Scraping progress
7. Extract the collected data and select the file format.
The scraping run is complete
Data export options
Here's the final result in a spreadsheet.
The final spreadsheet
That’s it – you are all set up and ready to focus on your web scraping tasks with Octoparse.
Combined with Oxylabs proxies, Octoparse can assist businesses in their data extraction operations. The tool is simple and doesn’t need any coding, yet fast and efficient. If you still have questions about the Octoparse proxy integration process, don’t hesitate to contact us.
Check for more similar web scraping tools like WebHarvy in our integrations.
Please be aware that this is a third-party tool not owned or controlled by Oxylabs. Each third-party provider is responsible for its own software and services. Consequently, Oxylabs will have no liability or responsibility to you regarding those services. Please carefully review the third party's policies and practices and/or conduct due diligence before accessing or using third-party services.
You can use Rotating Residential Proxies and set up the switch interval when configuring your proxy settings.
CEO's Guide to Data Extraction
Data extraction is at the core of many different businesses, from finance to e-commerce companies and everything in between. But what is data extraction? What is it used for and what challenges does it bring? Read all about it here.
How to Extract Data from A Website?
Making data-driven business decisions nowadays is the number one priority for many companies. If you are interested in this field, you should learn how to extract data from websites. Check out!
Scraping Data from Etsy: A Comprehensive Guide for Data Extraction
Follow this Python scraping guide to obtain Etsy product data with ease.
Get the latest news from data gathering world