Extracting data might sometimes be troublesome when it comes to building and maintaining customized web scrapers. Yet, with the growing significance of data collection, more easy-to-use tools that aid in data extraction are being developed, one of which is ParseHub. In this article, we’ll show you how to make the best out of this tool by integrating it with Oxylabs Residential or Datacenter Proxies.
Click the video below if you'd like to see the integration process on YouTube:
ParseHub is a convenient low-cost tool for scraping public data from websites. It allows users to extract information to easy-to-read spreadsheets or APIs.
Settings configuration in ParseHub is a pretty straightforward process. Before getting started, visit Oxylabs dashboard and add your IP address to the whitelist under Residential Proxies > Whitelist. If you haven’t created your user yet, do that in the Residential Proxies > Users section as well. You may need your user credentials for authentication in ParseHub later on. For Datacenter Proxies you should note that the port may be changing due to various reasons, such as Proxy Rotator, SOCKS connection, using Proxies via whitelist etc. therefore see our Datacenter Proxy documentation for more information. Then, follow the steps below:
1. Download ParseHub and install it on your computer.
2. Launch ParseHub.
3. Create a new project from your home screen by clicking on the “+ New Project” button.
4. Insert a URL from which you would like to scrape public data from. In this example, we’ll use oxylabs.io. Then, press “Start project on this URL”.
5. Wait for the project to be ready and switch to the “Browse” mode.
6. Once the “Browse” slider is green, open the drop-down list on the top-right side and click on “Preferences”.
7. Select “Advanced”, click on the “Network” tab, and choose “Settings”.
8. Select “Manual proxy configuration”. Insert the given details below, for specific proxy type configuration.
Proxy type: HTTP, HTTPS, or SOCKS5
IP/Host: pr.oxylabs.io
Port: 7777
You can also use country-specific entries. For example, entering ie-pr.oxylabs.io under IP/Host and 25000 under Port will acquire an Irish exit node. Please refer to our documentation for a complete list of country-specific entry nodes or if you need a sticky session.
8.1. Above is an example of how Residential Proxies can be integrated. For Datacenter proxies, there is only a minor change.
Specify the following if you purchased Dedicated Datacenter Proxies via sales.
Proxy type: HTTP or SOCKS5
IP/Host: a specific IP address (e.g., 1.2.3.4)
Port: 60000
For Enterprise Dedicated Datacenter Proxies, you’ll have to choose an IP address from the acquired list. Visit our documentation for more details.
Specify the following if you purchased Dedicated Datacenter Proxies via the dashboard.
Proxy type: HTTP or HTTPS
IP/Host: ddc.oxylabs.io
Port: 8001
For Self-Service Dedicated Datacenter Proxies, the port indicates the sequential number of an IP address from the acquired list. Check our documentation for more details.
Proxy type: HTTP, HTTPS, SOCKS5
IP/Host: dc.oxylabs.io
Port: 8001
With the pay-per-IP subscription, the port represents the sequential number assigned to an IP address from the given list, so port 8001 will use the first IP address on your proxy list. For further information, please check our documentation.
With the pay-per-traffic subscription, port 8001 will pick a random IP address, but it will stay consistent for the session duration. You can specify the proxy's geo-location within the user authentication string, such as user-USERNAME-country-US:PASSWORD to connect to a US proxy. Refer to our documentation for more information.
Proxy type: HTTP, HTTPS, or SOCKS5
IP/Host: isp.oxylabs.io
Port: 8001
If you’re using a Proxy Rotator, put clientname.oxylabs.io in the HTTP Proxy field and 60000 in the Port field. Click OK to save the settings.
9. Afterwards, open both a new tab and any website, where you’ll get a message like this (the photo is taken from Parsehub, your exact details will be different):
10. A general formula for your custom proxy format is this:
1. Once you’ve started your project, go to the settings button at the top and click on it.
2. After seeing the Settings menu pop up, you’ll notice a checkbox right next to the “Rotate IP address” text. Note that this premium Parsehub feature will require you to choose a paid plan.
Further below, the “Custom Proxies” textbox will be present. There you should paste your proxy with the realm which was obtained earlier in the tutorial (the custom proxy provided here is merely an example taken from Parsehub, yours will be different):
If you have multiple custom proxies you wish to rotate, all of them can be added to the Custom Proxies field on a new line.
3.After setting these proxies up on your account, you can save and run the project.
That’s it! You’ve successfully integrated Oxylabs Residential or Datacenter Proxies with ParseHub. To make sure it’s working properly, check your IP address on a database such as whatismyipaddress.com.
ParseHub, in combination with Oxylabs Residential and Datacenter Proxies, might serve as a powerful tool for public data scraping.
If you come up with any questions about Oxylabs proxy integration, don’t hesitate to drop us a line at your convenience.
About the author
Jolita Pundzaite
Senior Product Marketing Manager
Jolita Pundzaite is a Senior Product Marketing Manager at Oxylabs. With almost 10 years of experience in marketing and tech, Jolita likes to call herself “a jack of all trades”, constantly looking for ways to improve people's lives through technology. She loves reading books, travelling to distant places, experimenting in a kitchen or sliding down the mountain on a snowboard. When she is not at work, most probably you will find Jolita doing some HIIT workouts or simply chilling in an ice hole testing out her limits.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Maryia Stsiopkina
2024-05-15
Augustas Pelakauskas
2024-01-19
Get the latest news from data gathering world
Scale up your business with Oxylabs®