Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network statusCareers

Back to blog

Automate Recurring Scraping and Parsing Jobs: Scheduler Overview

scheduler

Roberta Aukstikalnyte

2023-03-203 min read
Share

When it comes to gathering web data, automation is key, especially if you execute identical web scraping and parsing tasks periodically. Sending the exact same requests at regular intervals can get tedious, sometimes impossible to do if it’s micromanaged hands-on. 

In today’s blog post, we’ll demonstrate how to configure recurring web scraping and parsing jobs with Scheduler. First, we’ll briefly introduce Scheduler, explaining what it is and what it does. Then, we’ll go through each step on how you can use it to your advantage.  

What is Scheduler? 

Scheduler is a feature used to automate recurring web scraping and parsing jobs by scheduling them. You can schedule at any interval – every minute, every five minutes, hourly, daily, every two days, and so on. It’s a feature included in all of our Scraper API subscriptions without any additional charges. 

With Scheduler, you no longer need to send new requests with the very same parameters. After you schedule a job, we take care of the rest, ensuring that the data arrives however frequently you like.

How to schedule a new job? 

Before starting, we highly suggest using the Upload to Cloud Storage feature, so you can configure a schedule and get data sent to your storage without having to fetch results from our system.

Now, let’s get into the exact steps on how to use Scheduler. You can find all the endpoints, parameters, and their in-depth explanations in our documentation.

If you’d like to see a visualization of using Scheduler, see our video below. 

Creating a schedule 

To interact with our API and create a new schedule, you’ll need an API client – Postman, for example – a terminal, or any programming language with an HTTP requests library.  

Now, to create a new job with Scheduler, add a payload specifying the following details:

1. Intervals

Set the intervals at which you want us to execute the scraping and parsing jobs. To do so, submit a cron schedule expression – it’s a command line used for scheduling tasks to run periodically at specified intervals (e.g., every Monday at 3 PM).

2. Parameters 

Then, you should enter a set of scraping/parsing job parameters that will be executed at the time you just scheduled. Here, enter the url you want to scrape data from and a callback_url – once the job is done, we’ll send you a notification.

You can also give us a storage_url, and we’ll upload the scraped and parsed results to your cloud storage. To see the full list of parameter values along with their descriptions, see our documentation.

 3. End time 

Finally, use the end_time parameter and enter a future date and time for when Scheduler should stop working. 

Results

After creating the schedule, you should be able to see these parameters along with their values in the output: schedule_id, active, items_count, cron, end_time, and next_run_at, indicating the task was successfully completed. 

Other endpoints 

Scheduler features several endpoints you can use once you’ve scheduled a job (or several of them.) 

Summary

Scheduler is a powerful tool that does the heavy lifting for users with recurring scraping and parsing jobs. You can test its capabilities for one week for free by claiming a trial for one of our Scraper APIs. 

We hope you found this article helpful, and if you have any questions regarding Scheduler, don’t hesitate to contact us via live chat on our website or send us an email.

About the author

Roberta Aukstikalnyte

Senior Content Manager

Roberta Aukstikalnyte is a Senior Content Manager at Oxylabs. Having worked various jobs in the tech industry, she especially enjoys finding ways to express complex ideas in simple ways through content. In her free time, Roberta unwinds by reading Ottessa Moshfegh's novels, going to boxing classes, and playing around with makeup.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

sd

Get the latest news from data gathering world

IN THIS ARTICLE:


  • What is Scheduler? 


  • How to schedule a new job? 


  • Other endpoints 


  • Summary

Forget about complex web scraping processes

Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.

Scale up your business with Oxylabs®