Automate Recurring Scraping and Parsing Jobs: Scheduler Overview

Roberta Aukstikalnyte

Last updated on

2023-03-20

3 min read

When it comes to gathering web data, automation is key, especially if you execute identical web scraping and parsing tasks periodically. Sending the exact same requests at regular intervals can get tedious, sometimes impossible to do if it’s micromanaged hands-on.

In today’s blog post, we’ll demonstrate how to configure recurring web scraping and parsing jobs with Scheduler. First, we’ll briefly introduce Scheduler, explaining what it is and what it does. Then, we’ll go through each step on how you can use it to your advantage.

What is Scheduler?

Scheduler is a feature used to automate recurring web scraping and parsing jobs by scheduling them. You can schedule at any interval – every minute, every five minutes, hourly, daily, every two days, and so on. It’s a feature included in all our Web Scraper API subscriptions without any additional charges.

With Scheduler, you no longer need to send new requests with the very same parameters. After you schedule a job, we take care of the rest, ensuring that the data arrives however frequently you like.

How to schedule a new job?

Before starting, we highly suggest using the Upload to Cloud Storage feature, so you can configure a schedule and get data sent to your storage without having to fetch results from our system.

Now, let’s get into the exact steps on how to use Scheduler. You can find all the endpoints, parameters, and their in-depth explanations in our documentation.

If you’d like to see a visualization of using Scheduler, see our video below.

Creating a schedule

To interact with our API and create a new schedule, you’ll need an API client – Postman, for example – a terminal, or any programming language with an HTTP requests library.

Now, to create a new job with Scheduler, add a payload specifying the following details:

1. Intervals

Set the intervals at which you want us to execute the scraping and parsing jobs. To do so, submit a cron schedule expression – it’s a command line used for scheduling tasks to run periodically at specified intervals (e.g., every Monday at 3 PM).

2. Parameters

Then, you should enter a set of scraping/parsing job parameters that will be executed at the time you just scheduled. Here, enter the url you want to scrape data from and a callback_url – once the job is done, we’ll send you a notification.

You can also give us a storage_url, and we’ll upload the scraped and parsed results to your cloud storage. To see the full list of parameter values along with their descriptions, see our documentation.

3. End time

Finally, use the end_time parameter and enter a future date and time for when Scheduler should stop working.

Results

After creating the schedule, you should be able to see these parameters along with their values in the output: schedule_id, active, items_count, cron, end_time, and next_run_at, indicating the task was successfully completed.

Other endpoints

Scheduler features several endpoints you can use once you’ve scheduled a job (or several of them.)

Summary

Scheduler is a powerful tool that does the heavy lifting for users with recurring scraping and parsing jobs. You can test its capabilities for one week for free by claiming a trial for our Web Scraper API.

We hope you found this article helpful, and if you have any questions regarding Scheduler, don’t hesitate to contact us via live chat on our website or send us an email.

Forget about complex web scraping processes

Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.

About the author

Roberta Aukstikalnyte

Former Senior Content Manager

Roberta Aukstikalnyte was a Senior Content Manager at Oxylabs. Having worked various jobs in the tech industry, she especially enjoys finding ways to express complex ideas in simple ways through content. In her free time, Roberta unwinds by reading Ottessa Moshfegh's novels, going to boxing classes, and playing around with makeup.

Learn more about Roberta Aukstikalnyte Learn more about Roberta Aukstikalnyte

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.