Back to blog
When it comes to gathering web data, automation is key, especially if you execute identical web scraping and parsing tasks periodically. Sending the exact same requests at regular intervals can get tedious, sometimes impossible to do if it’s micromanaged hands-on.
In today’s blog post, we’ll demonstrate how to configure recurring web scraping and parsing jobs with Scheduler. First, we’ll briefly introduce Scheduler, explaining what it is and what it does. Then, we’ll go through each step on how you can use it to your advantage.
Scheduler is a feature used to automate recurring web scraping and parsing jobs by scheduling them. You can schedule at any interval – every minute, every five minutes, hourly, daily, every two days, and so on. It’s a feature included in all of our Scraper API subscriptions without any additional charges.
With Scheduler, you no longer need to send new requests with the very same parameters. After you schedule a job, we take care of the rest, ensuring that the data arrives however frequently you like.
Before starting, we highly suggest using the Upload to Cloud Storage feature, so you can configure a schedule and get data sent to your storage without having to fetch results from our system.
Now, let’s get into the exact steps on how to use Scheduler. You can find all the endpoints, parameters, and their in-depth explanations in our documentation.
If you’d like to see a visualization of using Scheduler, see our video below.
To interact with our API and create a new schedule, you’ll need an API client – Postman, for example – a terminal, or any programming language with an HTTP requests library.
Now, to create a new job with Scheduler, add a payload specifying the following details:
Set the intervals at which you want us to execute the scraping and parsing jobs. To do so, submit a cron schedule expression – it’s a command line used for scheduling tasks to run periodically at specified intervals (e.g., every Monday at 3 PM).
Then, you should enter a set of scraping/parsing job parameters that will be executed at the time you just scheduled. Here, enter the url you want to scrape data from and a callback_url – once the job is done, we’ll send you a notification.
You can also give us a storage_url, and we’ll upload the scraped and parsed results to your cloud storage. To see the full list of parameter values along with their descriptions, see our documentation.
3. End time
Finally, use the end_time parameter and enter a future date and time for when Scheduler should stop working.
After creating the schedule, you should be able to see these parameters along with their values in the output: schedule_id, active, items_count, cron, end_time, and next_run_at, indicating the task was successfully completed.
Scheduler features several endpoints you can use once you’ve scheduled a job (or several of them.)
Scheduler is a powerful tool that does the heavy lifting for users with recurring scraping and parsing jobs. You can test its capabilities for one week for free by claiming a trial for one of our Scraper APIs.
We hope you found this article helpful, and if you have any questions regarding Scheduler, don’t hesitate to contact us via live chat on our website or send us an email.
About the author
Senior Content Manager
Roberta Aukstikalnyte is a Senior Content Manager at Oxylabs. Having worked various jobs in the tech industry, she especially enjoys finding ways to express complex ideas in simple ways through content. In her free time, Roberta unwinds by reading Ottessa Moshfegh's novels, going to boxing classes, and playing around with makeup.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Forget about complex web scraping processes
Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.
Scale up your business with Oxylabs®
GET IN TOUCH
Certified data centers and upstream providers
Connect with us
Advanced proxy solutions
oxylabs.io© 2023 All Rights Reserved