To be of recurrent use, a Python web scraping script would have to be automated to run repeatedly and periodically. There are some ways to do this in Python directly, although a more accessible alternative is available. MS Windows has a built-in tool, Windows Task Scheduler, for running applications automatically.
In this article, you’ll learn how to set up Windows Task Scheduler (Task Scheduler) to schedule a Python script automatically and periodically.
If you’re using macOS or Linux, you can use cron instead of Windows Task Scheduler. For more details, see how to automate web scraping with Python and cron.
Before configuring the Task Scheduler, follow these guidelines to prepare your web scraping script and avoid the most common errors.
Here are the three tips to follow when preparing a Python script.
Use a virtual environment to ensure that the correct Python version and all the required libraries are available when you run your Python web scraper.
Use the absolute file paths to ensure that the script doesn’t break due to missing files.
Use loggers to redirect output to a file.
After importing the logging module, you can configure logging with only one line of code:
After this, you can write to the log file as follows:
logging.info("informational message here")
You can read more about logging here.
In this article, the following Python web scraping script will be used:
from bs4 import BeautifulSoup
url = 'https://sandbox.oxylabs.io/products/1'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
price = soup.select_one('div.price').text
with open(r'c:\scraper\data.csv','a') as f:
Every time you run this script, it’ll append the latest price in a new line to the CSV file.
Even though running a Python script without creating a .bat (batch) file is possible, doing so is highly recommended. A batch file will give you better control over running the scraper.
Here is what a typical batch file would look like:
cmd /k "cd /d C:\scraper & venv\Scripts\python.exe price.py"
Note that there are two parts of this command – the first is the command cmd, and the second contains multiple commands.
Here is a quick explanation of the command:
cmd /k – creates a new command-line environment, executes the commands, and then terminates them.
cd /d c:\scraper – changes the current directory and drive to the folder where you have placed the Python executable.
venv\Scripts\python.exe price.py – runs the Python web scraper.
Now, as you have cd in the directory, you can use the relative file paths. If you were to skip this step, specify the full path of the Python executable and the Python script file.
Once the batch file is ready, locate the Task Scheduler through Windows Search.
Locating Windows Task Scheduler
From the right panel of the Task Scheduler, select Create Task.
Create Task (not Basic Task)
Note that you shouldn’t select Create Basic Task due to limited options.
The Create Task window contains various settings across the General, Triggers, Actions, Conditions, and Settings tabs.
The General tab has two settings to check. The first one is the name of the task that you’re creating. Choose a name that helps you remember its purpose.
Run whether user is logged on or not is an essential setting. Select it if you want your web scraper to run even when you are not logged on. Note that if you select this option, you’ll be asked for your password towards the end of the task creation process.
Next, open the Triggers tab. In this tab, click the New button. You’ll see the New Trigger window.
Trigger defines when the scraper runs
For example, you want the web scraper to run every hour. To do so, select Daily in Settings. In the Advanced settings, choose Repeat task every and set 1 hour.
For error handling, select Stop task if it runs longer than and choose a value.
Ensure that Enabled is selected and click OK to save the trigger.
Next, open the Actions tab and click New. You’ll see the New Action window.
Actions define what has to be done
In this window, maintain the action Start a program and specify the full path of the batch script.
As defined previously, the batch file contains only one line:
cmd /k "cd /d C:\scraper & venv\Scripts\python.exe price.py"
Important: If you have spaces in your path, surround the entire path with double quotes.
Alternatively, you can still run a Python script directly if you aren’t using a batch file.
In the Program/script field, specify the complete path of the python.exe file. In the Add arguments textbox, indicate the complete path of the Python script. Once again, make sure that if you have spaces in any of these paths, surround them with double quotes.
Lastly, if you’re using relative paths anywhere in the code, it would be a good idea to set the Start in textbox to the folder where you have the Python script. You can skip the Conditions tab and jump directly to the tab. The suggested settings are highlighted below.
All the settings are self-explanatory and optional. You can choose to select the defaults and click OK.
Once you click OK, you’ll be asked for Windows credentials. Enter your username and password and click OK to save the task. The web scraper will be executed during the next scheduled run.
The most common setback is when the Task Scheduler can’t find the python.exe file. To fix this, find the complete path of the Python executable file. Open the Command Prompt and run the following command:
The output of this command will be one or more lines. Each line contains the full path of the Python executable:
Take a note of the Python executable that you want to use. Unless you’re using virtual environments, you must specify the complete path of the python.exe file.
Another common reason for failure is when the script path is incorrect. When working with the Task Scheduler, always use absolute paths.
Lastly, surround the path with double quotes, especially when you have spaces in the path or file names.
Windows Task Scheduler is a tool exclusive to Windows. If you use macOS or Linux, the Cron tool is the most prominent alternative.
Cron doesn’t have a user interface, but it’s capable of doing almost everything that the Task Scheduler can do. All you have to do is run crontab -e from the terminal and enter the details. The process is covered in detail in another blog post.
Some other Linux-specific tools available on Linux distributions are systemd (read as system-d) and Anacron.
Thanks to software found in every PC running Windows, the in-Python task scheduler may be an unnecessary burden. Automating Python web scraping scripts with Windows Task Scheduler is a quick and easy-to-use solution for running tasks at predefined intervals, even while the computer is sleeping.
Exclusive to Windows, the Task Scheduler has an analogue automation solution on macOS and Linux called Cron.
If you're looking for an easy way to scrape websites, check out our daily proxies for scraping and Scraper API solutions that's specifically built to get the data you need with ease. Don’t forget to take a look at other solutions for Python web scraping, such as how to make it faster and for specific use cases - how to build a price tracker or how to automate competitor analysis.
Windows has a better and easier-to-use tool - Windows Task Scheduler. Windows Task Scheduler is an alternative to Cron, available on Unix and Unix-like operating systems such as macOS and Linux. Windows Task Scheduler has, in fact, more features than Cron.
Yes, it does. When creating a new task or editing an existing one, go to the Conditions tab and select Wake the computer to run this task.
Note that this will work only if the computer is sleeping. If you shut down the computer, Windows Task Scheduler can’t start the computer to run this task.
About the author
Augustas Pelakauskas is a Senior Copywriter at Oxylabs. Coming from an artistic background, he is deeply invested in various creative ventures - the most recent one being writing. After testing his abilities in the field of freelance journalism, he transitioned to tech content creation. When at ease, he enjoys sunny outdoors and active recreation. As it turns out, his bicycle is his fourth best friend.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Forget about complex web scraping processes
Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.
Scale up your business with Oxylabs®
GET IN TOUCH
Certified data centers and upstream providers
Connect with us
Advanced proxy solutions