Back to blog
Did you know that Microsoft Excel has a built-in feature to extract data from website to Excel using Web Query? It allows you to perform web scraping and automatically collect public data from the internet.
This article covers all the possible ways to extract public web data from websites with the help of Web Query.
Now that you've learned about Web Query, let's try to scrape data from the internet. In this tutorial, you'll see an example of how to scrape the "books.toscrape.com" website, extract the book's information and save it to an Excel spreadsheet. You'll also learn and explore various Web Query features along the way. Before starting the scraping process, you need to make sure that:
You have an active internet connection;
You have Microsoft Office installed so that you can use Microsoft Excel. If you don't have Microsoft Office, you can download and install it from here.
Once you have Microsoft Office installed, you can follow these steps:
Step 1: Open a blank spreadsheet
You need to open a blank spreadsheet in Microsoft Excel and click on "Data" from the menu.
Step 2: Click on "From Web" option
Once you click on "Data," you'll see a new menu with various options for extracting data. Pick a "From Web" option, and click on it. It will open a new window.
Step 3: Type the web page URL in the address bar, and click "Go"
You'll see an address bar in the "New Web Query" window. In this text box, you need to type the URL "https://books.toscrape.com" and click "Go."
Step 4: Navigate to the book page
After clicking a "Go" button, you'll see a website in the mini web browser. You can interact and browse the website here. Scroll down and click on a book link to open the book page.
Step 5: Select the desired table to scrape
If you scroll down a bit, you'll find a table on the book page. There will be a small yellow arrow icon that you can click. Clicking the button will select the table associated with it. Once the table is selected, click the "Import" button below.
Step 6: Select the "Existing worksheet" option and click "OK"
When you click the "Import" button, a small window will appear, similar to the screenshot below. Make sure the "Existing worksheet" option is selected and click "OK."
And that's it! Web Query will create a background process to run the web request and fetch the website. After fetching the website, it will parse the table and extract the data into the excel columns. The output will be similar to the example below.
You can match it with the website and validate that all the data from the table is correct.
All the columns and rows will be linked to the web query, so whenever you refresh the data manually or automatically, Microsoft Excel will know which rows or columns to update. The next section explores multiple ways to refresh & update data.
There are mainly two different refresh mechanisms available for the Web Query:
In the automatic mode, Excel will periodically pull the data in the background and keep the sheet up to date. You can also customize the duration according to your needs – you'll find the explanation later in this article.
Before that, let's explore multiple ways of manual data refreshes. First, let's delete a few items from the extracted data to validate the refresh works as expected.
Notice that we removed price, tax, and availability from the data.
Option 1: Click "Refresh" from the top menu
On the top menu, under the "Data" submenu, there's a button named "Refresh." You can pull the latest data by clicking this button or choosing the shortcut on your keyboard – the "CTRL + ALT + F5" button combination. Once you pull the latest data, the missing values of price, tax & availability are refreshed with the latest information.
Option 2: Refresh from the context menu
Let's remove a few fields again to test a different way of data refresh. This time, we'll use the "Refresh" button of the context menu instead of the menu button. First, you'll need to right-click on a cell and select "Refresh." Note that you'll have to click cells managed by Web Query. If you click on other cells, this "Refresh" button won't appear.
Option 3: Edit & rerun the Query
Now, we'll remove a few fields and then try another method to update the data. After removing some data, right-click on the cell, and you'll see an option "Edit Query." Click on it.
It will open a new window called "Edit Web Query." If you click "Import," the web query will run a background process to fetch the latest updates from the website and refresh the existing data, replacing obsolete data with new information.
This method is useful if you want to update the scraper, e.g., change the website URL, update the query to fetch a different table or page, etc.
Now, let's find out how to automate the refresh/update process.
The steps of configuring the automatic refresh process from properties are pretty simple. First, open the context menu again by right-clicking on a cell associated with Web Query. From the menu, select "Data Range Properties."
This will open up the "External Data Range Properties" window, similar to the example below:
In this window, look for the "Refresh Controls" section. By customizing this section, you can automatically refresh the data. Go ahead and click the "Enable background refresh" checkbox. Then, by using the second checkbox, you can tell Excel to refresh data periodically. For example, if you set it to refresh every 5 minutes, Excel will automatically pull the data every 5 minutes using a background process and update the table automatically.
If you check the third checkbox, "Refresh data when opening the file," every time you open the spreadsheet, Excel will pull the web data and update the sheet for you.
About the author
Lead Content Manager
Iveta Vistorskyte is a Lead Content Manager at Oxylabs. Growing up as a writer and a challenge seeker, she decided to welcome herself to the tech-side, and instantly became interested in this field. When she is not at work, you'll probably find her just chillin' while listening to her favorite music or playing board games with friends.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Forget about complex web scraping processes
Choose Oxylabs' advanced web scraping solutions to gather real-time public data hassle-free.
Scale up your business with Oxylabs®
GET IN TOUCH
Certified data centers and upstream providers
Connect with us
Advanced proxy solutions
oxylabs.io© 2023 All Rights Reserved