Back to blog

Guide to Scraping Data from Websites to Excel with Web Query

Iveta Vistorskyte

2022-11-10
Share

Did you know that Microsoft Excel has a built-in feature to extract data from websites called Web Query? It allows you to perform web scraping and automatically collect public data from the internet.  

This article covers all the possible ways to extract public web data from websites with the help of Web Query. 

How web scraping in Excel works

Microsoft Web Query uses the Operating System's Web Browser, usually Internet Explorer in older Windows and Edge in the latest ones, to load the website. This allows Web Query to render JavaScript-heavy websites hassle-free. Once the website finishes loading, the Web Query will automatically parse the whole page and find all relevant static HTML tables. It will highlight all of the tables and make them selectable. Once the user picks the desired table, a Web Query helps extract the table data. 

How to scrape website data using Excel Web Query

Now that you've learned about Web Query, let's try to scrape data from the internet. In this tutorial, you'll see an example of how to scrape the "books.toscrape.com" website, extract the book's information and save it to an Excel spreadsheet. You'll also learn and explore various Web Query features along the way. Before starting the scraping process, you need to make sure that:

  1. You have an active internet connection;

  2. You have Microsoft Office installed so that you can use Microsoft Excel. If you don't have Microsoft Office, you can download and install it from here.

Once you have Microsoft Office installed, you can follow these steps:

Step 1: Open a blank spreadsheet

You need to open a blank spreadsheet in Microsoft Excel and click on "Data" from the menu.

Step 2: Click on "From Web" option

Once you click on "Data," you'll see a new menu with various options for extracting data. Pick a "From Web" option, and click on it. It will open a new window.

Step 3: Type the web page URL in the address bar, and click "Go"

You'll see an address bar in the "New Web Query" window. In this text box, you need to type the URL "https://books.toscrape.com" and click "Go."

Step 4: Navigate to the book page

After clicking a "Go" button, you'll see a website in the mini web browser. You can interact and browse the website here. Scroll down and click on a book link to open the book page.

Step 5: Select the desired table to scrape

If you scroll down a bit, you'll find a table on the book page. There will be a small yellow arrow icon that you can click. Clicking the button will select the table associated with it. Once the table is selected, click the "Import" button below.

Step 6: Select the "Existing worksheet" option and click "OK"

When you click the "Import" button, a small window will appear, similar to the screenshot below. Make sure the "Existing worksheet" option is selected and click "OK."

And that's it! Web Query will create a background process to run the web request and fetch the website. After fetching the website, it will parse the table and extract the data into the excel columns. The output will be similar to the example below.

Output

You can match it with the website and validate that all the data from the table is correct.

All the columns and rows will be linked to the web query, so whenever you refresh the data manually or automatically, Microsoft Excel will know which rows or columns to update. The next section explores multiple ways to refresh & update data. 

How to update and refresh data

There are mainly two different refresh mechanisms available for the Web Query: 

  1. Automatic

  2. Manual

In the automatic mode, Excel will periodically pull the data in the background and keep the sheet up to date. You can also customize the duration according to your needs – you'll find the explanation later in this article. 

Before that, let's explore multiple ways of manual data refreshes. First, let's delete a few items from the extracted data to validate the refresh works as expected.

Notice that we removed price, tax, and availability from the data.

Option 1: Click "Refresh" from the top menu

On the top menu, under the "Data" submenu, there's a button named "Refresh." You can pull the latest data by clicking this button or choosing the shortcut on your keyboard – the "CTRL + ALT + F5" button combination. Once you pull the latest data, the missing values of price, tax & availability are refreshed with the latest information.

Option 2: Refresh from the context menu

Let's remove a few fields again to test a different way of data refresh. This time, we'll use the "Refresh" button of the context menu instead of the menu button. First, you'll need to right-click on a cell and select "Refresh." Note that you'll have to click cells managed by Web Query. If you click on other cells, this "Refresh" button won't appear.

Option 3: Edit & rerun the Query

Now, we'll remove a few fields and then try another method to update the data. After removing some data, right-click on the cell, and you'll see an option "Edit Query." Click on it.

It will open a new window called "Edit Web Query." If you click "Import," the web query will run a background process to fetch the latest updates from the website and refresh the existing data, replacing obsolete data with new information.

This method is useful if you want to update the scraper, e.g., change the website URL, update the query to fetch a different table or page, etc.

Now, let's find out how to automate the refresh/update process.

Configure automatic refresh from properties

The steps of configuring the automatic refresh process from properties are pretty simple. First, open the context menu again by right-clicking on a cell associated with Web Query. From the menu, select "Data Range Properties."

This will open up the "External Data Range Properties" window, similar to the example below:

In this window, look for the "Refresh Controls" section. By customizing this section, you can automatically refresh the data. Go ahead and click the "Enable background refresh" checkbox. Then, by using the second checkbox, you can tell Excel to refresh data periodically. For example, if you set it to refresh every 5 minutes, Excel will automatically pull the data every 5 minutes using a background process and update the table automatically. 

If you check the third checkbox, "Refresh data when opening the file," every time you open the spreadsheet, Excel will pull the web data and update the sheet for you.

Conclusion

Web Query simplifies web data extraction in Excel, especially for websites with tables. It enables you to automate simple tasks and extract web data with minimal or no interaction. Web Query also allows you to scrape data from dynamic websites with JavaScript. For your convenience, you can also access this tutorial in our GitHub repository

However, Web Query isn't suitable for developing custom sophisticated web scrapers, capable of interacting with web elements or allowing proxy integration to perform large-scale scraping. In such cases, you have multiple alternatives, such as developing a web scraper using Python, Javascript, or Go. Last but not least, you can also use VBA script to interact with websites from Excel; however, it's not as flexible as the other options we mentioned earlier.

About the author

Iveta Vistorskyte

Lead Content Manager

Iveta Vistorskyte is a Lead Content Manager at Oxylabs. Growing up as a writer and a challenge seeker, she decided to welcome herself to the tech-side, and instantly became interested in this field. When she is not at work, you'll probably find her just chillin' while listening to her favorite music or playing board games with friends.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested

IN THIS ARTICLE


  • How web scraping in Excel works

  • How to scrape website data using Excel Web Query

  • How to update and refresh data

  • Configure automatic refresh from properties

  • Conclusion

Scale up your business with Oxylabs®