Back to blog

Guide to Scraping Data from Websites to Excel with Web Query

Iveta Vistorskyte

2024-04-056 min read
Share

Did you know that Microsoft Excel has a built-in feature to extract data from website to Excel using Web Query? It allows you to perform web scraping and automatically collect public data from the internet.  

This article covers all the possible ways to extract public web data from websites with the help of Web Query. 

How web scraping in Excel works

Microsoft Web Query uses the Operating System's Web Browser, usually Internet Explorer in older Windows and Edge in the latest ones, to load the website. This allows Web Query to render JavaScript-heavy websites hassle-free. Once the website finishes loading, the Web Query will automatically parse the whole page and find all relevant static HTML tables. It will highlight all of the tables and make them selectable. Once the user picks the desired table, a Web Query helps extract the table data. 

How to scrape website data using Web Query Excel

Now that you've learned about Web Query, let's try to scrape data from the internet. In this tutorial, you'll see an example of how to scrape the "books.toscrape.com" website, extract the book's information and save it to an Excel spreadsheet. You'll also learn and explore various Web Query features along the way. Before starting the scraping process, you need to make sure that:

  1. You have an active internet connection;

  2. You have Microsoft Office installed so that you can use Microsoft Excel. If you don't have Microsoft Office, you can download and install it from here.

Once you have Microsoft Office installed, you can follow these steps:

Step 1: Open a blank spreadsheet

You need to open a blank spreadsheet in Microsoft Excel and click on "Data" from the menu.

Click on "Data" from the menu

Step 2: Click on "From Web" option

Once you click on "Data," you'll see a new menu with various options for extracting data. Pick a "From Web" option, and click on it. It will open a new window.

Pick a "From Web" option, and click on it

Step 3: Type the web page URL in the address bar, and click "Go"

You'll see an address bar in the "New Web Query" window. In this text box, you need to type the URL "https://books.toscrape.com" and click "Go."

Type the URL "https://books.toscrape.com" and click "Go"

Step 4: Navigate to the book page

After clicking a "Go" button, you'll see a website in the mini web browser. You can interact and browse the website here. Scroll down and click on a book link to open the book page.

Scroll down and click on a book link to open the book page

Step 5: Select the desired table to scrape

If you scroll down a bit, you'll find a table on the book page. There will be a small yellow arrow icon that you can click. Clicking the button will select the table associated with it. Once the table is selected, click the "Import" button below.

Select the desired table to scrape

Step 6: Select the "Existing worksheet" option and click "OK"

When you click the "Import" button, a small window will appear, similar to the screenshot below. Make sure the "Existing worksheet" option is selected and click "OK."

Select the "Existing worksheet" option and click "OK"

And that's it! Web Query will create a background process to run the web request and fetch the website. After fetching the website, it will parse the table and extract the data into the excel columns. The output will be similar to the example below.

Output

You can match it with the website and validate that all the data from the table is correct.

Output

All the columns and rows will be linked to the web query, so whenever you refresh the data manually or automatically, Microsoft Excel will know which rows or columns to update. The next section explores multiple ways to refresh & update data. 

How to update and refresh data

There are mainly two different refresh mechanisms available for the Web Query: 

  1. Automatic

  2. Manual

In the automatic mode, Excel will periodically pull the data in the background and keep the sheet up to date. You can also customize the duration according to your needs – you'll find the explanation later in this article. 

Before that, let's explore multiple ways of manual data refreshes. First, let's delete a few items from the extracted data to validate the refresh works as expected.

Delete a few items from the extracted data

Notice that we removed price, tax, and availability from the data.

Option 1: Click "Refresh" from the top menu

On the top menu, under the "Data" submenu, there's a button named "Refresh." You can pull the latest data by clicking this button or choosing the shortcut on your keyboard – the "CTRL + ALT + F5" button combination. Once you pull the latest data, the missing values of price, tax & availability are refreshed with the latest information.

You can pull the latest data by clicking the "Refresh" button

Option 2: Refresh from the context menu

Let's remove a few fields again to test a different way of data refresh. This time, we'll use the "Refresh" button of the context menu instead of the menu button. First, you'll need to right-click on a cell and select "Refresh." Note that you'll have to click cells managed by Web Query. If you click on other cells, this "Refresh" button won't appear.

Right-click on a cell and select "Refresh"

Option 3: Edit & rerun the Query

Now, we'll remove a few fields and then try another method to update the data. After removing some data, right-click on the cell, and you'll see an option "Edit Query." Click on it.

 Right-click on the cell, and you'll see an option "Edit Query"

It will open a new window called "Edit Web Query." If you click "Import," the web query will run a background process to fetch the latest updates from the website and refresh the existing data, replacing obsolete data with new information.

This method is useful if you want to update the scraper

This method is useful if you want to update the scraper, e.g., change the website URL, update the query to fetch a different table or page, etc.

Now, let's find out how to automate the refresh/update process.

Configure automatic refresh from properties

The steps of configuring the automatic refresh process from properties are pretty simple. First, open the context menu again by right-clicking on a cell associated with Web Query. From the menu, select "Data Range Properties."

From the menu, you need to select "Data Range Properties"

This will open up the "External Data Range Properties" window, similar to the example below:

You need to look for the "Refresh Controls" section

In this window, look for the "Refresh Controls" section. By customizing this section, you can automatically refresh the data. Go ahead and click the "Enable background refresh" checkbox. Then, by using the second checkbox, you can tell Excel to refresh data periodically. For example, if you set it to refresh every 5 minutes, Excel will automatically pull the data every 5 minutes using a background process and update the table automatically. 

If you check the third checkbox, "Refresh data when opening the file," every time you open the spreadsheet, Excel will pull the web data and update the sheet for you.

Setting up proxies

As an additional step, you can also configure proxy servers via your computer settings. Often, proxies are an inseparable part of the web scraping process, allowing you to change your IP address or even rotate between them. This way, you can elevate the web scraping process and avoid getting your own IP banned. 

If you use Windows, here are the instructions for manually configuring proxies on your PC:

  • Go to this website to see what your original IP address is. 

  • From your proxy provider, you’ll need to get the IP address (or the proxy server name) and port.

  • Select the Start button, then select Settings, Network & internet, Proxy.

  • Under Manual proxy setup, next to Use a proxy server, select Set up.

  • In the Edit proxy server dialog box, do the following:

  • Turn on Use a proxy server.

  • In the Proxy IP address and Port boxes, enter the proxy server name or IP address and port (optional) in the respective boxes.

  • Select or clear the Don’t use the proxy server for local (intranet) addresses check box.
    Select Save.

  • Check if your IP address has changed on this website.  

If you use macOS: 

  • Go to this website to see what your original IP address is. 

  • Get the IP address (or server name) and port from your proxy provider. 

  • Click on the magnifying glass in the top-right corner and enter System Settings into the search field.

  • On the left side, click Network

  • Select a network service on the right.

  • Click Details, then Proxies

  • Depending on your proxy type, toggle the necessary button. A window will open up, requiring you to enter the IP address and port. Click OK.

  • Again go to this website to see if your IP address has changed. 

Conclusion

Web Query simplifies web data extraction in Excel, especially for websites with tables. It enables you to automate simple tasks and extract web data with minimal or no interaction. Web Query also allows you to scrape data from dynamic websites with JavaScript. For your convenience, you can also access this tutorial in our GitHub repository

However, Web Query isn't suitable for developing custom sophisticated web scrapers, capable of interacting with web elements or allowing proxy integration to perform large-scale scraping. In such cases, you have multiple alternatives, such as developing a web scraper using Python, Javascript, or Go.

Get a free trial for Web Scraper API 

Sign up for a 7-day trial now via our self-service dashboard. No hidden fees, cancel anytime.

  • 5K results
  • Data from the majority of websites
  • Custom Parser available
  • Last but not least, you can also use VBA script to interact with websites from Excel; however, it's not as flexible as the other options we mentioned earlier.

    Frequently asked questions

    How do I extract data from a website to Excel?

    To extract data from a website to Excel, you can use various methods, including manual copy-pasting or using web scraping tools like Oxylabs Web Scraper API (you can get a free trial for it) or Excel's built-in web query feature. If your project is on a smaller scale and you have some programming knowledge, you can also build your own web scraping script with languages like Python or JavaScript.

    How do I scrape a table from a website to Excel?

    To scrape a table from a website to Excel, you can use web scraping tools (e.g., Web Scraper API) or build your own script using Python. 

    First, you have to inspect the HTML structure of the webpage to locate the table element. Then, use the web scraping tool to select and extract the table data. Finally, export the extracted data to Excel format using the tool's export feature, or copy-paste it directly into Excel for further formatting and analysis.

    Can Excel pull data from a URL?

    Yes, indeed, Excel can pull data from a URL using its built-in web query feature. You can create a web query by selecting Data > Get Data > From Other Sources > From Web in Excel. Simply enter the URL from which you want to pull data, specify the data you want to import, and Excel will retrieve and display the data in a new workshee

    About the author

    Iveta Vistorskyte

    Lead Content Manager

    Iveta Vistorskyte is a Lead Content Manager at Oxylabs. Growing up as a writer and a challenge seeker, she decided to welcome herself to the tech-side, and instantly became interested in this field. When she is not at work, you'll probably find her just chillin' while listening to her favorite music or playing board games with friends.

    All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

    Related articles

    Get the latest news from data gathering world

    I’m interested