Back to blog
Guide to Scraping Data from Websites to Excel with Web Query
Iveta Vistorskyte
Back to blog
Iveta Vistorskyte
Did you know that Microsoft Excel has a built-in feature to extract data from website to Excel using Web Query? It allows you to perform web scraping and automatically collect public data from the internet.
This article covers all the possible ways to extract public web data from websites with the help of Web Query.
Microsoft Web Query uses the Operating System's Web Browser, usually Internet Explorer in older Windows and Edge in the latest ones, to load the website. This allows Web Query to render JavaScript-heavy websites hassle-free. Once the website finishes loading, the Web Query will automatically parse the whole page and find all relevant static HTML tables. It will highlight all of the tables and make them selectable. Once the user picks the desired table, a Web Query helps extract the table data.
Now that you've learned about Web Query, let's try to scrape data from the internet. In this tutorial, you'll see an example of how to scrape the "books.toscrape.com" website, extract the book's information and save it to an Excel spreadsheet. You'll also learn and explore various Web Query features along the way. Before starting the scraping process, you need to make sure that:
You have an active internet connection;
You have Microsoft Office installed so that you can use Microsoft Excel. If you don't have Microsoft Office, you can download and install it from here.
Once you have Microsoft Office installed, you can follow these steps:
Step 1: Open a blank spreadsheet
You need to open a blank spreadsheet in Microsoft Excel and click on "Data" from the menu.
Step 2: Click on "From Web" option
Once you click on "Data," you'll see a new menu with various options for extracting data. Pick a "From Web" option, and click on it. It will open a new window.
Step 3: Type the web page URL in the address bar, and click "Go"
You'll see an address bar in the "New Web Query" window. In this text box, you need to type the URL "https://books.toscrape.com" and click "Go."
Step 4: Navigate to the book page
After clicking a "Go" button, you'll see a website in the mini web browser. You can interact and browse the website here. Scroll down and click on a book link to open the book page.
Step 5: Select the desired table to scrape
If you scroll down a bit, you'll find a table on the book page. There will be a small yellow arrow icon that you can click. Clicking the button will select the table associated with it. Once the table is selected, click the "Import" button below.
Step 6: Select the "Existing worksheet" option and click "OK"
When you click the "Import" button, a small window will appear, similar to the screenshot below. Make sure the "Existing worksheet" option is selected and click "OK."
And that's it! Web Query will create a background process to run the web request and fetch the website. After fetching the website, it will parse the table and extract the data into the excel columns. The output will be similar to the example below.
You can match it with the website and validate that all the data from the table is correct.
All the columns and rows will be linked to the web query, so whenever you refresh the data manually or automatically, Microsoft Excel will know which rows or columns to update. The next section explores multiple ways to refresh & update data.
There are mainly two different refresh mechanisms available for the Web Query:
Automatic
Manual
In the automatic mode, Excel will periodically pull the data in the background and keep the sheet up to date. You can also customize the duration according to your needs – you'll find the explanation later in this article.
Before that, let's explore multiple ways of manual data refreshes. First, let's delete a few items from the extracted data to validate the refresh works as expected.
Notice that we removed price, tax, and availability from the data.
Option 1: Click "Refresh" from the top menu
On the top menu, under the "Data" submenu, there's a button named "Refresh." You can pull the latest data by clicking this button or choosing the shortcut on your keyboard – the "CTRL + ALT + F5" button combination. Once you pull the latest data, the missing values of price, tax & availability are refreshed with the latest information.
Option 2: Refresh from the context menu
Let's remove a few fields again to test a different way of data refresh. This time, we'll use the "Refresh" button of the context menu instead of the menu button. First, you'll need to right-click on a cell and select "Refresh." Note that you'll have to click cells managed by Web Query. If you click on other cells, this "Refresh" button won't appear.
Option 3: Edit & rerun the Query
Now, we'll remove a few fields and then try another method to update the data. After removing some data, right-click on the cell, and you'll see an option "Edit Query." Click on it.
It will open a new window called "Edit Web Query." If you click "Import," the web query will run a background process to fetch the latest updates from the website and refresh the existing data, replacing obsolete data with new information.
This method is useful if you want to update the scraper, e.g., change the website URL, update the query to fetch a different table or page, etc.
Now, let's find out how to automate the refresh/update process.
The steps of configuring the automatic refresh process from properties are pretty simple. First, open the context menu again by right-clicking on a cell associated with Web Query. From the menu, select "Data Range Properties."
This will open up the "External Data Range Properties" window, similar to the example below:
In this window, look for the "Refresh Controls" section. By customizing this section, you can automatically refresh the data. Go ahead and click the "Enable background refresh" checkbox. Then, by using the second checkbox, you can tell Excel to refresh data periodically. For example, if you set it to refresh every 5 minutes, Excel will automatically pull the data every 5 minutes using a background process and update the table automatically.
If you check the third checkbox, "Refresh data when opening the file," every time you open the spreadsheet, Excel will pull the web data and update the sheet for you.
As an additional step, you can also configure proxy servers via your computer settings. Often, proxies are an inseparable part of the web scraping process, allowing you to change your IP address or even rotate between them. This way, you can elevate the web scraping process and avoid getting your own IP banned.
If you use Windows, here are the instructions for manually configuring proxies on your PC:
Go to this website to see what your original IP address is.
From your proxy provider, you’ll need to get the IP address (or the proxy server name) and port.
Select the Start button, then select Settings, Network & internet, Proxy.
Under Manual proxy setup, next to Use a proxy server, select Set up.
In the Edit proxy server dialog box, do the following:
Turn on Use a proxy server.
In the Proxy IP address and Port boxes, enter the proxy server name or IP address and port (optional) in the respective boxes.
Select or clear the Don’t use the proxy server for local (intranet) addresses check box.
Select Save.
Check if your IP address has changed on this website.
If you use macOS:
Go to this website to see what your original IP address is.
Get the IP address (or server name) and port from your proxy provider.
Click on the magnifying glass in the top-right corner and enter System Settings into the search field.
On the left side, click Network.
Select a network service on the right.
Click Details, then Proxies.
Depending on your proxy type, toggle the necessary button. A window will open up, requiring you to enter the IP address and port. Click OK.
Again go to this website to see if your IP address has changed.
Web Query simplifies web data extraction in Excel, especially for websites with tables. It enables you to automate simple tasks and extract web data with minimal or no interaction. Web Query also allows you to scrape data from dynamic websites with JavaScript. For your convenience, you can also access this tutorial in our GitHub repository.
However, Web Query isn't suitable for developing custom sophisticated web scrapers, capable of interacting with web elements or allowing proxy integration to perform large-scale scraping. In such cases, you have multiple alternatives, such as developing a web scraper using Python, Javascript, or Go.
Sign up for a 7-day trial now via our self-service dashboard. No hidden fees, cancel anytime.
Last but not least, you can also use VBA script to interact with websites from Excel; however, it's not as flexible as the other options we mentioned earlier.
To extract data from a website to Excel, you can use various methods, including manual copy-pasting or using web scraping tools like Oxylabs Web Scraper API (you can get a free trial for it) or Excel's built-in web query feature. If your project is on a smaller scale and you have some programming knowledge, you can also build your own web scraping script with languages like Python or JavaScript.
To scrape a table from a website to Excel, you can use web scraping tools (e.g., Web Scraper API) or build your own script using Python.
First, you have to inspect the HTML structure of the webpage to locate the table element. Then, use the web scraping tool to select and extract the table data. Finally, export the extracted data to Excel format using the tool's export feature, or copy-paste it directly into Excel for further formatting and analysis.
Yes, indeed, Excel can pull data from a URL using its built-in web query feature. You can create a web query by selecting Data > Get Data > From Other Sources > From Web in Excel. Simply enter the URL from which you want to pull data, specify the data you want to import, and Excel will retrieve and display the data in a new workshee
About the author
Iveta Vistorskyte
Lead Content Manager
Iveta Vistorskyte is a Lead Content Manager at Oxylabs. Growing up as a writer and a challenge seeker, she decided to welcome herself to the tech-side, and instantly became interested in this field. When she is not at work, you'll probably find her just chillin' while listening to her favorite music or playing board games with friends.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scale up your business with Oxylabs®