Microsoft Excel is undoubtedly one of the most used software applications around the world and across various disciplines. It not only stores, organizes, and manipulates the data using different functions and formulas but also allows users to access web pages and extract data from them.
In this tutorial, we are going to focus on the last mentioned feature by demonstrating how to implement a web scraper in Excel using VBA. We will briefly go through the installation and preparation of the environment and then write a scraper using VBA macro to successfully fetch data from a web page into Excel. Let’s get started.
VBA web scraping is a special scraping technique that allows for automatic data gathering from websites to Excel. The scraping itself becomes possible with the use of such external applications like Microsoft Edge browser.
What is VBA?
VBA stands for Visual Basic Application. It is a programming language of Microsoft Corporation. VBA extends the capabilities of Microsoft Office tools and allows users to develop advanced functions and complex automation. VBA can also be used to write macros to pull data from websites into Excel.
Before we move on to the tutorial part it is essential to highlight some advantages and disadvantages of web scraping with Excel VBA.
Ready to use – VBA is bundled with Microsoft Office which basically means that if you already have MS Office installed, you don’t have to worry about installing anything else. You can use VBA right away in all the Microsoft Office tools.
Reliable – Both Microsoft Excel & VBA are developed and maintained by Microsoft. Unlike other development environments, these tools can be upgraded together to the latest version without much hassle.
Out-of-the-box support for browser – VBA web scrapers can take advantage of Microsoft’s latest browser Microsoft Edge which makes scraping dynamic websites pretty convenient.
Complete automation – When running the VBA script, you don’t have to perform any additional tasks or interact with the browser. Everything will be taken care of by the VBA script including log-in, scrolling, button clicks, etc.
Only works in Windows – VBA scrapers are not cross-platform. They only work in a Windows environment. While MS Office does have support for Mac, it is way harder to write a working VBA scraper on it. The library supports are also limited, for example, you will be unable to use Microsoft Edge.
Tightly coupled with MS Office – VBA scrapers are highly dependent on MS Office tools. Third-party useful scraping tools are hard to integrate with it.
Overall, if you are looking to develop a web scraper for the Windows operating system that automatically pulls data from a website into, then VBA-based web scraping will be a good choice.
Before we begin, let us make sure we’ve installed all the prerequisites and set up our environment properly so that it will be easier to follow along.
In this tutorial, we’ll be using Windows 10 and Microsoft Office 10. However, the steps will be the same or similar for other versions of Windows. You’ll only need a computer with Windows Operating System. In addition, it’s necessary to install Microsoft Office if you don’t have it already. Detailed installation instructions can be found in the Microsoft’s Official documentation.
Now, that you’ve installed MS Office, complete the steps below to set up the development environment:
From the start menu or Cortana search, find Microsoft Excel and open the application. You will see a similar interface as below:
Click on File
By default, Excel doesn’t show the developer button in the top ribbon. To enable this, we’ll have to go to “Options” from the File menu.
Once you click the “Options,” a dialog will pop up where, from the side menu, you’ll need to select “Customize Ribbon”. Click on the check box next to “developer.” Make sure it is ticked and then click on “OK.”
Now, you’ll see a new developer button on the top ribbon, clicking on it will expand the developer menu. From the menu, select “Visual Basic.”
Once you click on “Visual Basic", it will open a new window as demonstrated below:
Click on “Insert” and select “Module” to insert a new module. It will open the module editor.
From the top menu select Tools > References… which will open a new window like the one below. Make sure to scroll through the available list of references and find Microsoft HTML Client Library and Microsoft Internet Control. Click on the check box next to both of them to enable these references. Once done, click OK.
That’s it! Our development environment is all set. Let’s write our first Excel VBA scraper.
Now, it’s time to update our newly created module to open the following website: https://quotes.toscrape.com. In the module editor, insert the following code:
Sub scrape_quotes() Dim browser As InternetExplorer Dim page As HTMLDocument Set browser = New InternetExplorer browser.Visible = True browser.navigate ("https://quotes.toscrape.com") End Sub
We are defining a subroutine named scrape_quotes(). This function will be executed when we run this script. Inside the subroutine, we are defining two objects “browser” and “page”.
The “browser” object will allow us to interact with Microsoft Edge. Next, we also set the browser as visible so that we can see it in action. The browser.navigate() function tells the VBA browser object to open the URL. The output will be similar to this:
Note: You might be wondering why we are writing “InternetExplorer” to interact with Microsoft Edge. VBA initially only supported Internet Explorer-based automation, but once Microsoft discontinued Internet Explorer, they deployed some updates so that VBA’s InternetExplorer module can run the Microsoft Edge browser in IEMode without any issues. The above code will also work in older Windows that have Internet Explorer still available instead of Edge.
The next step is to scrape the quotes and authors from the website. For simplicity, we’ll store it in the first sheet of the Excel spreadsheet and grab the top 5 quotes for now.
Let’s begin by defining two new objects – one for quotes and another for authors.
Dim quotes As Object Dim authors As Object
After navigating to the website, we’ll also add a little bit of pause so that the website loads properly by using Loop.
Do While browser.Busy: Loop
Next, grab the quotes and authors from the HTML document.
Set page = browser.document Set quotes = page.getElementsByClassName("quote") Set authors = page.getElementsByClassName("author")
Use a for loop to populate the excel rows with the extracted data by calling the Cells function and passing the row and column position:
For num = 1 To 5 Cells(num, 1).Value = quotes.Item(num).innerText Cells(num, 2).Value = authors.Item(num).innerText Next num
Finally, close the browser by calling the quit function. The below code will close the browser window.
Now, if we run the script again, it’ll open Microsoft Edge, browse to the quotes.toscrape.com website, grab the top 5 quotes from the list, and save them to the current excel file’s first sheet.
Below is an example of a full source code:
Sub scrape_quotes() Dim browser As InternetExplorer Dim page As HTMLDocument Dim quotes As Object Dim authors As Object Set browser = New InternetExplorer browser.Visible = True browser.navigate ("https://quotes.toscrape.com") Do While browser.Busy: Loop Set page = browser.document Set quotes = page.getElementsByClassName("quote") Set authors = page.getElementsByClassName("author") For num = 1 To 5 Cells(num, 1).Value = quotes.Item(num).innerText Cells(num, 2).Value = authors.Item(num).innerText Next num browser.Quit End Sub
Web scraping with VBA is a great choice for Windows automation and web extraction. It’s reliable and ready to use which means you won’t have to worry about any unexpected issues or additional steps. For your convenience, you can also access this tutorial in our GitHub repository.
The biggest disadvantage of VBA web scraping that was highlighted in the article is the lack of cross-platform support. However, if you want to develop web scrapers that can be used on multiple operating systems such as Linux or Mac, Excel Web Query can also be an option. Of course, we also recommend exploring web scraping with Python – one of the most popular programming languages that is capable of developing complex network applications while maintaining its simplified syntax.
About the author
Yelyzaveta Nechytailo is a Content Manager at Oxylabs. After working as a writer in fashion, e-commerce, and media, she decided to switch her career path and immerse in the fascinating world of tech. And believe it or not, she absolutely loves it! On weekends, you’ll probably find Yelyzaveta enjoying a cup of matcha at a cozy coffee shop, scrolling through social media, or binge-watching investigative TV series.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scale up your business with Oxylabs®
GET IN TOUCH
Certified data centers and upstream providers
Connect with us