Back to blog

How to Scrape Google Finance with Python

Roberta Aukstikalnyte

2024-04-033 min read
Share

Most investors, financial analysts, and stock market enthusiasts are probably familiar with Google Finance. The website contains financial news, real-time stock quotes, and other financial data. It pulls information from several sources, giving you a comprehensive view of the finance world. Keeping in touch with all the changes on Google Finance can be a daunting task, so in today’s article, we’re going to demonstrate how to collect its public information, including stock titles, pricing, and price changes in percentages.

Scraping Google Finance with Oxylabs’ Google Finance API and Python

Step one: installing prerequisite libraries

We’ll begin by installing the prerequisites:

pip install bs4

We’ll use BeautifulSoup4 to parse and extract information from the HTML that we’ll be scraping. 

Step two: building core structure

Next, we’re going to define the general logic for our finance data scraper. We’ll create functionality for defining multiple Google Finance URLs that we’d like to scrape. Afterwards, we’ll take these URLs one by one, collect the information we need and save it as a JSON file.
Let’s create a function that will take a URL as a parameter and scrape that very URL with Oxylabs Google Finance API.

Try free for 1 week

Request a free trial to test Google Finance Scraper API.

  • 5K results
  • No CC required
  • Cancel anytime
  • Our API will now return the scraper HTML:

    def get_finance_html(url):
       payload = {
           'source': 'google',
           'render': 'html',
           'url': url,
       }
    
       response = requests.request(
           'POST',
           'https://realtime.oxylabs.io/v1/queries',
           auth=('username', 'password'),
           json=payload,
       )
    
       response_json = response.json()
    
       html = response_json['results'][0]['content']
    
       return html

    Note: don’t forget to replace the USERNAME and PASSWORD with your own Oxylabs credentials.

    For the next step, we’ll be creating a function that accepts a BeautifulSoup object created from the HTML of the whole page. This function will create and return an object containing stock information. Let’s try to form the function in a way that makes it easy to extend (in case we need to.) 

    def extract_finance_information_from_soup(soup_of_the_whole_page):
    #Put data extraction here
    
       listing = {}
    
       return listing

    Since we can now get the HTML and have a function to hold our information extraction, we can combine both of those into one:

    def extract_finance_data_from_urls(urls):
       constructed_finance_results = []
    
       for url in urls:
           html = get_finance_html(url)
    
           soup = BeautifulSoup(html,'html.parser')
      
           finance = extract_finance_information_from_soup(soup)
    
           constructed_finance_results.append({
               'url': url,
               'data': finance
           })
    
       return constructed_finance_results

    This function will take an array of URLs as a parameter and return an object of extracted financial data.

    Last but not least, we need a function that takes this data and saves it as a file: 

    def save_results(results, filepath):
        with open(filepath, 'w', encoding='utf-8') as file:
            json.dump(results, file, ensure_ascii=False, indent=4)
    
        return

    To wrap this up, we’ll create a simple main function that invokes all that we’ve built so far. 

    def main():
       results_file = 'data.json'
    
       urls = [
           'https://www.google.com/finance/quote/BNP:EPA?hl=en',
           'https://www.google.com/finance/quote/.DJI:INDEXDJX?hl=en',
           'https://www.google.com/finance/quote/.INX:INDEXSP?hl=en'
       ]
    
       constructed_finance_results = extract_finance_data_from_urls(urls)
    
       save_results(constructed_finance_results, results_file)
    

    We’ve successfully built the core of the application. Now, let’s move on to creating functions for extracting specific data from Google Finance.

    1) Collecting prices

    First on the list is the pricing data. Navigating the HTML of Google Finance can get tricky (it seems to be quite dynamic), so let’s see how we can pinpoint the price. 

    We can see that most of the information about the stock is located inside a container named main.

    stock located inside a container named main

    Then, we’ll specify the div with the price itself – AHmHk.

    div with the price

    Now that we’ve gathered everything, let’s write the function itself:

    def get_price(soup_element):
       price = soup_element.find('main').find('div','AHmHk').get_text()
    
       return price

    2) Getting the stock price change in % 

    Another important piece of information is the historical data for price changes. We’ll begin with the same main container that we’ve found earlier and specify an inner div that will contain only the price change – JwB6zf.

    price change container

    We’ve got all of the needed HTML information, so let’s extract it.

    def get_change(soup_element):
       change = soup_element.find('main').find('div','JwB6zf').get_text()
    
       return change

    3) Retrieving the stock title

    For the last piece of information, we’ll need the name of the stock.
    Again, we begin with the same main container. Then, we can specify an inner div that contains the name, which is zzDege.

    stock title container

    The final step is to put this into a function for extraction.

    def get_name(soup_element):
       name = soup_element.find('main').find('div','zzDege').get_text()
      
       return name

    Let’s finish up our code by adding all the functions to the designated place from earlier: 

    def get_name(soup_element):
       name = soup_element.find('main').find('div','zzDege').get_text()
      
       return name
    

    Having all of these functions for financial data extraction, we just need to add them to the place we designated earlier to finish up our code:

    def extract_finance_information_from_soup(soup_of_the_whole_page):
       price = get_price(soup_of_the_whole_page)
       change = get_change(soup_of_the_whole_page)
       name = get_name(soup_of_the_whole_page)
    
       listing = {
           "name": name,
           "change": change,
           "price": price
       }
    
       return listing

    Final result

    After adding all the code parts together, the final product should look something like this: 

    from bs4 import BeautifulSoup
    import requests
    import json
    
    def get_price(soup_element):
       price = soup_element.find('main').find('div','AHmHk').get_text()
    
       return price
    
    
    def get_change(soup_element):
       change = soup_element.find('main').find('div','JwB6zf').get_text()
    
       return change
    
    
    def get_name(soup_element):
       name = soup_element.find('main').find('div','zzDege').get_text()
      
       return name
    
    
    def save_results(results, filepath):
       with open(filepath, 'w', encoding='utf-8') as file:
           json.dump(results, file, ensure_ascii=False, indent=4)
    
       return
    
    
    def get_finance_html(url):
       payload = {
           'source': 'google',
           'render': 'html',
           'url': url,
       }
    
       response = requests.request(
           'POST',
           'https://realtime.oxylabs.io/v1/queries',
           auth=('username', 'password'),
           json=payload,
       )
    
       response_json = response.json()
    
       html = response_json['results'][0]['content']
    
       return html
    
    
    def extract_finance_information_from_soup(soup_of_the_whole_page):
       price = get_price(soup_of_the_whole_page)
       change = get_change(soup_of_the_whole_page)
       name = get_name(soup_of_the_whole_page)
    
       listing = {
           "name": name,
           "change": change,
           "price": price
       }
    
       return listing
    
    
    def extract_finance_data_from_urls(urls):
       constructed_finance_results = []
    
       for url in urls:
           html = get_finance_html(url)
    
           soup = BeautifulSoup(html,'html.parser')
      
           finance = extract_finance_information_from_soup(soup)
    
           constructed_finance_results.append({
               'url': url,
               'data': finance
           })
    
       return constructed_finance_results
    
    
    def main():
       results_file = 'data.json'
    
       urls = [
           'https://www.google.com/finance/quote/BNP:EPA?hl=en',
           'https://www.google.com/finance/quote/.DJI:INDEXDJX?hl=en',
           'https://www.google.com/finance/quote/.INX:INDEXSP?hl=en'
       ]
    
       constructed_finance_results = extract_finance_data_from_urls(urls)
    
       save_results(constructed_finance_results, results_file)
    
    
    if __name__ == "__main__":
       main()

    That’s all! We’ve successfully gathered publicly available data from Google Finance, which is now ready for analysis. 

    Wrapping up 

    We hope that you’ve found our tutorial on scraping public Google Finance data useful. If you have any questions or any topics you’d like us to cover, be sure to drop us a message at support@oxylabs.io. We review every inquiry received and would be happy to explore topics our readers find interesting! 

    For web scraping, proxies are an essential anti-blocking measure. To avoid detection by the target website, you can buy proxies of various types to fit any scraping scenario.

    About the author

    Roberta Aukstikalnyte

    Senior Content Manager

    Roberta Aukstikalnyte is a Senior Content Manager at Oxylabs. Having worked various jobs in the tech industry, she especially enjoys finding ways to express complex ideas in simple ways through content. In her free time, Roberta unwinds by reading Ottessa Moshfegh's novels, going to boxing classes, and playing around with makeup.

    All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

    Related articles

    Get the latest news from data gathering world

    I’m interested