Back to blog

How to Scrape Google Shopping Results: A Step-by-Step Guide

Yelyzaveta Nechytailo

2023-03-166 min read
Share

In today’s competitive business environment, it’s hard to imagine a scenario where an e-commerce company or a retailer stays in demand without turning to web scraping. To shortly answer why, gathering accurate public data from thousands of targets worldwide, often with the help of proxies, is what gives them a chance to draw actionable insights and, eventually, present customers with the best deals.

This tutorial will demonstrate how you can scrape publicly-available data from Google Shopping hassle-free. In addition to the guide itself, we’ll shortly cover whether it’s legal to scrape Google Shopping and what difficulties you can encounter in the process.

For your convenience, we also prepared this tutorial in a video format:

What is Google Shopping?

Formerly known as Google Products Search, Google Products, and Froogle, Google Shopping is a service that allows users to browse, compare, and shop for products from different suppliers who have paid to be featured on the website. 

While giving consumers an opportunity to choose the best offers among thousands of brands, Google Shopping is also beneficial for retailers. When a user clicks on a product link, they are redirected to the vendor’s website for purchasing; thus, Google Shopping acts as a solution for businesses to advertise their products online.

More information on how Google Shopping works can be found here

Google Shopping results page structure overview

The data you get when browsing Google Shopping depends on some input parameters: Search, Product, and Price. Let's briefly discuss each of these parameters:

  • Search: A list of the items on Google Shopping with information about each item, such as its ID, title, description, price, and availability.

  • Product: Information on a single product's listing, details about other retailers selling it, and the costs at which it’s offered.

  • Price: A list of all the product retailers along with the prices they offer and other details like delivery information, total costs, store name, etc. 

Search page 

The Google Shopping search results page lists all the relevant items available for the required product. The below screenshot highlights different attributes of a results page for the query “levis.” 

Results page
  • Search bar: Allows a user to search for any product on Google Shopping. 

  • List of products: Lists all the products and the details of the searched product. 

  • Filters: Allows you to apply any filter to your search, for example, price range, color, style, etc. 

  • Sorting options: This drop-down list enables you to sort your search on multiple attributes, for example, increasing price, decreasing price, popularity, etc.

  • The list of products shows an individual product with the following product attributes: product name, price, name of the retailer or store, delivery Information.

Products page

When you select a specific item from the search page, you are directed to the Products page. This page contains detailed information about that particular product, such as its pictures, key features, product details, product reviews, retailers and prices information, and much more.

Products page
Reviews
General specifications
  • Product name: Title of the product. 

  • Product Highlights: Main features to have a quick product overview. 

  • Product details: Detailed description of the product.

  • Prices: List of different retailers and their prices. 

  • Product reviews: Product rating and customer reviews. 

  • Min and max prices: Product’s minimum to maximum pricing range sold by different sellers.

  • General specifications: General information about the product. 

Pricing page

This page lists all the prices of different retailers’ products. It also shows if a store or retailer is a trusted one or not. Moreover, it gives information if the retailer has a Google Guarantee. 

Pricing page
  • Product name: Name of the searched product.

  • Rating: Overall rating of the product and number of reviews. 

  • Prices from different stores: List of retailers, along with their offers, prices, and the link to visit their website to buy the product. 

  • Filters: These filters can be applied to the retailers’ list. 

In general, web scraping is legal as long as you strictly follow all the regulations surrounding the public data you wish to gather. However, we still recommend seeking professional legal advice to rule out any possible risks.

If you wish to dive deeper into the topic of web scraping legality, check out our extensive blog post.

The pain of scraping Google Shopping

Though doable, scraping Google Shopping might not be the easiest task to take on. Not only is Google Shopping good at detecting automated requests, but it also requires parsing JavaScript, which is an “expensive” operation that slows down the scraping process. 

Therefore, to make sure you effortlessly scrape and parse a variety of Google Shopping page types, it’s best to rely on a high-quality scraping solution, such as Oxylabs’ Google Shopping API. This API is specifically designed to deal with the challenges of Google scraping process and lets you gather accurate real-time data globally. If you want to extract data from the Google search engine, check out our other tutorial on how to scrape Google search results.

Get a 7-day free trial

Claim a free trial to test Web Scraper API for your use case.

  • 5K results
  • No credit card required
  • Step-by-step guide for scraping Google Shopping results using Google Shopping API

    Step 1: Set up Python and install required libraries

    To get started, you must have Python 3.6+ installed on your system. Then, you need to install the following packages to code the scraper. 

    • Requests - to send the request to the API.

    • Pandas - to populate the data in the DataFrame data structure. 

    To install the packages, use the following command:

    pip install requests pandas

    Step 2: Set up a payload

    Search page

    The first step is creating a structure payload containing different query parameters. Below is a list of the query parameters and their brief description. 

    Parameter Description Default Value
    source This parameter sets the type of scraper to use. google_shopping_search
    domain Domain name com
    start_page Starting page number 1
    pages Number of pages that you want to retrieve from the search result. 1
    locale Accept-Language header value to change in web interface language of Google Shopping page. -
    results_language Languages supported by Google. -
    geo_location The region for which the output should be adjusted. Using this parameter correctly is important if you want the right info. -
    user_agent_type The type of device and browser. desktop
    render It allows you to execute Javascript. -
    callback_url This is the URL where your POST request will be returned with the response. -
    parse If its value is set to true, it will return the structured data. -
    context: nfpr If it is set to true, it will turn off auto-correct spelling. false
    context: sort_by It sorts the products list in different forms. The r value is for default sorting, rv is for review score, p is for increasing price, and pd is for decreased pricing. r
    context: min_price Apply filter for the minimum price value. -
    context: max_price Apply filter for the maximum price value. -

    For more detailed information on the parameters, check out our documentation for product, search, and pricing.

    Using the parameters mentioned in the table, we can create a payload structure as follows:

    payload = {
       'source': 'google_shopping_search',
       'domain': 'com',
       'query': 'levis',
       'pages': 1,
       'context': [
           {'key': 'sort_by', 'value': 'pd'},
           {'key': 'min_price', 'value': 30},
       ],
       'parse': 'true',
    }

    Step 3: Send a POST request

    After the payload structure is ready, you can create the request by passing your authentication key.

    response = requests.request(
       'POST',
       'https://realtime.oxylabs.io/v1/queries',
       auth=('username', 'password'),
       json=payload,
    )

    Step 4: Extract product data from a JSON response

    We will be extracting Product Title, Price, and Store name from the response. Since we made the payload parameter parse: true, so we will get the JSON response. We can get all this data from the JSON response.

    The code below extracts the data from JSON format and stores it in DataFrame.

    #Get the content from the response
    result=response.json()['results'][0]['content']
    products = result['results']['organic']
    
    
    #Create a DataFrame
    df = pd.DataFrame(columns=['Product Title', 'Price', 'Store'])
    
    
    #iterate through all the products
    for p in products:
       title = p['title']
       price = p['price_str']
       store = p['merchant']['name']
       df = pd.concat([pd.DataFrame([[title, price, store]], columns=df.columns),
                       df], ignore_index=True)

    The script extracts relevant product information from the response and stores it in the df DataFrame.

    Step 5: Save extracted data to a CSV using Pandas

    Using the following script, we can easily export the DataFrame to CSV or JSON files:

    df.to_csv('google_shopping_search.csv', index=False)
    df.to_json('google_shopping_search.json', orient='split', index=False)

    Let’s put all the code together and see the output. 

    import pandas as pd
    import requests
    
    
    # Structure payload
    payload = {
       'source': 'google_shopping_search',
       'domain': 'com',
       'query': 'levis',
       'pages': 1,
       'context': [
           {'key': 'sort_by', 'value': 'pd'},
           {'key': 'min_price', 'value': 30},
       ],
       'parse': 'true',
    }
    
    
    # Get response
    response = requests.request(
       'POST',
       'https://realtime.oxylabs.io/v1/queries',
       auth=('username', 'password'),
       json=payload,
    
    
    )
    
    
    #Get the content from the response
    result=response.json()['results'][0]['content']
    products = result['results']['organic']
    
    
    #Create a DataFrame
    df = pd.DataFrame(columns=['Product Title', 'Price', 'Store'])
    
    
    #iterate through all the products
    for p in products:
       title = p['title']
       price = p['price_str']
       store = p['merchant']['name']
       df = pd.concat([pd.DataFrame([[title, price, store]], columns=df.columns),
                       df], ignore_index=True)
    
    
    #Copy the DataFrame to CSV and JSON files
    df.to_csv('google_shopping_search.csv', index=False)
    df.to_json('google_shopping_search.json', orient='split', index=False)

    The script doesn’t contain any print statements and writes everything in CSV and JSON files. Let’s look at a portion of the output CSV file.

    CSV file

    As expected, the output CSV contains the Product Titles, Prices, and Store information for all the products listed on the search page.

    Now, let’s scrape a specific product page. 

    Product page

    The payload structure will be created using different parameters for the products page. Below is a list of the query parameters and their brief description. 

    Parameter Description Default Value
    source This parameter sets the type of scraper to use. google_shopping_product
    domain Domain name com
    locale Accept-Language header value to change in web interface language of Google Shopping page. -
    results_language Languages supported by Google. -
    geo_location The region for which the output should be adjusted. Using this parameter correctly is important if you want the right info. -
    user_agent_type The type of device and browser. desktop
    render It allows you to execute Javascript. -
    callback_url This is the URL where your POST request will be returned with the response. -
    parse If its value is set to true, it will return the structured data. -

    Once again, for more detailed information on the parameters, check out our documentation.

    We will be using product ID 4505166624001087642 for scraping. Using the parameters mentioned in the table, we can create a payload structure like this:

    payload = {
      'source': 'google_shopping_product',
      'domain': 'com',
      'query': '4505166624001087642',
      'parse': 'true',
    }

    After the payload structure is ready, you can create the request by passing your authentication key. 

    response = requests.request(
      'POST',
      'https://realtime.oxylabs.io/v1/queries',
      auth=('username', 'password'),
      json=payload,
    )

    We’ll extract the Product Title, Product Details, Highlights, Rating, and Reviews Count from the response received. Like in the previous section, we’ll use JSON response and extract our desired output. You can see the structure of JSON output here

    # Get the content
    product=response.json()['results'][0]['content']
    
    
    # create a DataFrame
    df = pd.DataFrame(columns=['Product Title', 'Product Details',
                              'Highlights', 'Rating', 'Reviews Count'])
    
    
    # Get the elements from the response object
    title = product['title']
    details = product['description']
    highlights = product['highlights']
    rating = product['reviews']['rating']
    reviews_count = product['reviews']['reviews_count']
    
    
    # Add all the elements in DataFrame
    df = pd.concat([pd.DataFrame([[title, details, highlights, rating, reviews_count]],
                               columns=df.columns), df], ignore_index=True)

    In the above code, we’ve created a DataFrame object that will save all the extracted data in it. We can print this DataFrame or write it in CSV or JSON files. 

    # Copy the data in CSV and JSON file
    df.to_csv('google_shopping_product.csv', index=False)
    df.to_json('google_shopping_product.json', orient = 'split', index = False)
    
    
    # Print the data on screen
    print ('Product Name: ' + title)
    print ('Product Details: ' + details)
    print ('Product Highlights: ' + str(highlights))
    print ('Product Rating: ' + str(rating))
    print ('Reviews Count: ' + str(reviews_count))

    Let’s put all the code together and see the output. 

    import pandas as pd
    import requests
    
    
    # Structure payload.
    payload = {
      'source': 'google_shopping_product',
      'domain': 'com',
      'query': '4505166624001087642',
      'parse': 'true',
    }
    
    
    # Get response.
    response = requests.request(
      'POST',
      'https://realtime.oxylabs.io/v1/queries',
      auth=('username', 'password'),
      json=payload,
    )
    
    
    # Get the content
    product=response.json()['results'][0]['content']
    
    
    # create a DataFrame
    df = pd.DataFrame(columns=['Product Title', 'Product Details',
                              'Highlights', 'Rating', 'Reviews Count'])
    
    
    # Get the elements from the response object
    title = product['title']
    details = product['description']
    highlights = product['highlights']
    rating = product['reviews']['rating']
    reviews_count = product['reviews']['reviews_count']
    
    
    # Add all the elements in DataFrame
    df = pd.concat([pd.DataFrame([[title, details, highlights, rating, reviews_count]],
                               columns=df.columns), df], ignore_index=True)
    
    
    # Copy the data in CSV and JSON file
    df.to_csv('google_shopping_product.csv', index=False)
    df.to_json('google_shopping_product.json', orient = 'split', index = False)
    
    
    # Print the data on screen
    print ('Product Name: ' + title)
    print ('Product Details: ' + details)
    print ('Product Highlights: ' + str(highlights))
    print ('Product Rating: ' + str(rating))
    print ('Reviews Count: ' + str(reviews_count))
    Output

    We’ve just successfully scraped a product page at Google Shopping. Let’s move on to scrape the Pricing page.

    Pricing page

    The payload structure will be created using different parameters for the pricing page. Below is a list of the query parameters and their brief description. 

    Parameter Description Default Value
    source This parameter sets the type of scraper to use. google_shopping_pricing
    domain Domain name com
    start_page Starting page number. 1
    pages Number of pages you want to retrieve from the search result. 1
    locale Accept-Language header value to change in web interface language of Google Shopping page. -
    results_language Languages supported by Google. -
    geo_location The region for which the output should be adjusted. Using this parameter correctly is important if you want the right info. -
    user_agent_type The type of device and browser. desktop
    render It allows you to execute Javascript. -
    callback_url This is the URL where your POST request will be returned with the response. -
    parse If its value is set to true, it will return the structured data. -

    More information on the parameters can be found in our documentation.

    We’ll be using product ID 4505166624001087642 for scraping. Using the parameters mentioned in the table, we can create a payload structure like this:

    payload = {
       'source': 'google_shopping_pricing',
       'domain': 'com',
       'query': '4505166624001087642',
       'parse': 'true'
    }

    After the payload structure is ready, you can create the request by passing your authentication key. 

    response = requests.request(
       'POST',
       'https://realtime.oxylabs.io/v1/queries',
       auth=('username', 'password'),
       json=payload,
    )

    We’ll be extracting Product Name, Special Offer, Item Price, Total Price and Shipping charges from the JSON response received. You can find the structure of the JSON response here

    result = response.json()['results'][0]['content']
    title = result['title']
    pricing = result['pricing']
    # Create a DataFrame
    df = pd.DataFrame(columns=['Product Name', 'Special Offer',
                              'Item Price', 'Total Price', 'Shipping'])
    
    
    for p in pricing:
       offer = p['details']
       item_price = p['price']
       total_price = p['price_total']
       shipping = p['price_shipping']
       df = pd.concat([pd.DataFrame([[title, offer, item_price,
                                      total_price, shipping]], columns=df.columns), df],
                      ignore_index=True)

    The above script stores the extracted data in a DataFrame object. Therefore, saving data in CSV, JSON, or other formats is easy. Just execute the following code to save the whole data in CSV and JSON files.

    df.to_csv('google_shopping_pricing.csv', index=False)
    df.to_json('google_shopping_pricing.json', orient='split', index=False)

    Let’s put all the code together and see the output. 

    import pandas as pd  # include the pandas library for DataFrame
    import requests  # Include the requests library
    
    
    # Structure payload.
    payload = {
       'source': 'google_shopping_pricing',
       'domain': 'com',
       'query': '4505166624001087642',
       'parse': 'true'
    }
    
    
    # Get response.
    response = requests.request(
       'POST',
       'https://realtime.oxylabs.io/v1/queries',
       auth=('username', 'password'),
       json=payload,
    )
    
    
    # Get the content from the response
    result = response.json()['results'][0]['content']
    title = result['title']
    pricing = result['pricing']
    # Create a DataFrame
    df = pd.DataFrame(columns=['Product Name', 'Special Offer',
                              'Item Price', 'Total Price', 'Shipping'])
    
    
    for p in pricing:
       offer = p['details']
       item_price = p['price']
       total_price = p['price_total']
       shipping = p['price_shipping']
       df = pd.concat([pd.DataFrame([[title, offer, item_price,
                                      total_price, shipping]], columns=df.columns), df],
                      ignore_index=True)
    
    
    # Copy the DataFrame to CSV and JSON files
    df.to_csv('google_shopping_pricing.csv', index=False)
    df.to_json('google_shopping_pricing.json', orient='split', index=False)
    Pricing output

    Conclusion

    Scraping Google Shopping is essential if you’re looking to retrieve accurate data on your biggest competitors’ products and prices and make data-driven decisions to scale your business. If you're aiming to enhance your scraping capabilities, you can buy proxies to ensure smoother and more efficient data extraction. We hope this tutorial was clear and will contribute to more effortless and smooth data-gathering activities. You can also find all the necessary code files on our GitHub. But in case you still have any questions, don’t hesitate to contact us – Oxylabs’ professional team is always ready to assist you. 

    Want to broaden your Google data scraping skills? Explore our step-by-step guides for scraping Jobs, Search, Images, Trends, News, Flights, Scholar, and Maps.

    About the author

    Yelyzaveta Nechytailo

    Senior Content Manager

    Yelyzaveta Nechytailo is a Senior Content Manager at Oxylabs. After working as a writer in fashion, e-commerce, and media, she decided to switch her career path and immerse in the fascinating world of tech. And believe it or not, she absolutely loves it! On weekends, you’ll probably find Yelyzaveta enjoying a cup of matcha at a cozy coffee shop, scrolling through social media, or binge-watching investigative TV series.

    All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

    Related articles

    Get the latest news from data gathering world

    I’m interested