Back to blog
Yelyzaveta Nechytailo
In today’s competitive business environment, it’s hard to imagine a scenario where an e-commerce company or a retailer stays in demand without turning to web scraping. To shortly answer why, gathering accurate public data from thousands of targets worldwide, often with the help of proxies, is what gives them a chance to draw actionable insights and, eventually, present customers with the best deals.
This tutorial will demonstrate how you can scrape publicly-available data from Google Shopping hassle-free. In addition to the guide itself, we’ll shortly cover whether it’s legal to scrape Google Shopping and what difficulties you can encounter in the process.
For your convenience, we also prepared this tutorial in a video format:
Formerly known as Google Products Search, Google Products, and Froogle, Google Shopping is a service that allows users to browse, compare, and shop for products from different suppliers who have paid to be featured on the website.
While giving consumers an opportunity to choose the best offers among thousands of brands, Google Shopping is also beneficial for retailers. When a user clicks on a product link, they are redirected to the vendor’s website for purchasing; thus, Google Shopping acts as a solution for businesses to advertise their products online.
More information on how Google Shopping works can be found here.
The data you get when browsing Google Shopping depends on some input parameters: Search, Product, and Price. Let's briefly discuss each of these parameters:
Search: A list of the items on Google Shopping with information about each item, such as its ID, title, description, price, and availability.
Product: Information on a single product's listing, details about other retailers selling it, and the costs at which it’s offered.
Price: A list of all the product retailers along with the prices they offer and other details like delivery information, total costs, store name, etc.
The Google Shopping search results page lists all the relevant items available for the required product. The below screenshot highlights different attributes of a results page for the query “levis.”
Search bar: Allows a user to search for any product on Google Shopping.
List of products: Lists all the products and the details of the searched product.
Filters: Allows you to apply any filter to your search, for example, price range, color, style, etc.
Sorting options: This drop-down list enables you to sort your search on multiple attributes, for example, increasing price, decreasing price, popularity, etc.
The list of products shows an individual product with the following product attributes: product name, price, name of the retailer or store, delivery Information.
When you select a specific item from the search page, you are directed to the Products page. This page contains detailed information about that particular product, such as its pictures, key features, product details, product reviews, retailers and prices information, and much more.
Product name: Title of the product.
Product Highlights: Main features to have a quick product overview.
Product details: Detailed description of the product.
Prices: List of different retailers and their prices.
Product reviews: Product rating and customer reviews.
Min and max prices: Product’s minimum to maximum pricing range sold by different sellers.
General specifications: General information about the product.
This page lists all the prices of different retailers’ products. It also shows if a store or retailer is a trusted one or not. Moreover, it gives information if the retailer has a Google Guarantee.
Product name: Name of the searched product.
Rating: Overall rating of the product and number of reviews.
Prices from different stores: List of retailers, along with their offers, prices, and the link to visit their website to buy the product.
Filters: These filters can be applied to the retailers’ list.
In general, web scraping is legal as long as you strictly follow all the regulations surrounding the public data you wish to gather. However, we still recommend seeking professional legal advice to rule out any possible risks.
If you wish to dive deeper into the topic of web scraping legality, check out our extensive blog post.
Though doable, scraping Google Shopping might not be the easiest task to take on. Not only is Google Shopping good at detecting automated requests, but it also requires parsing JavaScript, which is an “expensive” operation that slows down the scraping process.
Therefore, to make sure you effortlessly scrape and parse a variety of Google Shopping page types, it’s best to rely on a high-quality scraping solution, such as Oxylabs’ Google Shopping API. This API is specifically designed to deal with the challenges of Google scraping process and lets you gather accurate real-time data globally. If you want to extract data from the Google search engine, check out our other tutorial on how to scrape Google search results.
Claim a free trial to test Web Scraper API for your use case.
To get started, you must have Python 3.6+ installed on your system. Then, you need to install the following packages to code the scraper.
Requests - to send the request to the API.
Pandas - to populate the data in the DataFrame data structure.
To install the packages, use the following command:
pip install requests pandas
The first step is creating a structure payload containing different query parameters. Below is a list of the query parameters and their brief description.
Parameter | Description | Default Value |
---|---|---|
source | This parameter sets the type of scraper to use. | google_shopping_search |
domain | Domain name | com |
start_page | Starting page number | 1 |
pages | Number of pages that you want to retrieve from the search result. | 1 |
locale | Accept-Language header value to change in web interface language of Google Shopping page. | - |
results_language | Languages supported by Google. | - |
geo_location | The region for which the output should be adjusted. Using this parameter correctly is important if you want the right info. | - |
user_agent_type | The type of device and browser. | desktop |
render | It allows you to execute Javascript. | - |
callback_url | This is the URL where your POST request will be returned with the response. | - |
parse | If its value is set to true, it will return the structured data. | - |
context: nfpr | If it is set to true, it will turn off auto-correct spelling. | false |
context: sort_by | It sorts the products list in different forms. The r value is for default sorting, rv is for review score, p is for increasing price, and pd is for decreased pricing. | r |
context: min_price | Apply filter for the minimum price value. | - |
context: max_price | Apply filter for the maximum price value. | - |
For more detailed information on the parameters, check out our documentation for product, search, and pricing.
Using the parameters mentioned in the table, we can create a payload structure as follows:
payload = {
'source': 'google_shopping_search',
'domain': 'com',
'query': 'levis',
'pages': 1,
'context': [
{'key': 'sort_by', 'value': 'pd'},
{'key': 'min_price', 'value': 30},
],
'parse': 'true',
}
After the payload structure is ready, you can create the request by passing your authentication key.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('username', 'password'),
json=payload,
)
We will be extracting Product Title, Price, and Store name from the response. Since we made the payload parameter parse: true, so we will get the JSON response. We can get all this data from the JSON response.
The code below extracts the data from JSON format and stores it in DataFrame.
#Get the content from the response
result=response.json()['results'][0]['content']
products = result['results']['organic']
#Create a DataFrame
df = pd.DataFrame(columns=['Product Title', 'Price', 'Store'])
#iterate through all the products
for p in products:
title = p['title']
price = p['price_str']
store = p['merchant']['name']
df = pd.concat([pd.DataFrame([[title, price, store]], columns=df.columns),
df], ignore_index=True)
The script extracts relevant product information from the response and stores it in the df DataFrame.
Using the following script, we can easily export the DataFrame to CSV or JSON files:
df.to_csv('google_shopping_search.csv', index=False)
df.to_json('google_shopping_search.json', orient='split', index=False)
Let’s put all the code together and see the output.
import pandas as pd
import requests
# Structure payload
payload = {
'source': 'google_shopping_search',
'domain': 'com',
'query': 'levis',
'pages': 1,
'context': [
{'key': 'sort_by', 'value': 'pd'},
{'key': 'min_price', 'value': 30},
],
'parse': 'true',
}
# Get response
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('username', 'password'),
json=payload,
)
#Get the content from the response
result=response.json()['results'][0]['content']
products = result['results']['organic']
#Create a DataFrame
df = pd.DataFrame(columns=['Product Title', 'Price', 'Store'])
#iterate through all the products
for p in products:
title = p['title']
price = p['price_str']
store = p['merchant']['name']
df = pd.concat([pd.DataFrame([[title, price, store]], columns=df.columns),
df], ignore_index=True)
#Copy the DataFrame to CSV and JSON files
df.to_csv('google_shopping_search.csv', index=False)
df.to_json('google_shopping_search.json', orient='split', index=False)
The script doesn’t contain any print statements and writes everything in CSV and JSON files. Let’s look at a portion of the output CSV file.
As expected, the output CSV contains the Product Titles, Prices, and Store information for all the products listed on the search page.
Now, let’s scrape a specific product page.
The payload structure will be created using different parameters for the products page. Below is a list of the query parameters and their brief description.
Parameter | Description | Default Value |
---|---|---|
source | This parameter sets the type of scraper to use. | google_shopping_product |
domain | Domain name | com |
locale | Accept-Language header value to change in web interface language of Google Shopping page. | - |
results_language | Languages supported by Google. | - |
geo_location | The region for which the output should be adjusted. Using this parameter correctly is important if you want the right info. | - |
user_agent_type | The type of device and browser. | desktop |
render | It allows you to execute Javascript. | - |
callback_url | This is the URL where your POST request will be returned with the response. | - |
parse | If its value is set to true, it will return the structured data. | - |
Once again, for more detailed information on the parameters, check out our documentation.
We will be using product ID 4505166624001087642 for scraping. Using the parameters mentioned in the table, we can create a payload structure like this:
payload = {
'source': 'google_shopping_product',
'domain': 'com',
'query': '4505166624001087642',
'parse': 'true',
}
After the payload structure is ready, you can create the request by passing your authentication key.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('username', 'password'),
json=payload,
)
We’ll extract the Product Title, Product Details, Highlights, Rating, and Reviews Count from the response received. Like in the previous section, we’ll use JSON response and extract our desired output. You can see the structure of JSON output here.
# Get the content
product=response.json()['results'][0]['content']
# create a DataFrame
df = pd.DataFrame(columns=['Product Title', 'Product Details',
'Highlights', 'Rating', 'Reviews Count'])
# Get the elements from the response object
title = product['title']
details = product['description']
highlights = product['highlights']
rating = product['reviews']['rating']
reviews_count = product['reviews']['reviews_count']
# Add all the elements in DataFrame
df = pd.concat([pd.DataFrame([[title, details, highlights, rating, reviews_count]],
columns=df.columns), df], ignore_index=True)
In the above code, we’ve created a DataFrame object that will save all the extracted data in it. We can print this DataFrame or write it in CSV or JSON files.
# Copy the data in CSV and JSON file
df.to_csv('google_shopping_product.csv', index=False)
df.to_json('google_shopping_product.json', orient = 'split', index = False)
# Print the data on screen
print ('Product Name: ' + title)
print ('Product Details: ' + details)
print ('Product Highlights: ' + str(highlights))
print ('Product Rating: ' + str(rating))
print ('Reviews Count: ' + str(reviews_count))
Let’s put all the code together and see the output.
import pandas as pd
import requests
# Structure payload.
payload = {
'source': 'google_shopping_product',
'domain': 'com',
'query': '4505166624001087642',
'parse': 'true',
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('username', 'password'),
json=payload,
)
# Get the content
product=response.json()['results'][0]['content']
# create a DataFrame
df = pd.DataFrame(columns=['Product Title', 'Product Details',
'Highlights', 'Rating', 'Reviews Count'])
# Get the elements from the response object
title = product['title']
details = product['description']
highlights = product['highlights']
rating = product['reviews']['rating']
reviews_count = product['reviews']['reviews_count']
# Add all the elements in DataFrame
df = pd.concat([pd.DataFrame([[title, details, highlights, rating, reviews_count]],
columns=df.columns), df], ignore_index=True)
# Copy the data in CSV and JSON file
df.to_csv('google_shopping_product.csv', index=False)
df.to_json('google_shopping_product.json', orient = 'split', index = False)
# Print the data on screen
print ('Product Name: ' + title)
print ('Product Details: ' + details)
print ('Product Highlights: ' + str(highlights))
print ('Product Rating: ' + str(rating))
print ('Reviews Count: ' + str(reviews_count))
We’ve just successfully scraped a product page at Google Shopping. Let’s move on to scrape the Pricing page.
The payload structure will be created using different parameters for the pricing page. Below is a list of the query parameters and their brief description.
Parameter | Description | Default Value |
---|---|---|
source | This parameter sets the type of scraper to use. | google_shopping_pricing |
domain | Domain name | com |
start_page | Starting page number. | 1 |
pages | Number of pages you want to retrieve from the search result. | 1 |
locale | Accept-Language header value to change in web interface language of Google Shopping page. | - |
results_language | Languages supported by Google. | - |
geo_location | The region for which the output should be adjusted. Using this parameter correctly is important if you want the right info. | - |
user_agent_type | The type of device and browser. | desktop |
render | It allows you to execute Javascript. | - |
callback_url | This is the URL where your POST request will be returned with the response. | - |
parse | If its value is set to true, it will return the structured data. | - |
More information on the parameters can be found in our documentation.
We’ll be using product ID 4505166624001087642 for scraping. Using the parameters mentioned in the table, we can create a payload structure like this:
payload = {
'source': 'google_shopping_pricing',
'domain': 'com',
'query': '4505166624001087642',
'parse': 'true'
}
After the payload structure is ready, you can create the request by passing your authentication key.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('username', 'password'),
json=payload,
)
We’ll be extracting Product Name, Special Offer, Item Price, Total Price and Shipping charges from the JSON response received. You can find the structure of the JSON response here.
result = response.json()['results'][0]['content']
title = result['title']
pricing = result['pricing']
# Create a DataFrame
df = pd.DataFrame(columns=['Product Name', 'Special Offer',
'Item Price', 'Total Price', 'Shipping'])
for p in pricing:
offer = p['details']
item_price = p['price']
total_price = p['price_total']
shipping = p['price_shipping']
df = pd.concat([pd.DataFrame([[title, offer, item_price,
total_price, shipping]], columns=df.columns), df],
ignore_index=True)
The above script stores the extracted data in a DataFrame object. Therefore, saving data in CSV, JSON, or other formats is easy. Just execute the following code to save the whole data in CSV and JSON files.
df.to_csv('google_shopping_pricing.csv', index=False)
df.to_json('google_shopping_pricing.json', orient='split', index=False)
Let’s put all the code together and see the output.
import pandas as pd # include the pandas library for DataFrame
import requests # Include the requests library
# Structure payload.
payload = {
'source': 'google_shopping_pricing',
'domain': 'com',
'query': '4505166624001087642',
'parse': 'true'
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('username', 'password'),
json=payload,
)
# Get the content from the response
result = response.json()['results'][0]['content']
title = result['title']
pricing = result['pricing']
# Create a DataFrame
df = pd.DataFrame(columns=['Product Name', 'Special Offer',
'Item Price', 'Total Price', 'Shipping'])
for p in pricing:
offer = p['details']
item_price = p['price']
total_price = p['price_total']
shipping = p['price_shipping']
df = pd.concat([pd.DataFrame([[title, offer, item_price,
total_price, shipping]], columns=df.columns), df],
ignore_index=True)
# Copy the DataFrame to CSV and JSON files
df.to_csv('google_shopping_pricing.csv', index=False)
df.to_json('google_shopping_pricing.json', orient='split', index=False)
Scraping Google Shopping is essential if you’re looking to retrieve accurate data on your biggest competitors’ products and prices and make data-driven decisions to scale your business. If you're aiming to enhance your scraping capabilities, you can buy proxies to ensure smoother and more efficient data extraction. We hope this tutorial was clear and will contribute to more effortless and smooth data-gathering activities. You can also find all the necessary code files on our GitHub. But in case you still have any questions, don’t hesitate to contact us – Oxylabs’ professional team is always ready to assist you.
Want to broaden your Google data scraping skills? Explore our step-by-step guides for scraping Jobs, Search, Images, Trends, News, Flights, Scholar, and Maps.
About the author
Yelyzaveta Nechytailo
Senior Content Manager
Yelyzaveta Nechytailo is a Senior Content Manager at Oxylabs. After working as a writer in fashion, e-commerce, and media, she decided to switch her career path and immerse in the fascinating world of tech. And believe it or not, she absolutely loves it! On weekends, you’ll probably find Yelyzaveta enjoying a cup of matcha at a cozy coffee shop, scrolling through social media, or binge-watching investigative TV series.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scale up your business with Oxylabs®