E-commerce scraping can be overwhelming, especially when faced with countless options on platforms like Amazon. Luckily, Oxylabs E-Commerce Scraper API (now part of a Web Scraper API solution) combined with Python offers an optimal web scraping solution to retrieve Amazon price data. With E-Commerce Scraper API, you can schedule daily price scrapes to remain always aware of the current pricing models, price changes, and competitor pricing strategies. and By web scraping product prices from multiple Amazon pages, you can simplify your search and find the best deals without the hassle. It's a practical way to streamline your shopping experience and save time.
In this tutorial, we’ll scrape Amazon price data based on:
Best-selling items
Search results
Currently available deals.
You can find the following code on our GitHub.
You can download the latest version of Python from the official website.
To store your Python code, run the following command to create a new Python file in your current directory.
touch main.py
Next, run the command below to install the dependencies required for sending HTTP requests and data processing. We will use Requests and Pandas.
pip install requests pandas
Now, open the previously created Python file and import the installed libraries.
import requests
import pandas as pd
First of all, start by declaring your API credentials. Since we’ll be using the E-Commerce API, you’ll need to retrieve the credentials for authenticating with the API from your Oxylabs dashboard. Replace USERNAME and PASSWORD with the credentials you retrieved.
USERNAME = "USERNAME"
PASSWORD = "PASSWORD"
Now, let’s start by fetching the Amazon price data for best-selling items in a category on Amazon. First, let’s choose a category and retrieve its ID. For this tutorial, we’ll use the dog food category.
Go to the category page on your browser and inspect the URL. You should see a query parameter called node.
https://www.amazon.com/gp/browse.html?node=2975359011&ref_=nav_em__sd_df_0_2_19_4
The value of the node parameter is the ID of the dog food category. Save it to a variable called dog_food_category_id; we’ll use it in the payload of our API request. You can adjust the variable name based on your preferred category.
USERNAME = "USERNAME"
PASSWORD = "PASSWORD"
dog_food_category_id="2975359011"
Let’s start by implementing a function called get_best_seller_results. It should accept an argument called category_id.
def get_best_seller_results(category_id):
...
Next, we’ll be adding our API request. Declare a payload and send a POST request to Oxylabs E-Commerce API. Don’t forget to include your authentication credentials.
payload = {
"source": "amazon_bestsellers",
"domain": "com",
"query": category_id,
"start_page": 1,
"parse": True,
}
response = requests.post(
"https://realtime.oxylabs.io/v1/queries",
auth=(USERNAME, PASSWORD),
json=payload,
)
response.raise_for_status()
results = response.json()["results"][0]["content"]["results"]
Make sure the source parameter is set to amazon_bestsellers and the parse parameter is set to True. Feel free to adjust other parameters to your preference.
Next, let’s extract the data we need from the retrieved results.
return [
{
"price": result["price"],
"title": result["title"],
"currency": result["currency"],
}
for result in results
]
Here’s the full code of the get_best_seller_results function:
def get_best_seller_results(category_id):
payload = {
"source": "amazon_bestsellers",
"domain": "com",
"query": category_id,
"start_page": 1,
"parse": True,
}
response = requests.post(
"https://realtime.oxylabs.io/v1/queries",
auth=(USERNAME, PASSWORD),
json=payload,
)
response.raise_for_status()
results = response.json()["results"][0]["content"]["results"]
return [
{
"price": result["price"],
"title": result["title"],
"currency": result["currency"],
}
for result in results
]
Next, let’s scrape prices for Amazon search results. We can reuse most of the code from the get_best_seller_results function, changing only the payload and results variables.
Let’s adjust the payload parameter first.
payload = {
"source": "amazon_search",
"domain": "com",
"query": "couch",
"start_page": 1,
"parse": True,
}
The source parameter should be amazon_search. The query is now a simple search query that you would use in the Amazon website. In this example, we’ll be scraping couch prices.
Next, the results variable should be extracted with an additional key called organic. Here’s how it should look.
results = response.json()["results"][0]["content"]["results"]["organic"]
Finally, we can put it all together in a function called get_search_results. The function should accept a query parameter and use it in the payload. Here’s the full code for the function.
def get_search_results(query):
payload = {
"source": "amazon_search",
"domain": "com",
"query": query,
"start_page": 1,
"parse": True,
}
response = requests.post(
"https://realtime.oxylabs.io/v1/queries",
auth=(USERNAME, PASSWORD),
json=payload,
)
response.raise_for_status()
results = response.json()["results"][0]["content"]["results"]["organic"]
return [
{
"price": result["price"],
"title": result["title"],
"currency": result["currency"],
}
for result in results
]
Next, let’s get prices for deals in a category. Oxylabs E-Commerce API doesn’t have a source setting for getting deals, so we can simply use the amazon source parameter to get the prices from specific product links.
Any amazon page can be scraped with the same code, when using amazon as the source parameter. These pages include:
New Releases
Wish Lists and Gift Guides
Category-Specific Pages
Amazon Outlet
Amazon Warehouse Deals
Amazon Pantry and Grocery
Amazon Brand Stores
International Amazon Sites
Here’s an example URL of Amazon deals for camping supplies.
https://www.amazon.com/s?i=sporting&rh=n%3A3400371%2Cp_n_deal_type%3A23566064011&s=exact-aware-popularity-rank&pf_rd_i=10805321&pf_rd_m=ATVPDKIKX0DER&pf_rd_p=bf702ff1-4bf6-4c17-ab26-f4867bf293a9&pf_rd_r=ER3N9MGTCESZPZ0KRV8R&pf_rd_s=merchandised-search-3&pf_rd_t=101&ref=s9_acss_bw_cg_SODeals_3e1_w
The payload should look like this.
payload = {
"source": "amazon",
"url": "https://www.amazon.com/s?i=sporting&rh=n%3A3400371%2Cp_n_deal_type%3A23566064011&s=exact-aware-popularity-rank&pf_rd_i=10805321&pf_rd_m=ATVPDKIKX0DER&pf_rd_p=bf702ff1-4bf6-4c17-ab26-f4867bf293a9&pf_rd_r=ER3N9MGTCESZPZ0KRV8R&pf_rd_s=merchandised-search-3&pf_rd_t=101&ref=s9_acss_bw_cg_SODeals_3e1_w",
"parse": True,
}
Now, we can implement another function called get_deals_results. The function should accept a url parameter that is then used in the payload. The rest of the code can be identical to the get_search_results function.
def get_deals_results(url):
payload = {
"source": "amazon",
"url": url,
"parse": True,
}
response = requests.post(
"https://realtime.oxylabs.io/v1/queries",
auth=(USERNAME, PASSWORD),
json=payload,
)
response.raise_for_status()
results = response.json()["results"][0]["content"]["results"]["organic"]
return [
{
"price": result["price"],
"title": result["title"],
"currency": result["currency"],
}
for result in results
]
Now that we have our three price scraping functions, we can use them to retrieve our data and dump it into CSV files.
We’ll utilize the previously installed pandas library for this. Create a pandas data frame for each result dictionary and use the to_csv method to create a CSV file.
Here’s how it could look.
dog_food_category_id = "2975359011"
best_seller_results = get_best_seller_results(dog_food_category_id)
best_seller_df = pd.DataFrame(best_seller_results)
best_seller_df.to_csv("best_seller.csv")
search_results = get_search_results("couch")
search_df = pd.DataFrame(search_results)
search_df.to_csv("search.csv")
camping_deal_url = "https://www.amazon.com/s?i=sporting&rh=n%3A3400371%2Cp_n_deal_type%3A23566064011&s=exact-aware-popularity-rank&pf_rd_i=10805321&pf_rd_m=ATVPDKIKX0DER&pf_rd_p=bf702ff1-4bf6-4c17-ab26-f4867bf293a9&pf_rd_r=ER3N9MGTCESZPZ0KRV8R&pf_rd_s=merchandised-search-3&pf_rd_t=101&ref=s9_acss_bw_cg_SODeals_3e1_w"
deal_results = get_deal_results(camping_deal_url)
deal_df = pd.DataFrame(deal_results)
deal_df.to_csv("deals.csv")
You should have three separate CSV files in your directory after running the code. The data can look something like this.
First off, to make our code cleaner, let’s create a parser function called parse_price_results to reuse the result parsing code in each scraping function. The function should accept an argument called results.
def parse_price_results(results):
return [
{
"price": result["price"],
"title": result["title"],
"currency": result["currency"],
}
for result in results
]
Here’s the full code utilizing the parse_price_results function.
import requests
import pandas as pd
USERNAME = "USERNAME"
PASSWORD = "PASSWORD"
def parse_price_results(results):
return [
{
"price": result["price"],
"title": result["title"],
"currency": result["currency"],
}
for result in results
]
def get_best_seller_results(category_id):
payload = {
"source": "amazon_bestsellers",
"domain": "com",
"query": category_id,
"start_page": 1,
"parse": True,
}
response = requests.post(
"https://realtime.oxylabs.io/v1/queries",
auth=(USERNAME, PASSWORD),
json=payload,
)
response.raise_for_status()
results = response.json()["results"][0]["content"]["results"]
return parse_price_results(results)
def get_search_results(query):
payload = {
"source": "amazon_search",
"domain": "com",
"query": query,
"start_page": 1,
"parse": True,
}
response = requests.post(
"https://realtime.oxylabs.io/v1/queries",
auth=(USERNAME, PASSWORD),
json=payload,
)
response.raise_for_status()
results = response.json()["results"][0]["content"]["results"]["organic"]
return parse_price_results(results)
def get_deals_results(url):
payload = {
"source": "amazon",
"url": url,
"parse": True,
}
response = requests.post(
"https://realtime.oxylabs.io/v1/queries",
auth=(USERNAME, PASSWORD),
json=payload,
)
response.raise_for_status()
results = response.json()["results"][0]["content"]["results"]["organic"]
return parse_price_results(results)
dog_food_category_id = "2975359011"
best_seller_results = get_best_seller_results(dog_food_category_id)
best_seller_df = pd.DataFrame(best_seller_results)
best_seller_df.to_csv("best_seller.csv")
search_results = get_search_results("couch")
search_df = pd.DataFrame(search_results)
search_df.to_csv("search.csv")
deal_url = "https://www.amazon.com/s?i=sporting&rh=n%3A3400371%2Cp_n_deal_type%3A23566064011&s=exact-aware-popularity-rank&pf_rd_i=10805321&pf_rd_m=ATVPDKIKX0DER&pf_rd_p=bf702ff1-4bf6-4c17-ab26-f4867bf293a9&pf_rd_r=ER3N9MGTCESZPZ0KRV8R&pf_rd_s=merchandised-search-3&pf_rd_t=101&ref=s9_acss_bw_cg_SODeals_3e1_w"
deal_results = get_deals_results(deal_url)
deal_df = pd.DataFrame(deal_results)
deal_df.to_csv("deals.csv")
In this article, we’ve covered how to scrape Amazon price information with Python and Oxylabs E-Commerce API and learned to use the pandas library to export the price data to CSV files. This method of data retrieval makes it a lot easier to track down the best deals on any Amazon page and scrape Amazon data. This product data can be invaluable if you want to automate price adjustments, identify common price points and future market movements, make informed pricing decisions, and shape a business pricing strategy overall. Additionally, scraped Amazon price data can help identify competitor pricing strategies to gain a better understanding of the price elasticity.
We also have a tutorial for building a custom Amazon price tracker and some others:
About the author
Maryia Stsiopkina
Senior Content Manager
Maryia Stsiopkina is a Senior Content Manager at Oxylabs. As her passion for writing was developing, she was writing either creepy detective stories or fairy tales at different points in time. Eventually, she found herself in the tech wonderland with numerous hidden corners to explore. At leisure, she does birdwatching with binoculars (some people mistake it for stalking), makes flower jewelry, and eats pickles.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scale up your business with Oxylabs®
GET IN TOUCH
General:
hello@oxylabs.ioSupport:
support@oxylabs.ioCareer:
career@oxylabs.ioCertified data centers and upstream providers
Connect with us
Advanced proxy solutions
Resources
Data Collection
Innovation hub
oxylabs.io© 2024 All Rights Reserved