Back to blog
Yelyzaveta Nechytailo
If you’re an e-commerce professional, you know that keeping up with the ever-changing prices is a must. However, it would be nearly impossible to do it without an automated solution, especially if we’re talking data at scale.
In today’s article, we’ll demonstrate how to build a scalable price tracker for Best Buy, one of the largest e-commerce websites for electronic devices.
For the tutorial, we’ll be using Python and Oxylabs’ Best Buy API.
You can find the following code examples on our GitHub.
Let’s begin by installing the libraries we will be using throughout the following tutorial.
pip install pandas
pip install matplotlib
We’ll be using pandas for easier dict management and saving our results, while matplotlib will help with plotting pricing history.
Now that we have all the prerequisites installed, we can begin working on the code.
To start off, we need to establish our connection to the Oxylabs' E-Commerce Scraper API, which will help us fetch the data we need from Best Buy.
import requests
USERNAME = "username"
PASSWORD = "password"
# Structure payload.
payload = {
'source': 'universal_ecommerce',
'url': "https://www.bestbuy.com/site/samsung-galaxy-z-flip4-128gb-unlocked-graphite/6512618.p?skuId=6512618&intl=nosplash",
'geo_location': 'United States',
'parse': True,
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=(USERNAME, PASSWORD),
json=payload,
)
print(response.json())
Here we have a simple code that sets up a request to the E-Commerce Scraper API and prints out the result. Note that we don’t have to write our own parsing tool, as the API delivers already structured data when scraping Best Buy.
If we check the response after running the code, we should see our scraping results:
Having the connection to the scraper established, we can start building the core logic of our price tracker. A basic specification for a price tracker could be a script that runs once a day to fetch today's price, adds it to the historical data we already have and saves it somewhere. Let’s do just that.
First, let’s create a function that would read the historical price tracker data.
def read_past_data(filepath):
results = {}
if not os.path.isfile(filepath):
open(filepath, 'a').close()
if not os.stat(filepath).st_size == 0:
results_df = pd.read_json(filepath, convert_axes=False)
results = results_df.to_dict()
return results
return results
The function takes the file path to our historical data file as an argument and returns the read data as a python dictionary. It also has a few logical considerations:
If there is no data file, one should be created;
If the data file is empty, we should return an empty dictionary.
Now that we have the historical price data loaded, we can think about a function that would take the past price tracker data and add today’s price to it.
def add_todays_prices(results, tracked_product_links):
today = date.today()
for link in tracked_product_links:
product = get_product(link)
if product["title"] not in results:
results[product["title"]] = {}
results[product["title"]][today.strftime("%d %B, %Y")] = {
"price": product["price"],
"currency": product["currency"],
}
return results
This function takes past Best Buy price tracking results and a list of product page URLs as arguments, then adds today’s price for the provided products to the already existing Best Buy prices and returns the results back.
Having the prices updated for today, we can move on to saving our results back to the file we started from, thus finishing our process loop.
def save_results(results, filepath):
df = pd.DataFrame.from_dict(results)
df.to_json(filepath)
return
Finally, we can move the connection to the Scraper API to a separate function and combine all we have done so far:
import os
import requests
import os.path
from datetime import date
import pandas as pd
def get_product(link):
USERNAME = "username"
PASSWORD = "password"
# Structure payload.
payload = {
'source': 'universal_ecommerce',
'url': link,
'geo_location': 'United States',
'parse': True,
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=(USERNAME, PASSWORD),
json=payload,
)
response_json = response.json()
content = response_json["results"][0]["content"]
product = {
"title": content["title"],
"price": content["price"]["price"],
"currency": content["price"]["currency"]
}
return product
def read_past_data(filepath):
results = {}
if not os.path.isfile(filepath):
open(filepath, 'a').close()
if not os.stat(filepath).st_size == 0:
results_df = pd.read_json(filepath, convert_axes=False)
results = results_df.to_dict()
return results
return results
def save_results(results, filepath):
df = pd.DataFrame.from_dict(results)
df.to_json(filepath)
return
def add_todays_prices(results, tracked_product_links):
today = date.today()
for link in tracked_product_links:
product = get_product(link)
if product["title"] not in results:
results[product["title"]] = {}
results[product["title"]][today.strftime("%d %B, %Y")] = {
"price": product["price"],
"currency": product["currency"],
}
return results
def main():
results_file = "data.json"
tracked_product_links = [
"https://www.bestbuy.com/site/samsung-galaxy-z-flip4-128gb-unlocked-graphite/6512618.p?skuId=6512618&intl=nosplash",
"https://www.bestbuy.com/site/samsung-galaxy-z-flip5-256gb-unlocked-graphite/6548838.p?skuId=6548838"
]
past_results = read_past_data(results_file)
updated_results = add_todays_prices(past_results, tracked_product_links)
save_results(updated_results, results_file)
if __name__ == "__main__":
main()
We coordinate all the logic of the application in the main() function. Variable results_file holds value for the path to the historical price tracker information and tracked_product_links has all the Best Buy product links we should track. Our application then reads past data from the file, fetches new prices and saves the results back to the file.
After we run the code, we can inspect our Best Buy product prices in the specified results file:
Having the core functionality already done, we can start adding a few useful features to our price tracker, like plotting the Best Buy product price changes over time.
We can do this by utilizing the matplotlib python library that we installed earlier.
def plot_history_chart(results):
for product in results:
dates = []
prices = []
for entry_date in results[product]:
dates.append(entry_date)
prices.append(results[product][entry_date]["price"])
plt.plot(dates,prices, label=product)
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("Product prices over time")
plt.legend()
plt.show()
The function above will plot multiple product price changes over time into a single diagram and then show it. When we add a call to plot_history_chart function to our existing main and run our code again, we will see the results:
Another useful functionality could be to get price drop alerts. This would help to direct our attention to a specific product, which becomes especially useful when price tracking multiple product prices at the same time.
def check_for_pricedrop(results):
for product in results:
today = date.today()
yesterday = today - timedelta(days = 1)
change = results[product][today.strftime("%d %B, %Y")]["price"] - results[product][yesterday.strftime("%d %B, %Y")]["price"]
if change < 0:
print(f'Price for {product} has dropped by {change}!')
Here, we have created a function that checks the price change between yesterday's price entry and today's one and reports if the change was negative. When we add a call to check_for_pricedrop function to our existing main and run our code again, we will see the results in the command line:
If we add all that we have done, our code will look like this:
import os
import requests
import os.path
from datetime import date
from datetime import timedelta
import pandas as pd
import matplotlib.pyplot as plt
def get_product(link):
USERNAME = "username"
PASSWORD = "password"
# Structure payload.
payload = {
'source': 'universal_ecommerce',
'url': link,
'geo_location': 'United States',
'parse': True,
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=(USERNAME, PASSWORD),
json=payload,
)
response_json = response.json()
content = response_json["results"][0]["content"]
product = {
"title": content["title"],
"price": content["price"]["price"],
"currency": content["price"]["currency"]
}
return product
def read_past_data(filepath):
results = {}
if not os.path.isfile(filepath):
open(filepath, 'a').close()
if not os.stat(filepath).st_size == 0:
results_df = pd.read_json(filepath, convert_axes=False)
results = results_df.to_dict()
return results
return results
def save_results(results, filepath):
df = pd.DataFrame.from_dict(results)
df.to_json(filepath)
return
def add_todays_prices(results, tracked_product_links):
today = date.today()
for link in tracked_product_links:
product = get_product(link)
if product["title"] not in results:
results[product["title"]] = {}
results[product["title"]][today.strftime("%d %B, %Y")] = {
"price": product["price"],
"currency": product["currency"],
}
return results
def plot_history_chart(results):
for product in results:
dates = []
prices = []
for entry_date in results[product]:
dates.append(entry_date)
prices.append(results[product][entry_date]["price"])
plt.plot(dates,prices, label=product)
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("Product prices over time")
plt.legend()
plt.show()
def check_for_pricedrop(results):
for product in results:
today = date.today()
yesterday = today - timedelta(days = 1)
change = results[product][today.strftime("%d %B, %Y")]["price"] - results[product][yesterday.strftime("%d %B, %Y")]["price"]
if change < 0:
print(f'Price for {product} has dropped by {change}!')
def main():
results_file = "data.json"
tracked_product_links = [
"https://www.bestbuy.com/site/samsung-galaxy-z-flip4-128gb-unlocked-graphite/6512618.p?skuId=6512618&intl=nosplash",
"https://www.bestbuy.com/site/samsung-galaxy-z-flip5-256gb-unlocked-graphite/6548838.p?skuId=6548838"
]
past_results = read_past_data(results_file)
updated_results = add_todays_prices(past_results, tracked_product_links)
plot_history_chart(updated_results)
check_for_pricedrop(updated_results)
save_results(updated_results, results_file)
if __name__ == "__main__":
main()
Having the code already laid out, we can see that while the core application is quite simple, it can be easily extended to accommodate more scale and complexity as time goes on:
Alerting could have an improved price change tracking algorithm and have the notifications be sent to some external channel like Telegram.
Plotting could be extended to save the resulting diagrams to a file or load them up on some external webpage to be viewed.
Result saving could be remade to use a database instead of saving to a file.
…and so on.
By combining Python and Oxylabs’ Best Buy Scraper API, we were able to build a scalable Best Buy price tracker. Our tracker can send price drop alerts, gather historical price change data, and plot a diagram for said changes.
If you have any questions about this or any other topic related to web scraping, don't hesitate to reach out to us at hello@oxylabs.io or through the live chat. Our professional team is always ready to assist you!
About the author
Yelyzaveta Nechytailo
Senior Content Manager
Yelyzaveta Nechytailo is a Senior Content Manager at Oxylabs. After working as a writer in fashion, e-commerce, and media, she decided to switch her career path and immerse in the fascinating world of tech. And believe it or not, she absolutely loves it! On weekends, you’ll probably find Yelyzaveta enjoying a cup of matcha at a cozy coffee shop, scrolling through social media, or binge-watching investigative TV series.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Augustas Pelakauskas
2023-04-02
Gabija Fatenaite
2021-05-13
Get the latest news from data gathering world
Scale up your business with Oxylabs®
GET IN TOUCH
General:
hello@oxylabs.ioSupport:
support@oxylabs.ioCareer:
career@oxylabs.ioCertified data centers and upstream providers
Connect with us
Advanced proxy solutions
Resources
Innovation hub
oxylabs.io© 2024 All Rights Reserved