Back to blog
Roberta Aukstikalnyte
Whether you’re a buyer or a seller in the e-commerce world, you know how crucial it is to stay updated about price changes. If you’re interested in tracking pricing data on specifically Wayfair, in today’s article, we’ll demonstrate how to do just that. By the end of this tutorial, you'll have a scalable Wayfair price tracker that not only delivers the data but also sends pricing change alerts and generates price fluctuation diagrams.
Let’s get started!
For this tutorial, we’re going to use Python and Oxylabs’ Wayfair Scraper API.
First, let’s install the libraries we’ll be using throughout the tutorial.
pip install requests
pip install pandas
pip install matplotlib
pip install beautifulsoup4
We'll use requests to make HTTP calls to the API. To parse the html of the scraped page, we’ll be using beautifulsoup4. Meanwhile, pandas will help with easier dict management and saving the results. Finally, matplotlib will be used for plotting price histories.
Now that we have all the prerequisites installed, we can start writing the code. Let’s first establish our connection with the scraper API.
import requests
USERNAME = "username"
PASSWORD = "password"
# Structure payload.
payload = {
"source": "universal_ecommerce",
"url": "https://www.wayfair.com/appliances/pdp/unique-appliances-classic-retro-30-frost-free-177-cu-ft-energy-star-certified-bottom-freezer-refrigerator-unqe1173.html",
"user_agent_type": "desktop",
"render": "html",
"browser_instructions": [
{
"type": "wait_for_element",
"selector": {
"type": "css",
"value": "[data-enzyme-id='PriceBlock']"
},
"timeout_s": 30
}
]
}
# Get response.
response = requests.request(
"POST",
"https://data.oxylabs.io/v1/queries",
auth=(USERNAME, PASSWORD),
json=payload,
)
print(response.json())
Here, we have a simple code that sets up a request to the Wayfair Scraper API and creates a scraping job. Note that we don’t get the result instantly – it’s rather the information about the job that we’ve just created.
We’ll also add a “render”: “html” parameter to our payload, as Wayfair pages don’t load prices without JavaScript at the moment of writing. To learn more about the parameters, please refer to documentation for Web Scraper API.
If we check the response after running the code, we should see the job information. It should look something like this:
For the next step, we need to create a logic that waits for the job to finish, and then fetches the results.
import time
# Get response.
response = requests.request(
"POST",
"https://data.oxylabs.io/v1/queries",
auth=(USERNAME, PASSWORD),
json=payload,
)
print(response.json())
response_json = response.json()
job_id = response_json["id"]
status = ""
# Wait until the job is done
while status != "done":
time.sleep(5)
response = requests.request(
"GET",
f"https://data.oxylabs.io/v1/queries/{job_id}",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
status = response_json.get("status")
print(f"Job {job_id} status is {status}")
# Fetch the job results
response = requests.request(
"GET",
f"https://data.oxylabs.io/v1/queries/{job_id}/results",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
print(response_json)
In the code above, we’re creating a while loop that keeps pooling the API for updates on the job status until it’s done. Then, we fetch the job results.
With the connection to the scraper established, we can start building the core logic of our price tracking tool. To summarize, our tracker is going to be a script that runs once a day to fetch today's price, adds it to the Wayfair price history data we already have, and then saves it.
First, let’s create a function that reads the historical price tracker data.
def read_past_data(filepath):
results = {}
if not os.path.isfile(filepath):
open(filepath, "a").close()
if not os.stat(filepath).st_size == 0:
results_df = pd.read_json(filepath, convert_axes=False)
results = results_df.to_dict()
return results
return results
The function takes the file path to our historical data file as an argument and returns the read data as a Python dictionary. Also, it features a few logical considerations:
If there is no data file, one should be created,
If the data file is empty, we should return an empty dictionary.
Now that we have the historical price data loaded, we can come up with a function that adds today’s price to the past price tracker data.
def add_todays_prices(results, tracked_product_links):
today = date.today()
for link in tracked_product_links:
product = get_product(link)
if product["title"] not in results:
results[product["title"]] = {}
results[product["title"]][today.strftime("%d %B, %Y")] = {
"price": product["price"],
}
return results
This function takes past Wayfair price tracking results and a list of product page URLs as arguments. Afterwards, it adds today’s price for the provided products to the already existing Wayfair prices and returns the results back.
With the prices updated for today, we can move on to saving our results back to the file we started from, thus finishing our process loop.
def save_results(results, filepath):
df = pd.DataFrame.from_dict(results)
df.to_json(filepath)
return
At last, we can move the connection to the Scraper API to a separate function and combine all we have done so far. Note that in the username and password areas, you'll need to insert your own credentials.
import os
import time
from bs4 import BeautifulSoup
import requests
import os.path
from datetime import date
import pandas as pd
def get_product(link):
USERNAME = "username"
PASSWORD = "password"
# Structure payload.
payload = {
"source": "universal_ecommerce",
"url": link,
"user_agent_type": "desktop",
"render": "html",
"browser_instructions": [
{
"type": "wait_for_element",
"selector": {
"type": "css",
"value": "[data-enzyme-id='PriceBlock']"
},
"timeout_s": 30
}
]
}
# Post the scraping job
response = requests.request(
"POST",
"https://data.oxylabs.io/v1/queries",
auth=(USERNAME, PASSWORD),
json=payload,
)
response_json = response.json()
job_id = response_json["id"]
status = ""
# Wait until the job is done
while status != "done":
time.sleep(5)
response = requests.request(
"GET",
f"https://data.oxylabs.io/v1/queries/{job_id}",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
status = response_json.get("status")
print(f"Job {job_id} status is {status}")
# Fetch the job results
response = requests.request(
"GET",
f"https://data.oxylabs.io/v1/queries/{job_id}/results",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
content = response_json["results"][0]["content"]
soup = BeautifulSoup(content, "html.parser")
title = soup.select_one("header h1").text
try:
price = soup.select_one(".SFPrice div span:first-of-type").text
except AttributeError:
try:
price = soup.select_one("div [data-enzyme-id='PriceBlock'] span").text
except AttributeError as err:
price = None
print(err)
product = {
"title": title,
"price": price,
}
return product
def read_past_data(filepath):
results = {}
if not os.path.isfile(filepath):
open(filepath, "a").close()
if not os.stat(filepath).st_size == 0:
results_df = pd.read_json(filepath, convert_axes=False)
results = results_df.to_dict()
return results
return results
def save_results(results, filepath):
df = pd.DataFrame.from_dict(results)
df.to_json(filepath)
return
def add_todays_prices(results, tracked_product_links):
today = date.today()
for link in tracked_product_links:
product = get_product(link)
if product["title"] not in results:
results[product["title"]] = {}
results[product["title"]][today.strftime("%d %B, %Y")] = {
"price": product["price"],
}
return results
def main():
results_file = "data.json"
tracked_product_links = [
"https://www.wayfair.com/appliances/pdp/unique-appliances-classic-retro-30-frost-free-177-cu-ft-energy-star-certified-bottom-freezer-refrigerator-unqe1173.html",
"https://www.wayfair.com/appliances/pdp/samsung-bespoke-30-cu-ft-3-door-refrigerator-with-beverage-center-and-custom-panels-included-smsg1754.html"
]
past_results = read_past_data(results_file)
updated_results = add_todays_prices(past_results, tracked_product_links)
save_results(updated_results, results_file)
if __name__ == "__main__":
main()
Here, we coordinate all the logic of the application in the main() function. Variable results_file holds value for the path to the historical price tracking tool information and tracked_product_links has all the Wayfair product links we should track. Our application then reads past data from the file, fetches new prices and saves the results back to the file.
After we run the code, we can examine our Wayfair product prices in the specified results file:
With the core functionality successfully established, we can start adding a few useful features to our Wayfair price tracking system, e.g., plotting the Wayfair product price changes over time.
We can do this by utilizing the matplotlib Python library that we have installed prior to this:
def plot_history_chart(results):
for product in results:
dates = []
prices = []
for entry_date in results[product]:
dates.append(entry_date)
prices.append(results[product][entry_date]["price"])
plt.plot(dates,prices, label=product)
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("Product prices over time")
plt.legend()
plt.show()
The function above will plot multiple product price changes over time into a single diagram, and then display it. When we add a call to plot_history_chart function to our existing main and run our code again. It’ll show how the prices fluctuate as per our screenshot below:
Another useful functionality could be receiving price drop alerts. These are especially helpful when you’re tracking multiple product prices simultaneously.
def check_for_pricedrop(results):
for product in results:
today = date.today()
yesterday = today - timedelta(days = 1)
if yesterday.strftime("%d %B, %Y") in results[product]:
change = results[product][today.strftime("%d %B, %Y")]["price"] - results[product][yesterday.strftime("%d %B, %Y")]["price"]
if change < 0:
print(f"Price for {product} has dropped by {change}!")
Here, we’ve created a function that checks the price change between yesterday's and today’s price entry and reports if the change was negative. When we add a call to check_for_pricedrop function to our existing main and run our code again, we will see the results in the command line:
Here’s what our code looks like all compiled together:
import os
import time
from bs4 import BeautifulSoup
import requests
import os.path
from datetime import date
from datetime import timedelta
import pandas as pd
import matplotlib.pyplot as plt
def get_product(link):
USERNAME = "username"
PASSWORD = "password"
# Structure payload.
payload = {
"source": "universal_ecommerce",
"url": link,
"user_agent_type": "desktop",
"render": "html",
"browser_instructions": [
{
"type": "wait_for_element",
"selector": {
"type": "css",
"value": "[data-enzyme-id='PriceBlock']"
},
"timeout_s": 30
}
]
}
# Post the scraping job
response = requests.request(
"POST",
"https://data.oxylabs.io/v1/queries",
auth=(USERNAME, PASSWORD),
json=payload,
)
response_json = response.json()
job_id = response_json["id"]
status = ""
# Wait until the job is done
while status != "done":
time.sleep(5)
response = requests.request(
"GET",
f"https://data.oxylabs.io/v1/queries/{job_id}",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
status = response_json.get("status")
print(f"Job {job_id} status is {status}")
# Fetch the job results
response = requests.request(
"GET",
f"https://data.oxylabs.io/v1/queries/{job_id}/results",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
content = response_json["results"][0]["content"]
soup = BeautifulSoup(content, "html.parser")
title = soup.select_one("header h1").text
try:
price = soup.select_one(".SFPrice div span:first-of-type").text
except AttributeError:
try:
price = soup.select_one("div [data-enzyme-id='PriceBlock'] span").text
except AttributeError as err:
price = None
print(err)
product = {
"title": title,
"price": price,
}
return product
def read_past_data(filepath):
results = {}
if not os.path.isfile(filepath):
open(filepath, "a").close()
if not os.stat(filepath).st_size == 0:
results_df = pd.read_json(filepath, convert_axes=False)
results = results_df.to_dict()
return results
return results
def save_results(results, filepath):
df = pd.DataFrame.from_dict(results)
df.to_json(filepath)
return
def add_todays_prices(results, tracked_product_links):
today = date.today()
for link in tracked_product_links:
product = get_product(link)
if product["title"] not in results:
results[product["title"]] = {}
results[product["title"]][today.strftime("%d %B, %Y")] = {
"price": product["price"],
}
return results
def plot_history_chart(results):
for product in results:
dates = []
prices = []
for entry_date in results[product]:
dates.append(entry_date)
prices.append(results[product][entry_date]["price"])
plt.plot(dates,prices, label=product)
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("Product prices over time")
plt.legend()
plt.show()
def check_for_pricedrop(results):
for product in results:
today = date.today()
yesterday = today - timedelta(days = 1)
if yesterday.strftime("%d %B, %Y") in results[product]:
change = results[product][today.strftime("%d %B, %Y")]["price"] - results[product][yesterday.strftime("%d %B, %Y")]["price"]
if change < 0:
print(f"Price for {product} has dropped by {change}!")
def main():
results_file = "data.json"
tracked_product_links = [
"https://www.wayfair.com/appliances/pdp/unique-appliances-classic-retro-30-frost-free-177-cu-ft-energy-star-certified-bottom-freezer-refrigerator-unqe1173.html",
"https://www.wayfair.com/appliances/pdp/samsung-bespoke-30-cu-ft-3-door-refrigerator-with-beverage-center-and-custom-panels-included-smsg1754.html"
]
past_results = read_past_data(results_file)
past_results = {}
updated_results = add_todays_prices(past_results, tracked_product_links)
plot_history_chart(updated_results)
check_for_pricedrop(updated_results)
save_results(updated_results, results_file)
if __name__ == "__main__":
main()
With the code laid out, we can see that while the core application is quite simple, it can be easily extended to accommodate more scale and complexity as time goes on.
Here are a few ideas:
Alerting could have an improved price change tracking algorithm and have the notifications be sent to some external channel like Telegram.
Plotting could be extended to save the resulting diagrams to a file or load them up in some external webpage to be viewed.
Result saving could be remade to use a database instead of saving to a file.
In today’s article, we successfully built a Wayfair price tracker that collects data, sends price change alerts, and visually presents price fluctuations.
Hopefully, you find this tutorial helpful and easy-to-follow. For more similar tutorials, check our blog post on scraping Wayfair product data. If you have any questions or feedback regarding this article, please feel free to drop us a line at support@oxylabs.io, and our professionals will get back to you within a day.
To track prices on Wayfair, you’ll need an automated, scalable web scraping solution. For example, the Wayfair Scraper API helps collect prices and any other type of public data from Wayfair: search results, product information, and more. You can then use this data for such use cases as competitor intelligence or e-commerce MAP monitoring.
Wayfair's pricing algorithm continuously adapts product listings in response to a range of factors, such as seasonal promotions, competitor pricing, and stock availability. This is a common practice in the e-commerce industry. Its purpose is balancing between keeping prices competitive and keeping the profit margins healthy.
To be able to keep up with constant price changes, you’ll need an automated solution that gathers the data and sends you alerts to inform you about price drops or increases. With a web scraping tool like Oxylabs’ E-Commerce Scraper API (now part of a Web Scraper API) and basic Python knowledge, you should be able to successfully set up a price tracking system.
About the author
Roberta Aukstikalnyte
Senior Content Manager
Roberta Aukstikalnyte is a Senior Content Manager at Oxylabs. Having worked various jobs in the tech industry, she especially enjoys finding ways to express complex ideas in simple ways through content. In her free time, Roberta unwinds by reading Ottessa Moshfegh's novels, going to boxing classes, and playing around with makeup.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scale up your business with Oxylabs®