If you come from the US, Target probably needs no introduction. But just in case, it’s one of the biggest in-store and online retailers, specializing in everything from groceries to electronic devices. Given how large it is, it’s only natural that Target possesses a lot of valuable pricing data and price changes for e-commerce professionals.
At the same time, there can be thousands of said pricing changes, making it impossible for e-commerce professionals to keep up with no automated solution. Because of this, in today’s article, we’re going to show how to build a designated Target price tracker. Our Target price scraper will monitor the lowest prices, items on sale, and price drops, send you price change alerts, generate pricing diagrams, and deliver historical data.
Let’s begin by installing the libraries we will be using throughout the following tutorial.
pip install pandas
pip install matplotlib
We will use pandas for easier dict management and saving of results, while matplotlib will be used for plotting price histories.
As we have all the prerequisites installed, we can start working on the code. To start off, we need to connect to the Oxylabs E-Commerce Scraper API, which will help us to fetch the data we need from Target.
import requests
USERNAME = "username"
PASSWORD = "password"
url = "https://www.target.com/p/playstation-5-console-marvel-39-s-spider-man-2-bundle/-/A-89981366"
# Structure payload.
payload = {
'source': 'universal_ecommerce',
'url': url,
'geo_location': 'United States',
'render': 'html',
'parse': True,
}
response = requests.request(
'POST',
'https://data.oxylabs.io/v1/queries',
auth=(USERNAME, PASSWORD),
json=payload,
)
print(response.json())
Here, we have some code that sets up a request to the E-Commerce Scraper API and creates a scraping job.
We will use the universal_ecommerce source to extract the information we need. We also define geo_location to the country we want the scraping to be performed from. The API provides us with structured data from the page already, but if we need something additional - we can get that information using the Custom Parser functionality. You can also read more about all the available parameters in the E-Commerce Scraper API documentation for Target.
If we check the response after running the code, we should see the job information:
The next step would be to create some logic that would wait for our job to finish and then fetch the results.
import time
# Get response.
response = requests.request(
'POST',
'https://data.oxylabs.io/v1/queries',
auth=(USERNAME, PASSWORD),
json=payload,
)
print(response.json())
response_json = response.json()
job_id = response_json["id"]
status = ""
# Pool until the job is done
while status != "done":
time.sleep(5)
response = requests.request(
'GET',
f"https://data.oxylabs.io/v1/queries/{job_id}",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
status = response_json.get("status")
print(f"Job status is {status}")
# Fetch the job results
response = requests.request(
'GET',
f"https://data.oxylabs.io/v1/queries/{job_id}/results",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
print(response_json)
In the code above, we create a while loop that keeps pooling the API for updates on the job status until it’s done, and then we fetch the job results.
Having the connection to the Scraper API established, we can start building the core logic of our Target price tracker. The basic requirements for a price scraper could be a script that runs once a day to fetch today's price, then adds it to the historical data we already have and saves it. So, let’s start with that.
We will begin by creating a function that would read the historical data about past Target prices we could have already gathered.
def read_past_data(filepath):
results = {}
if not os.path.isfile(filepath):
open(filepath, 'a').close()
if not os.stat(filepath).st_size == 0:
results_df = pd.read_json(filepath, convert_axes=False)
results = results_df.to_dict()
return results
return results
The function takes the file path to our historical data file as an argument and returns the read data as a Python dictionary. It also has a few logical considerations:
If there is no data file, one should be created.
If the data file is empty, we should return an empty dictionary.
Now that we have the historical price data loaded, we can think about a function that would take the past price scraper data and add today’s product price.
def add_todays_prices(results, tracked_product_urls):
today = date.today()
for url in tracked_product_urls:
product = get_product(url)
if product["title"] not in results:
results[product["title"]] = {}
results[product["title"]][today.strftime("%d %B, %Y")] = {
"price": product["price"],
}
return results
This function takes past Target price tracking results and a list of product urls as arguments, then adds today’s price for the provided products to the already existing Target prices and returns the results.
Having the prices updated for today, we can move on to saving our results back to the file we started from, thus finishing our process loop.
def save_results(results, filepath):
df = pd.DataFrame.from_dict(results)
df.to_json(filepath)
return
Finally, we can move the connection to the Scraper API to a separate function and combine all we have done so far:
import os
import time
import requests
import os.path
from datetime import date
import pandas as pd
def get_product(url):
USERNAME = "username"
PASSWORD = "password"
# Structure payload.
payload = {
'source': 'universal_ecommerce',
'url': url,
'geo_location': 'United States',
'render': 'html',
'parse': True,
}
# Post the scraping job
response = requests.request(
'POST',
'https://data.oxylabs.io/v1/queries',
auth=(USERNAME, PASSWORD),
json=payload,
)
response_json = response.json()
job_id = response_json["id"]
status = ""
# Wait until the job is done
while status != "done":
time.sleep(5)
response = requests.request(
'GET',
f"https://data.oxylabs.io/v1/queries/{job_id}",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
status = response_json.get("status")
print(f"Job status is {status}")
# Fetch the job results
response = requests.request(
'GET',
f"https://data.oxylabs.io/v1/queries/{job_id}/results",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
content = response_json["results"][0]["content"]
title = content["title"]
price = content["price"]
product = {
"title": title,
"price": price,
}
return product
def read_past_data(filepath):
results = {}
if not os.path.isfile(filepath):
open(filepath, 'a').close()
if not os.stat(filepath).st_size == 0:
results_df = pd.read_json(filepath, convert_axes=False)
results = results_df.to_dict()
return results
return results
def save_results(results, filepath):
df = pd.DataFrame.from_dict(results)
df.to_json(filepath)
return
def add_todays_prices(results, tracked_product_urls):
today = date.today()
for url in tracked_product_urls:
product = get_product(url)
if product["title"] not in results:
results[product["title"]] = {}
results[product["title"]][today.strftime("%d %B, %Y")] = {
"price": product["price"],
}
return results
def main():
results_file = "data.json"
tracked_product_urls = [
"https://www.target.com/p/sony-playstation-4-pro-gaming-console-1tb-spider-man-limited-edition-with-wireless-controller-manufacturer-refurbished/-/A-89167778#lnk=sametab",
"https://www.target.com/p/playstation-5-console-marvel-39-s-spider-man-2-bundle/-/A-89981366"
]
past_results = read_past_data(results_file)
updated_results = add_todays_prices(past_results, tracked_product_urls)
save_results(updated_results, results_file)
if __name__ == "__main__":
main()
We coordinate all the logic of the application in the main() function. Variable results_file holds value for the path to the historical price tracker information,and tracked_product_urls has all the Target product urls we should track. Our script then reads past data from the file, fetches new prices, and saves the results back to the file.
After we run the code, we can inspect our Target product prices in the specified results file:
Having the prices scraped and saved to a file, we can start adding a few useful features to our price tracker, like plotting the Target product price changes over time.
We can do this by utilizing the matplotlib python library that we installed earlier.
def plot_history_chart(results):
for product in results:
dates = []
prices = []
for entry_date in results[product]:
dates.append(entry_date)
prices.append(float(results[product][entry_date]["price"]))
plt.plot(dates,prices, label=product[:30])
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("Product prices over time")
plt.legend(loc='lower center', bbox_to_anchor=(0.5, 1.05), ncol=3, fancybox=True, shadow=True)
plt.show()
The function above will plot multiple product price changes over time into a single diagram and then show it. When we add a call to plot_history_chart function to our existing main and run our code again, we will see the results:
Another useful functionality could be to get a notification about price drops. This would help direct our attention to a specific product, which becomes especially useful when price tracking multiple product prices at the same time.
def check_for_pricedrop(results):
for product in results:
today = date.today()
yesterday = today - timedelta(days = 1)
if yesterday.strftime("%d %B, %Y") in results[product]:
change = float(results[product][today.strftime("%d %B, %Y")]["price"]) - float(results[product][yesterday.strftime("%d %B, %Y")]["price"])
if change < 0:
print(f'Price for {product} has dropped by {change}!')
Here, we have created a function that checks the price changes between yesterday's price entry and today's one and reports if the change was negative. When we add a call to check_for_pricedrop function to our existing main and run our code again, we will see the results in the command line:
If we add all that we have done, our code will look like this:
import os
import time
import requests
import os.path
from datetime import date
from datetime import timedelta
import pandas as pd
import matplotlib.pyplot as plt
def get_product(url):
USERNAME = "username"
PASSWORD = "password"
# Structure payload.
payload = {
'source': 'universal_ecommerce',
'url': url,
'geo_location': 'United States',
'render': 'html',
'parse': True,
}
# Post the scraping job
response = requests.request(
'POST',
'https://data.oxylabs.io/v1/queries',
auth=(USERNAME, PASSWORD),
json=payload,
)
response_json = response.json()
job_id = response_json["id"]
status = ""
# Wait until the job is done
while status != "done":
time.sleep(5)
response = requests.request(
'GET',
f"https://data.oxylabs.io/v1/queries/{job_id}",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
status = response_json.get("status")
print(f"Job status is {status}")
# Fetch the job results
response = requests.request(
'GET',
f"https://data.oxylabs.io/v1/queries/{job_id}/results",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
content = response_json["results"][0]["content"]
title = content["title"]
price = content["price"]
product = {
"title": title,
"price": price,
}
return product
def read_past_data(filepath):
results = {}
if not os.path.isfile(filepath):
open(filepath, 'a').close()
if not os.stat(filepath).st_size == 0:
results_df = pd.read_json(filepath, convert_axes=False)
results = results_df.to_dict()
return results
return results
def save_results(results, filepath):
df = pd.DataFrame.from_dict(results)
df.to_json(filepath)
return
def add_todays_prices(results, tracked_product_urls):
today = date.today()
for url in tracked_product_urls:
product = get_product(url)
if product["title"] not in results:
results[product["title"]] = {}
results[product["title"]][today.strftime("%d %B, %Y")] = {
"price": product["price"],
}
return results
def plot_history_chart(results):
for product in results:
dates = []
prices = []
for entry_date in results[product]:
dates.append(entry_date)
prices.append(float(results[product][entry_date]["price"]))
plt.plot(dates,prices, label=product[:30])
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("Product prices over time")
plt.legend(loc='lower center', bbox_to_anchor=(0.5, 1.05), ncol=3, fancybox=True, shadow=True)
plt.show()
def check_for_pricedrop(results):
for product in results:
today = date.today()
yesterday = today - timedelta(days = 1)
if yesterday.strftime("%d %B, %Y") in results[product]:
change = float(results[product][today.strftime("%d %B, %Y")]["price"]) - float(results[product][yesterday.strftime("%d %B, %Y")]["price"])
if change < 0:
print(f'Price for {product} has dropped by {change}!')
def main():
results_file = "data.json"
tracked_product_urls = [
"https://www.target.com/p/sony-playstation-4-pro-gaming-console-1tb-spider-man-limited-edition-with-wireless-controller-manufacturer-refurbished/-/A-89167778#lnk=sametab",
"https://www.target.com/p/playstation-5-console-marvel-39-s-spider-man-2-bundle/-/A-89981366"
]
past_results = read_past_data(results_file)
updated_results = add_todays_prices(past_results, tracked_product_urls)
plot_history_chart(updated_results)
check_for_pricedrop(updated_results)
save_results(updated_results, results_file)
if __name__ == "__main__":
main()
Having the code already laid out, we can see that while the core application is quite simple, it can be easily extended to accommodate more scale and complexity as time goes on:
Alerting could have an improved price change tracking algorithm and have the notifications be sent to some external channel like Telegram.
Plotting could be extended to save the resulting diagrams to a file or load them up on some external webpage to be viewed.
Result saving could be remade to use a database instead of saving to a file.
Today, we’ve successfully built a multifunctional Target price crawler. Not only does it send you a price change notification, but it also creates price fluctuation charts and provides historical pricing data.
About the author
Maryia Stsiopkina
Senior Content Manager
Maryia Stsiopkina is a Senior Content Manager at Oxylabs. As her passion for writing was developing, she was writing either creepy detective stories or fairy tales at different points in time. Eventually, she found herself in the tech wonderland with numerous hidden corners to explore. At leisure, she does birdwatching with binoculars (some people mistake it for stalking), makes flower jewelry, and eats pickles.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scale up your business with Oxylabs®
GET IN TOUCH
General:
hello@oxylabs.ioSupport:
support@oxylabs.ioCareer:
career@oxylabs.ioCertified data centers and upstream providers
Connect with us
Advanced proxy solutions
Resources
Innovation hub
oxylabs.io© 2024 All Rights Reserved