Back to blog
Roberta Aukstikalnyte
eBay is a large US-based e-commerce company that brokers retail and customer-to-customer sales. With its immense popularity, this retail giant holds tons of valuable information for e-commerce businesses.
In today’s article, we’re going to demonstrate how to successfully build a price tracker using Python and eBay Scraper API. By the end of this tutorial, you’ll have a tool that sends price drop alerts, creates price change diagrams, and provides historical information.
Let’s get started.
To build the eBay price tracker, we’ll begin by installing the libraries we will be using throughout the following tutorial.
pip install pandas
pip install matplotlib
We’ll use pandas for easier dict management and saving of results, while matplotlib will be used for plotting price histories.
As we have all the prerequisites installed, we can start working on the code. To start off, we need to connect to Oxylabs E-Commerce Scraper API (part of Web Scraper API), which will help us to fetch the data we need from eBay.
import requests
USERNAME = "username"
PASSWORD = "password"
# Structure payload.
payload = {
'source': 'universal',
'url': url,
'geo_location': 'United States',
"parse": 'true',
'render':'html',
"parsing_instructions": {
"title": {
"_fns": [
{"_fn": "css_one", "_args": ["h1.x-item-title__mainTitle"]},
{"_fn": "element_text"}
]
},
"price": {
"_fns": [
{"_fn": "css_one", "_args": ["div.x-price-primary"]},
{"_fn": "element_text"}
]
}
}
}
# Post the scraping job
response = requests.request(
'POST',
'https://data.oxylabs.io/v1/queries',
auth=(USERNAME, PASSWORD),
json=payload,
)
response_json = response.json()
print(response_json)
Here, we have some code that sets up a request to the API and creates a scraping job.
We’ll use the universal_ecommerce source to extract the information we need. We also define geo_location to the country we want the scraping to be performed from.
To define the information, we want the scraper to collect from the page using the Custom Parser functionality. In our case, that will be the css selectors for price and title in the css_one function, combined with element_text to extract the text from these elements. You can read more about all the available parameters in the documentation for eBay.
If we check the response after running the code, we should see the job information:
The next step would be to create some logic that would wait for our job to finish and then fetch the results.
import requests
import time
# Get response.
response = requests.request(
'POST',
'https://data.oxylabs.io/v1/queries',
auth=(USERNAME, PASSWORD),
json=payload,
)
print(response.json())
response_json = response.json()
job_id = response_json["id"]
status = ""
# Pool until the job is done
while status != "done":
time.sleep(5)
response = requests.request(
'GET',
f"https://data.oxylabs.io/v1/queries/{job_id}",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
status = response_json.get("status")
print(f"Job status is {status}")
# Fetch the job results
response = requests.request(
'GET',
f"https://data.oxylabs.io/v1/queries/{job_id}/results",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
In the code above, we create a while loop that keeps pooling the API for updates on the job status until it’s done and then we fetch the job results.
Having the connection to the API established, we can start building the core logic of our eBay price tracker. The basic requirements for a price tracker could be a script that runs once a day to fetch today's price, then adds it to the historical data we already have, and saves it. So, let’s start with that.
We’ll begin by creating a function that would read the historical data about historical eBay prices we could have already gathered.
def read_past_data(filepath):
results = {}
if not os.path.isfile(filepath):
open(filepath, 'a').close()
if not os.stat(filepath).st_size == 0:
results_df = pd.read_json(filepath, convert_axes=False)
results = results_df.to_dict()
return results
return results
The function takes the file path to our historical data file as an argument and returns the read data as a Python dictionary. It also has a few logical considerations:
If there is no data file, one should be created,
If the data file is empty, we should return an empty dictionary.
Now that we have the historical price data loaded, we can think about a function that would take the historical price tracker data and add today’s listing’s price.
def add_todays_prices(results, tracked_product_urls):
today = date.today()
for url in tracked_product_urls:
product = get_product(url)
if product["title"] not in results:
results[product["title"]] = {}
results[product["title"]][today.strftime("%d %B, %Y")] = {
"price": product["price"],
}
return results
This function takes past eBay price tracking results and a list of product listings’ URLs as arguments, then adds today’s price for the provided products to the already existing eBay prices and returns the results.
Having the prices updated for today, we can move on to saving our results back to the file we started from, thus finishing our process loop.
def save_results(results, filepath):
df = pd.DataFrame.from_dict(results)
df.to_json(filepath)
return
Finally, we can move the connection to the Scraper API to a separate function and combine all we have done so far:
import os
import time
import requests
import os.path
from datetime import date
from datetime import timedelta
import pandas as pd
import matplotlib.pyplot as plt
def get_product(url):
USERNAME = "username"
PASSWORD = "password"
# Structure payload.
payload = {
'source': 'universal',
'url': url,
'geo_location': 'United States',
"parse": 'true',
'render':'html',
"parsing_instructions": {
"title": {
"_fns": [
{"_fn": "css_one", "_args": ["h1.x-item-title__mainTitle"]},
{"_fn": "element_text"}
]
},
"price": {
"_fns": [
{"_fn": "css_one", "_args": ["div.x-price-primary"]},
{"_fn": "element_text"}
]
}
}
}
# Post the scraping job
response = requests.request(
'POST',
'https://data.oxylabs.io/v1/queries',
auth=(USERNAME, PASSWORD),
json=payload,
)
response_json = response.json()
print(response_json)
job_id = response_json["id"]
status = ""
# Wait until the job is done
while status != "done":
time.sleep(5)
response = requests.request(
'GET',
f"https://data.oxylabs.io/v1/queries/{job_id}",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
status = response_json.get("status")
print(f"Job status is {status}")
# Fetch the job results
response = requests.request(
'GET',
f"https://data.oxylabs.io/v1/queries/{job_id}/results",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
print(response_json)
content = response_json["results"][0]["content"]
title = content["title"]
price = content["price"].split("$")[1]
product = {
"title": title,
"price": price,
}
return product
def read_past_data(filepath):
results = {}
if not os.path.isfile(filepath):
open(filepath, 'a').close()
if not os.stat(filepath).st_size == 0:
results_df = pd.read_json(filepath, convert_axes=False)
results = results_df.to_dict()
return results
return results
def save_results(results, filepath):
df = pd.DataFrame.from_dict(results)
df.to_json(filepath)
return
def add_todays_prices(results, tracked_product_urls):
today = date.today()
for url in tracked_product_urls:
product = get_product(url)
if product["title"] not in results:
results[product["title"]] = {}
results[product["title"]][today.strftime("%d %B, %Y")] = {
"price": product["price"],
}
return results
def main():
results_file = "data.json"
tracked_product_urls = [
"https://www.ebay.com/itm/296030363868",
"https://www.ebay.com/itm/293608130360"
]
past_results = read_past_data(results_file)
updated_results = add_todays_prices(past_results, tracked_product_urls)
save_results(updated_results, results_file)
if __name__ == "__main__":
main()
We coordinate all the logic of the application in the main() function. Variable results_file holds value for the path to the historical price tracker information and tracked_product_urls has all the eBay product listings’ URLs we should track. Our script then reads historical data from the file, fetches new prices and saves the results back to the file.
After we run the code, we can inspect our eBay product listings’ prices in the specified results file:
Now that we have the prices scraped and saved to a file, we can start adding a few useful features to our price tracker, like plotting the eBay product price changes over time.
We can do this by utilizing the matplotlib Python library that we installed earlier.
def plot_history_chart(results):
for product in results:
dates = []
prices = []
for entry_date in results[product]:
dates.append(entry_date)
prices.append(results[product][entry_date]["price"])
plt.plot(dates,prices, label=product[:50])
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("Product prices over time")
plt.legend()
plt.show()
The function above will plot multiple product listings’ price changes over time into a single diagram and then show it. When we add a call to plot_history_chart function to our existing main and run our code again, we’ll see the results:
Another useful functionality for our eBay price tracker could be to get alerts for price drops. This would help direct our attention to specific eBay listings, which becomes especially useful when tracking multiple listings at the same time.
def check_for_pricedrop(results):
for product in results:
today = date.today()
yesterday = today - timedelta(days = 1)
if yesterday.strftime("%d %B, %Y") in results[product]:
change = float(results[product][today.strftime("%d %B, %Y")]["price"]) - float(results[product][yesterday.strftime("%d %B, %Y")]["price"])
if change < 0:
print(f'Price for {product} has dropped by {change}!')
Here, we have created a function that checks the price change between yesterday's price entry and today's one and reports if the change was negative. When we add a call to check_for_pricedrop function to our existing main and run our code again, we’ll see the results in the command line:
If we add all that we have done, our code will look like this:
import os
import time
import requests
import os.path
from datetime import date
from datetime import timedelta
import pandas as pd
import matplotlib.pyplot as plt
def get_product(url):
USERNAME = "username"
PASSWORD = "password"
# Structure payload.
payload = {
'source': 'universal',
'url': url,
'geo_location': 'United States',
"parse": 'true',
'render':'html',
"parsing_instructions": {
"title": {
"_fns": [
{"_fn": "css_one", "_args": ["h1.x-item-title__mainTitle"]},
{"_fn": "element_text"}
]
},
"price": {
"_fns": [
{"_fn": "css_one", "_args": ["div.x-price-primary"]},
{"_fn": "element_text"}
]
}
}
}
# Post the scraping job
response = requests.request(
'POST',
'https://data.oxylabs.io/v1/queries',
auth=(USERNAME, PASSWORD),
json=payload,
)
response_json = response.json()
print(response_json)
job_id = response_json["id"]
status = ""
# Wait until the job is done
while status != "done":
time.sleep(5)
response = requests.request(
'GET',
f"https://data.oxylabs.io/v1/queries/{job_id}",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
status = response_json.get("status")
print(f"Job status is {status}")
# Fetch the job results
response = requests.request(
'GET',
f"https://data.oxylabs.io/v1/queries/{job_id}/results",
auth=(USERNAME, PASSWORD),
)
response_json = response.json()
print(response_json)
content = response_json["results"][0]["content"]
title = content["title"]
price = content["price"].split("$")[1]
product = {
"title": title,
"price": price,
}
return product
def read_past_data(filepath):
results = {}
if not os.path.isfile(filepath):
open(filepath, 'a').close()
if not os.stat(filepath).st_size == 0:
results_df = pd.read_json(filepath, convert_axes=False)
results = results_df.to_dict()
return results
return results
def save_results(results, filepath):
df = pd.DataFrame.from_dict(results)
df.to_json(filepath)
return
def add_todays_prices(results, tracked_product_urls):
today = date.today()
for url in tracked_product_urls:
product = get_product(url)
if product["title"] not in results:
results[product["title"]] = {}
results[product["title"]][today.strftime("%d %B, %Y")] = {
"price": product["price"],
}
return results
def plot_history_chart(results):
for product in results:
dates = []
prices = []
for entry_date in results[product]:
dates.append(entry_date)
prices.append(float(results[product][entry_date]["price"]))
plt.plot(dates,prices, label=product[:30])
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("Product prices over time")
plt.legend(loc='lower center', bbox_to_anchor=(0.5, 1.05),
ncol=3, fancybox=True, shadow=True)
plt.show()
def check_for_pricedrop(results):
for product in results:
today = date.today()
yesterday = today - timedelta(days = 1)
if yesterday.strftime("%d %B, %Y") in results[product]:
change = float(results[product][today.strftime("%d %B, %Y")]["price"]) - float(results[product][yesterday.strftime("%d %B, %Y")]["price"])
if change < 0:
print(f'Price for {product} has dropped by {change}!')
def main():
results_file = "data.json"
tracked_product_urls = [
"https://www.ebay.com/itm/296030363868",
"https://www.ebay.com/itm/293608130360"
]
past_results = read_past_data(results_file)
updated_results = add_todays_prices(past_results, tracked_product_urls)
plot_history_chart(updated_results)
check_for_pricedrop(updated_results)
save_results(updated_results, results_file)
if __name__ == "__main__":
main()
Having the code already laid out, we can see that while the core application is quite simple, it can be easily extended to accommodate more scale and complexity as time goes on. For example:
Alerting could have an improved price change tracking algorithm and have the notifications be sent to some external channel like Telegram.
Plotting could be extended to save the resulting diagrams to a file or load them up on an external webpage to be viewed.
Result saving could be remade to use a database instead of saving to a file.
That’s it! By this point, you should have successfully built a multifunctional eBay price tracker.
We hope you found this tutorial easy to follow. However, if you have any questions or feedback, feel free to drop us an email at support@oxylabs.io or leave a message on the live chat.
About the author
Roberta Aukstikalnyte
Senior Content Manager
Roberta Aukstikalnyte is a Senior Content Manager at Oxylabs. Having worked various jobs in the tech industry, she especially enjoys finding ways to express complex ideas in simple ways through content. In her free time, Roberta unwinds by reading Ottessa Moshfegh's novels, going to boxing classes, and playing around with makeup.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scale up your business with Oxylabs®