Back to blog

How to Scrape Amazon Prices With Python

Maryia Stsiopkina

2024-03-154 min read
Share

E-commerce scraping can be overwhelming, especially when faced with countless options on platforms like Amazon. Luckily, Oxylabs' Web Scraper API (previously known as E-Commerce Scraper API) solution combined with Python offers an optimal web scraping solution to retrieve Amazon price data. With E-Commerce Scraper API, you can schedule daily price scrapes to remain always aware of the current pricing models, price changes, and competitor pricing strategies. and By web scraping product prices from multiple Amazon pages, you can simplify your search and find the best deals without the hassle. It's a practical way to streamline your shopping experience and save time.

In this tutorial, we’ll scrape Amazon price data based on:

  • Best-selling items

  • Search results

  • Currently available deals. 

You can find the following code on our GitHub.

1. Prepare the environment

You can download the latest version of Python from the official website.

To store your Python code, run the following command to create a new Python file in your current directory.

touch main.py

2. Install dependencies

Next, run the command below to install the dependencies required for sending HTTP requests and data processing. We will use Requests and Pandas

pip install requests pandas

3. Import libraries

Now, open the previously created Python file and import the installed libraries.

import requests
import pandas as pd

4. Preparing API credentials

First of all, start by declaring your API credentials. Since we’ll be using the E-Commerce API, you’ll need to retrieve the credentials for authenticating with the API from your Oxylabs dashboard. Replace USERNAME and PASSWORD with the credentials you retrieved.

USERNAME = "USERNAME"
PASSWORD = "PASSWORD"

5. Getting best-seller prices by category

Now, let’s start by fetching the Amazon price data for best-selling items in a category on Amazon. First, let’s choose a category and retrieve its ID. For this tutorial, we’ll use the dog food category. 

Go to the category page on your browser and inspect the URL. You should see a query parameter called node.

https://www.amazon.com/gp/browse.html?node=2975359011&ref_=nav_em__sd_df_0_2_19_4

The value of the node parameter is the ID of the dog food category. Save it to a variable called dog_food_category_id; we’ll use it in the payload of our API request. You can adjust the variable name based on your preferred category.

USERNAME = "USERNAME"
PASSWORD = "PASSWORD"

dog_food_category_id="2975359011"

Let’s start by implementing a function called get_best_seller_results. It should accept an argument called category_id.

def get_best_seller_results(category_id):
    ...

Next, we’ll be adding our API request. Declare a payload and send a POST request to Oxylabs E-Commerce API. Don’t forget to include your authentication credentials.

payload = {
    "source": "amazon_bestsellers",
    "domain": "com",
    "query": category_id,
    "start_page": 1,
    "parse": True,
}
response = requests.post(
    "https://realtime.oxylabs.io/v1/queries",
    auth=(USERNAME, PASSWORD),
    json=payload,
)
response.raise_for_status()
results = response.json()["results"][0]["content"]["results"]

Make sure the source parameter is set to amazon_bestsellers and the parse parameter is set to True. Feel free to adjust other parameters to your preference.

Next, let’s extract the data we need from the retrieved results.

return [
     {
         "price": result["price"],
         "title": result["title"],
         "currency": result["currency"],
     }
     for result in results
]

Here’s the full code of the get_best_seller_results function:

def get_best_seller_results(category_id):
    payload = {
        "source": "amazon_bestsellers",
        "domain": "com",
        "query": category_id,
        "start_page": 1,
        "parse": True,
    }
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=(USERNAME, PASSWORD),
        json=payload,
    )
    response.raise_for_status()
    results = response.json()["results"][0]["content"]["results"]
    return [
        {
            "price": result["price"],
            "title": result["title"],
            "currency": result["currency"],
        }
        for result in results
    ]

6. Getting prices from search results

Next, let’s scrape prices for Amazon search results. We can reuse most of the code from the get_best_seller_results function, changing only the payload and results variables.

Let’s adjust the payload parameter first.

payload = {
    "source": "amazon_search",
    "domain": "com",
    "query": "couch",
    "start_page": 1,
    "parse": True,
}

The source parameter should be amazon_search. The query is now a simple search query that you would use in the Amazon website. In this example, we’ll be scraping couch prices.

Next, the results variable should be extracted with an additional key called organic. Here’s how it should look.

results = response.json()["results"][0]["content"]["results"]["organic"]

Finally, we can put it all together in a function called get_search_results. The function should accept a query parameter and use it in the payload. Here’s the full code for the function.

def get_search_results(query):
    payload = {
        "source": "amazon_search",
        "domain": "com",
        "query": query,
        "start_page": 1,
        "parse": True,
    }
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=(USERNAME, PASSWORD),
        json=payload,
    )
    response.raise_for_status()
    results = response.json()["results"][0]["content"]["results"]["organic"]
    return [
        {
            "price": result["price"],
            "title": result["title"],
            "currency": result["currency"],
        }
        for result in results
    ]

7. Getting prices for other categories

Next, let’s get prices for deals in a category. Oxylabs E-Commerce API doesn’t have a source setting for getting deals, so we can simply use the amazon source parameter to get the prices from specific product links. 

Any amazon page can be scraped with the same code, when using amazon as the source parameter. These pages include:

  • New Releases

  • Wish Lists and Gift Guides

  • Category-Specific Pages

  • Amazon Outlet

  • Amazon Warehouse Deals

  • Amazon Pantry and Grocery

  • Amazon Brand Stores

  • International Amazon Sites

Here’s an example URL of Amazon deals for camping supplies.

https://www.amazon.com/s?i=sporting&rh=n%3A3400371%2Cp_n_deal_type%3A23566064011&s=exact-aware-popularity-rank&pf_rd_i=10805321&pf_rd_m=ATVPDKIKX0DER&pf_rd_p=bf702ff1-4bf6-4c17-ab26-f4867bf293a9&pf_rd_r=ER3N9MGTCESZPZ0KRV8R&pf_rd_s=merchandised-search-3&pf_rd_t=101&ref=s9_acss_bw_cg_SODeals_3e1_w

The payload should look like this.

payload = {
    "source": "amazon",
    "url": "https://www.amazon.com/s?i=sporting&rh=n%3A3400371%2Cp_n_deal_type%3A23566064011&s=exact-aware-popularity-rank&pf_rd_i=10805321&pf_rd_m=ATVPDKIKX0DER&pf_rd_p=bf702ff1-4bf6-4c17-ab26-f4867bf293a9&pf_rd_r=ER3N9MGTCESZPZ0KRV8R&pf_rd_s=merchandised-search-3&pf_rd_t=101&ref=s9_acss_bw_cg_SODeals_3e1_w",
    "parse": True,
}

Now, we can implement another function called get_deals_results. The function should accept a url parameter that is then used in the payload. The rest of the code can be identical to the get_search_results function.

def get_deals_results(url):
    payload = {
        "source": "amazon",
        "url": url,
        "parse": True,
    }
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=(USERNAME, PASSWORD),
        json=payload,
    )
    response.raise_for_status()
    results = response.json()["results"][0]["content"]["results"]["organic"]
    return [
        {
            "price": result["price"],
            "title": result["title"],
            "currency": result["currency"],
        }
        for result in results
    ]

8. Save to a CSV file

Now that we have our three price scraping functions, we can use them to retrieve our data and dump it into CSV files. 

We’ll utilize the previously installed pandas library for this. Create a pandas data frame for each result dictionary and use the to_csv method to create a CSV file.

Here’s how it could look.

dog_food_category_id = "2975359011"

best_seller_results = get_best_seller_results(dog_food_category_id)
best_seller_df = pd.DataFrame(best_seller_results)
best_seller_df.to_csv("best_seller.csv")

search_results = get_search_results("couch")
search_df = pd.DataFrame(search_results)
search_df.to_csv("search.csv")

camping_deal_url = "https://www.amazon.com/s?i=sporting&rh=n%3A3400371%2Cp_n_deal_type%3A23566064011&s=exact-aware-popularity-rank&pf_rd_i=10805321&pf_rd_m=ATVPDKIKX0DER&pf_rd_p=bf702ff1-4bf6-4c17-ab26-f4867bf293a9&pf_rd_r=ER3N9MGTCESZPZ0KRV8R&pf_rd_s=merchandised-search-3&pf_rd_t=101&ref=s9_acss_bw_cg_SODeals_3e1_w"

deal_results = get_deal_results(camping_deal_url)
deal_df = pd.DataFrame(deal_results)
deal_df.to_csv("deals.csv")

You should have three separate CSV files in your directory after running the code. The data can look something like this.

The complete code

First off, to make our code cleaner, let’s create a parser function called parse_price_results to reuse the result parsing code in each scraping function. The function should accept an argument called results.

def parse_price_results(results):
    return [
        {
            "price": result["price"],
            "title": result["title"],
            "currency": result["currency"],
        }
        for result in results
    ]

Here’s the full code utilizing the parse_price_results function.

import requests
import pandas as pd

USERNAME = "USERNAME"
PASSWORD = "PASSWORD"

def parse_price_results(results):
    return [
        {
            "price": result["price"],
            "title": result["title"],
            "currency": result["currency"],
        }
        for result in results
    ]

def get_best_seller_results(category_id):
    payload = {
        "source": "amazon_bestsellers",
        "domain": "com",
        "query": category_id,
        "start_page": 1,
        "parse": True,
    }
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=(USERNAME, PASSWORD),
        json=payload,
    )
    response.raise_for_status()
    results = response.json()["results"][0]["content"]["results"]
    return parse_price_results(results)

def get_search_results(query):
    payload = {
        "source": "amazon_search",
        "domain": "com",
        "query": query,
        "start_page": 1,
        "parse": True,
    }
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=(USERNAME, PASSWORD),
        json=payload,
    )
    response.raise_for_status()
    results = response.json()["results"][0]["content"]["results"]["organic"]
    return parse_price_results(results)

def get_deals_results(url):
    payload = {
        "source": "amazon",
        "url": url,
        "parse": True,
    }
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=(USERNAME, PASSWORD),
        json=payload,
    )
    response.raise_for_status()
    results = response.json()["results"][0]["content"]["results"]["organic"]
    return parse_price_results(results)

dog_food_category_id = "2975359011"

best_seller_results = get_best_seller_results(dog_food_category_id)
best_seller_df = pd.DataFrame(best_seller_results)
best_seller_df.to_csv("best_seller.csv")

search_results = get_search_results("couch")
search_df = pd.DataFrame(search_results)
search_df.to_csv("search.csv")

deal_url = "https://www.amazon.com/s?i=sporting&rh=n%3A3400371%2Cp_n_deal_type%3A23566064011&s=exact-aware-popularity-rank&pf_rd_i=10805321&pf_rd_m=ATVPDKIKX0DER&pf_rd_p=bf702ff1-4bf6-4c17-ab26-f4867bf293a9&pf_rd_r=ER3N9MGTCESZPZ0KRV8R&pf_rd_s=merchandised-search-3&pf_rd_t=101&ref=s9_acss_bw_cg_SODeals_3e1_w"

deal_results = get_deals_results(deal_url)
deal_df = pd.DataFrame(deal_results)
deal_df.to_csv("deals.csv")

Conclusion

In this article, we’ve covered how to scrape Amazon price information with Python and Oxylabs E-Commerce API and learned to use the pandas library to export the price data to CSV files. This method of data retrieval makes it a lot easier to track down the best deals on any Amazon page and scrape Amazon data. This product data can be invaluable if you want to automate price adjustments, identify common price points and future market movements, make informed pricing decisions, and shape a business pricing strategy overall. Additionally, scraped Amazon price data can help identify competitor pricing strategies to gain a better understanding of the price elasticity. 

We also have a tutorial for building a custom Amazon price tracker and some others:

About the author

Maryia Stsiopkina

Senior Content Manager

Maryia Stsiopkina is a Senior Content Manager at Oxylabs. As her passion for writing was developing, she was writing either creepy detective stories or fairy tales at different points in time. Eventually, she found herself in the tech wonderland with numerous hidden corners to explore. At leisure, she does birdwatching with binoculars (some people mistake it for stalking), makes flower jewelry, and eats pickles.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested