Back to blog

How to Scrape Amazon Reviews With Python

Enrika avatar

Enrika Pavlovskytė

2025-01-315 min read
Share

As sellers pack the digital shelves with goods, customers become fickle and quickly change between brands and items in search of something that meets their expectations best. They’re also more vocal about product experiences, often sharing feedback to help other consumers decide on their next purchase. For companies, this uncovers an excellent opportunity to tune into customers’ needs and improve their products accordingly.

In this blog post, we want to shed more light on scraping reviews from one of the biggest e-commerce sites — Amazon. We’ve already explored such topics as Amazon scraping and automated Amazon price tracking. This time, we'll present two approaches to capturing customer feedback from reviews: a custom-built Amazon review scraper and an automated solution.

Let's get to it!

Try free for 1 week

Get a free trial to test Web Scraper API for your projects.

  • 5K results
  • No credit card required

Setting up

For this tutorial, you'll be using Python, so make sure you have Python 3.8 or above installed and three packages — Requests, Pandas, Beautiful Soup, and lxml. We've detailed the installation process in our previous blog post about Amazon product data scraping.

After that, start by importing all the necessary libraries, specifying the product ASIN, and creating a custom headers dictionary. 

import requests
from bs4 import BeautifulSoup
import pandas as pd


asin = "B098FKXT8L"

custom_headers = {
    "Accept": (
        "text/html,application/xhtml+xml,application/xml;"
        "q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8"
    ),
    "Accept-Language": "en-US,en;q=0.9",
    "Connection": "keep-alive",
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "Gecko/20100101 Firefox/135.0"
    )
}

Implementing custom headers is a crucial step that ensures you don’t get blocked while scraping Amazon reviews — we’ve covered this aspect in detail in our product scraping blog post.

Making a request

Next, define a get_soup() function to send a request to the Amazon product URL and return a BeautifulSoup instance that will make the HTML of the web page ready for parsing.

def get_soup(url):
    response = requests.get(url, headers=custom_headers)

    if response.status_code != 200:
        print("Error in getting webpage")
        exit(-1)

    return BeautifulSoup(response.text, "lxml")

Getting the review objects

Now that you're ready to start scraping get all the review objects and extract the information you'll need from them. You'll need to find a CSS selector for the product reviews and then use the .select method to extract all of them.

You can use this selector to identify the local Amazon reviews:

#cm-cr-dp-review-list > li

For global Amazon reviews, use this selector:

#cm-cr-global-review-list > li

And the following code to collect them:

local_reviews = soup.select("#cm-cr-dp-review-list > li")
global_reviews = soup.select("#cm-cr-global-review-list > li")

This will leave you with an array of all the reviews over which you'll iterate and gather the required information.

You need an array where you can add the processed reviews and a for loop for each review type to start iterating:

def get_reviews(soup):
    reviews = []
    
    # Get both local and global reviews using the same function.
    local_reviews = soup.select("#cm-cr-dp-review-list > li")
    global_reviews = soup.select("#cm-cr-global-review-list > li")
    
    for review in local_reviews:
        reviews.append(extract_review(review, is_local=True))
    
    for review in global_reviews:
        reviews.append(extract_review(review, is_local=False))
    
    return reviews

Note: In the next steps, you'll see how to create the extract_review() function which will have to placed above the get_reviews() function.

Author name

The first in our list is the author's name. Use the following CSS selector to select the name:

.a-profile-name

You can collect the names in plain text with the following snippet:

author = review.select_one(".a-profile-name").text.strip()

Review rating

The next thing to extract is the review rating. It can be located with the following CSS:

.review-rating > span

The rating string has some extra text that you won’t need, so let’s remove that:

rating = (
    review.select_one(".review-rating > span").text
    .replace("out of 5 stars", "")
    .strip()
)

Date

One more thing to fetch from the review is the date. It can be located using the following CSS selector:

.review-date

Here’s the code that fetches the date value from the object:

date = review.select_one(".review-date").text.strip()

Title

The process for extracting the title of a local or global review is different. Therefore, you need to code these two distinct methods separately. Here, you can use if-else statements.

To get the title of the local review, use this selector:

.review-title span:not([class]

For a global review, utilize this selector:

.review-title .cr-original-review-content

Here's the logic you should have by now to get the text value of a title:

if is_local:
    title = (
        review.select_one(".review-title")
        .select_one("span:not([class])")
        .text.strip()
    )
else:
    title = (
        review.select_one(".review-title")
        .select_one(".cr-original-review-content")
        .text.strip()
    )

Review text

The review text also requires two different approaches for local and global reviews.

The local review text can be found with the following selector:

.review-text

On the other hand, the global review text can be found with this CSS selector:

.review-text .cr-original-review-content

You can then scrape Amazon review text accordingly:

if is_local:
    content = ' '.join(
        review.select_one(".review-text").stripped_strings
    )
else:
    content = ' '.join(
        review.select_one(".review-text")
        .select_one(".cr-original-review-content")
        .stripped_strings
    )

Images

If any pictures are added to the local review, you first select their elements with this selector:

.review-image-tile

To select a global review image element, use the following selector:

.linkless-review-image-tile

Let's add these image selectors to the if-else statements block defined previously:

if is_local:
    img_selector = ".review-image-tile"
else:
    img_selector = ".linkless-review-image-tile"

After that, you can extract each image URL from the element's src attribute, as shown below:

image_elements = review.select(img_selector)
images = (
    [img.attrs["data-src"] for img in image_elements] 
    if image_elements else None
)

Verification

Another thing you can do is check if the review is verified or not. The object holding this information can be accessed with this selector:

span.a-size-mini

And extracted using the following code:

verified_element = review.select_one("span.a-size-mini")
verified = verified_element.text.strip() if verified_element else None

Putting everything together

Now that you have all this information gathered, assemble it into an extract_review() function and return a dictionary of reviews for this product:

def extract_review(review, is_local=True):
    author = review.select_one(".a-profile-name").text.strip()
    rating = (
        review.select_one(".review-rating > span").text
        .replace("out of 5 stars", "")
        .strip()
    )
    date = review.select_one(".review-date").text.strip()
    
    if is_local:
        title = (
            review.select_one(".review-title")
            .select_one("span:not([class])")
            .text.strip()
        )
        content = ' '.join(
            review.select_one(".review-text").stripped_strings
        )
        img_selector = ".review-image-tile"
    else:
        title = (
            review.select_one(".review-title")
            .select_one(".cr-original-review-content")
            .text.strip()
        )
        content = ' '.join(
            review.select_one(".review-text")
            .select_one(".cr-original-review-content")
            .stripped_strings
        )
        img_selector = ".linkless-review-image-tile"
    
    verified_element = review.select_one("span.a-size-mini")
    verified = verified_element.text.strip() if verified_element else None

    image_elements = review.select(img_selector)
    images = (
        [img.attrs["data-src"] for img in image_elements] 
        if image_elements else None
    )

    return {
        "type": "local" if is_local else "global",
        "author": author,
        "rating": rating,
        "title": title,
        "content": content.replace("Read more", ""),
        "date": date,
        "verified": verified,
        "images": images
    }

Exporting data 

When you already have all the data scraped, the last thing to do is to export it to a file. You can export the data in CSV format using the code below:

def main():
    search_url = f"https://www.amazon.com/dp/{asin}"
    soup = get_soup(search_url)
    reviews = get_reviews(soup)
    
    df = pd.DataFrame(reviews)
    df.to_csv(f"reviews_{asin}.csv", index=False)


if __name__ == "__main__":
    main()

After running the script, you'll see your data in the file reviews_B098FKXT8L.csv:

Scraped Amazon reviews

Full code for Amazon reviews scraper

import requests
from bs4 import BeautifulSoup
import pandas as pd


asin = "B098FKXT8L"

custom_headers = {
    "Accept": (
        "text/html,application/xhtml+xml,application/xml;"
        "q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8"
    ),
    "Accept-Language": "en-US,en;q=0.9",
    "Connection": "keep-alive",
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "Gecko/20100101 Firefox/135.0"
    )
}


def get_soup(url):
    response = requests.get(url, headers=custom_headers)

    if response.status_code != 200:
        print("Error in getting webpage")
        exit(-1)

    return BeautifulSoup(response.text, "lxml")


def extract_review(review, is_local=True):
    author = review.select_one(".a-profile-name").text.strip()
    rating = (
        review.select_one(".review-rating > span").text
        .replace("out of 5 stars", "")
        .strip()
    )
    date = review.select_one(".review-date").text.strip()
    
    if is_local:
        title = (
            review.select_one(".review-title")
            .select_one("span:not([class])")
            .text.strip()
        )
        content = ' '.join(
            review.select_one(".review-text").stripped_strings
        )
        img_selector = ".review-image-tile"
    else:
        title = (
            review.select_one(".review-title")
            .select_one(".cr-original-review-content")
            .text.strip()
        )
        content = ' '.join(
            review.select_one(".review-text")
            .select_one(".cr-original-review-content")
            .stripped_strings
        )
        img_selector = ".linkless-review-image-tile"
    
    verified_element = review.select_one("span.a-size-mini")
    verified = verified_element.text.strip() if verified_element else None

    image_elements = review.select(img_selector)
    images = (
        [img.attrs["data-src"] for img in image_elements] 
        if image_elements else None
    )

    return {
        "type": "local" if is_local else "global",
        "author": author,
        "rating": rating,
        "title": title,
        "content": content.replace("Read more", ""),
        "date": date,
        "verified": verified,
        "images": images
    }


def get_reviews(soup):
    reviews = []
    
    # Get both local and global reviews using the same function.
    local_reviews = soup.select("#cm-cr-dp-review-list > li")
    global_reviews = soup.select("#cm-cr-global-review-list > li")
    
    for review in local_reviews:
        reviews.append(extract_review(review, is_local=True))
    
    for review in global_reviews:
        reviews.append(extract_review(review, is_local=False))
    
    return reviews


def main():
    search_url = f"https://www.amazon.com/dp/{asin}"
    soup = get_soup(search_url)
    reviews = get_reviews(soup)
    
    df = pd.DataFrame(reviews)
    df.to_csv(f"reviews_{asin}.csv", index=False)


if __name__ == "__main__":
    main()

Scrape Amazon product reviews with an API

As an alternative to building your own scraper, you can also look into some ready-made solutions like Amazon Scraper API. For instance, our Scraper API is specifically designed to deal with various Amazon data sources, including Amazon review data. It also boasts additional features like:

  • Product data localization in 195 locations worldwide;

  • Results delivered in raw HTML or structured JSON formats;

  • Convenient automation features like bulk scraping and automated jobs;

  • Maintenance-free web scraping infrastructure.

Let's check out the documentation or, for more information, our ready-to-use Amazon Reviews Scraper page.

Setting up payload 

Start by creating a new file and setting up a payload. You can use our amazon_reviews data source and provide the product ASIN in the payload, for example:

import requests
from pprint import pprint

payload = {
    'source': 'amazon_reviews',
    'domain': 'com',
    'query': 'B098FKXT8L',
    'start_page': 1,
    'pages': 3,
    'parse': True
}

Also, the above payload instructs Amazon Scraper API to start from the first page and scrape three pages in total. If you set parse to True, you’ll get structured data.

Send a POST request

Once the payload is done, create the request by passing your authentication key.

# Get response
response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('USERNAME', 'PASSWORD'),
    json=payload,
)

Then, simply print the response:

# Print prettified response to stdout.
pprint(response.json())

This is how the full code should look like:

import requests
from pprint import pprint

# Structure payload.
payload = {
    'source': 'amazon_reviews',
    'domain': 'com',
    'query': 'B098FKXT8L',
    'start_page': 1,
    'pages': 3,
    'parse': True
}

# Get response
response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('USERNAME', 'PASSWORD'),
    json=payload,
)

# Print prettified response to stdout.
pprint(response.json())

Here, you can see a snapshot of one of the reviews in the output:

Amazon product reviews output sample

Free proxies for scraping

For those looking for a free option for their smaller-scale projects, free proxies can be a viable alternative. They provide a starting point for web scraping, though you’ll need to write your own script and manage aspects like connection stability and bypassing anti-scraping systems.

For larger-scale scraping projects, we also have premium Residential Proxies available. With Residential Proxies, you can mimic organic human behavior, avoid blocks, making them an ideal option for projects of bigger scale.

Conclusion 

There are multiple approaches to scrape Amazon product reviews. While a custom scraper will give you more flexibility, a commercial choice like Amazon Scraper API will significantly save time and effort. You can also check out datasets if you decide that getting read-to-use data is enough to satisfy your needs.

If you found this article helpful, be sure to check out our blog for resources on scraping Best Buy, Wayfair, or eBay. We also have tutorials for:

Lastly, for web scraping, proxies are an essential anti-blocking measure. To avoid detection by the target website, you can buy proxies of various types to fit any scraping scenario, such as datacenter or residential proxies.

About the author

Enrika avatar

Enrika Pavlovskytė

Former Copywriter

Enrika Pavlovskytė was a Copywriter at Oxylabs. With a background in digital heritage research, she became increasingly fascinated with innovative technologies and started transitioning into the tech world. On her days off, you might find her camping in the wilderness and, perhaps, trying to befriend a fox! Even so, she would never pass up a chance to binge-watch old horror movies on the couch.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Frequently asked questions

Is there a way to export Amazon reviews?

Yes, you can export Amazon reviews, but Amazon does not provide a built-in feature for downloading reviews directly. To achieve this, you'll need to use web scraping tools or APIs, such as Oxylabs' Web Scraper API, which can automate the process of collecting reviews from product pages. This method allows you to extract review details like ratings, titles, and comments.

How do I get a CSV file from Amazon reviews?

To get a CSV file of Amazon reviews, you can scrape the reviews using a web scraping tool or write a custom script. A typical process involves extracting review data (e.g., reviewer name, rating, review text, and date) and saving it in a structured format, like a CSV. Tools like Oxylabs' Web Scraper API can simplify this by automatically collecting and organizing the data for you.

If you’re coding your own scraper, you’ll need to ensure the data is properly parsed and written into a CSV file using a library like pandas in Python. Always ensure compliance with Amazon’s terms of service when scraping their platform.

Get the latest news from data gathering world

I’m interested