How to Scrape Amazon Reviews With Python

Enrika Pavlovskytė

Last updated on

2025-01-31

4 min read

As sellers pack the digital shelves with goods, customers become fickle and quickly change between brands and items in search of something that meets their expectations best. They’re also more vocal about product experiences, often sharing feedback to help other consumers decide on their next purchase. For companies, this uncovers an excellent opportunity to tune into customers’ needs and improve their products accordingly.

In this blog post, we want to shed more light on scraping reviews from one of the biggest e-commerce sites — Amazon. We’ve already explored such topics as Amazon scraping and automated Amazon price tracking. This time, we'll present how to build a custom-built Amazon review scraper.

Let's get to it!

Try for free

Get a free trial to test Web Scraper API for your projects.

Up to 2K results
No credit card required

Setting up

For this tutorial, you'll be using Python, so make sure you have Python 3.8 or above installed and three packages — Requests, Pandas, Beautiful Soup, and lxml. We've detailed the installation process in our previous blog post about Amazon product data scraping.

After that, start by importing all the necessary libraries, specifying the product ASIN, and creating a custom headers dictionary.

Copy

import requests
from bs4 import BeautifulSoup
import pandas as pd


asin = "B098FKXT8L"

custom_headers = {
    "Accept": (
        "text/html,application/xhtml+xml,application/xml;"
        "q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8"
    ),
    "Accept-Language": "en-US,en;q=0.9",
    "Connection": "keep-alive",
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "Gecko/20100101 Firefox/135.0"
    )
}

Implementing custom headers is a crucial step that ensures you don’t get blocked while scraping Amazon reviews — we’ve covered this aspect in detail in our product scraping blog post.

Making a request

Next, define a get_soup() function to send a request to the Amazon product URL and return a BeautifulSoup instance that will make the HTML of the web page ready for parsing.

Copy

def get_soup(url):
    response = requests.get(url, headers=custom_headers)

    if response.status_code != 200:
        print("Error in getting webpage")
        exit(-1)

    return BeautifulSoup(response.text, "lxml")

Getting the review objects

Now that you're ready to start scraping get all the review objects and extract the information you'll need from them. You'll need to find a CSS selector for the product reviews and then use the .select method to extract all of them.

You can use this selector to identify the local Amazon reviews:

Copy

#cm-cr-dp-review-list > li

For global Amazon reviews, use this selector:

Copy

#cm-cr-global-review-list > li

And the following code to collect them:

Copy

local_reviews = soup.select("#cm-cr-dp-review-list > li")
global_reviews = soup.select("#cm-cr-global-review-list > li")

This will leave you with an array of all the reviews over which you'll iterate and gather the required information.

You need an array where you can add the processed reviews and a for loop for each review type to start iterating:

Copy

def get_reviews(soup):
    reviews = []
    
    # Get both local and global reviews using the same function.
    local_reviews = soup.select("#cm-cr-dp-review-list > li")
    global_reviews = soup.select("#cm-cr-global-review-list > li")
    
    for review in local_reviews:
        reviews.append(extract_review(review, is_local=True))
    
    for review in global_reviews:
        reviews.append(extract_review(review, is_local=False))
    
    return reviews

Note: In the next steps, you'll see how to create the extract_review() function which will have to placed above the get_reviews() function.

Author name

The first in our list is the author's name. Use the following CSS selector to select the name:

Copy

.a-profile-name

You can collect the names in plain text with the following snippet:

Copy

author = review.select_one(".a-profile-name").text.strip()

Review rating

The next thing to extract is the review rating. It can be located with the following CSS:

Copy

.review-rating > span

The rating string has some extra text that you won’t need, so let’s remove that:

Copy

rating = (
    review.select_one(".review-rating > span").text
    .replace("out of 5 stars", "")
    .strip()
)

Date

One more thing to fetch from the review is the date. It can be located using the following CSS selector:

Copy

.review-date

Here’s the code that fetches the date value from the object:

Copy

date = review.select_one(".review-date").text.strip()

Title

The process for extracting the title of a local or global review is different. Therefore, you need to code these two distinct methods separately. Here, you can use if-else statements.

To get the title of the local review, use this selector:

Copy

.review-title span:not([class]

For a global review, utilize this selector:

Copy

.review-title .cr-original-review-content

Here's the logic you should have by now to get the text value of a title:

Copy

if is_local:
    title = (
        review.select_one(".review-title")
        .select_one("span:not([class])")
        .text.strip()
    )
else:
    title = (
        review.select_one(".review-title")
        .select_one(".cr-original-review-content")
        .text.strip()
    )

Review text

The review text also requires two different approaches for local and global reviews.

The local review text can be found with the following selector:

Copy

.review-text

On the other hand, the global review text can be found with this CSS selector:

Copy

.review-text .cr-original-review-content

You can then scrape Amazon review text accordingly:

Copy

if is_local:
    content = ' '.join(
        review.select_one(".review-text").stripped_strings
    )
else:
    content = ' '.join(
        review.select_one(".review-text")
        .select_one(".cr-original-review-content")
        .stripped_strings
    )

Images

If any pictures are added to the local review, you first select their elements with this selector:

Copy

.review-image-tile

To select a global review image element, use the following selector:

Copy

.linkless-review-image-tile

Let's add these image selectors to the if-else statements block defined previously:

Copy

if is_local:
    img_selector = ".review-image-tile"
else:
    img_selector = ".linkless-review-image-tile"

After that, you can extract each image URL from the element's src attribute, as shown below:

Copy

image_elements = review.select(img_selector)
images = (
    [img.attrs["data-src"] for img in image_elements] 
    if image_elements else None
)

Verification

Another thing you can do is check if the review is verified or not. The object holding this information can be accessed with this selector:

Copy

span.a-size-mini

And extracted using the following code:

Copy

verified_element = review.select_one("span.a-size-mini")
verified = verified_element.text.strip() if verified_element else None

Putting everything together

Now that you have all this information gathered, assemble it into an extract_review() function and return a dictionary of reviews for this product:

Copy

def extract_review(review, is_local=True):
    author = review.select_one(".a-profile-name").text.strip()
    rating = (
        review.select_one(".review-rating > span").text
        .replace("out of 5 stars", "")
        .strip()
    )
    date = review.select_one(".review-date").text.strip()
    
    if is_local:
        title = (
            review.select_one(".review-title")
            .select_one("span:not([class])")
            .text.strip()
        )
        content = ' '.join(
            review.select_one(".review-text").stripped_strings
        )
        img_selector = ".review-image-tile"
    else:
        title = (
            review.select_one(".review-title")
            .select_one(".cr-original-review-content")
            .text.strip()
        )
        content = ' '.join(
            review.select_one(".review-text")
            .select_one(".cr-original-review-content")
            .stripped_strings
        )
        img_selector = ".linkless-review-image-tile"
    
    verified_element = review.select_one("span.a-size-mini")
    verified = verified_element.text.strip() if verified_element else None

    image_elements = review.select(img_selector)
    images = (
        [img.attrs["data-src"] for img in image_elements] 
        if image_elements else None
    )

    return {
        "type": "local" if is_local else "global",
        "author": author,
        "rating": rating,
        "title": title,
        "content": content.replace("Read more", ""),
        "date": date,
        "verified": verified,
        "images": images
    }

Exporting data

When you already have all the data scraped, the last thing to do is to export it to a file. You can export the data in CSV format using the code below:

Copy

def main():
    search_url = f"https://www.amazon.com/dp/{asin}"
    soup = get_soup(search_url)
    reviews = get_reviews(soup)
    
    df = pd.DataFrame(reviews)
    df.to_csv(f"reviews_{asin}.csv", index=False)


if __name__ == "__main__":
    main()

After running the script, you'll see your data in the file reviews_B098FKXT8L.csv:

Full code for Amazon reviews scraper

Copy

import requests
from bs4 import BeautifulSoup
import pandas as pd


asin = "B098FKXT8L"

custom_headers = {
    "Accept": (
        "text/html,application/xhtml+xml,application/xml;"
        "q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8"
    ),
    "Accept-Language": "en-US,en;q=0.9",
    "Connection": "keep-alive",
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "Gecko/20100101 Firefox/135.0"
    )
}


def get_soup(url):
    response = requests.get(url, headers=custom_headers)

    if response.status_code != 200:
        print("Error in getting webpage")
        exit(-1)

    return BeautifulSoup(response.text, "lxml")


def extract_review(review, is_local=True):
    author = review.select_one(".a-profile-name").text.strip()
    rating = (
        review.select_one(".review-rating > span").text
        .replace("out of 5 stars", "")
        .strip()
    )
    date = review.select_one(".review-date").text.strip()
    
    if is_local:
        title = (
            review.select_one(".review-title")
            .select_one("span:not([class])")
            .text.strip()
        )
        content = ' '.join(
            review.select_one(".review-text").stripped_strings
        )
        img_selector = ".review-image-tile"
    else:
        title = (
            review.select_one(".review-title")
            .select_one(".cr-original-review-content")
            .text.strip()
        )
        content = ' '.join(
            review.select_one(".review-text")
            .select_one(".cr-original-review-content")
            .stripped_strings
        )
        img_selector = ".linkless-review-image-tile"
    
    verified_element = review.select_one("span.a-size-mini")
    verified = verified_element.text.strip() if verified_element else None

    image_elements = review.select(img_selector)
    images = (
        [img.attrs["data-src"] for img in image_elements] 
        if image_elements else None
    )

    return {
        "type": "local" if is_local else "global",
        "author": author,
        "rating": rating,
        "title": title,
        "content": content.replace("Read more", ""),
        "date": date,
        "verified": verified,
        "images": images
    }


def get_reviews(soup):
    reviews = []
    
    # Get both local and global reviews using the same function.
    local_reviews = soup.select("#cm-cr-dp-review-list > li")
    global_reviews = soup.select("#cm-cr-global-review-list > li")
    
    for review in local_reviews:
        reviews.append(extract_review(review, is_local=True))
    
    for review in global_reviews:
        reviews.append(extract_review(review, is_local=False))
    
    return reviews


def main():
    search_url = f"https://www.amazon.com/dp/{asin}"
    soup = get_soup(search_url)
    reviews = get_reviews(soup)
    
    df = pd.DataFrame(reviews)
    df.to_csv(f"reviews_{asin}.csv", index=False)


if __name__ == "__main__":
    main()

Free proxies for scraping

For those looking for a free option for their smaller-scale projects, free proxies can be a viable alternative. They provide a starting point for web scraping, though you’ll need to write your own script and manage aspects like connection stability and bypassing anti-scraping systems.

For larger-scale scraping projects, we also have premium Residential Proxies available. With Residential Proxies, you can mimic organic human behavior, avoid blocks, making them an ideal option for projects of bigger scale.

Conclusion

There are multiple approaches to scrape Amazon product reviews. While a custom scraper will give you more flexibility, a commercial choice like Amazon Scraper API will significantly save time and effort. You can also check out datasets if you decide that getting read-to-use data is enough to satisfy your needs.

If you found this article helpful, be sure to check out our blog for resources on scraping Best Buy, Wayfair, or eBay. We also have tutorials for:

Lastly, for web scraping, proxies are an essential anti-blocking measure. To avoid detection by the target website, you can buy proxies of various types to fit any scraping scenario, such as datacenter or residential proxies.

About the author

Enrika Pavlovskytė

Former Copywriter

Enrika Pavlovskytė was a Copywriter at Oxylabs. With a background in digital heritage research, she became increasingly fascinated with innovative technologies and started transitioning into the tech world. On her days off, you might find her camping in the wilderness and, perhaps, trying to befriend a fox! Even so, she would never pass up a chance to binge-watch old horror movies on the couch.

Learn more about Enrika Pavlovskytė Learn more about Enrika Pavlovskytė

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Data acquisition

15 Tips on How to Crawl a Website Without Getting Blocked

Adelina Kiskyte

2024-03-15

How to Make Amazon Price Tracker With Python

Tutorials Scrapers Python

How to Track Amazon Prices With Python

Yelyzaveta Hayrapetyan

2023-11-22

Tutorials Scrapers Python

How to Build a Price Tracker With Python

Augustas Pelakauskas

2022-03-23

Frequently asked questions

Yes, you can export Amazon reviews, but Amazon does not provide a built-in feature for downloading reviews directly. To achieve this, you'll need to use web scraping tools or APIs, which can automate the process of collecting reviews from product pages. This method allows you to extract review details like ratings, titles, and comments.

To get a CSV file of Amazon reviews, you can scrape the reviews using a web scraping tool or write a custom script. A typical process involves extracting review data (e.g., reviewer name, rating, review text, and date) and saving it in a structured format, like a CSV. If you’re coding your own scraper, you’ll need to ensure the data is properly parsed and written into a CSV file using a library like pandas in Python. Always ensure compliance with Amazon’s terms of service when scraping their platform.

Free proxies

Get 5GB of traffic per month across 5 US IPs and 20 concurrent sessions.

Premium proxies

Choose from several premium proxy categories, including Residential, Datacenter, and more.

Get the latest news from data gathering world

I’m interested

ISO/IEC 27001:2022 certified products:

Proxy Solutions

Scraper APIs

Scale up your business with Oxylabs®

Company

About us Our values Affiliate program Service partners Press area Residential Proxies sourcing Careers OxyCon®Project 4beta Sustainability Community

Proxies

Datacenter Proxies Dedicated Datacenter Proxies Residential Proxies SOCKS5 Proxies Mobile Proxies ISP Proxies Private Proxies Free Proxies

Advanced proxy solutions

Web Unblocker

Data Collection

Web Scraper API Proxy Servers

Datasets

Resources

Documentation Use cases Scraper APIs Playground Release notes Developers Hub Open source tools Blog OxyCopilot Prompts Library FAQ

Innovation hub

Features Intellectual Property OxyCopilot story

Corporate governance

hello@oxylabs.io support@oxylabs.io career@oxylabs.io

Free proxies

Get 5GB of traffic per month across 5 US IPs and 20 concurrent sessions.

Premium proxies

Choose from several premium proxy categories, including Residential, Datacenter, and more.

How to Scrape Amazon Reviews With Python

Try for free

Setting up

Making a request

Getting the review objects

Author name

Review rating

Date

Title

Review text

Images

Verification

Putting everything together

Exporting data

Full code for Amazon reviews scraper

Free proxies for scraping

Conclusion

Related articles

15 Tips on How to Crawl a Website Without Getting Blocked

How to Track Amazon Prices With Python

How to Build a Price Tracker With Python

Frequently asked questions