Back to blog

How to Bypass Amazon Captcha When Scraping

Danielius Radavicius

2024-01-162 min read
Share

You likely ran into blocks if you’ve done any scraping tasks on Amazon. Naturally, you may ask as to why this is the case. Well, it’s because Amazon, like many other e-commerce websites, uses CAPTCHA to prevent bots or automated scripts from accessing its content. This means that without a specialized scraping tool to bypass CAPTCHAs, extracting data from Amazon is a nigh impossible task. 

Thankfully, such tools are easily accessible, and below, we show a step-by-step guide on how to extract data with our Amazon Product Data API solution.

You can find the following codes on our GitHub.

Setting up a simple scraper

Let’s begin by setting up a simple Amazon scraper and see if it runs into any CAPTCHAs. For the purpose of this tutorial, we’ll be using Python, but this could be done in almost any other language, too.

import requests

custom_headers = {
    "Accept-language": "en-GB,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Cache-Control": "max-age=0",
    "Connection": "keep-alive",
    "User-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Safari/605.1.15",
}

url = "https://www.amazon.com/SAMSUNG-Border-Less-Compatible-Adjustable-LS24AG302NNXZA/dp/B096N2MV3H?ref_=Oct_DLandingS_D_fe3953dd_2"

response = requests.get(url, headers=custom_headers)

with open('with_captcha.html', 'w') as file:
    file.write(response.text)

Here, we have a very simple script that sends a request to Amazon and fetches the HTML of the page, then saves it as a file for inspection. We also create a custom header for our request. Otherwise, without one, the custom header would get rejected right away.
If we open up the resulting HTML file, we can see that we ran into the issue that we were expecting:

Using Amazon Product Data API

While there are many different ways to approach this issue, let’s use the Oxylabs Amazon Product Data API. This tool is specifically built to avoid Amazon CAPTCHA while scraping. Here’s a short script that’ll help us to utilize the API:

import requests
from pprint import pprint


payload = {
    'source': 'amazon',
    'url': 'https://www.amazon.com/dp/B096N2MV3H',
    'parse': True
}

response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('username', 'password'),
    json=payload,
)

with open('without_captcha.json', 'w') as file:
    file.write(response.text)

If we look at our results file, we can see that the page was scraped successfully without any CAPTCHA solving. We even managed to retrieve the information in a structured format:

There you have it: using this simple script combined with our Amazon Product Data API will allow you to successfully scrape Amazon without running into CAPTCHA.

Conclusion

As you can see, scraping Amazon data is a relatively straightforward and quick process with a dedicated scraping tool. Other bypassing options you may want to consider include CAPTCHA proxies, using Selenium to handle CAPTCHAs, Playwright to bypass CAPTCHAs, and Puppeteer to overcome CAPTCHA tests. If any questions arise throughout this tutorial, or you’re curious to learn more about our solutions/scraping in general, don’t hesitate to contact us at hello@oxylabs.io.

We have several tutorials available for gathering different types of Amazon data:

Frequently asked questions

Why does Amazon require CAPTCHA?

Without CAPTCHA, even the most basic automated scripts would get through to Amazon, significantly affecting the website's stability and worsening the user experience.

What kind of CAPTCHA does Amazon use?

Amazon uses a variety of CAPTCHA types. However, common ones include text-based CAPTCHA, image-based CAPTCHA, interactive CAPTCHA, and checkbox CAPTCHA. Note that the specific types of CAPTCHA employed by Amazon will likely change and be updated as time goes on to increase anti-scraping measures, so using dedicated CAPTCHA-solving tools such as CapSolver is recommended.

About the author

Danielius Radavicius

Former Copywriter

Danielius Radavičius was a Copywriter at Oxylabs. Having grown up in films, music, and books and having a keen interest in the defense industry, he decided to move his career toward tech-related subjects and quickly became interested in all things technology. In his free time, you'll probably find Danielius watching films, listening to music, and planning world domination.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested