Back to blog

How to Scrape Home Depot Data: Step-by-Step Guide

How to Scrape Home Depot Data

Augustas Pelakauskas

2023-09-153 min read
Share

Home Depot is one of the largest home goods retailers in the US. With its high popularity and vast online catalog, this e-commerce website holds valuable data for business insights. Home Depot data can fuel market research, price comparison, or inventory management.

In this guide, you’ll learn how to collect public data from Home Depot at scale with the help of automation, including the following Home Depot product data:

  • Product names

  • Prices

  • Availability

  • Review scores

Get your free trial

Claim a free 7-day trial to test our Web Scraper API.

  • 5K results
  • No credit card required
  • 1. Connecting to the API

    To collect data from Home Depot, pair Python with Home Depot Scraper API

    Let’s start by creating a Python file to hold code. Run this command in the terminal:

    touch main.py

    In the file you just created, define the initial parameters needed for a connection to the API:

    import requests
    
    USERNAME = "yourUsername"
    PASSWORD = "yourPassword"
    
    payload = {
       'source': 'universal',
       'url': 'https://www.homedepot.com/p/Lifestyle-Solutions-Wesley-80-3-in-Round-Arm-Polyester-Rectangle-3-Seater-Sofa-in-Dark-Grey-CCWENKS3M26DGRA/304602855',
       'geo_location': 'United States',
    }
    
    response = requests.request(
       'POST',
       'https://realtime.oxylabs.io/v1/queries',
       auth=(USERNAME, PASSWORD),
       json=payload,
    )

    USERNAME and PASSWORD are needed for authentication – these are your API credentials. Next comes the payload, where you can configure various parameters for the request you’re sending.

    To begin with the request, you’ll need source, URL, and geo_location. Source is the parameter that defines what scraper (depending on a scraping target) should be used for a provided URL. In this case, it should be universal. The URL is the link that you want to scrape.

    Finally, the geo_location parameter determines the country (or city) you’ll be scraping data from. As Home Depot is only available in the US, set it to the United States. You can read more about all the custom parameters for requests in our documentation.

    The request is now set up. Let’s send it and check the response with a simple print:

    print(response.json())

    A successful response should look something like this:

    2. Extracting product name

    Having the connection ready, you can start configuring the scraper to extract specific data from a product page. You can do so with the help of the Custom Parser feature, which is a part of the API.

    First off, let’s get the product name. You’ll need to find the CSS handler by inspecting the HTML of the product page:

    Now, let’s modify the payload with the Custom Parser instructions:

    import requests
    
    USERNAME = "yourUsername"
    PASSWORD = "yourPassword"
    
    payload = {
       'source': 'universal',
       'url': 'https://www.homedepot.com/p/Lifestyle-Solutions-Wesley-80-3-in-Round-Arm-Polyester-Rectangle-3-Seater-Sofa-in-Dark-Grey-CCWENKS3M26DGRA/304602855',
       'geo_location': 'United States',
       'parse': 'true',
       'render': 'html',
       'parsing_instructions': {
           'product_name': {
               "_fns": [
                   {"_fn": "css", "_args": ["div.product-details__badge-title--wrapper"]},
                   {"_fn": "element_text"}
               ]
           },
       }
    }
    
    response = requests.request(
      'POST',
      'https://realtime.oxylabs.io/v1/queries',
      auth=(USERNAME, PASSWORD),
      json=payload,
    )
    
    print(response.json())

    Add some additional parameters to your payload if needed. Render with the value html makes some of the JavaScript render on the site and loads part of the needed data. Parse with the value true means that you’ll use Custom Parser, while parsing_instructions contains what the name suggests – the parsing instructions, where you can put the CSS found earlier.

    If you check the response, you’ll find the product name:

    3. Scraping product price

    Next, let’s gather pricing data. Find the CSS for it from the product page:

    As there are many different products with their prices on the page, take the unique main price wrapper element first and then narrow it down with the specific price div within:

    As for the code, you already have most of the payload set up from the previous request. You only need to make some minor adjustments by adding the parsing instructions:

    import requests
    
    USERNAME = "yourUsername"
    PASSWORD = "yourPassword"
    
    payload = {
       'source': 'universal',
       'url': 'https://www.homedepot.com/p/Lifestyle-Solutions-Wesley-80-3-in-Round-Arm-Polyester-Rectangle-3-Seater-Sofa-in-Dark-Grey-CCWENKS3M26DGRA/304602855',
       'geo_location': 'United States',
       "parse": 'true',
       'render':'html',
       "parsing_instructions": {
           "price": {
               "_fns": [
                   {"_fn": "css", "_args": ["div.price-detailed__qty-limit-wrapper"]},
                   {"_fn": "css", "_args": ["div.price"]},
                   {"_fn": "element_text"}
               ]
           },
       }
    }
    
    response = requests.request(
      'POST',
      'https://realtime.oxylabs.io/v1/queries',
      auth=(USERNAME, PASSWORD),
      json=payload,
    )
    
    print(response.json())

    Let’s run the code and check the response:

    4. Getting availability

    Getting the availability will be a tad bit trickier, as it’s something that you have to derive logically from the information found on the product page. Let’s assume the product is available if you see the button to add it to your cart.

    Let’s find the CSS for this button:

    And now for the code:

    import requests
    
    USERNAME = "yourUsername"
    PASSWORD = "yourPassword"
    
    payload = {
       'source': 'universal',
       'url': 'https://www.homedepot.com/p/Lifestyle-Solutions-Wesley-80-3-in-Round-Arm-Polyester-Rectangle-3-Seater-Sofa-in-Dark-Grey-CCWENKS3M26DGRA/304602855',
       'geo_location': 'United States',
       "parse": 'true',
       'render':'html',
       "parsing_instructions": {
           "availability": {
               "_fns": [
                   {"_fn": "css", "_args": ["div.buybox__atc"]},
                   {"_fn": "element_text"}
               ]
           },
       }
    }
    
    response = requests.request(
      'POST',
      'https://realtime.oxylabs.io/v1/queries',
      auth=(USERNAME, PASSWORD),
      json=payload,
    )
    
    print(response.json())

    Fetch the element for the Add to Cart button and get its text. Now, you need to implement the logic to check if the response has content. If it does, the product is available, as you can add it to the cart. If not, the product isn’t available:

    raw_scraped_data = response.json()
    
    results = []
    
    for result in raw_scraped_data["results"]:
       r = {
           "availability": result["content"]["availability"] is not None,
       }
       results.append(r)
    
    print(results)

    Let’s look at the print:

    5. Collecting review score

    Finally, to get the review score, find its CSS:

    The only thing left is writing the code to fetch the score:

    import requests
    
    USERNAME = "yourUsername"
    PASSWORD = "yourPassword"
    
    payload = {
       'source': 'universal',
       'url': 'https://www.homedepot.com/p/Lifestyle-Solutions-Wesley-80-3-in-Round-Arm-Polyester-Rectangle-3-Seater-Sofa-in-Dark-Grey-CCWENKS3M26DGRA/304602855',
       'geo_location': 'United States',
       "parse": 'true',
       'render':'html',
       "parsing_instructions": {
       "review_score": {
               "_fns": [
                   {"_fn": "css", "_args": ["div.ratings-reviews__accordion-subheader"]},
                   {"_fn": "element_text"}
               ]
           }
       }
    }
    
    response = requests.request(
      'POST',
      'https://realtime.oxylabs.io/v1/queries',
      auth=(USERNAME, PASSWORD),
      json=payload,
    )
    
    print(response.json())

    If you look at the print after running the code, you should see the review:

    6. Saving results

    Now you can merge all the separate data extraction functionalities and save it all to a file for later use:

    import requests
    import json
    
    USERNAME = "yourUsername"
    PASSWORD = "yourPassword"
    
    # Structure payload.
    payload = {
       'source': 'universal',
       'url': 'https://www.homedepot.com/p/Lifestyle-Solutions-Wesley-80-3-in-Round-Arm-Polyester-Rectangle-3-Seater-Sofa-in-Dark-Grey-CCWENKS3M26DGRA/304602855',
       'geo_location': 'United States',
       "parse": 'true',
       'render':'html',
       "parsing_instructions": {
           "product_name": {
               "_fns": [
                   {"_fn": "css", "_args": ["div.product-details__badge-title--wrapper"]},
                   {"_fn": "element_text"}
               ]
           },
           "price": {
               "_fns": [
                   {"_fn": "css", "_args": ["div.price-detailed__qty-limit-wrapper"]},
                   {"_fn": "css", "_args": ["div.price"]},
                   {"_fn": "element_text"}
               ]
           },
           "availability": {
               "_fns": [
                   {"_fn": "css", "_args": ["div.buybox__atc"]},
                   {"_fn": "element_text"}
               ]
           },
           "review_score": {
               "_fns": [
                   {"_fn": "css", "_args": ["div.ratings-reviews__accordion-subheader"]},
                   {"_fn": "element_text"}
               ]
           }
       }
    }
    
    # Get response.
    response = requests.request(
       'POST',
       'https://realtime.oxylabs.io/v1/queries',
       auth=(USERNAME, PASSWORD),
       json=payload,
    )
    
    raw_scraped_data = response.json()
    
    results = []
    
    for result in raw_scraped_data["results"]:
       r = {
           "product_name": result["content"]["product_name"],
           "availability": result["content"]["availability"] is not None,
           "price": result["content"]["price"],
           "review_score": result["content"]["review_score"],
       }
       results.append(r)
    
    json_file_path = 'data.json'
    
    with open(json_file_path, 'w') as json_file:
       json.dump(results, json_file, indent=4)

    Wrapping up

    As results prove, pairing Python and Home Depot Scraper API, along with the use of proxies, is a seamless way to collect Home Depot data and avoid common web scraping-associated challenges. If you need to scale your scraping efforts, you can also buy proxy services to enhance efficiency. Make sure to check our technical documentation for all the API parameters and variables mentioned in this tutorial.

    Also, check how to extract data from other popular targets, such as Best Buy, Zillow, eBay, Walmart, YouTube, Google Trends, and many more on our blog.

    If you have any questions regarding the tutorial, feel free to drop us a message at support@oxylabs.io or via the live chat on our homepage, and our scraping experts will get back to you in a timely manner.

    About the author

    Augustas Pelakauskas

    Senior Copywriter

    Augustas Pelakauskas is a Senior Copywriter at Oxylabs. Coming from an artistic background, he is deeply invested in various creative ventures - the most recent one being writing. After testing his abilities in the field of freelance journalism, he transitioned to tech content creation. When at ease, he enjoys sunny outdoors and active recreation. As it turns out, his bicycle is his fourth best friend.

    All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

    Related articles

    Get the latest news from data gathering world

    I’m interested