Back to blog

How to Scrape Airbnb Listing Data With Python

How to Scrape Airbnb Listing Data With Python

Vytenis Kaubrė

2024-03-067 min read
Share

Manual public data extraction is in the past, yet scraping Airbnb data automatically isn’t always a breeze. With Airbnb being one of the most popular online marketplaces for homestay rentals, it also has robust anti-scraping defenses that pose a challenge. Hence, a basic web scraping tool may not be enough for successful data extraction. 

See this in-depth article to learn how to easily scrape Airbnb listings with Python.

Get free trial

Claim a free 7-day trial to test Web Scraper API.

  • 5K requests
  • No credit card required
  • Why scrape Airbnb data?

    Airbnb is abundant with valuable data; hence, scraping Airbnb data that’s publicly-available offers various opportunities for business and personal ventures. For a person seeking a budget and cozy getaway – web scraping Airbnb data significantly saves time and effort, as hundreds or thousands of Airbnb listings can be scraped with a single click of a button. Now, for a business, Airbnb listing data can provide heaps of valuable insights for growth. Think competitive intelligence, pricing analysis, sentiment analysis, market trends research, and much more.

    How to scrape Airbnb listing data

    Let’s get to the fun part – building a scraper to collect data and parse it for as many Airbnb listings as you provide. You can also download all the files created in this tutorial by cloning our GitHub repository. Open you terminal and run:

    git clone https://github.com/oxylabs/how-to-scrape-airbnb

    1. Install prerequisites

    Start by installing Python from the official website if you don’t have it already. Additionally, we recommend using an Integrated Development Environment (IDE) like PyCharm or VS Code for easier development and debugging processes. Then, install the required Python libraries for this project:

    python -m pip install asyncio aiohttp requests

    The asyncio and aiohttp modules will be used to make asynchronous requests and thus speed up the scraping process. Additionally, the requests library will be used to send a simple test request.

    2. Get a free API trial and test your connection

    Head to the Oxylabs dashboard and claim your 7-day free trial for Web Scraper API which includes Airbnb Scraper API. See the steps here on how to get your free trial when logged in to the dashboard.

    Web Scraper API comes with a worldwide proxy pool, a Headless Browser, a Custom Parser, batch queries, and other smart features for block-free web scraping. As a result, you won’t need any additional Python libraries like Beautiful Soup, Selenium, or Puppeteer since dynamic page rendering, human-like requests, and data parsing will be done via Web Scraper API.

    Send a test request

    Once your API authentication credentials are ready, you can send a basic POST request to test your connection. Run the following code and check the JSON output, which should include rendered HTML data of this Airbnb listing:

    import requests
    from pprint import pprint
    
    payload = {
        "source": "universal",
        "url": "https://www.airbnb.com/rooms/639705836870039306?adults=1&category_tag=Tag%3A8536&children=0&enable_m3_private_room=true&infants=0&pets=0&photo_id=1437197361&search_mode=flex_destinations_search&check_in=2024-06-28&check_out=2024-07-03&source_impression_id=p3_1708944446_F%2FuHvpf5A7Gvt8Pi&previous_page_section_name=1000",
        "geo_location": "United States",
        "render": "html"
    }
    
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=("USERNAME", "PASSWORD"), # Replace with your API credentials
        json=payload
    )
    
    pprint(response.json())

    If you see a status_code of 200 within the response, your scraping job has been executed successfully.

    3. Import the libraries

    The next step to scrape Airbnb data is to use and import the following libraries:

    import json, asyncio, aiohttp
    from aiohttp import BasicAuth

    4. Store API credentials and Airbnb listing URLs

    Next, store your API credentials into USERNAME and PASSWORD variables for easier management and create the urls list to store your Airbnb listing URLs for further processing:

    USERNAME, PASSWORD = "USERNAME", "PASSWORD"
    
    urls = [
       "https://www.airbnb.com/rooms/639705836870039306?adults=1&category_tag=Tag%3A8536&children=0&enable_m3_private_room=true&infants=0&pets=0&photo_id=1437197361&search_mode=flex_destinations_search&check_in=2024-06-28&check_out=2024-07-03&source_impression_id=p3_1708944446_F%2FuHvpf5A7Gvt8Pi&previous_page_section_name=1000",
       "https://www.airbnb.com/rooms/685374557739707093?adults=1&category_tag=Tag%3A8536&children=0&enable_m3_private_room=true&infants=0&pets=0&photo_id=1514770868&search_mode=flex_destinations_search&check_in=2024-03-17&check_out=2024-03-22&source_impression_id=p3_1708944446_iBXKC59AR9NTQc4y&previous_page_section_name=1000",
       "https://www.airbnb.com/rooms/51241506?adults=1&category_tag=Tag%3A8536&children=0&enable_m3_private_room=true&infants=0&pets=0&photo_id=1264515417&search_mode=flex_destinations_search&check_in=2024-04-07&check_out=2024-04-12&source_impression_id=p3_1708944446_zo%2FqBnbRPhn7zqAr&previous_page_section_name=1000",
    ]

    5. Create the payload

    Oxylabs’ APIs take scraping and parsing instructions as a JSON payload, so create the payload variable as shown below:

    payload = {
        "source": "universal",
        "url": None,
        "geo_location": "United States",
        "user_agent_type": "desktop",
        "render": "html",
        "browser_instructons": [],
        "parse": True,
        "parsing_instructions": {}
    }

    Note that the url parameter is set to None since the Python scraper you’ll build will assign the target URLs dynamically. Moreover, if you want to make sure that elements load before the scraper initiates, you can use browser instructions to wait_for_element:

        "browser_instructons": [
            {
                "type": "wait_for_element",
                "selector": {
                    "type": "xpath",
                    "value": "//div[@data-section-id='HOST_PROFILE_DEFAULT']"
                },
                "timeout_s": 30
            }
        ],

    See the documentation to learn more about the available Web Scraper API parameters and their purpose.

    6. Create parsing instructions

    Once you scrape Airbnb data, the API will return raw HTML files of web pages; therefore, parsing takes care of that by extracting data directly from the HTML.

    So, let’s define custom parsing logic for Airbnb listing pages using xPath selectors (you can also use CSS selectors). If you aren’t familiar with how to write and structure parsing instructions, feel free to check our documentation and this GitHub repository for step-by-step guidelines. Alternatively, you can also use the Beautiful Soup library instead of Custom Parser. For the sections below, open up your browser's Developer Tools by pressing Ctrl + Shift + I (Windows) or Option + Command + I (macOS) and follow along.

    Titles

    Inspecting Aribnb titles via Developer Tools

    Airbnb listings have two titles, one with an h1 tag and another with an h2 tag. You can extract them by defining parsing instructions like so:

        "parsing_instructions": {
            "titles": {
                "title_one": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": ["//span/h1/text()"]
                        }
                    ]
                },
                "title_two": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": ["//div[contains(@data-section-id, 'OVERVIEW')]//h2/text()"]
                        }
                    ]
                }
            },

    Pricing

    Inspecting Aribnb pricing via Developer Tools

    For the Airbnb pricing details, you can get the price per night and the total before taxes as shown below:

            "pricing": {
                "price_per_night": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": ["//span[@class='_tyxjp1']/text()", "//span[@class='_1y74zjx']/text()"]
                        }
                    ]
                },
                "price_total": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": ["//span/span[@class='_j1kt73']/text()"]
                        }
                    ]
                }
            },

    Note the double xPath selectors for price per night. Some listings may display the pricing section differently; thus, the second selector is used to fall back to in case the first one fails. Moreover, if your use case requires more data points, like the cleaning fee, you can always add more functions in the same manner with appropriate selectors.

    Host profile

    Inspecting Aribnb host profile URL via Developer Tools

    The host profile URL is relatively easy to extract since the whole host section uses the data-section-id="HOST_PROFILE_DEFAULT" attribute. You can simply retrieve the URL from the href attribute:

            "host_url": {
                "_fns": [
                    {
                        "_fn": "xpath_one",
                        "_args": ["//div[contains(@data-section-id, 'HOST')]//a/@href"]
                    }
                ]
            },

    Overall rating

    The overall rating of an Airbnb listing requires a more complex xPath selector:

            "overall_rating": {
                "_fns": [
                    {
                        "_fn": "xpath_one",
                        "_args": ["//*[contains(text(), 'Rated')]/following::div[1]/text()"]
                    }
                ]
            },

    Guest reviews

    Inspecting Aribnb listing reviews via Developer Tools

    If you want to scrape Airbnb listing reviews, you can do so by first defining the reviews section for processing:

            "reviews": {
                "_fns": [
                    {
                        "_fn": "xpath",
                        "_args": ["//div[contains(@data-section-id, 'REVIEWS')]//div[@role='listitem']"]
                    }
                ],

    Then, use the _items iterator to process each review as an item and extract the review rating, date, and body text:

                "_items": {
                    "rating": {
                        "_fns": [
                            {
                                "_fn": "xpath_one",
                                "_args": [".//div[contains(@class, 'c5dn5hn')]/span/text()"]
                            },
                            {"_fn": "amount_from_string"}
                        ]
                    },
                    "date": {
                        "_fns": [
                            {
                                "_fn": "xpath_one",
                                "_args": [".//div[contains(@class, 's78n3tv')]/text()"]
                            }
                        ]
                    },
                    "review": {
                        "_fns": [
                            {
                                "_fn": "xpath_one",
                                "_args": [".//span[contains(@class, 'lrl13de')]/text()"]
                            }
                        ]
                    }
                }
            },

    If you need to access all of the reviews, you can instruct the headless browser to click the “Show all 165 reviews” button and then process each review with appropriate selectors.

    Images

    Inspecting Aribnb listing images via Developer Tools

    Finally, you can extract the listing images by scraping the data-original-uri attribute value, which holds the image in its original size:

            "images": {
                "_fns": [
                    {
                        "_fn": "xpath",
                        "_args": ["//picture//*[@data-original-uri]/@data-original-uri"]
                    }
                ]
            }
        }
    }

    Complete payload

    Once you have your payload ready, copy it into a new Python file and then save the payload into a separate listing_payload.json file to keep the scraper code shorter:

    import json
    
    payload = {
        "source": "universal",
        "url": None,
        "geo_location": "United States",
        "user_agent_type": "desktop",
        "render": "html",
        "browser_instructons": [
            {
                "type": "wait_for_element",
                "selector": {
                    "type": "xpath",
                    "value": "//div[@data-section-id='HOST_PROFILE_DEFAULT']"
                },
                "timeout_s": 30
            }
        ],
        "parse": True,
        "parsing_instructions": {
            "titles": {
                "title_one": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": ["//span/h1/text()"]
                        }
                    ]
                },
                "title_two": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": ["//div[contains(@data-section-id, 'OVERVIEW')]//h2/text()"]
                        }
                    ]
                }
            },
            "pricing": {
                "price_per_night": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": ["//span[@class='_tyxjp1']/text()", "//span[@class='_1y74zjx']/text()"]
                        }
                    ]
                },
                "price_total": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": ["//span/span[@class='_j1kt73']/text()"]
                        }
                    ]
                }
            },
            "host_url": {
                "_fns": [
                    {
                        "_fn": "xpath_one",
                        "_args": ["//div[contains(@data-section-id, 'HOST')]//a/@href"]
                    }
                ]
            },
            "overall_rating": {
                "_fns": [
                    {
                        "_fn": "xpath_one",
                        "_args": ["//*[contains(text(), 'Rated')]/following::div[1]/text()"]
                    }
                ]
            },
            "reviews": {
                "_fns": [
                    {
                        "_fn": "xpath",
                        "_args": ["//div[contains(@data-section-id, 'REVIEWS')]//div[@role='listitem']"]
                    }
                ],
                "_items": {
                    "rating": {
                        "_fns": [
                            {
                                "_fn": "xpath_one",
                                "_args": [".//div[contains(@class, 'c5dn5hn')]/span/text()"]
                            },
                            {"_fn": "amount_from_string"}
                        ]
                    },
                    "date": {
                        "_fns": [
                            {
                                "_fn": "xpath_one",
                                "_args": [".//div[contains(@class, 's78n3tv')]/text()"]
                            }
                        ]
                    },
                    "review": {
                        "_fns": [
                            {
                                "_fn": "xpath_one",
                                "_args": [".//span[contains(@class, 'lrl13de')]/text()"]
                            }
                        ]
                    }
                }
            },
            "images": {
                "_fns": [
                    {
                        "_fn": "xpath",
                        "_args": ["//picture//*[@data-original-uri]/@data-original-uri"]
                    }
                ]
            }
        }
    }
    
    with open("listing_payload.json", "w") as f:
        json.dump(payload, f, indent=4)

    Then replace the payload in your main scraper file with the following lines:

    payload = {}
    with open("listing_payload.json", "r") as f:
        payload = json.load(f)

    A payload that scrapes more data points from the Airbnb listing page is available in our Airbnb GitHub repository.

    7. Create coroutines to process API jobs asynchronously

    Submit a job

    Oxylabs’ APIs support batch processing of query or url parameter values. Therefore, you can send a single request to process up to 5,000 Airbnb URLs with the same scraping and parsing instructions. So, let’s define an asynchronous coroutine which will return job IDs for each submitted Airbnb URL:

    async def submit_job(payload):
        async with aiohttp.ClientSession(auth=BasicAuth(USERNAME, PASSWORD)) as session:
            async with session.post("https://data.oxylabs.io/v1/queries/batch", json=payload) as response:
                try:
                    r = await response.json()
                    ids = [query['id'] for query in r["queries"]]
                    print(ids)
                    return ids
                except Exception as e:
                    print(f"Error has occurred in {e}: {r}")

    Important: Make sure not to exceed the free trial’s rate limit. Use up to 10 URLs; otherwise, the API will return the “Too many requests” response code. The try-except block will print this error in your terminal. If you want to bypass this rate limit, head to the pricing page and pick a plan that suits you.

    Check job status

    Once you have the job IDs from the previous coroutine, you can use them to send asynchronous requests to return the status (done, pending, or failed) of each job in progress:

    async def check_job_status(job_id):
        async with aiohttp.ClientSession(auth=BasicAuth(USERNAME, PASSWORD)) as session:
            async with session.get(f"https://data.oxylabs.io/v1/queries/{job_id}") as response:
                return (await response.json())["status"]

    Get job results

    When the submitted jobs process successfully, you can again send asynchronous requests to retrieve the scraped and parsed data of each Airbnb listing:

    async def get_job_results(job_id):
        async with aiohttp.ClientSession(auth=BasicAuth(USERNAME, PASSWORD)) as session:
            async with session.get(f"https://data.oxylabs.io/v1/queries/{job_id}/results") as response:
                return (await response.json())["results"][0]["content"]

    Process jobs

    Next, create an asynchronous coroutine with a while loop that’ll keep checking the status of each job, retrieve the results if the status equals “done”, and then append the results to the results_list:

    async def process_job(job_id, url, results_list):
        await asyncio.sleep(5)
    
        while True:
            status = await check_job_status(job_id)
    
            if status == "done":
                print(f"Job {job_id} done.")
                results = await get_job_results(job_id)
                results["listing_url"] = url
                results_list.append(results)
                break
    
            elif status == "failed":
                print(f"Job {job_id} failed.")
                break
    
            await asyncio.sleep(5)

    This coroutine also appends the scraped URL from the urls list so you can keep track with ease.

    8. Save results to JSON

    Create another asynchronous coroutine to save parsed results to a JSON file using the json module's .dump() function:

    async def save_to_json(results_list):
        with open("parsed_listings.json", "a") as f:
            json.dump(results_list, f, indent=4)

    9. Bring everything together

    Define the last asynchronous function by passing the urls list to the payload and running the submit_job() coroutine. Use asyncio.gather() to concurrently execute the process_job() coroutine for each URL and its corresponding job ID. Once that’s done, run the save_to_json() function:

    async def parse(urls):
        payload["url"] = urls
        job_ids = await submit_job(payload)
    
        results_list = []
    
        await asyncio.gather(*(process_job(job_id, url, results_list) for job_id, url in zip(job_ids, urls)))  
        await save_to_json(results_list)
    
        print("Airbnb URLs parsed.")

    Finally, add the main check to run the Python file when it’s called directly. Here’s the full code snippet that scrapes and parses public Airbnb listing data:

    ### parse_urls.py
    import json, asyncio, aiohttp
    from aiohttp import BasicAuth
    
    USERNAME, PASSWORD = "username", "password" # Replace with your API credentials
    
    urls = [
       "https://www.airbnb.com/rooms/639705836870039306?adults=1&category_tag=Tag%3A8536&children=0&enable_m3_private_room=true&infants=0&pets=0&photo_id=1437197361&search_mode=flex_destinations_search&check_in=2024-06-28&check_out=2024-07-03&source_impression_id=p3_1708944446_F%2FuHvpf5A7Gvt8Pi&previous_page_section_name=1000",
       "https://www.airbnb.com/rooms/685374557739707093?adults=1&category_tag=Tag%3A8536&children=0&enable_m3_private_room=true&infants=0&pets=0&photo_id=1514770868&search_mode=flex_destinations_search&check_in=2024-03-17&check_out=2024-03-22&source_impression_id=p3_1708944446_iBXKC59AR9NTQc4y&previous_page_section_name=1000",
       "https://www.airbnb.com/rooms/51241506?adults=1&category_tag=Tag%3A8536&children=0&enable_m3_private_room=true&infants=0&pets=0&photo_id=1264515417&search_mode=flex_destinations_search&check_in=2024-04-07&check_out=2024-04-12&source_impression_id=p3_1708944446_zo%2FqBnbRPhn7zqAr&previous_page_section_name=1000",
    ]
    
    payload = {}
    with open("listing_payload.json", "r") as f:
        payload = json.load(f)
    
    async def submit_job(payload):
        async with aiohttp.ClientSession(auth=BasicAuth(USERNAME, PASSWORD)) as session:
            async with session.post("https://data.oxylabs.io/v1/queries/batch", json=payload) as response:
                try:
                    r = await response.json()
                    ids = [query['id'] for query in r["queries"]]
                    print(ids)
                    return ids
                except Exception as e:
                    print(f"Error has occurred in {e}: {r}")
    
    async def check_job_status(job_id):
        async with aiohttp.ClientSession(auth=BasicAuth(USERNAME, PASSWORD)) as session:
            async with session.get(f"https://data.oxylabs.io/v1/queries/{job_id}") as response:
                return (await response.json())["status"]
    
    async def get_job_results(job_id):
        async with aiohttp.ClientSession(auth=BasicAuth(USERNAME, PASSWORD)) as session:
            async with session.get(f"https://data.oxylabs.io/v1/queries/{job_id}/results") as response:
                return (await response.json())["results"][0]["content"]
    
    async def process_job(job_id, url, results_list):
        await asyncio.sleep(5)
        while True:
            status = await check_job_status(job_id)
            if status == "done":
                print(f"Job {job_id} done.")
                results = await get_job_results(job_id)
                results["listing_url"] = url
                results_list.append(results)
                break
            elif status == "failed":
                print(f"Job {job_id} failed.")
                break
            await asyncio.sleep(5)
    
    async def save_to_json(results_list):
        with open("parsed_listings.json", "a") as f:
            json.dump(results_list, f, indent=4)
    
    async def parse(urls):
        payload["url"] = urls
        job_ids = await submit_job(payload)
        results_list = []
        await asyncio.gather(*(process_job(job_id, url, results_list) for job_id, url in zip(job_ids, urls)))  
        await save_to_json(results_list)
        print("Airbnb URLs parsed.")
    
    if __name__ == "__main__":
        asyncio.run(parse(urls))

    Below you can see the JSON output of one of the scraped and parsed Airbnb listings:

    [
        {
            "images": [
                "https://a0.muscache.com/pictures/miso/Hosting-639705836870039306/original/a6e9e75e-8fe3-44d9-bd1b-9c20e111f09b.jpeg",
                "https://a0.muscache.com/pictures/miso/Hosting-639705836870039306/original/78a13367-2edd-4ad6-83f3-09c468cd2389.jpeg",
                "https://a0.muscache.com/pictures/miso/Hosting-639705836870039306/original/cca081b9-b868-4dbe-bdd2-585b3dd3b96a.jpeg",
                "https://a0.muscache.com/pictures/miso/Hosting-639705836870039306/original/7c553d75-b316-4961-920a-92c6327b1287.jpeg",
                "https://a0.muscache.com/pictures/miso/Hosting-639705836870039306/original/36a1d733-cea9-4802-bd96-f4a81a93c941.jpeg"
            ],
            "titles": {
                "title_one": "Private Rooftop Hidden Gem Studio",
                "title_two": "Entire rental unit in New York, United States"
            },
            "pricing": {
                "price_total": "$2,840",
                "price_per_night": "$478\u00a0"
            },
            "reviews": [
                {
                    "date": "2 weeks ago",
                    "rating": 5,
                    "review": "Located in the heart of New York City, these apartments offer a prime location with easy access to everything you need. The highlight is definitely the amazing terrace, perfect for enjoying the city skyline. Henry, the host, is exceptionally proactive, providing excellent communication and clear instructions throughout the stay. A top choice for anyone looking for a comfortable and convenient stay in NYC."
                },
                {
                    "date": "2 weeks ago",
                    "rating": 5,
                    "review": "Great location, easy to find the apartment"
                },
                {
                    "date": "2 weeks ago",
                    "rating": 1,
                    "review": "Although the rooftop was very special, there was significant clanging noise from the heating pipes that made sleeping impossible.  We left after the first night, as Henry was not able to solve the problem.  Walls were also paper thin and could hear the conversations of neighbors.  Can't recommend."
                },
                {
                    "date": "3 weeks ago",
                    "rating": 5,
                    "review": "Great midtown location near everything!"
                },
                {
                    "date": "3 weeks ago",
                    "rating": 5,
                    "review": "Henry\u2019s home was absolutely gorgeous. Skyview\u2019s of surrounding buildings and the Empire State building right in your backyard was just breathtaking."
                },
                {
                    "date": "3 weeks ago",
                    "rating": 5,
                    "review": "This place was perfect! It was in a great location and exactly as described. Henry was very communicative and quick to respond."
                }
            ],
            "host_url": "/users/show/461252637",
            "overall_rating": "4.84",
            "parse_status_code": 12000,
            "listing_url": "https://www.airbnb.com/rooms/639705836870039306?adults=1&category_tag=Tag%3A8536&children=0&enable_m3_private_room=true&infants=0&pets=0&photo_id=1437197361&search_mode=flex_destinations_search&check_in=2024-06-28&check_out=2024-07-03&source_impression_id=p3_1708944446_F%2FuHvpf5A7Gvt8Pi&previous_page_section_name=1000"
        }
    ]

    Scrape Airbnb search page

    Getting the listing URLs from Airbnb search pages by hand is a daunting task. For this reason, you can build a very simple scraper that gathers listing URLs from any Airbnb search page.

    1. Build the Airbnb search results scraper

    By default, the Airbnb website loads the first 20 listings on the search page. If you want to load more listings, you must scroll the page, which we'll show how to do later. To scrape an Airbnb search page, you need to instruct the Headless Browser to wait until the 20th listing loads, and then collect the URLs of those 20 Airbnb listings:

    ### scrape_20_urls.py
    import asyncio, aiohttp, json
    
    USERNAME, PASSWORD = "username", "password" # Replace with your API credentials
    
    payload = {
        "source": "universal",
        "url": "https://www.airbnb.com/",
        "geo_location": "United States",
        "render": "html",
        "browser_instructons": [
            {
                "type": "wait_for_element",
                "selector": {
                    "type": "xpath",
                    "value": "//div[@data-testid='card-container']/a"
                },
                "timeout_s": 30
            }
        ],
        "parse": True,
        "parsing_instructions": {
            "links": {
                "_fns": [
                    {
                        "_fn": "xpath",
                        "_args": ["//div[@data-testid='card-container']/a/@href"]
                    }
                ]
            }
        }
    }
    
    async def scrape_search():
        async with aiohttp.ClientSession(auth=aiohttp.BasicAuth(USERNAME, PASSWORD)) as session:
            async with session.post("https://realtime.oxylabs.io/v1/queries", json=payload) as response:
                hrefs = (await response.json())["results"][0]["content"]["links"]
                urls = ["https://www.airbnb.com" + url for url in hrefs]
                with open("20_airbnb_urls.json", "w") as f:
                    json.dump(urls, f, indent=4)
                    print("Airbnb URLs saved.")
                return urls
    
    if __name__ == "__main__":
        asyncio.run(scrape_search())

    Use asyncio and aiohttp libraries to send a request to the API and then return the URLs as a list while also appending the https://www.airbnb.com string to create complete URLs. You should see the list saved in your local directory as a JSON file.

    As mentioned earlier, if you’re using the free API trial, make sure to provide up to 10 URLs so you don’t exceed the free trial’s rate limit.

    2. Combine Airbnb listing and URL scrapers

    To simplify the process, you can create another Python file that’ll import the scrape_20_urls.py and parse_urls.py files and run these scrapers together:

    ### airbnb_scraper.py
    import asyncio
    from scrape_20_urls import scrape_search
    from parse_urls import parse
    
    
    async def main():
        urls = await scrape_search()
        await parse(urls)
    
    if __name__ == "__main__":
        asyncio.run(main())

    Remember to empty the urls list in the original parse_urls.py file so that it could be populated with newly scraped URLs after running the scrape_20_urls.py file.

    After it finishes running, you should have two files in your local directory that include scraped Airbnb listing URLs from the search page and parsed listing data.

    3. Scrape more listings by scrolling the page

    If you want to increase the number of Airbnb listings from 20 to around 250, you can instruct the browser to scroll the page by 1000 pixels, wait for 1 second, and then, after doing it 6 times – click the button that says “Show more.” Afterward, instruct the browser to scroll the page 20 more times:

    ### scrape_250_urls.py
    
    payload = {
        "source": "universal",
        "url": "https://www.airbnb.com/?tab_id=home_tab&refinement_paths%5B%5D=/homes&search_mode=flex_destinations_search&flexible_trip_lengths%5B%5D=one_week&location_search=MIN_MAP_BOUNDS&monthly_start_date=2023-07-01&monthly_length=3&price_filter_input_type=0&price_filter_num_nights=5&channel=EXPLORE&search_type=category_change&category_tag=Tag:8522",
        "geo_location": "Canada",
        "render": "html",
        "browser_instructions": [
            {"type": "scroll", "x": 0, "y": 1000},
            {"type": "wait", "wait_time_s": 1}
        ] * 6 + [
            {"type": "click", "selector": {"type": "xpath", "value": "//button[text()='Show more']"}},
            {"type": "wait", "wait_time_s": 5}
        ] + [
            {"type": "scroll", "x": 0, "y": 1000},
            {"type": "wait", "wait_time_s": 1}
        ] * 20,
        "parse": True,
        "parsing_instructions": {
            "links": {
                "_fns": [
                    {
                    "_fn": "xpath",
                    "_args": ["//div[@data-testid='card-container']/a/@href"]
                    }
                ]
            }
        }
    }

    Since Airbnb is strict about how you scroll the page, a simpler scroll_to_bottom function, unfortunately, won’t work in this situation. Airbnb detects automated scrolling, so when the browser scrolls too fast and too much via JavaScript, the listing data isn’t loaded at all. You must tell the browser that it should scroll pixel by pixel. In case you're dealing with pagination, the process is much easier as you only have to instruct the headless browser to click the next page button.

    Conclusion

    Congrats! You’ve built two web scrapers for two different Airbnb scraping processes. With their combined powers, you can easily scrape Airbnb data from more than 250 listings and retrieve structured results in under a few minutes. Feel free to build upon the outlined code samples and adjust them for your use case.

    Evidently, Oxylabs’ solution makes the data scraping process significantly straightforward and scalable. In case you want to build a truly custom Airbnb scraper without Oxylabs’ API, you may want to use a headless browser like Puppeteer to bypass anti-scraping systems and consider implementing proxy servers.

    When building your own tools for web scraping, proxies are an essential anti-blocking measure. To avoid detection by the target website, you can buy proxies of various types to fit any scraping scenario.

    Frequently asked questions

    How do I fetch data from Airbnb?

    While Airbnb offers its own API, it’s only available to hosts that want to manage their listings programmatically. If you’re looking to scrape data from Airbnb for personal reasons or for your data-driven business processes, you can utilize web scraping. It’s a programmatic data-harvesting approach that can extract data from thousands of Airbnb listings simultaneously. However, it does come with its challenges, like IP address blocks and restricted content; hence, you’ll most likely need to use a headless browser and proxies to retrieve desired data successfully.

    Does Airbnb block web scraping?

    Yes, Airbnb blocks automated website access via web scrapers. Airbnb achieves this by constantly monitoring incoming visitors’ IP addresses, HTTP headers, the frequency of requests, and other connection parameters that are used to fingerprint users.

    Additionally, Airbnb employs JavaScript rendering to dynamically display content on their pages. Thus, if you’re scraping without a headless browser, you may observe low success rates.

    In case you’re struggling with web scraping Airbnb data, consider Oxylabs’ proxy servers or Web Scraper API for hassle-free projects. 

    How to analyze data on Airbnb?

    Airbnb data analysis typically involves several steps for successful insights:

    • Define your goals;

    • Gather public Airbnb data with a scraping tool;

    • Clean and format the data;

    • Perform Exploratory Data Analysis (EDA);

    • Engineer features like the nightly price, occupancy rate, revenue per available room, etc.;

    • Carry out descriptive analysis to identify patterns and relationships;

    • Analyze the market and competitors.

    About the author

    Vytenis Kaubrė

    Technical Copywriter

    Vytenis Kaubrė is a Technical Copywriter at Oxylabs. His love for creative writing and a growing interest in technology fuels his daily work, where he crafts technical content and web scrapers with Oxylabs’ solutions. Off duty, you might catch him working on personal projects, coding with Python, or jamming on his electric guitar.

    All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

    Related articles

    Get the latest news from data gathering world

    I’m interested