Back to blog

How to Scrape Tripadvisor Data

How to Scrape Tripadvisor Data

Augustas Pelakauskas

2023-10-063 min read
Share

Tripadvisor is a prominent platform in the travel and hospitality industry. It offers a wealth of data on hotels, restaurants, and attractions, along with user reviews, making it a good target for web scraping and a valuable resource for market research, competitor analysis, and, in turn, decision-making. Using proxies can help ensure smooth data extraction and avoid blocking while gathering large volumes of information.

You can scrape data like names, addresses, contact info, ratings, user-generated reviews, images, pricing, and geographic coordinates to enhance your understanding of the industry.

In this tutorial, you’ll learn how to scrape Tripadvisor data with Web Scraper API and Python.

Claim your 7-day free trial

Request a free trial to test our Web Scraper API for your use case.

  • 5K results
  • No credit card required
  • 1. Prepare the environment

    You can download the latest version of Python from the official website.

    Install dependencies

    Install scraping-related Python libraries. Run the command below.

    pip install bs4 requests pandas

    It’ll automatically download and install Beautiful Soup, Requests, and Pandas.

    Import libraries

    Import the libraries for use at a later step.

    from bs4 import BeautifulSoup
    import requests
    import pandas as pd

    Get API credentials

    To use SERP API, you’ll need an Oxylabs account. With a one-week free trial, you’ll have ample time to fine-tune your scraping task. Once signed up, you'll receive your API credentials. Save them in a tuple, as shown below.

    credentials = ('USERNAME', 'PASSWORD')

    Don’t forget to replace USERNAME and PASSWORD with your credentials.

    2. Prepare payload

    Prepare a payload to make a POST request to the API. For Tripadvisor, the source must be set to universal. You’ll also have to set render to html.

    NOTE: You can always find all of the parameters and examples in our documentation.

    url  = "https://www.tripadvisor.com/Search?searchSessionId=000a97712c5c1aad.ssid&searchNearby=false&ssrc=e&q=Nearby&sid=6786CB884ED642F4A91E6E9AD932BE131695517577013&blockRedirect=true&geo=1&rf=1"
    payload = {
        'source': 'universal',
        'render': 'html',
        'url': url,
    }

    Just replace the URL above with your own search query.

    3. Send POST request

    Use credentials and payload to send a POST request to the API. The Requests module will convert the payload dict to a JSON object and send it to the API.

    response = requests.post(
        'https://realtime.oxylabs.io/v1/queries',
        auth=credentials,
        json=payload,
    )
    print(response.status_code)

    You should expect a status_code with a value of 200, indicating success. If you get a different code, check your credentials and payload to make sure they’re correct.

    4. Extract data

    The API sends the response in JSON format. You can extract the HTML content of the page as follows.

    content = response.json()["results"][0]["content"]
    soup = BeautifulSoup(content, "html.parser")

    The soup object will contain parsed HTML content. You can use CSS selectors to grab specific elements.

    Let’s collect the following data from the Restaurants category.

    Name

    To extract a restaurant name, you’ll first need to find the corresponding CSS selector. Use your web browser’s developer tools to inspect and find the necessary CSS selector. Navigate to the web page, right-click, and then select Inspect.

    If you inspect a name, you’ll notice it’s wrapped in <span> inside the <div> with the result-title class. Using this information, you can construct the Beautiful Soup selectors.

    name = soup.find('div', {"class": "result-title"}).find('span').get_text(strip=True)

    Rating

    Similarly, for rating, inspect the rating bubbles.

    As you can see, the <span> element has a class ui_bubble_rating, and the rating is available in the alt attribute. Use the find() method to extract the alt attribute.

    rating = soup.find('span', {"class": "ui_bubble_rating"})['alt']

    Reviews

    Reviews can be extracted from the <a> tag with the class review_count, as shown below.

    The code will look like this.

    review = soup.find('a', {"class": "review_count"}).get_text(strip=True)

    NOTE: In all three cases, the find() method only grabs elements from the first search result. See the following section for extracting all results.

    Search results

    To extract all the search results, grab each result and then run a loop. First, identify the CSS selector of each result encapsulated in a <div> with the class result.

    Now, update the code to grab all the search results.

    data = []
    for div in soup.find_all("div", {"class": "result"}):
        name = div.find('div', {"class": "result-title"}).find('span').get_text(strip=True)
        rating = div.find('span', {"class": "ui_bubble_rating"})['alt']
    
        review = div.find('a', {"class": "review_count"}).get_text(strip=True)
        data.append({
            "name": name,
            "rating": rating,
            "review": review,
        })

    The code above extracts all the search results and stores them in the data list.

    Save to CSV

    Lastly, use Pandas to export data to a CSV file using the to_csv() method.

    df = pd.DataFrame(data)
    df.to_csv("search_results.csv", index=False)

    The complete code

    Here’s the full source code.

    from bs4 import BeautifulSoup
    import requests
    import pandas as pd
    
    credentials = ('USERNAME', 'PASSWORD')
    url  = "https://www.tripadvisor.com/Search?searchSessionId=000a97712c5c1aad.ssid&searchNearby=false&ssrc=e&q=Nearby&sid=6786CB884ED642F4A91E6E9AD932BE131695517577013&blockRedirect=true&geo=1&rf=1"
    payload = {
        'source': 'universal',
        'render': 'html',
        'url': url,
    }
    response = requests.post(
        'https://realtime.oxylabs.io/v1/queries',
        auth=credentials,
        json=payload,
    )
    print(response.status_code)
    
    content = response.json()["results"][0]["content"]
    soup = BeautifulSoup(content, "html.parser")
    
    data = []
    for div in soup.find_all("div", {"class": "result"}):
        name = div.find('div', {"class": "result-title"}).find('span').get_text(strip=True)
        rating = div.find('span', {"class": "ui_bubble_rating"})['alt']
    
        review = div.find('a', {"class": "review_count"}).get_text(strip=True)
        data.append({
            "name": name,
            "rating": rating,
            "review": review,
        })
    
    df = pd.DataFrame(data)
    df.to_csv("search_results.csv", index=False)

    Conclusion

    Pairing Python and Tripadvisor Scraper API lets you scrape Tripadvisor data, avoiding common web scraping-associated challenges. For even better results and to scale your scraping efforts, you can buy proxies to enhance performance and bypass anti-bot measures. Check our technical documentation for all the API parameters and variables mentioned in this tutorial.

    Additionally, explore our blog to learn how to scrape data from popular targets like YouTube, Best Buy, Zillow, eBay, Walmart, and many others.
    If you have inquiries about the tutorial or web scraping in general, don't hesitate to reach out either by sending a message to support@oxylabs.io or using the live chat.

    Frequently asked questions

    Is scraping Tripadvisor legal?

    Yes, you can freely scrape public data, including Tripadvisor. Make sure to adhere to website regulations and consider legal differences based on geographic location. To learn more about the legalities of web scraping, check here.

    How do I crawl data from Tripadvisor?

    To scrape data at scale, you can either build and maintain your own web scraping infrastructure using a preferred programming language or outsource an all-in-one solution, such as a scraper API.

    Can you scrape Tripadvisor reviews?

    Yes, when using Python’s Beautiful Soup, you need to inspect and locate corresponding HTML elements and use CSS selectors to extract them.

    About the author

    Augustas Pelakauskas

    Senior Copywriter

    Augustas Pelakauskas is a Senior Copywriter at Oxylabs. Coming from an artistic background, he is deeply invested in various creative ventures - the most recent one being writing. After testing his abilities in the field of freelance journalism, he transitioned to tech content creation. When at ease, he enjoys sunny outdoors and active recreation. As it turns out, his bicycle is his fourth best friend.

    All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

    Related articles

    Get the latest news from data gathering world

    I’m interested