Tripadvisor is a prominent platform in the travel and hospitality industry. It offers a wealth of data on hotels, restaurants, and attractions, along with user reviews, making it a good target for web scraping and a valuable resource for market research, competitor analysis, and, in turn, decision-making. Using proxies can help ensure smooth data extraction and avoid blocking while gathering large volumes of information.
You can scrape data like names, addresses, contact info, ratings, user-generated reviews, images, pricing, and geographic coordinates to enhance your understanding of the industry.
In this tutorial, you’ll learn how to scrape Tripadvisor data with Web Scraper API and Python.
Request a free trial to test our Web Scraper API for your use case.
You can download the latest version of Python from the official website.
Install scraping-related Python libraries. Run the command below.
pip install bs4 requests pandas
It’ll automatically download and install Beautiful Soup, Requests, and Pandas.
Import the libraries for use at a later step.
from bs4 import BeautifulSoup
import requests
import pandas as pd
To use SERP API, you’ll need an Oxylabs account. With a one-week free trial, you’ll have ample time to fine-tune your scraping task. Once signed up, you'll receive your API credentials. Save them in a tuple, as shown below.
credentials = ('USERNAME', 'PASSWORD')
Don’t forget to replace USERNAME and PASSWORD with your credentials.
Prepare a payload to make a POST request to the API. For Tripadvisor, the source must be set to universal. You’ll also have to set render to html.
NOTE: You can always find all of the parameters and examples in our documentation.
url = "https://www.tripadvisor.com/Search?searchSessionId=000a97712c5c1aad.ssid&searchNearby=false&ssrc=e&q=Nearby&sid=6786CB884ED642F4A91E6E9AD932BE131695517577013&blockRedirect=true&geo=1&rf=1"
payload = {
'source': 'universal',
'render': 'html',
'url': url,
}
Just replace the URL above with your own search query.
Use credentials and payload to send a POST request to the API. The Requests module will convert the payload dict to a JSON object and send it to the API.
response = requests.post(
'https://realtime.oxylabs.io/v1/queries',
auth=credentials,
json=payload,
)
print(response.status_code)
You should expect a status_code with a value of 200, indicating success. If you get a different code, check your credentials and payload to make sure they’re correct.
The API sends the response in JSON format. You can extract the HTML content of the page as follows.
content = response.json()["results"][0]["content"]
soup = BeautifulSoup(content, "html.parser")
The soup object will contain parsed HTML content. You can use CSS selectors to grab specific elements.
Let’s collect the following data from the Restaurants category.
To extract a restaurant name, you’ll first need to find the corresponding CSS selector. Use your web browser’s developer tools to inspect and find the necessary CSS selector. Navigate to the web page, right-click, and then select Inspect.
If you inspect a name, you’ll notice it’s wrapped in <span> inside the <div> with the result-title class. Using this information, you can construct the Beautiful Soup selectors.
name = soup.find('div', {"class": "result-title"}).find('span').get_text(strip=True)
Similarly, for rating, inspect the rating bubbles.
As you can see, the <span> element has a class ui_bubble_rating, and the rating is available in the alt attribute. Use the find() method to extract the alt attribute.
rating = soup.find('span', {"class": "ui_bubble_rating"})['alt']
Reviews can be extracted from the <a> tag with the class review_count, as shown below.
The code will look like this.
review = soup.find('a', {"class": "review_count"}).get_text(strip=True)
NOTE: In all three cases, the find() method only grabs elements from the first search result. See the following section for extracting all results.
To extract all the search results, grab each result and then run a loop. First, identify the CSS selector of each result encapsulated in a <div> with the class result.
Now, update the code to grab all the search results.
data = []
for div in soup.find_all("div", {"class": "result"}):
name = div.find('div', {"class": "result-title"}).find('span').get_text(strip=True)
rating = div.find('span', {"class": "ui_bubble_rating"})['alt']
review = div.find('a', {"class": "review_count"}).get_text(strip=True)
data.append({
"name": name,
"rating": rating,
"review": review,
})
The code above extracts all the search results and stores them in the data list.
Lastly, use Pandas to export data to a CSV file using the to_csv() method.
df = pd.DataFrame(data)
df.to_csv("search_results.csv", index=False)
Here’s the full source code.
from bs4 import BeautifulSoup
import requests
import pandas as pd
credentials = ('USERNAME', 'PASSWORD')
url = "https://www.tripadvisor.com/Search?searchSessionId=000a97712c5c1aad.ssid&searchNearby=false&ssrc=e&q=Nearby&sid=6786CB884ED642F4A91E6E9AD932BE131695517577013&blockRedirect=true&geo=1&rf=1"
payload = {
'source': 'universal',
'render': 'html',
'url': url,
}
response = requests.post(
'https://realtime.oxylabs.io/v1/queries',
auth=credentials,
json=payload,
)
print(response.status_code)
content = response.json()["results"][0]["content"]
soup = BeautifulSoup(content, "html.parser")
data = []
for div in soup.find_all("div", {"class": "result"}):
name = div.find('div', {"class": "result-title"}).find('span').get_text(strip=True)
rating = div.find('span', {"class": "ui_bubble_rating"})['alt']
review = div.find('a', {"class": "review_count"}).get_text(strip=True)
data.append({
"name": name,
"rating": rating,
"review": review,
})
df = pd.DataFrame(data)
df.to_csv("search_results.csv", index=False)
Pairing Python and Tripadvisor Scraper API lets you scrape Tripadvisor data, avoiding common web scraping-associated challenges. For even better results and to scale your scraping efforts, you can buy proxies to enhance performance and bypass anti-bot measures. Check our technical documentation for all the API parameters and variables mentioned in this tutorial.
Additionally, explore our blog to learn how to scrape data from popular targets like YouTube, Best Buy, Zillow, eBay, Walmart, and many others.
If you have inquiries about the tutorial or web scraping in general, don't hesitate to reach out either by sending a message to support@oxylabs.io or using the live chat.
Yes, you can freely scrape public data, including Tripadvisor. Make sure to adhere to website regulations and consider legal differences based on geographic location. To learn more about the legalities of web scraping, check here.
To scrape data at scale, you can either build and maintain your own web scraping infrastructure using a preferred programming language or outsource an all-in-one solution, such as a scraper API.
Yes, when using Python’s Beautiful Soup, you need to inspect and locate corresponding HTML elements and use CSS selectors to extract them.
About the author
Augustas Pelakauskas
Senior Copywriter
Augustas Pelakauskas is a Senior Copywriter at Oxylabs. Coming from an artistic background, he is deeply invested in various creative ventures - the most recent one being writing. After testing his abilities in the field of freelance journalism, he transitioned to tech content creation. When at ease, he enjoys sunny outdoors and active recreation. As it turns out, his bicycle is his fourth best friend.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scale up your business with Oxylabs®