Back to blog
How to Scrape Bing Search Results using Python
Danielius Radavicius
Back to blog
Danielius Radavicius
Like many other search engines, Bing is filled with valuable data, including numerous product listings, images, articles, frequent searches, and more. If we were to list some use cases where scraping Bing could be highly beneficial, then one example stands out the most, which is SERP. Analyzing what makes top-ranking pages perform, whether that's looking at the keywords they choose or how they create their successful titles and descriptions, is invaluable. More generally, introducing Bing scraping lets you receive SEO insights that can form detailed, data-based decisions.
Overall, scraping is not illegal as long as it is done without breaking any applicable rules or laws surrounding the targeted websites or gathered data. As such, we recommend that before engaging in any scraping activities, you may want to conduct appropriate legal consultation. If the necessary legal analysis has been done, then Bing search results/terms, which are publicly available, could be scraped.
If you’ve ever scraped Bing, you’re likely aware of how difficult it is to achieve consistent and successful scraping jobs. Bing is especially good at detecting automated requests, eventually leading to a ban. Therefore, a constant change of setup is necessary if you want to retain your data without CAPTCHA.
We will be using Python. If you haven’t installed python, please download it from the official website. Once you have downloaded and installed python, install the following dependencies by executing the below command in the terminal or command prompt:
python -m pip install requests pandas
The above command will install requests and pandas libraries. We will use these modules to interact with the Web API and store data.
Before we begin, let’s discuss some of the most useful query parameters of Oxylabs Web Scraper API. The API operates in two modes:
This method allows you to search using any Bing url, meaning you will have to pass two required parameters: url& `source`. It also takes optional parameters such as user_agent_type, geo_location & callback_url.
To scrape a Bing SERP with Web Scraper API, the source parameter should be set to bing, and the url should be a valid Bing website URL. The user_agent_type tells the API which device the user agent will use, i.e., desktop. The geo_location parameter is used to determine the user's geographical location making the request.
Finally, the callback_url parameter is used to specify a URL to which the server should send a response after processing the request.
payload = {
"url": "https://www.bing.com/search?q=tomato",
"source": "bing",
"geo_location": "New York,New York,United States",
"user_agent_type": "desktop",
# "callback_url": "https://your.callback.url",
}
This method allows you to scrape Bing search pages using queries. It also requires two parameters: source & query, respectively. The source must be set to bing_search since we will use a query this time. In the query parameter, we need to specify the terms we want to search.
In addition to the extra parameters mentioned above, this endpoint also supports some additional parameters such as: domain, start_page, pages, limit, and locale.
The domain parameters allow users to choose specific TLD to narrow down search results. The start_page parameter tells which result page, to begin with. pages parameter is used to retrieve several pages from the search result, starting from the start _ page. pages is used to specify the number of results per page. The parse parameter automatically extracts the data from the raw HTML document. Lastly, the locale parameter is used to localize/display results in a specific language.
payload = {
"query": "tomato",
"source": "bing_search",
"geo_location": "New York,New York,United States",
"user_agent_type": "mobile",
"locale": "de",
"start_page": 2,
"pages": 2,
"parse": True,
# "callback_url": "https://your.callback.url",
}
Now, let’s write a python script to interact with the Web Scraper API. Let’s say we will be searching for the keyword oxylabs proxy. First, we will have to import the necessary library:
import requests
import pandas as pd
Notice we only imported a single library named requests. This library will allow us to create network requests. Next, we need to prepare the payload. We will use the search using the query method described above, so we need to specify the necessary job parameters like below:
payload = {
"source": "bing_search",
"domain": "io",
"query": "oxylabs proxy",
"start_page": 1,
"pages": 10,
"parse": True
}
Next, we will send this payload to the Web Scraper API using the requests module. We will use the post method:
USERNAME = "API_username"
PASSWORD = "API_password"
response = requests.post(
"https://realtime.oxylabs.io/v1/queries",
auth=(USERNAME, PASSWORD),
json=payload,
)
The post method takes an additional parameter, auth, which accepts a tuple object. In this tuple object, you need to put your API account credentials. The payload is sent as JSON using the json parameter. We also store the result in the response object. Now, let’s print the status code to see whether it is working correctly or not.
print(response.status_code)
If we run this code, it will print 200 if everything works. If you see a different value, make sure you have followed the steps correctly. Also, validate and ensure the credentials you have used are correct. In the unlikely case that you receive an HTTP 500 error message, our API failed to process the request. In case of any errors or questions, you can always contact our 24/7 available support team.
In the next section, we will explore how to extract the data into JSON.
First, we will export the data into a JSON object. This can be done using a single line of code shown below:
data = response.json()["results"]
content_list = []
for result in data:
content_list.append(result["content"])
The data object will contain the search results in a python dictionary, which can be used for further processing. Additionally, you need to loop over the data object to retrieve scraped data for each page. Let's now export the data into JSON format.
For this purpose, we will use the pandas library as shown below:
df = pd.DataFrame(content_list)
df.to_json("search_results.json", indent=4, orient="records")
This will create a new file named search_results.json in the current directory, which will contain the search results.
In the tutorial we’ve shown reasoning as to why someone may want to scrape Bing, what issues you could encounter through your scraping projects and how to set up the crucial, problem-solving solution Bing Scraper API. In case you run into any issues or simply have questions, make sure to contact us at our 24/7 available support team.
About the author
Danielius Radavicius
Former Copywriter
Danielius Radavičius was a Copywriter at Oxylabs. Having grown up in films, music, and books and having a keen interest in the defense industry, he decided to move his career toward tech-related subjects and quickly became interested in all things technology. In his free time, you'll probably find Danielius watching films, listening to music, and planning world domination.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scale up your business with Oxylabs®