Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network statusCareers

Back to blog

How to Scrape Yandex Search Results: A Step-by-Step Guide

Vytenis Kaubrė

2023-03-015 min read
Share

In this tutorial, you’ll learn how to use Yandex Scraper API to scrape Yandex search results. Before we begin, let’s briefly discuss what Yandex Search Engine Results Pages (SERPs) look like and why it's difficult to scrape them.

Yandex SERP overview 

Like Google, Bing, or any other search engine, Yandex provides a way to search the web. Yandex SERP displays search results based on various factors, including the relevance of the content to the search query, the website's quality and authority, the user's language and location, and other personalized factors. Users can refine their search results by using filters and advanced search options. 

Let's say we searched for the term “iPhone.” You should see something similar to the below:

Notice the results page has two different sections: Advertisements on top and organic search results below. The organic search results section includes web pages that are not paid for and are displayed based on their relevance to the search query, as determined by Yandex's search algorithm.

On the other hand, you can identify ads by a label such as "Sponsored" or "Advertisement." They are displayed based on the keywords used in the search query and the advertiser's bid for those keywords. The ads usually include basic details, such as the title, the price, and the link to the product on the Yandex market.

The pain points of scraping Yandex

One of the key challenges of scraping Yandex is its CAPTCHA protection. See the screenshot below:

Yandex has a strict anti-bot system to prevent scrapers from extracting data programmatically from the Yandex search engine. They can block your IP address if the CAPTCHA is triggered frequently. Moreover, they constantly update the anti-bot system, which is tough to keep up with. This makes scraping SERPs at scale complicated, and raw scripts require frequent maintenance to adapt to the changes. 

Fortunately, our Yandex Scraper API is an excellent solution to bypass Yandex’s anti-bot system. The Scraper API can scale on demand by using sophisticated crawling methods and rotating proxies. In the next section, we’ll explore how you can take advantage of it to scrape Yandex using Python. 

Setting up the environment

Begin by downloading and installing Python from the official website. If you already have Python installed, make sure you have the latest version. 

To scrape Yandex, we’ll use two Python libraries: requests and pandas. You can install them using Python’s package manager pip with the following command: 

python -m pip install requests pandas

The requests module will enable you to interact with the API by making network requests, and you’ll be able to store the results using pandas. 

Yandex Scraper API query parameters 

Since the Yandex Scraper API is part of our SERP Scraper API, let’s get to know some query parameters for a smooth start. Essentially, the API supports two different ways of searching Yandex: 

1. Search by URL 

When searching by URL, you must set the source to yandex, and the url should be a valid Yandex URL. You can also tell the API what user agent type to use by adding an extra parameter: user_agent_type. If needed, you can enable Javascript rendering by using the render parameter. Lastly, you can use the callback_url parameter to specify a URL where the server should send a response after processing the request. 

2. Search by query 

In this tutorial, we’ll use this method. When utilizing this technique, you need to set the source to yandex_search since you’ll be looking for a term on Yandex search results. You need to specify the term that you want to search in the query parameter.

The yandex_search source also supports additional parameters such as domain, pages, start_page, limit, locale, and geo_location. The domain parameter allows users to choose a specific Top-level Domain (TLD). For example, if you set it to com the result will only consist of websites with .com TLD. Available domains include com, ru, ua, by, kz, tr.

The pages parameter sets the number of pages to retrieve from the search result. The start_page parameter tells from which result page to begin. limit retrieves a certain number of results per page. Using the geo_location parameter, you can tell the API to use a specific geographical location. Lastly, the locale parameter customizes the Accept-Language header, which allows the user to gather data in a different language. Currently, it supports the following values: en, ru, by, fr, de, id, kk, tt, tr, uk. Visit our documentation to find out more about parameters and their values.

Scraping Yandex Search Pages for any keyword

Now that everything’s ready, let’s write a Python script to interact with the Yandex SERP and retrieve results for any keyword.

1. Import required libraries

Start by importing the libraries that you’ve installed in the previous step:

import requests
import pandas as pd

2. Prepare a payload

Next, prepare a payload as shown below:

payload = {
    'source': 'yandex_search',
    'domain': 'com',
    'query': 'what is web scraping',
    'start_page': 1,
    'pages': 5
}

Using the above payload, we’re searching Yandex for the term “what is web scraping.” We’re telling the scraper to retrieve search results that only include websites with the domain .com from the first to the fifth page.

3. Send a POST request

Next, we need to make a POST request to the Yandex Scraper API. To do that, use the requests library you’ve imported previously:

credentials = ('USERNAME', 'PASSWORD')
response = requests.post(
    'https://realtime.oxylabs.io/v1/queries',
    auth=credentials,
    json=payload,
)

Note that we have declared a tuple named credentials. For the code to work, you’ll have to replace the USERNAME and PASSWORD with the authentication credentials you’ve received from us. If you don’t have them, you can sign up and get a 1-week free trial.

We use the POST method of the requests library to send the payload to the URL https://realtime.oxylabs.io/v1/queries. We also pass the authentication credentials and the payload as JSON. 

Next, let’s print the result with the following line:

print(response.status_code, response.content) 

It’ll print the HTTP status code and the content of the response. A successful Yandex scraping request will return a 200 status code, but if you encounter a different response, we recommend visiting our documentation, where we’ve detailed common response codes.

4. Export data into a CSV/JSON file

To export the data into a CSV or JSON format, you must first create a data frame:

df = pd.DataFrame(response.json())

With this code, you’re using the pandas library to pass the response that you’ve received by calling the json() function. Now, you can simply export the data frame into JSON as below:

df.to_json("yandex_result.json", orient="records")

Similarly, you can export the results into CSV as well using the following code:

df.to_csv("yandex_result.csv", index=False)

Once you execute the code, the script will create two new files in the current directory with the response results. 

Conclusion 

While scraping Yandex SERPs is extremely challenging, by following the steps outlined in this article and using the provided Python code, you can easily scrape Yandex search results for any chosen keyword and export the data into a CSV or JSON file. With the help of Yandex Scraper API, you can bypass Yandex's anti-bot measures and scrape SERPs at scale.

If you require assistance or want to know more, feel free to contact us via email or live chat.

About the author

Vytenis Kaubrė

Copywriter

Vytenis Kaubrė is a Copywriter at Oxylabs. His love for creative writing and a growing interest in technology fuels his daily work, where he crafts technical content and web scrapers with Oxylabs’ solutions. Off duty, you might catch him working on personal projects, coding with Python, or jamming on his electric guitar.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested

IN THIS ARTICLE:


  • Yandex SERP overview 


  • The pain points of scraping Yandex


  • Setting up the environment


  • Yandex Scraper API query parameters 


  • Scraping Yandex Search Pages for any keyword


  • Conclusion 

Try Yandex Scraper API

Choose Oxylabs' Yandex Scraper API to gather real-time search data hassle-free.

Scale up your business with Oxylabs®