How to Scrape Yandex Search Results: A Step-by-Step Guide

Vytenis Kaubrė

Last updated on

2025-04-29

5 min read

In this tutorial, you’ll learn how to build a custom Yandex scraper with proxies and use Web Scraper API to scrape Yandex search results. Before we begin, let’s briefly discuss what Yandex Search Engine Results Pages (SERPs) look like and why it's difficult to scrape them, and how proxy servers can help overcome these challenges.

Yandex SERP overview

Like Google, Bing, or any other search engine, Yandex provides a way to search the web. Yandex SERP displays search results based on various factors, including the relevance of the content to the search query, the website's quality and authority, the user's language and location, and other personalized factors. Users can refine their search results by using filters and advanced search options.

Let's say we searched for the term “iPhone.” You should see something similar to the below:

Notice the results page has two different sections: Advertisements on top and organic search results below. The organic search results section includes web pages that are not paid for and are displayed based on their relevance to the search query, as determined by Yandex's search algorithm.

On the other hand, you can identify ads by a label, such as "Sponsored" or "Advertisement." They are displayed based on the keywords used in the search query and the advertiser's bid for those keywords. The ads usually include basic details, such as the title, the price, and the link to the product on the Yandex market.

The pain points of scraping Yandex

One of the key challenges of scraping Yandex is its CAPTCHA protection. See the screenshot below:

Yandex has a strict anti-bot system to prevent scrapers from extracting data programmatically from the Yandex search pages. They can block your IP address if the CAPTCHA is triggered frequently. Moreover, they constantly update the anti-bot system, which is tough to keep up with. This makes scraping SERPs at scale complicated, and raw scripts require frequent maintenance to adapt to the changes.

Fortunately, our Web Scraper API is an excellent solution to bypass Yandex’s anti-bot system. Web Scraper API can scale on demand by using sophisticated crawling methods and rotating proxy solutions. In the next sections, we’ll explore how you can take advantage of it to scrape Yandex search engine results using Python.

Start your free trial

Get a free trial to test our Web Scraper API.

Up to 2K results

No credit card required

Set up the environment

Begin by downloading and installing Python from the official website. If you already have Python installed, make sure you have the latest version.

To scrape Yandex, we’ll use three Python libraries: requests, Beautiful Soup, and pandas. You can install them using Python’s package manager pip with the following command:

pip install requests pandas beautifulsoup4

The requests module will enable you to make network requests, the Beautiful Soup library will help you extract specific data, and pandas will let you store the results in a CSV file.

How to scrape Yandex using proxies

In this section, you’ll learn how to scrape Yandex search data by building a simple scraper that utilizes Residential Proxies to overcome CAPTCHAs and IP blocks.

1. Set up proxies and request headers

In a new Python file, import the necessary libraries:

import requests
from bs4 import BeautifulSoup
import pandas as pd

Next, create a proxies dictionary that we’ll use to route requests through:

USERNAME = 'PROXY_USERNAME'
PASSWORD = 'PROXY_PASSWORD'

proxies = {
    'http': f'https://{USERNAME}:{PASSWORD}@pr.oxylabs.io:7777',
    'https': f'https://{USERNAME}:{PASSWORD}@pr.oxylabs.io:7777'
}

It’s essential to make HTTP requests look like coming from a real web browser. So, let’s create a basic HTTP headers dictionary:

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:137.0) '
                  'Gecko/20100101 Firefox/137.0',
    'Accept': 'text/html,application/xhtml+xml,'
              'application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.9,ru;q=0.8',
    'Connection': 'keep-alive'
}

2. Send a GET request

Send a GET request to your desired Yandex SERP URL and make sure to use the proxies and headers dictionaries:

response = requests.get(
    'https://yandex.com/search/?text=what%20is%20web%20scraping',
    proxies=proxies,
    headers=headers
)
response.raise_for_status()

3. Parse Yandex search results

After getting a response back, use the BeautifulSoup class to read the raw HTML document:

soup = BeautifulSoup(response.text, 'html.parser')

After that, you can start iterating through each search result card and extract the required data for your needs. For instance, a great starting point is to retrieve the titles and links:

data = []
for listing in soup.select('li.serp-item_card'):
    title_el = listing.select_one('h2 > span')
    title = title_el.text if title_el else None
    link_el = listing.select_one('.organic__url')
    link = link_el.get('href') if link_el else None

    data.append({'Title': title, 'Link': link})

After extracting an individual listing, the code stores every result to a data list.

4. Save results to a CSV file

It’s time to use the pandas library to store the scraped data in a file. You may save the data to any format that’s useful to you, but for this tutorial, let’s stick to CSV:

df = pd.DataFrame(data)
df.to_csv('yandex_results.csv')

Full Yandex scraper code with proxies

import requests
from bs4 import BeautifulSoup
import pandas as pd


USERNAME = 'PROXY_USERNAME'
PASSWORD = 'PROXY_PASSWORD'

proxies = {
    'http': f'https://{USERNAME}:{PASSWORD}@pr.oxylabs.io:7777',
    'https': f'https://{USERNAME}:{PASSWORD}@pr.oxylabs.io:7777'
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:137.0) '
                  'Gecko/20100101 Firefox/137.0',
    'Accept': 'text/html,application/xhtml+xml,'
              'application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.9,ru;q=0.8',
    'Connection': 'keep-alive'
}

response = requests.get(
    'https://yandex.com/search/?text=what%20is%20web%20scraping',
    proxies=proxies,
    headers=headers
)
response.raise_for_status()

soup = BeautifulSoup(response.text, 'html.parser')

data = []
for listing in soup.select('li.serp-item_card'):
    title_el = listing.select_one('h2 > span')
    title = title_el.text if title_el else None
    link_el = listing.select_one('.organic__url')
    link = link_el.get('href') if link_el else None

    data.append({'Title': title, 'Link': link})

df = pd.DataFrame(data)
df.to_csv('yandex_results.csv', index=False)

Running the code will produce a CSV file with scraped data that should look similar to this screenshot:

Scraped Yandex search results in a CSV file

How to scrape Yandex using Web Scraper API

Building your own web scraping tool can become burdensome, especially when you want to scale your data scraping processes. That’s where Oxylabs’ robust web scraping infrastructure comes in handy, allowing you to scrape thousands and even millions of Yandex pages without worrying about scaling, IP blocks, CAPTCHAs, and other hurdles.

Web Scraper API boasts plenty of features, including built-in proxy servers as well as dedicated scrapers and parsers for popular targets such as Google, Bing, Amazon, and more. Take a look at our documentation for a smooth start.

1. Prepare a request payload

Begin by importing the requests and pandas libraries:

import requests
import pandas as pd

Next, create a payload dictionary that will provide all the search parameters to the API required to scrape Yandex data:

payload = {
    'source': 'universal',
    'url': 'https://yandex.com/search/?text=what%20is%20web%20scraping',
}

You can also add more parameters to set a specific geo-location, enable JavaScript rendering, and more. Check out the supported API parameters for additional details.

2. Parse the data

Web Scraper API allows you to define your own parsing logic through the Custom Parser feature. So, let’s modify the payload dictionary with parsing_instructions:

payload = {
    'source': 'universal',
    'url': 'https://yandex.com/search/?text=what%20is%20web%20scraping',
    'parse': True,
    'parsing_instructions': {
        'listings': {
            '_fns': [{'_fn': 'css', '_args': ['li.serp-item_card']}],
            '_items': {
                'title': {
                    '_fns': [
                        {'_fn': 'css_one', '_args': ['h2 > span']},
                        {'_fn': 'element_text'}
                    ]
                },
                'link': {
                    '_fns': [
                        {
                            '_fn': 'xpath_one',
                            '_args': [
                                './/a[contains(@class, "organic__url")]/@href'
                            ]
                        }
                    ]
                }
            }
        }
    }
}

Custom Parser supports both CSS and XPath selectors. Hence, you can easily extract the result link from the href attribute using XPath. You can also ease the process of writing your own parsing logic by generating a Yandex parser with our AI-powered OxyCopilot.

3. Send a POST request

Next, make a POST request to Web Scraper API and send the configured payload for processing:

response = requests.post(
    'https://realtime.oxylabs.io/v1/queries',
    auth=('API_USERNAME', 'API_PASSWORD'),
    json=payload
)
response.raise_for_status()

Make sure to replace the API_USERNAME and API_PASSWORD with the API user credentials you’ve created in the Oxylabs dashboard.

4. Export data to a CSV file

To save the data to a CSV format, you must first access the results from the API’s response:

data = response.json()['results'][0]['content']['listings']

Finally, create a data frame and export the search results to CSV:

df = pd.DataFrame(data)
df.to_csv('yandex_results_API.csv', index=False)

Complete Yandex API scraping code

import requests
import pandas as pd


payload = {
    'source': 'universal',
    'url': 'https://yandex.com/search/?text=what%20is%20web%20scraping',
    'parse': True,
    'parsing_instructions': {
        'listings': {
            '_fns': [{'_fn': 'css', '_args': ['li.serp-item_card']}],
            '_items': {
                'title': {
                    '_fns': [
                        {'_fn': 'css_one', '_args': ['h2 > span']},
                        {'_fn': 'element_text'}
                    ]
                },
                'link': {
                    '_fns': [
                        {
                            '_fn': 'xpath_one',
                            '_args': [
                                './/a[contains(@class, "organic__url")]/@href'
                            ]
                        }
                    ]
                }
            }
        }
    }
}

response = requests.post(
    'https://realtime.oxylabs.io/v1/queries',
    auth=('API_USERNAME', 'API_PASSWORD'),
    json=payload
)
response.raise_for_status()

data = response.json()['results'][0]['content']['listings']

df = pd.DataFrame(data)
df.to_csv('yandex_results_API.csv', index=False)

Running the above code will output a CSV file that will look similar to this:

Comparing different scraping methods

Approach	Advantages	Disadvantages
No proxies	Straightforward implementation, zero proxy-related expenses	Frequent IP blocking and CAPTCHA challenges, unable to access geo-restricted content, poor performance at larger scales
With proxies	Significantly reduces blocking risks, enables access to location-specific content, enhanced performance for high-volume scraping	Additional proxy service costs, requires managing proxy infrastructure (unless handled by provider)
Using a scraping API	Automatic IP rotation and CAPTCHA bypass, enterprise-grade scalability, browser emulation capabilities, rapid development and deployment	Recurring subscription fees, vendor lock-in concerns, may have constraints on certain data extraction scenarios
Custom solutions (Selenium, etc.)	Complete customization possibilities, particularly effective for JavaScript-heavy websites, no ongoing costs with self-hosted infrastructure	Requires substantial technical expertise, developer must implement anti-blocking strategies, often slower performance than specialized solutions

Conclusion

While scraping Yandex SERPs is extremely challenging, by following the steps outlined in this article and using the provided Python code, you can easily scrape Yandex organic results for any chosen keyword and export the data into a CSV file. With the help of Web Scraper API, residential proxies, or a reliable free proxy list, you can bypass Yandex's anti-bot measures and scrape real-time search data at scale. If you need even more robust solutions, you can buy proxy services to further enhance your scraping efficiency.

If you require assistance or want to know more, feel free to contact us via email or live chat.

About the author

Vytenis Kaubrė

Technical Content Researcher

Vytenis Kaubrė is a Technical Content Researcher at Oxylabs. Creative writing and a growing interest in technology fuel his daily work, where he researches and crafts technical content, all the while honing his skills in Python. Off duty, you may catch him working on personal projects, learning all things cybersecurity, or relaxing with a book.

Learn more about Vytenis Kaubrė Learn more about Vytenis Kaubrė

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.