Back to blog

How to Scrape Google News: Step-by-Step Guide

Danielius avatar

Danielius Radavicius

2025-02-286 min read
Share

Google News is a personalized news aggregation platform that curates and highlights relevant stories worldwide based on user interests. It compiles news and headlines from various sources, ensuring easy access from any device. An essential feature is "Full Coverage," which delves deeper into stories by presenting diverse perspectives from different outlets and mediums.

In this tutorial, you'll learn how to scrape Google News data in two ways: by writing a custom scraper and by utilizing a ready-made Google News Scraper. By following the steps outlined, you'll also learn how to mitigate the anti-bot scraping challenges of Google News. Although, before continuing, check out this article to learn more about news scraping.

Project requirements

Make sure you have Python installed from the official website. The code samples shown in this blog post are written using Python 3.12.0. You'll also need to install the following libraries:

beautifulsoup4==4.13.3
pandas==2.2.3
requests==2.32.3

You can use pip to install these modules via your terminal:

pip install requests bs4 pandas

The requests library simplifies making HTTP calls to Google, while beautifulsoup4 parses the raw HTML to extract your needed data, and pandas provides an easy way to save your results to CSV files.

Scrape Google News using requests and Beautiful Soup

There are a few distinct ways Google offers news results:

  • Through the dedicated Google News website:
    https://news.google.com/home

  • By accessing the News tab on Google search results:
    https://www.google.com/search?q=stock+market&tbm=nws

  • And the RSS feed URL:
    https://news.google.com/rss/headlines/section/topic/WORLD

For this tutorial, let’s scrape news articles from the search result pages.

1. Send a request

Create a new Python file and import the libraries:

import requests
from bs4 import BeautifulSoup
import pandas as pd

Next, make a GET request to the Google News URL. Once the request returns a response, use the BeautifulSoup instance to prepare the HTML for parsing. 

This approach doesn't use a headless browser, so any dynamic data loaded via JavaScript won't appear in the HTML file. To work around this limitation, either save the scraped HTML document and open it in your browser or use Dev Tools to disable JavaScript when viewing the Google News page. This step ensures your CSS selectors match the actual HTML you're scraping, not the JavaScript-enhanced version.

response = requests.get(
    url='https://www.google.com/search?q=stock+market&tbm=nws'
)

with open('page.html', 'w') as f:
    f.write(response.text)

soup = BeautifulSoup(response.text, 'html.parser')

You can open the saved HTML file in your browser to view the data you're working with. If you don't see the expected search results and instead encounter a policy page, CAPTCHA, or other unexpected content, refer to the next step below.

2. Bypass content blocks

If the content you want to scrape is blocked, your IP address might be the issue. To solve this problem, you can use proxy servers to replace your actual IP address with a proxy’s IP.

Residential Proxies offer superior performance as they utilize IP addresses provided by established Internet Service Providers (ISPs) with excellent online reputation. For this reason, let’s use this proxy type to make requests by modifying the previous code:

USER = 'proxy_username'
PASS = 'proxy_password'

response = requests.get(
    url='https://www.google.com/search?q=stock+market&tbm=nws',
    proxies={
        'http': f'https://customer-{USER}:{PASS}@us-pr.oxylabs.io:10000',
        'https': f'https://customer-{USER}:{PASS}@us-pr.oxylabs.io:10000'
    }
)

Visit our documentation to learn how to use Residential Proxies and see more code examples. Additionally, you may want to use a headless browser such as Selenium or Playwright to make your requests even more resilient to blocks.

Note: contact our Customer Support Team to enable Google domains for your acquired proxies.

3. Extract news data

Once you’re able to access news headlines, the next step is to create the parsing logic with CSS selectors. The idea is to select all news article cards and then iterate through each card to extract specific data. Let’s start by finding a way to select all news articles on the page.

Select all article cards

Open your browser and either load the saved page.html file or navigate to https://www.google.com/search?q=stock+market&tbm=nws and disable JavaScript rendering (see this tutorial for the Chrome browser). 

Next, open Developer Tools by right-clicking anywhere on the web page and selecting Inspect. Make sure you’re inside the Elements tab (or Inspector) and have enabled element selection by clicking the pointer icon in the top-left corner of the developer tools panel. With this option enabled, you can click on any element on the page, and its corresponding HTML section will be highlighted in the panel. 

Since there are two different types of news cards on the page, the CSS selector we’ll use is div.X7NTVe > a, div.pkphOe > a. Open the search function in the Dev Tools panel by pressing CTRL + F (or Cmd + F on Mac), then paste your CSS selector. This lets you quickly verify the selector works correctly.

Inspecting the different article cards via Dev Tools

In your Python file, add these lines of code:

articles = []
for article in soup.select('div.X7NTVe > a, div.pkphOe > a'):

Extract the title

You should find the news article title inside each card’s <h3> element.

Inspecting the title element

With this in mind, you can update the code like so:

articles = []
for article in soup.select('div.X7NTVe > a, div.pkphOe > a'):
    title = article.select_one('h3').text

Extract the link

Next, the article’s link can be parsed from the href attribute.

Inspecting the <a> element that contains the link

Add this additional line inside the for loop:

   href = article.get('href').replace('/url?q=', '')

Extract the source name

You can find the name of the publisher by selecting two different element classes: .aJyiOc, .lRVwie.

Inspecting the element that contains article source name

Include the following line in your code:

   source = article.select_one('.aJyiOc, .lRVwie').text

Extract the time of publication

For all articles, you can find the time of publication by selecting span.r0bn4c.rQMQod.

Inspecting the element that contains time of publication

Your for loop should look like this now:

articles = []
for article in soup.select('div.X7NTVe > a, div.pkphOe > a'):
    title = article.select_one('h3').text
    href = article.get('href').replace('/url?q=', '')
    source = article.select_one('.aJyiOc, .lRVwie').text
    time = article.select_one('span.r0bn4c.rQMQod').text

Next, inside the loop, append all the parsed data for each article to the articles list:

   articles.append({
        'title': title,
        'link': href,
        'source': source,
        'published': time
    })

4. Save parsed articles to CSV

Finally, you can save all the extracted news articles to a CSV file using pandas:

df = pd.DataFrame(articles)
df.to_csv('news_1.csv', index=False)

Full code sample

The final version of your code should be:

import requests
from bs4 import BeautifulSoup
import pandas as pd


USER = 'proxy_username'
PASS = 'proxy_password'

response = requests.get(
    url='https://www.google.com/search?q=stock+market&tbm=nws',
    proxies={
        'http': f'https://customer-{USER}:{PASS}@us-pr.oxylabs.io:10000',
        'https': f'https://customer-{USER}:{PASS}@us-pr.oxylabs.io:10000'
    }
)

soup = BeautifulSoup(response.text, 'html.parser')
with open('page.html', 'w') as f:
    f.write(response.text)

articles = []
for article in soup.select('div.X7NTVe > a, div.pkphOe > a'):
    title = article.select_one('h3').text
    href = article.get('href').replace('/url?q=', '')
    source = article.select_one('.aJyiOc, .lRVwie').text
    time = article.select_one('span.r0bn4c.rQMQod').text

    articles.append({
        'title': title,
        'link': href,
        'source': source,
        'published': time
    })

df = pd.DataFrame(articles)
df.to_csv('news_1.csv', index=False)

After executing this custom news scraper, you’ll have all the articles neatly scraped into a CSV file, which can be opened in Excel, Google Sheets, or any other program that supports CSV:

Scraped output example

Scrape Google News using Oxylabs’ Web API

Our Google News scraper aims to ensure that your current and future scraping projects will be significantly streamlined while all the possible hassles are dealt with efficiently. The Oxylabs Web API will allow you to access real-time data and scrape Google News results localized for almost any location. On top of that, with a single purchase, you get access to multiple ready-made scrapers, including Google SERP, Amazon, and others. Consequently, you don't have to worry about any anti-scraping solution issues.

Oxylabs also provides a 1-week free trial to thoroughly test and develop your scraper and explore all the functionalities of the Google News API. Visit our documentation to learn more.

Claim your 7-day free trial

Start a free trial to test our Web Scraper API.

  • 5K results
  • No credit card required

1. Claim your API credentials

Sign up and log in to the dashboard. From there, you can create and grab your user credentials for the Web API. They will be needed in later steps.

2. Import the libraries

Create a new Python file and import the modules:

import requests
import pandas as pd

3. Send a request to API

Let's prepare the payload dictionary and credentials to send API requests and start scraping data. First, replace the USERNAME and PASSWORD with your sub-account credentials. 

credentials = ('USERNAME', 'PASSWORD')

What’s neat about Web Scraper API is that it automatically parses the data by setting the parse parameter to True, so you don’t have to inspect the HTML yourself. Additionally, you can easily scrape multiple pages by utilizing the pages parameter. To scrape Google News, set the following parameters:

payload = {
    'source': 'google_search',    # Define Google Search as source.
    'query': 'stock market',      # Your search query.
    'pages': '5',                 # Number of pages to scrape.
    'parse': True,                # Enable automatic data parsing.
    'context': [
        {'key': 'tbm', 'value': 'nws'},  # Enable News results.
    ]
}

You can also enable JavaScript rendering by setting the render parameter to html if needed. Next, using the post() method of the requests module, POST the payload and credentials to the API.

response = requests.post(
    'https://realtime.oxylabs.io/v1/queries',
    auth=credentials,
    json=payload,
)

print(response.json())

If everything works, you should see the status code 200 inside the JSON response. If you get any other response codes please, refer to the documentation.

4. Export data into CSV

Finally, you can save the parsed news results to a file. It’s a good idea to first clean up the JSON response by extracting only the relevant information:

data = [item['content'] for item in response.json()['results']]

all_news = []
for page_data in data:
    page_num = page_data['page']
    for news in page_data['results']['main']:
        news['page'] = page_num
        all_news.append(news)

This will ensure that the results aren’t nested and include only the page number and news data from each page. 

Next, let’s store the all_news list into a data frame object. Then, you can export it to a CSV file using the to_csv() method. You can also set the index to False so that the CSV won’t include an extra index column.

df = pd.DataFrame(all_news)
df.to_csv('news_2.csv', index=False)

Full API code sample

Here’s the complete API code sample:

import requests
import pandas as pd


credentials = ('USERNAME', 'PASSWORD')

payload = {
    'source': 'google_search',
    'query': 'stock market',
    'pages': '5',
    'parse': True,
    'context': [
        {'key': 'tbm', 'value': 'nws'},
    ]
}

response = requests.post(
    'https://realtime.oxylabs.io/v1/queries',
    auth=credentials,
    json=payload,
)

print(response.json())

data = [item['content'] for item in response.json()['results']]

all_news = []
for page_data in data:
    page_num = page_data['page']
    for news in page_data['results']['main']:
        news['page'] = page_num
        all_news.append(news)

df = pd.DataFrame(all_news)
df.to_csv('news_2.csv', index=False)

Running the code will produce a CSV file that will present data like so:

Scraped data using API output sample

Using other tools

When scraping Google News or other challenging targets, selecting the appropriate technique is essential. The following table provides a comparison of different approaches:

Scraping method Success rate Handling blocks Speed Ease of use Maintenance effort
No proxies Low Frequent IP bans Fast Simple High – needs manual fixes due to blocks
With proxies Medium Better, requires IP rotation Moderate Moderate Low – may need proxy management if not provided by the proxy provider
Headless browser Medium Can handle some blocks but may be detected Slow Complex High – requires CAPTCHA handling and anti-scraping evasion
Web Scraper API High Bypasses anti-scraping systems Fast Easy Low – no need for manual adjustments

Conclusion

Using Oxylabs web scraping solutions, you can keep up to date with the latest News from Google News. Take advantage of Oxylabs' powerful Scraper API to enhance your overall scraping experiences. Also, by using the techniques described in the article, you can harness the power of Google News data without worrying about proxy rotation or anti-bot challenges.

Proxies are essential for block-free web scraping. To resemble organic traffic, you can buy proxy solutions, most notably residential proxies and datacenter IPs, or get a reliable free proxy server.

Want to broaden your Google data scraping skills? Take a look at our guides for scraping Jobs, Search, Images, Trends, Scholar, Flights, Shopping, and Maps.

Frequently Asked Questions

Can Google News be scraped?

The answer isn’t a simple “yes or no”. Before scraping public data available on Google News, you should consult with legal professionals to make sure your use case and the data you want to scrape doesn’t violate any laws and regulations.

How to extract data from Google News?

Scraping Google News data involves building a custom scraper or utilizing a dedicated web scraping API. The latter is the best option if you want to avoid the hassle of dealing with complex coding, proxy management, headless browsers, and other common web scraping difficulties.

Is scraping Google news legal?

It depends on the data you want to scrape and how you use it. Scraping public web data is generally considered legal when it’s performed without violating any local and international laws and regulations. However, you should always seek legal advice and review any terms before engaging in scraping activities. To learn more on this topic, check out this in-depth article about is web scraping legal.

What tools or libraries can be used to scrape Google News articles effectively?

To scrape Google News articles effectively, you may want to equip yourself with a web scraping tool that handles blocks, CAPTCHAs, and infrastructure management so you can focus on results. Alternatively, you can create your own web scraper using a preferred programming language, a headless browser (Selenium, Playwright, Puppeteer, or similar), a parser, and rotating HTTP proxies to overcome IP blocks.

About the author

Danielius avatar

Danielius Radavicius

Former Copywriter

Danielius Radavičius was a Copywriter at Oxylabs. Having grown up in films, music, and books and having a keen interest in the defense industry, he decided to move his career toward tech-related subjects and quickly became interested in all things technology. In his free time, you'll probably find Danielius watching films, listening to music, and planning world domination.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested