Back to blog
How to Scrape Google News: Step-by-Step Guide
Danielius Radavicius
Back to blog
Danielius Radavicius
Google News is a personalized news aggregation platform that curates and highlights relevant stories worldwide based on user interests. It compiles news and headlines from various sources, ensuring easy access from any device. An essential feature is "Full Coverage," which delves deeper into stories by presenting diverse perspectives from different outlets and mediums.
In this tutorial, you'll learn how to scrape Google News data by utilizing a Google News Scraper. By following the steps outlined, you'll also learn how to mitigate the anti-bot scraping challenges of Google News. Although, before continuing, check out this article to learn more about news scraping.
Our scraper aims to ensure that your current and future scraping projects will be significantly streamlined while all the possible hassles are dealt with efficiently. The Oxylabs Web API will allow you to access real-time data and scrape Google News results localized for almost any location. So you don't have to worry about any anti-bot solution issues. Last but not least, Oxylabs also provides a 1-week free trial to thoroughly test and develop your scraper and explore all the functionalities of the Google News API.
Alternatively, you may want to use the RSS feed URL for easily scraping Google News results in a structured data format, for example: https://news.google.com/rss/headlines/section/topic/WORLD. Another option is to scrape data from Google Search pages to gather Google News article results. Visit our documentation to learn more.
Start a free trial to test our Web Scraper API.
Signup and login to the dashboard. From there, you can create and grab your user credentials for the Web API. They will be needed in later steps.
Install the requests, bs4, and pandas modules:
pip install requests bs4 pandas
Using the pandas Python library, you'll create a CSV file to store the headlines of Google News results. Once the installation is complete, create a new Python file and import the libraries:
import requests
from bs4 import BeautifulSoup as soup
import pandas as pd
Now, let's prepare the payload and credentials for sending the API requests and start scraping data. Since you need to render JavaScript, you'll have to set render to html. This'll tell the Web API to render JavaScript. Apart from that, you'll also have to set the source to google and pass the target URL as url. Also, don't forget to replace the USERNAME and PASSWORD with your sub-account credentials.
payload = {
'source': 'google',
'render': 'html',,
'url': 'https://news.google.com/home',
}
credentials = ('USERNAME', 'PASSWORD')
Next, using the post() method of the requests module, you’ll POST the payload and credentials to the API.
response = requests.post(
'https://realtime.oxylabs.io/v1/queries',
auth=credentials,
json=payload,
)
print(response.status_code)
If everything works, you should see the status code 200. If you get any other response codes please, refer to the documentation.
Before you begin parsing the news headlines, you'll have to locate the target HTML elements using a web browser. Open the Google News Home web page on a Web Browser and right-click. Now, select inspect. Alternatively, you can also press CTRL + SHIFT + I on Windows or COMMAND + OPTION + I on macOS to open developer tools. It'll look similar to what's shown below:
Thoroughly check out the content of the source HTML file. You should be able to see the tags and properties of the elements on the elements tab. In the above screenshot, you can see that the Top Stories headlines are wrapped in an <h4> tag.
As you’ve already seen, all the News headlines are wrapped in <h4> tags. You can use your browser’s developers tool to check the source HTML and plan the parser accordingly.
To parse these headlines, you can use the Beautiful Soup module that you’ve imported in the previous steps. Alternatively, you can utilize the API's Custom Parser feature that's straightforward to use. Let’s create a data list to store all the headlines.
data = []
soup = BeautifulSoup(response.json()["results"][0]["content"], "html.parser")
for headline in soup.find_all("h4"):
data.append(headline.text)
By using the find_all() method, you can grab all the headlines in one go. You can then add them to the data for exporting them in CSV.
Now, let’s store the data into a data frame object first. Then, you can export it to a CSV file using the to_csv() method. You can also set the index to False so that the CSV file won’t include an extra index column.
df = pd.DataFrame(data)
df.to_csv("google_news_data", index=False)
Using Oxylabs web scraping solutions, you can keep up to date with the latest News from Google News. Take advantage of Oxylabs' powerful Scraper API to enhance your overall scraping experiences. Also, by using the techniques described in the article, you can harness the power of Google News data without worrying about proxy rotation or anti-bot challenges.
Proxies are essential for block-free web scraping. To resemble organic traffic, you can buy proxy solutions, most notably residential and datacenter IPs.
Want to broaden your Google data scraping skills? Take a look at our guides for scraping Jobs, Search, Images, Trends, Scholar, Flights, Shopping, and Maps.
The answer isn’t a simple “yes or no”. Before scraping public data available on Google News, you should consult with legal professionals to make sure your use case and the data you want to scrape doesn’t violate any laws and regulations.
Extracting publicly available Google News data involves building a custom scraper or utilizing a dedicated web scraping API. The latter is the best option if you want to avoid the hassle of dealing with complex coding, proxy management, headless browsers, and other common web scraping difficulties.
It depends on the data you want to scrape and how you use it. Scraping public web data is generally considered legal when it’s performed without violating any local and international laws and regulations. However, you should always seek legal advice and review any terms before engaging in scraping activities. To learn more on this topic, check out this in-depth article about the legality of web scraping.
To scrape Google News articles effectively, you may want to equip yourself with a web scraping tool that handles blocks, CAPTCHAs, and infrastructure management so you can focus on results. Alternatively, you can create your own web scraper using a preferred programming language, a headless browser (Selenium, Playwright, Puppeteer, or similar), a parser, and rotating proxy servers to overcome IP blocks.
About the author
Danielius Radavicius
Former Copywriter
Danielius Radavičius was a Copywriter at Oxylabs. Having grown up in films, music, and books and having a keen interest in the defense industry, he decided to move his career toward tech-related subjects and quickly became interested in all things technology. In his free time, you'll probably find Danielius watching films, listening to music, and planning world domination.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Augustas Pelakauskas
2024-12-09
Roberta Aukstikalnyte
2024-11-19
Vytenis Kaubrė
2024-11-05
Get the latest news from data gathering world
Scale up your business with Oxylabs®