Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network statusCareers

Back to blog

How to Scrape Google News: Step-by-Step Guide

Danielius Radavicius

Danielius Radavicius

2023-08-073 min read
Share

Google News is a personalized news aggregator that curates and highlights relevant stories worldwide based on user interests. It compiles news articles and headlines from various sources, ensuring easy access from any device. An essential feature is "Full Coverage," which delves deeper into stories by presenting diverse perspectives from different outlets and mediums. In this tutorial, you’ll create a Google News Scraper from scratch using Python. By following the steps outlined, you’ll also learn how to mitigate the anti-bot scraping challenges of Google News. Although, before continuing, check out this article to learn more about news scraping.

Scrape Google News using Oxylabs’ SERP API

Our scraper aims to ensure that your current and future scraping projects will be significantly streamlined while all the possible hassles are dealt with efficiently. The Oxylabs SERP API will manage everything from gathering real-time data to accessing search results from almost any location. So you don’t have to worry about any anti-bot solution issues. Last but not least, Oxylabs also provides a 1-week free trial to thoroughly test and develop your scraper and explore all the functionalities.

Claim your 7-day free trial

Request a free trial to test our SERP Scraper API.

  • 5K results
  • No credit card required
  • Step 1 - signup for Oxylabs’ SERP API credentials

    Signup and login to the dashboard. From there, you can create and grab your user credentials for the SERP API. They Will be needed in later steps.

    Step 2 - install dependencies

    Install the requests, bs4, and `pandas modules. Using pandas, you’ll create a CSV file to store the headlines of Google News results.

    pip install pandas

    Step 3 - send network requests through SERP API

    Now, let’s prepare the payload and credentials for sending the API requests. Since you need to render Javascript, you’ll have to set render to html. This’ll tell the SERP API to render Javascript. Apart from that, you’ll also have to set source to google and pass the target URL as url. Also, don’t forget to replace the USERNAME and PASSWORD with your sub-account credentials.

    payload = {
       'source': 'google',
       'render': 'html',,
       'url': 'https://news.google.com/home',
    }
    credential = ('USERNAME', 'PASSWORD')

    Next, using the post() method of the requests module, you’ll POST the payload and credential to the API.

    response = requests.post(
       'https://realtime.oxylabs.io/v1/queries',
       auth=credential,
       json=payload,
    )
    print(response.status_code)

    If everything works, you should see the status code 200. If you get any other response codes please, refer to the documentation.

    Step 4 - inspect elements

    Before you begin parsing the news headlines, you’ll have to locate the target HTML elements using a web browser. Open the Google News Homepage on a Web Browser and right-click. Now, select inspect. Alternatively, you can also press CTRL + SHIFT + I to open the developer tools. It’ll look similar to what is shown below:

    Thoroughly check out the content of the source HTML. You should be able to see the tags and properties of the elements on the elements tab. In the above screenshot, you can see that the Top Stories headlines are wrapped in an <h4> tag.

    Step 5 - parse data

    As you’ve already seen, all the News headlines are wrapped in <h4> tags. You can use the Chrome Browser’s developers tool to check the Source HTML and plan the parser accordingly.

    To parse these headlines, you can use the Beautiful Soup module that you’ve imported in the previous steps. Let’s create a list data to store all the headlines.

    data = [] 
    soup = BeautifulSoup(response.json()["results"][0]["content"], "html.parser")
    for headline in soup.find_all("h4"):
        data.append(headline.text)

    By using the find_all() method, you can grab all the headlines in one go. You can then add them to the `data` for exporting them in CSV.

    Step 6 - export data into CSV

    Now, let’s store the data into a data frame object first. Then, you can export it to a CSV file using the to_csv() method. You can also set the index to False so that the CSV file won’t include an extra index column.

    df = pd.DataFrame(data)
    df.to_csv("google_news_data", index=False)

    Conclusion

    Using Oxylabs web scraping solutions, you can keep up to date with the latest News from Google News. Take advantage of Oxylabs’ powerful Scraper API to enhance your overall scraping experiences. Also, by using the techniques described in the article, you can harness the power of Google News data without worrying about proxy rotation or Anti-bot challenges.

    About the author

    Danielius Radavicius

    Danielius Radavicius

    Former Copywriter

    Danielius Radavičius was a Copywriter at Oxylabs. Having grown up in films, music, and books and having a keen interest in the defense industry, he decided to move his career toward tech-related subjects and quickly became interested in all things technology. In his free time, you'll probably find Danielius watching films, listening to music, and planning world domination.

    All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

    Related articles

    Get the latest news from data gathering world

    I’m interested

    IN THIS ARTICLE:


    • Scrape Google News using Oxylabs’ SERP API


    • Conclusion

    Try Google News Scraper API

    Choose Oxylabs' Google News Scraper API to unlock real-time product data hassle-free.

    Scale up your business with Oxylabs®