Skip to main content
Back to blog

Project 4β Supports Large-Scale Research Using Archived Public Web Data

4Beta + University of Birmingham
Narmin Mammadova

Narmin Mammadova

Last updated on

2026-06-19

3 min read

AI Summary:

Project 4β partnered with Jagadishen Pragassen Mooneesawmy, a PhD student at the University of Birmingham, to support academic research using archived public web data. With access to Oxylabs’ Web Scraper API and proxy infrastructure, he collected large-scale historical data on corporate websites for his doctoral research on how corporate messaging changes over time.

Oxylabs’ pro bono initiative, Project 4β, has partnered with Jagadishen Mooneesawmy, a PhD student in the Department of Economics at the University of Birmingham. His doctoral research, titled “The Political Economy of Woke Capitalism,” uses historical web data to examine how corporate public communication evolves over time and how changes in external conditions may relate to organizational messaging and business behavior. Project 4β provided technical infrastructure to support data collection and did not participate in the research design, analysis, or conclusions.

A Powerful Tool for Research

Through Project 4β, Jagadishen received access to Oxylabs’ Web Scraper API and proxy infrastructure. These tools enabled the large-scale collection of publicly available archived corporate website data from the Internet Archive’s Wayback Machine, helping build a rich textual dataset that would have been extremely difficult to assemble manually. With the help of the solution, Jagadishen managed to collect 663,840 unique archived corporate website URLs, covering 2,302 publicly listed U.S. firms between 2000 and 2024, and assemble a corpus of 500 million words efficiently and at scale. The data collection was conducted solely for academic research purposes, focused on publicly available corporate website material, and followed relevant research requirements. According to the researcher:

Our research uses archived company websites to understand whether firms ‘walk the talk’: how their public rhetoric and internal practices change when they face a political shock. Project  4β made the data collection process very easy to use, even for someone without prior knowledge of web scraping.

Jagadishen Pragassen Mooneesawmy, PhD student in the Department of Economics at the University of Birmingham

The project is based on the idea that a firm’s website can serve as a public reflection of its internal priorities, heuristics, and cultural shifts. By analyzing historical website data, the research explores how corporate messaging changes over time and how these shifts connect to broader business, political, and societal developments.

Examining Changes in Corporate Communications Over Time

One strand of the research examines associations between changes in state-level political conditions and changes in firms’ public communications. To study this, Jagadishen built a Woke Narrative Index using archived corporate website pages from 2,302 publicly listed U.S. firms between 2000 and 2024, collected with Oxylabs’ tools.

Using a regression discontinuity design around closely contested gubernatorial elections, the study identifies how firms adjust their public positioning when political control changes. In the researcher’s analysis, closely contested gubernatorial election outcomes were associated with measurable changes in corporate public communications, including changes in diversity, equity, and inclusion-related language.

With Oxylabs’ support, this project has become a success. We were able to create a new large-scale historical dataset from corporate websites, a source that remains underexplored in economics, and use it to observe how corporate narratives shift over time.

Jagadishen Pragassen Mooneesawmy, PhD student in the Department of Economics at the University of Birmingham

The research also finds that public-facing rhetoric and internal diversity practices can move in opposite directions. While DEI language on corporate websites increases, workforce diversity scores decline by 10.0% and board gender diversity declines by 10.6% after the same political shock. The analysis identified differences between public communications and selected organizational indicators following the observed events.

Why Public Web Data Matters

Access to public web data is essential for this kind of academic research because it enables researchers to observe real-world corporate behavior as it unfolds. Public web data can provide an additional source of longitudinal information that complements traditional datasets and enables analysis of changes in public communication over time. 

Without scalable web data collection technologies, this type of research would be substantially more difficult to conduct at this scale. Manual collection across decades of archived webpages and thousands of firms would be labor-intensive, error-prone, and difficult to scale. Oxylabs’ infrastructure made that process possible and helped create a comprehensive dataset with meaningful real-world applications.

Partner with Project 4β 

We’re always open to new partnerships where Oxylabs’ web intelligence collection solutions and expertise can help solve critical research questions and advance important missions.

Project 4β supports researchers, academics, journalists, NGOs, and organizations that use public web data to drive positive impact.  We provide free access to these tools, so partners can collect and analyze large-scale public web data and focus on what matters most: impact. 

If your organization’s work or research could benefit from web intelligence tools, we’d love to hear from you. Reach out to 4beta@oxylabs.io or share your details through our form.

Forget about complex web scraping processes

Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.

About the author

Narmin Mammadova

Narmin Mammadova

PR Content Manager

Narmin is the PR Content Manager for Project 4β at Oxylabs. She enjoys the challenge of getting people to care, and pro bono work gives her good stories to tell. In her spare time, she travels whenever possible or indulges her love of poetry and reciting.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Oxylabs Shortlisted at DataIQ Awards 2026
Oxylabs Shortlisted at DataIQ Awards 2026
Narmin Mammadova

Narmin Mammadova

2026-06-18

RAISE Summit 2026
Who Feeds the Agents? Web Data Takes Center Stage at RAISE Summit 2026
Danielė Virinaitė avatar

Danielė Virinaitė

2026-06-08

We’re Heading to AI Engineer World’s Fair 2026
Iveta Liupševičė

Iveta Liupševičė

2026-06-04

Get the latest news from data gathering world

Forget about complex web scraping processes

Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.