Skip to main content
Back to blog

Project 4β Impact: How FactCheck.LT Built an AI-Powered Platform to Monitor Disinformation Across Eastern Europe

4beta+FactCheck.LT
Narmin Mammadova

Narmin Mammadova

Last updated on

2026-06-30

4 min read

AI Summary:

Using Oxylabs’ Web Scraper API provided through Project 4β, FactCheck built a systematic data-collection pipeline to gather publicly available content from state-owned and independent media websites, video data, search data, and social media.

Overview

FactCheck.LT, a Vilnius-based research organization focused on media analysis, fact-checking, and countering disinformation, has developed FORESIGHT - an AI-powered multi-agent platform for geopolitical analysis and disinformation detection. As a long-standing partner of Oxylabs’ pro bono initiative Project 4β, the organization monitors one of Europe’s most heavily controlled information spaces: the Belarusian media ecosystem.

Using Oxylabs’ Web Scraper API provided through Project 4β, FactCheck.LT built a systematic data-collection pipeline to gather publicly available content from state-owned and independent media websites, video data, search data, and publicly available data on social media. The platform now contains over 2,1 million documents from more than 700 sources, one of the most comprehensive datasets on the Belarusian information space.

The challenge

Monitoring the Belarusian information space is not a simple research task. It requires following a fragmented ecosystem shaped by censorship, exile, platform migration, and state-controlled messaging.

After the events of 2020, many independent Belarusian media outlets were forced to relocate outside the country. At the same time, state-owned media retained institutional backing and expanded their digital presence across social media platforms. The result was a scattered yet highly active information environment spanning websites, channels, accounts, and formats.

For FactCheck.LT, the challenge was not only to follow individual stories but to understand how narratives are built, repeated, adapted, and amplified across the wider ecosystem. That meant monitoring dozens of outlets, hundreds of video platform channels, and large volumes of short-form and social content over extended periods. Manual collection was insufficient to support that kind of work, and basic scraping tools were unreliable as well.

Our biggest technical challenge was reliably collecting publicly available web content at scale. Many of the sources we monitored involved dynamic, JavaScript-rendered content or required a more stable collection setup than standard tools could provide. Before Oxylabs, our data collection was inconsistent, slow, and incomplete.

Mikhail Doroshevich, Editor-in-Chief and Co-founder of FactCheck.LT

The limitations were practical as much as technical. Consistently collecting public web data across different sources was difficult, especially when some pages relied on dynamic content rendering. Public video transcripts, an important source for analyzing broadcast messaging and propaganda narratives, were also not always readily available through standard APIs. For a small organization, building and maintaining custom collection workflows would have taken valuable time away from the research itself.

The web intelligence behind the research

Through Project 4β, FactCheck.LT integrated Oxylabs’ Web Scraper API into its data collection workflow and built a more stable foundation for long-term research.

This support enabled the team to collect and structure publicly available web data from several parts of the information ecosystem, including:

  • news articles from state and independent media outlets

  • social media video transcripts from state television channels

  • public social media profiles and posts from platforms 

  • search engine data used to contextualize shifts in public attention

This breadth was essential to the work. Looking at a single article or account in isolation rarely reveals how information campaigns operate. What matters is the ability to compare coverage, detect repetition, identify timing patterns, and place sudden narrative shifts into a broader historical context.

With a reliable data collection layer in place, we could redirect effort toward what actually matters: analysis, publication, and impact. The difference is between spending days manually collecting data and having a systematic pipeline that automatically feeds our analytical platform. Oxylabs essentially removed the data collection bottleneck.

Mikalai Kvantaliani, Co-founder and Project Manager at FactCheck.LT

Built on top of that data pipeline, FORESIGHT supports multiple types of analysis simultaneously. The platform tracks how propaganda themes rise and fall over time, compares how different media groups cover the same events, detects signs of synchronized behavior across platforms, and supports forecasting based on historical patterns.

What the research made possible

With a more dependable collection pipeline in place, FactCheck was able to move from fragmented monitoring to systematic analysis.

The scale alone changed what became possible. The organization built a corpus of more than 2,1 million documents across 700+ sources and expanded video platform monitoring to roughly 1,000 channels. Automated transcript analysis enabled examination of how video messaging was framed, repeated, and adapted over time.

That broader visibility helped reveal not only what was being said, but also how messaging changed over time and how narratives were coordinated across formats.

One example was the way state media framed relations with the European Union. FactCheck identified a shift between rhetoric favoring European integration and messaging emphasizing an Eastern turn, pointing to a pattern of strategic ambiguity rather than a fixed line. This kind of analysis is difficult to produce without both historical depth and large-scale cross-source comparison.

The platform also supported cross-platform coordination analysis. In one social media platform-related study, FactCheck identified highly synchronized, providing strong evidence of coordinated inauthentic activity affecting several countries, including Poland, Moldova, Romania, and Hungary.

Key Findings

Metric Result
Corpus scale 2,1 million+ documents across 700+ sources
Video platform monitoring Around 1,000 channels with automated transcript analysis
Coordination detection Statistical evidence of coordinated behavior at P<10-80
Data collection reliability Improved from roughly 40% with manual tools to 98%+ in one monitored workflow

Why this matters

Disinformation research often receives attention at the point of publication, when a report is released, or a network is exposed. Less visible is the infrastructure that makes those findings possible in the first place.

Investigating disinformation requires more than identifying individual examples. It depends on being able to follow patterns over time, compare sources, and detect coordination across platforms. That kind of work is only possible when researchers have reliable access to public web data.

Denas Grybauskas, Chief Governance and Strategy Officer of Oxylabs

In environments like Belarus, where state messaging is systematic and cross-border influence activity affects neighboring EU countries, researchers need more than strong analytical frameworks. They need dependable access to public data. Without that, investigations become partial, slow, and difficult to scale.

That is why this case matters beyond a single organization. FactCheck.LT’s work shows what becomes possible when civil society researchers have access to infrastructure that is usually out of reach for smaller teams. Instead of spending limited time troubleshooting blockers and rebuilding collectors, they can focus on interpretation, publication, and public-interest impact.

The results have already extended beyond internal research. FactCheck.LT’s findings have informed diplomatic briefings across the Baltic and Eastern European region, supported journalism, contributed to broader discussions around foreign information manipulation and interference, and helped train practitioners working in AI and digital security.

For civil society organizations, investigative journalists, and academic researchers working on countering disinformation, reliable data collection is not a luxury. It is a prerequisite. Project 4β makes professional-grade infrastructure accessible to organizations doing critical work on limited budgets.

Mikhail Doroshevich, Editor-in-Chief and Co-founder, FactCheck.LT

This case highlights the role reliable data collection plays in serious disinformation research. Through Project 4β, Oxylabs supports organizations working to monitor information threats, strengthen independent research, and turn public web data into meaningful analysis. To learn more, contact the team via the form or at 4beta@oxylabs.io.

Forget about complex web scraping processes

Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.

About the author

Narmin Mammadova

Narmin Mammadova

PR Content Manager

Narmin is the PR Content Manager for Project 4β at Oxylabs. She enjoys the challenge of getting people to care, and pro bono work gives her good stories to tell. In her spare time, she travels whenever possible or indulges her love of poetry and reciting.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

4Beta + University of Birmingham
Project 4β Supports Large-Scale Research Using Archived Public Web Data
Narmin Mammadova

Narmin Mammadova

2026-06-19

Oxylabs Shortlisted at DataIQ Awards 2026
Oxylabs Shortlisted at DataIQ Awards 2026
Narmin Mammadova

Narmin Mammadova

2026-06-18

RAISE Summit 2026
Who Feeds the Agents? Web Data Takes Center Stage at RAISE Summit 2026
Danielė Virinaitė avatar

Danielė Virinaitė

2026-06-08

Get the latest news from data gathering world

Forget about complex web scraping processes

Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.