

Narmin Mammadova
Last updated on
2026-06-30
4 min read
AI Summary:
Using Oxylabs’ Web Scraper API provided through Project 4β, FactCheck built a systematic data-collection pipeline to gather publicly available content from state-owned and independent media websites, video data, search data, and social media.
FactCheck.LT, a Vilnius-based research organization focused on media analysis, fact-checking, and countering disinformation, has developed FORESIGHT - an AI-powered multi-agent platform for geopolitical analysis and disinformation detection. As a long-standing partner of Oxylabs’ pro bono initiative Project 4β, the organization monitors one of Europe’s most heavily controlled information spaces: the Belarusian media ecosystem.
Using Oxylabs’ Web Scraper API provided through Project 4β, FactCheck.LT built a systematic data-collection pipeline to gather publicly available content from state-owned and independent media websites, video data, search data, and publicly available data on social media. The platform now contains over 2,1 million documents from more than 700 sources, one of the most comprehensive datasets on the Belarusian information space.
Monitoring the Belarusian information space is not a simple research task. It requires following a fragmented ecosystem shaped by censorship, exile, platform migration, and state-controlled messaging.
After the events of 2020, many independent Belarusian media outlets were forced to relocate outside the country. At the same time, state-owned media retained institutional backing and expanded their digital presence across social media platforms. The result was a scattered yet highly active information environment spanning websites, channels, accounts, and formats.
For FactCheck.LT, the challenge was not only to follow individual stories but to understand how narratives are built, repeated, adapted, and amplified across the wider ecosystem. That meant monitoring dozens of outlets, hundreds of video platform channels, and large volumes of short-form and social content over extended periods. Manual collection was insufficient to support that kind of work, and basic scraping tools were unreliable as well.
Our biggest technical challenge was reliably collecting publicly available web content at scale. Many of the sources we monitored involved dynamic, JavaScript-rendered content or required a more stable collection setup than standard tools could provide. Before Oxylabs, our data collection was inconsistent, slow, and incomplete.
Mikhail Doroshevich, Editor-in-Chief and Co-founder of FactCheck.LT
The limitations were practical as much as technical. Consistently collecting public web data across different sources was difficult, especially when some pages relied on dynamic content rendering. Public video transcripts, an important source for analyzing broadcast messaging and propaganda narratives, were also not always readily available through standard APIs. For a small organization, building and maintaining custom collection workflows would have taken valuable time away from the research itself.
Through Project 4β, FactCheck.LT integrated Oxylabs’ Web Scraper API into its data collection workflow and built a more stable foundation for long-term research.
This support enabled the team to collect and structure publicly available web data from several parts of the information ecosystem, including:
news articles from state and independent media outlets
social media video transcripts from state television channels
public social media profiles and posts from platforms
search engine data used to contextualize shifts in public attention
This breadth was essential to the work. Looking at a single article or account in isolation rarely reveals how information campaigns operate. What matters is the ability to compare coverage, detect repetition, identify timing patterns, and place sudden narrative shifts into a broader historical context.
With a reliable data collection layer in place, we could redirect effort toward what actually matters: analysis, publication, and impact. The difference is between spending days manually collecting data and having a systematic pipeline that automatically feeds our analytical platform. Oxylabs essentially removed the data collection bottleneck.
Mikalai Kvantaliani, Co-founder and Project Manager at FactCheck.LT
Built on top of that data pipeline, FORESIGHT supports multiple types of analysis simultaneously. The platform tracks how propaganda themes rise and fall over time, compares how different media groups cover the same events, detects signs of synchronized behavior across platforms, and supports forecasting based on historical patterns.
With a more dependable collection pipeline in place, FactCheck was able to move from fragmented monitoring to systematic analysis.
The scale alone changed what became possible. The organization built a corpus of more than 2,1 million documents across 700+ sources and expanded video platform monitoring to roughly 1,000 channels. Automated transcript analysis enabled examination of how video messaging was framed, repeated, and adapted over time.
That broader visibility helped reveal not only what was being said, but also how messaging changed over time and how narratives were coordinated across formats.
One example was the way state media framed relations with the European Union. FactCheck identified a shift between rhetoric favoring European integration and messaging emphasizing an Eastern turn, pointing to a pattern of strategic ambiguity rather than a fixed line. This kind of analysis is difficult to produce without both historical depth and large-scale cross-source comparison.
The platform also supported cross-platform coordination analysis. In one social media platform-related study, FactCheck identified highly synchronized, providing strong evidence of coordinated inauthentic activity affecting several countries, including Poland, Moldova, Romania, and Hungary.
| Metric | Result |
|---|---|
| Corpus scale | 2,1 million+ documents across 700+ sources |
| Video platform monitoring | Around 1,000 channels with automated transcript analysis |
| Coordination detection | Statistical evidence of coordinated behavior at P<10-80 |
| Data collection reliability | Improved from roughly 40% with manual tools to 98%+ in one monitored workflow |
Disinformation research often receives attention at the point of publication, when a report is released, or a network is exposed. Less visible is the infrastructure that makes those findings possible in the first place.
Investigating disinformation requires more than identifying individual examples. It depends on being able to follow patterns over time, compare sources, and detect coordination across platforms. That kind of work is only possible when researchers have reliable access to public web data.
Denas Grybauskas, Chief Governance and Strategy Officer of Oxylabs
In environments like Belarus, where state messaging is systematic and cross-border influence activity affects neighboring EU countries, researchers need more than strong analytical frameworks. They need dependable access to public data. Without that, investigations become partial, slow, and difficult to scale.
That is why this case matters beyond a single organization. FactCheck.LT’s work shows what becomes possible when civil society researchers have access to infrastructure that is usually out of reach for smaller teams. Instead of spending limited time troubleshooting blockers and rebuilding collectors, they can focus on interpretation, publication, and public-interest impact.
The results have already extended beyond internal research. FactCheck.LT’s findings have informed diplomatic briefings across the Baltic and Eastern European region, supported journalism, contributed to broader discussions around foreign information manipulation and interference, and helped train practitioners working in AI and digital security.
For civil society organizations, investigative journalists, and academic researchers working on countering disinformation, reliable data collection is not a luxury. It is a prerequisite. Project 4β makes professional-grade infrastructure accessible to organizations doing critical work on limited budgets.
Mikhail Doroshevich, Editor-in-Chief and Co-founder, FactCheck.LT
This case highlights the role reliable data collection plays in serious disinformation research. Through Project 4β, Oxylabs supports organizations working to monitor information threats, strengthen independent research, and turn public web data into meaningful analysis. To learn more, contact the team via the form or at 4beta@oxylabs.io.
Forget about complex web scraping processes
Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.


Narmin Mammadova
2026-06-19



Danielė Virinaitė
2026-06-08
Get the latest news from data gathering world
Scale up your business with Oxylabs®
Proxies
Advanced proxy solutions
Data Collection
Datasets
Resources
Innovation hub
Forget about complex web scraping processes
Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.