Building an Ethical Future: Using Publicly Available Data to Beat the Bad Guys


Julius Cerniauskas
Last updated on
2026-02-20
4 min read


Julius Cerniauskas
Last updated on
2026-02-20
4 min read
Much of today’s internet systems, think price comparison sites as a key example, are built on publicly available web data. Yet the practice still attracts skepticism from those who feel unsure of how far this data collection goes. Just earlier this year, Cloudflare announced an offering to ‘block the bots’ and others have called out the pressure some forms of scraping might put on sites. While the concerns in these cases should be addressed to balance everyone’s interests, they should not overshadow the positive outcomes that data collection can bring.
For years now, web scraping is a respected practice amongst journalists, watchdog organizations, and NGOs. Take the Global Investigative Journalism Network’s recent piece on how to use AI to help non-coding journalists access publicly available data for large-scale investigation. Others are using the technology to expose corruption, shed light on societal issues, and foster transparency. We need to find a way to ensure all sides are heard in the great data debate, but protect those who need web data in order to unearth injustice and build a net-positive future.
Public web data, defined as “information that can be shared, used, reused and redistributed without restriction” online, has the potential to bring positive change and benefits to the public. For example, it enhances access to information, promotes transparency and fosters innovation. However, as with any resource, there is a potential for misuse. This is the root of negative attitudes toward web scraping, with a few bad actors casting a shadow over its positive impact.
The most vital step that needs to be taken is collaboration between stakeholders, in order to establish ethical guidelines and prevent the misuse of web scraping. Governments, NGOs, and technology companies should come together and agree on industry standards and regulations that strike a balance between keeping the internet open for all, and allowing users to feel safe online.
In early 2025, the UK parliament debated the Data (Use and Access) Act back and forth in the House of Lords. With this legislation, the UK government aimed to make training data more accessible, as AI innovation sheds light on how important this is to growth. However, the hold up to the act was caused by concerns that more consideration should be given to protecting content creators, with the government eventually committing to a dedicated review and further action to factor this in. This was a good start in the move towards open conversations, but the pushback also shows the complexities with finding an ethical approach, it is not something to rush.
One organisation, the "Ethical Web Data Collection Initiative'' (EWDCI), aims to help search for a solution for such complex issues by building better systems of cooperation and communication within the data aggregation industry. It's working to construct a framework that facilitates an open and inclusive process for formulating principles governing legal and ethical web scraping practices. Currently, an organisation can display EWDCI's certification badge, telling other enterprises it has signed on to a robust ethical approach to the use of public web data.
In tandem with joining organisations like the EWDCI to build a better industry as a whole, web intelligence providers should promote ethical web scraping practices among their clients. They can do this by putting in place robust usage policies and KYC practices, alongside offering tools that enable responsible data extraction and actively monitoring and addressing any potential misuse of the services they provide.
When public web data is being collected ethically, it can be applied to an endless number of positive use cases. These include:
Advocating for social justice and accountability: For years now, web scraping has played a key role in advocating for social justice and holding organisations accountable. By analysing publicly available data, journalists and researchers have revealed disparities in public services, discriminatory practices, and financial irregularities. One key example of this was when public web data was used to shed light on wrongful property tax charges leading to home foreclosures. These findings have prompted corrective actions and legal interventions to fix flaws within the system, protecting those who operate within them.
Combating illegal activities: Web scraping has, and continues to today, aided police in identifying when crimes are taking place online. By analysing message boards and online marketplaces, web data has exposed underground markets for human trafficking and illegal firearm sales to name a few. These efforts have led to successful prosecutions, dismantled criminal networks, and ultimately saved lives.
Monitoring online hate speech: Online hate speech is an increasingly difficult problem for authorities, but as this operates on the internet, scraping tools can collect insights on patterns, trends and the spread of harmful ideologies. By monitoring online platforms and forums, researchers can identify emerging threats, put in place strategies for countering hate speech and promote online safety.
Cleaning the internet from illegal content: The Communications Regulatory Authority of the Republic of Lithuania (RRT) is a key example of the potential positive impact web scraping can have. They moved beyond a previously established reliance on public hotlines with large-scale web data collection and AI-based image recognition to detect illegal material across open sources, mainly related to child sexual abuse, improving coverage and speed of response.
Monitoring air pollution levels: Web search analytics can be utilised to observe and track air pollution trends, offering a deeper understanding of environmental conditions based on public interest. By analysing aggregated search data from various platforms and integrating it with meteorological information available online, researchers can make more specific estimations about pollution levels
Looking at this as a whole, it’s hard to ignore the fact that public web data gathering can bring immense value to society. From uncovering corruption to safeguarding public health and promoting fair competition, there is an ever-growing list of positive use cases. However, we can’t forget how crucial it is to strike a balance between protecting against data misuse and preserving the ability to leverage web scraping for good.
In order to protect these important use cases the industry must embrace responsible web scraping practices, safeguard user privacy and respect the website that hosts public data instead of overloading it with requests. Only then will companies and researchers unlock the full potential of this powerful tool with the knowledge that these systems are built to last.
Scale up your business with Oxylabs®
Proxies
Advanced proxy solutions
Data Collection
Datasets
Resources
Innovation hub