Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network statusCareers

Back to blog

New Research Uncovers the Main Challenges of Public Web Data Gathering

Oxylabs+Censuswide

Gabija Birgile

2024-09-132 min read
Share

New research conducted by Oxylabs in partnership with Censuswide has surveyed scraping professionals to uncover the key issues they face when gathering public web data and the role of artificial intelligence (AI) in addressing them. The survey also revealed increasing demand for public web data, with 74% of respondents experiencing it over the last 12 months.

The survey confirms what Oxylabs, as a leading web scraping solutions provider, have observed over the past year – the demand for high-quality public web data is accelerating. With the advancements in AI model development and increased usage of tools based on these models, we expect this trend to continue well into the future.

 Julius Černiauskas, CEO at Oxylabs

Methodology of the research

The market survey of web scraping professionals was carried out by a leading research consultants Censuswide at Oxylabs’ request. Censuswide surveyed 506 developers and other technical professionals who conduct web scraping as part of their job in the United States (US) and the United Kingdom (UK), with both countries represented equally.

While the respondents held various job titles, software developers (41%) were prevalent, with full-stack developers (22%) and data engineers (20%) being the second and third most represented groups respectively. The annual turnover of their companies expressed in British pounds ranged from under £100,000 (3%) to £500 million or over (8%), while almost half of the represented companies were in the range between £1 million and £49.99 million. The survey measured the pulse of the two key markets for web scraping with a special focus on challenges, their business impact, and the potential of applying AI in data scraping.

Key findings

The research revealed that the demand for public web data increased for almost three-quarters of US and UK businesses. The increase was significant for almost one quarter (24%, to be precise). Other key findings of the research include:

  • 61% of respondents identified building the necessary infrastructure as the main challenge of large-scale web data collection. 

  • Building and maintaining data parsers was the second most recognized challenge, named by half of the respondents, with 86% acknowledging that it is a time- and- resource-intensive task.

  • 98% of businesses face some negative impact when data collection is delayed. For 21%, the impact is severe.

  • If parsing is interrupted, 95% of businesses face negative impacts within 24 hours.

  • 75% of developers spend from 10 to 40 hours weekly on parsing processes.

  • Identifying complex parsing patterns when dealing with multiple URLs is the most common challenge (58%) associated with data parsing, with dynamic website layouts and changing structures (56%) being a close second.

  • 57% say that due to changing website layouts, they have to fix parsers several times a week. For the same reason, 31% fix parsers every day.

  • Over 50% of developers mention time as the main parsing related business cost.

  • 77% of respondents have tried using AI-powered web scraping solutions. Of them, 86% say that it helped them save time and resources.

Most leading web scraping solutions providers experiment with AI and ML to find out how they can streamline infrastructure maintenance, target unblocking, and other crucial tasks. The satisfaction users report with AI-powered data scraping will encourage continued research in this area.

Martynas Juravičius, R&D tech lead at Oxylabs

Summing up

The survey of web scraping professionals in the UK and the US by Oxylabs and Censuswide shows the growing importance of public web data for businesses. Most businesses experience a rapid negative impact when data collection is delayed. The survey's key findings uncover the main challenges that lead to such delays, and what business costs they lead to. Stay tuned, as a more detailed analysis of these results is forthcoming.

About the author

Gabija Birgile

Senior PR Manager

Gabija Birgile is a Senior PR Manager at Oxylabs. After working in a PR agency and juggling various projects for quite some time, she wanted to try a role in the tech industry. Making a positive impact with her work was always on top of her mind, so managing "Project 4β" pro bono partnerships now definitely does the job. If you have a project in mind, drop her a message at 4beta@oxylabs.io.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested