OxyCon 2025: Key Takeaways

Agnė Matusevičiūtė

Last updated on

2025-10-03

7 min read

AI Summary:

OxyCon 2025 explored how AI is transforming web scraping workflows, addressing challenges like data access restrictions and large-scale data structuring. Discussions highlighted innovations such as self-healing parsers, AI-powered development tools, and advanced unblocking techniques, emphasizing scraping's critical role in maintaining an open and innovative AI ecosystem.

OxyCon 2025 brought the global web scraping community together once again – this time under the bold theme “Power Up for the AI-Driven Era of Web Scraping.”

The event explored how AI is transforming scraping workflows, driving smarter data use, and reshaping the industry as we know it. From technical deep dives to strategic insights, speakers tackled everything from resilient infrastructure and adaptive scrapers to the growing need for ethical data practices in a rapidly evolving digital landscape.

Web Scraping: The Key to an Open AI Future?

As AI continues reshaping industries, access to data is quickly becoming a defining factor in who leads and who gets left behind. In his talk, Oxylabs Engineering Manager Giedrius Šteimantas explored how data restrictions imposed by major platforms are threatening openness in the AI ecosystem, and why web scraping is a crucial tool to keep AI innovation accessible to all.

Major platforms are tightening access – Cloudflare now blocks AI crawlers by default, Amazon has shut down Google’s shopping agents, and many sites are charging “pay‑per‑crawl” fees or suing scrapers. The consequence? Innovation concentrates in the hands of a few tech giants that control both data and funding.

“By blocking an opportunity this big, you block the innovation, and hurt the end consumer the most.”

– Giedrius Šteimantas

But there’s still a way forward. In his words, scraping isn’t just about gathering information – it’s about keeping the future of AI open, diverse, and innovative.

From Chaos to Clarity: Data Structuring in Large-Scale Scraping

While unblocking websites often takes center stage in web scraping discussions, another challenge looms quietly behind the scenes – data structuring. At scale, it's one of the biggest bottlenecks teams face, and in his presentation, Aleksandras Šulženko, Product Owner at Oxylabs, made the case that structuring may be even trickier than unblocking.

“Data without structure is just noise.”

– Aleksandras Šulženko

To highlight this, Aleksandras walked through a typical large-scale scraping workflow – from URL building and content unblocking to data delivery and processing. But the spotlight was on how Oxylabs’ Web Scraper API helps structure data more efficiently and reliably, especially when scraping operations grow in complexity.

The newest Oxylabs feature, Self-healing parser presets, stood out. It’s a self-healing logic that adapts to site changes over time. It allows users to scale structured data consumption in no time, as well as outsource maintenance. When asked about the future of parsers, Aleksandras shared that this is the future – better output, less maintenance, more datapoints, and more structured data for everyone.

To wrap up the presentation, Aleksandras showed a live demo of this feature, demonstrating just how far data structuring has come – and how much easier it is to maintain clarity in even the most chaotic scraping setups.

Aleksandras Šulženko, Product Owner @ Oxylabs

Scaling E-Commerce Data Extraction: From Zero to 10 Billion Products a Day

As e-commerce continues to dominate global trade, efficiently scaling data collection has become a make-or-break factor for businesses. In his presentation, Fred de Villamil, Chief Technology Officer at NielsenIQ Digital Shelf, provided a comprehensive guide to building robust, scalable scraping strategies for e-commerce websites. He shared how his team extracts data for billions of products daily while maintaining cost-efficiency, data quality, and security.

“There’s no good strategy without a good process.”

– Fred de Villamil

Fred broke down the challenges of location-based scraping – such as identifying stores, tracking new openings and closures, and dynamically switching scraping sources as needed. His team’s efforts have successfully scaled operations to 4,500 stores for a single retailer, showcasing the flexibility needed to adapt to ever-changing data sources.

But scaling doesn’t stop at data acquisition. To truly succeed, monitoring at scale is essential. Fred unveiled how NielsenIQ’s systems track tens of KPIs across thousands retailers and stores, automating issue detection and response handling. From spotting anomalies in real-time to creating tickets and assigning them for action with minimal human intervention, the workflow is an amazing example of efficiency at scale.

Creating an AI-Powered Price Comparison Tool With Cursor and Oxylabs’ AI Studio

Gone are the days when building a price comparison tool required advanced coding expertise or extensive knowledge of scraping and parsing. In his session, Rytis Ulys, Head of Data & Analytics at Oxylabs, showcased the transformative capabilities of AI-powered tools, illustrating how even those with limited coding experience can create sophisticated web data-based systems. Using Cursor (an AI Code Editor) and Oxylabs' groundbreaking AI Studio, he developed a fully functional price comparison tool – with only minimal manual coding required.

Throughout the demo, Rytis walked the audience through the simplicity of integrating web data into personal or professional projects. Starting with a high-level overview of the architecture and the two AI Studio apps (AI-Scraper and Browser Agent), he demonstrated how tasks that traditionally required complex scripting – such as scraping, parsing, and data analysis – are now effortlessly automated.

With AI doing the heavy lifting, Rytis illustrated how anyone can bypass the technical hurdles of web scraping and focus on bringing their projects to life.

Rytis Ulys, Head of Data & Analytics @ Oxylabs

How Machine Learning Improves Web Scraping (and Vice Versa)

At the intersection of web scraping and AI lies a powerful, two-way relationship. In his presentation, Zia Ahmad, Data Scientist at Turing, explored how web scraping feeds machine learning models with data, while AI increasingly enhances the scraping process itself.

Zia showcased how AI techniques, such as natural language processing (NLP) and computer vision, are revolutionizing web scraping pipelines. From intelligent content extraction to anomaly detection and adaptive crawling, these advancements have made scraping faster, more resilient, and extraordinarily precise. He addressed the brittleness of traditional scraping methods and introduced the idea of AI agents capable of self-healing when page structures change.

Beyond enhancing scraping, Zia highlighted how scraped data powers core ML use cases, including sentiment analysis, healthcare applications, competitive intelligence, and predictive analytics. By shifting the focus from “how” scraping works to “what” insights it can unlock, this talk underscored a virtuous cycle: better scraping enables better AI models, and those models in turn make scraping smarter, faster, and more robust.

When asked whether AI can truly change someone as a professional, he responded:

“It’s not a question if we will be replaced by AI or not. But we will surely be replaced by someone who knows AI better than us.”

– Zia Ahmad

Web Scraping and AI: Legal Touchpoints and Ways Forward

Oxylabs Chief Governance and Strategy Officer, Denas Grybauskas, hosted a thought-provoking panel discussion on the evolving legal and ethical landscape of web scraping in the age of AI and automation. Joined by leading experts in law, ethics, and technology, the panel unraveled the complexities of AI-driven web data extraction and its business implications.

The panel explored liability challenges when AI scrapers inadvertently collect sensitive or protected data, the shifting boundaries of fair use and copyright, and how new regulatory proposals may reshape the field. While AI hasn’t created new legal questions, panelists agreed it has intensified old ones – from copyright preemption to the use of copyrighted content in model training – bringing sharper scrutiny to familiar gray areas.

The conversation also explored new forms of governance like robots.txt, Cloudflare’s AI-focused tools, and the IETF’s “AI Preferences” proposal. These initiatives aim to give content owners more control but raise concerns about fragmenting the open web and creating new gatekeepers.

The debate underscored that as web scraping evolves alongside AI, so too must the ethical and regulatory frameworks that guide it – requiring collaboration, adaptability, and accountability.

How AI Reshaped My Workflow As a Scraper Developer and Content Creator

The web scraping landscape is undergoing a transformation, and at the forefront of that change is how developers leverage AI-powered tools to redefine their workflows. In his session, Pierluigi Vinciguerra, Co-Founder & Chief Technology Officer at DataBoutique.com, shared how AI tools like LLMs, agents, Cursor, and Model Context Protocols (MCPs) are transforming web scraping workflows – and sparking creativity.

Through live demos and real-world examples, he revealed how AI helps automate time-consuming tasks – like assigning roles for paid users across GitHub and Discord, or building a daily press review tool that scrapes, summarizes, and tags new articles for his notes.

He introduced a helpful framework: horizontal scraping (many sources, few URLs) is now LLM territory, while vertical scraping (millions of URLs from a few sites) still favors traditional tools. But even here, AI speeds up development – in his words:

“90% of the code I deployed in the past year was written by AI.”

– Pierluigi Vinciguerra

Looking ahead, he sees a near future of self-healing scrapers powered by LLMs and human supervision. And while AI handles the parsing, he reminded the audience: you still need to reach the data – so proxies and unblockers aren’t going anywhere.

Advanced Web Scraping: Techniques to Stay Unblocked

To wrap up OxyCon 2025, Juras Juršėnas, COO at Oxylabs, moderated a panel discussion featuring five industry experts who unpacked the evolving challenges and innovations in web scraping amid growing anti-bot defenses.

The panel explored how anti-bot defenses have become more sophisticated across multiple layers – from TLS fingerprinting and HTTP/2 quirks to advanced IP reputation scoring and dynamic content blocking. Tools like Playwright remain central, but even they require frequent patching due to browser leakage and detection. Panelists shared how LLMs are becoming valuable allies and that scraping at scale now requires not just better tools, but custom strategies and in-house infrastructure like reverse proxies, alerting pipelines, and domain-specific monitoring.

But unblocking isn’t just technical – it's about people. The conversation highlighted the need for collaborative teams, a hacker mindset, and knowing when to ask for help. Building a strong R&D culture and hiring curious, puzzle-loving developers was framed as essential for success.

On a broader level, the “open internet” debate took center stage, questioning why smaller scrapers face harsher restrictions than major AI players. This topic of fairness and accessibility underscored the need for ethical frameworks and industry collaborations to ensure a level playing field.

As anti-bot technology advances, so must scrapers – through smarter tech, stronger teams, and cross-industry collaboration.

Final thoughts

And with that, we wrap up OxyCon 2025 – we hope you enjoyed it! A heartfelt thank you to everyone who joined us: whether you presented, participated in panel discussions, asked questions, or simply tuned in – your energy and enthusiasm made this event truly special. From all of us at Oxylabs, thank you for being part of it.

Join our growing Discord community to network and get the latest updates. If you registered for OxyCon ’25, you’ll receive the on-demand videos shortly. If you didn’t, stay tuned for updates on the OxyCon page!

Have any questions? Reach out to us at events@oxylabs.io.

Forget about complex web scraping processes

Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.

About the author

Agnė Matusevičiūtė

Technical Copywriter

With a background in philology and pedagogy, Agnė focuses on making complicated tech simple.

Learn more about Agnė Matusevičiūtė Learn more about Agnė Matusevičiūtė

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.