OxyCon® 2023: The Top Takeaways

Maryia Stsiopkina

Last updated on

2023-09-15

5 min read

AI Summary:

OxyCon 2023 explored technical advancements in web scraping, focusing on easy, scalable web data access, leveraging machine learning for adaptive parsing, and optimizing infrastructure. Key discussions also covered legal considerations, extracting insights from video data, and future industry trends influenced by AI and data accessibility challenges.

OxyCon 2023 was truly a blast, bringing together the sharpest minds of the web scraping industry and more than 2000 attendees.

If you couldn’t make it to the conference or want to experience the thrill again, let’s revisit its best moments in a brief event summary. And if you’re hungry for more in-depth OxyCon content and exclusive Q&A sessions with the experts, hold on just a little bit longer; very soon, OxyCon on-demand videos will be available.

Before jumping in, join our Discord to network and stay tuned for the updates.

Cracking the Code: Overcoming Blocks in Large-Scale Web Scraping

The conference kicked off with an insightful presentation by Denis Zyk, a Golang Developer at Oxylabs, about overcoming roadblocks in large-scale scraping operations.

How do you get the data you need and do it right away? How do you deal with geo-specific websites, dynamic content, and sophisticated anti-bot systems? Denis' presentation is a must-watch if any of these questions pique your interest.

Here’s a sneak peek at some of the pro tips from Denis:

Use a headless browser for dynamic targets;
Be mindful of your browsing flow;
Rotate your static fingerprints.

At the end of the presentation, the expert answered a couple of questions from the audience, including whether one can use Tor proxies for scraping, how to handle websites that keep updating their structure, how to identify the rate limit of a site without hitting too much, and more.

Denis Zyk (Golang Developer at Oxylabs) and Gabija Fatėnaitė (OxyCon host)

Cybercriminal Footprint Erasure: Response Strategies

Javier Velandia, RA Product Manager at Appgate, captivated the audience’s attention from the very first seconds by bringing up one of the hottest topics in the cyber defense world: the ongoing war between cyber criminals and cybersecurity companies.

He provided some alarming cyberattack figures: 4.7M reported attacks to APWG in 2022 and an overall 150% phishing attacks increase since 2019. Pretty disturbing, isn't it?

According to Javier, cybercriminals never stop improving their strategies, which include hidden redirects, search engine poisoning, typosquatting or URL hijacking, and more.

So, what actions can cybersecurity companies take to confront cyber fraudsters? Javier stressed the importance of reinforcing ML usage and growing multinational cooperation and mentioned some other proven tactics he and his team successfully employ to fight cybercrime.

During the Q&A session, the cybersecurity expert gave tips on how to validate a domain, avoid sharing sensitive information with a fraudster website, and discussed the latest cybercrime trends.

Leveraging Machine Learning for Web Scraping

It would be considered a bad tone in 2023 to skip the omnipresent duo of web scraping and machine learning. Andrius Kūkšta, Data Engineer at Oxylabs, was the ML ambassador this year and explained how adaptive parsing and scraper block detection heavily depend on machine learning.

Andrius started with basics: How to build a robust machine learning model? Then, he explained how ML is employed to develop reinforced fingerprinting, ML-based proxy rotation, and CAPTCHA solving.

Later, the expert discussed how Oxylabs successfully implemented machine learning in the adaptive parser, block detection tools, and ML-based proxy management.

And here's a little teaser for you: Andrius touched upon the untapped ML potential and mentioned some key areas where ML is yet to bring profit.

Some curious questions popped up during the Q&A sessions, such as the potential of ML for product-matching across e-commerce platforms, how LLMs are used in advanced parsing techniques, and others.

Andrius Kūkšta (Data Engineer at Oxylabs)

Open-Source Technology for Extracting High-Quality Data at Scale

Browser or no browser? That is the question. If you ever felt lost in the data extraction world and its multiple choices, Glen De Cauwsemaecker, Senior Lead Crawler Engineer at OTA Insight, got your back. In his presentation, Glen guided us through the critical scraping choices everyone has to make.

And even though Glen doesn’t have a crystal ball, he tries to see into the future to answer the main question, “Can we continue extracting data without using browsers and accounts”? Well, let’s wait for the on-demand video with Glen's presentation to find a definite answer.

Web Scraping, AI, and Evolving Legal Landscapes

To spice things up, after a flow of presentations, the Head of Legal at Oxylabs, Denas Grybauskas, held a lively panel discussion with top legal experts.

Among many other topics, the panel discussed what laws will define the future of web scraping now that the hiQ Labs v. LinkedIn is over and what is important to consider regarding data scraping from the legal perspective.

Additionally, the experts raised some thought-provoking questions, like how to distinguish if the data is public and whether IP blocking measures imply that the data isn’t publicly accessible. Join the expert discussion to learn their thoughts once the on-demand videos are live.

Finally, here's a bonus from the Q&A session - the legal experts explain what you should do if you receive an official complaint letter from the website owners asking to stop scraping their data.

Denas Grybauskas (Head of Legal at Oxylabs)

Accelerating Data-on-Demand Services with Async Python and AWS

Our legal experts passed the mic to Alexander Lebedev, Software Engineer at Hotjar, who talked about all things tech.

How do you collect data on demand from tens of thousands of pages in a few minutes? Alexander’s presentation was just about that. He walked us through server and AWS matters, proxies, and how to save money while building the scraping infrastructure. He also explained what architecture to pick depending on one’s specific needs.

Alexander matched his presentation with a real-life example of the Hotjar company. He also shared some tips and tricks that have saved him quite a few times:

The easiest and safest way to extract data faster is to pay for more servers and better proxies;
It’s almost never worth dealing with anti-bots yourself, as engineering time costs more than good proxies.

In the Q&A session, Alexander shared some valuable resources that helped him improve his skills in Celery and FastAPI. So make sure you don't miss his presentation once we drop it off.

Unlocking Insights from Video Data: Challenges and Solutions

Allen O’Neill, Co-founder at SocialVoice.ai, challenged our minds with another web scraping frontier to be unlocked: video data.

According to Allen, video data is the new goldmine of valuable insights, such as pricing, availability, and sentiment, to name a few. The statistics further emphasize its importance: 93 % of video data is locked inside the video and not in the hashtags or descriptions under it. So, we should ask, “How do we unlock and analyze video data at scale?” Let’s find out the answer in the upcoming on-demand presentation by Allen.

Web Scraping in 2023 and Beyond

Ever dreamed of time-traveling? Juras Juršėnas, COO at Oxylabs, and other industry experts made it possible. In their discussion, the panelists took us to the future to see what the web scraping industry will look like.

The experts warned: Be prepared for the challenges. On the one hand, the lightspeed technological advancements and LLM development will make retrieving and consuming the data easier. On the other hand, the same technological advancements will contribute to making the data more locked down and hardly accessible.

Is moving the content behind login a real possibility we should consider? The time will tell; meanwhile, you can check what our experts predict in the upcoming on-demand videos.

Juras Juršėnas (COO at Oxylabs)

And this is a wrap. Stay put, and don’t worry about missing out on the on-demand videos - if you’ve registered for the event, the content will reach you safely. If you haven’t registered, wait for the updates on the OxyCon page. Got questions? Contact us at events@oxylabs.io.

See you at OxyCon 2024!

Forget about complex web scraping processes

Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.

About the author

Maryia Stsiopkina

Former Senior Content Manager

Maryia Stsiopkina was a Senior Content Manager at Oxylabs. As her passion for writing was developing, she was writing either creepy detective stories or fairy tales at different points in time. Eventually, she found herself in the tech wonderland with numerous hidden corners to explore. At leisure, she does birdwatching with binoculars (some people mistake it for stalking), makes flower jewelry, and eats pickles.

Learn more about the author Maryia Stsiopkina Learn more about the author Maryia Stsiopkina

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.