Data Parsing: The Basic, the Easy, and the Difficult | OxyCast #3

Yelyzaveta Hayrapetyan

Last updated on

2022-03-23

2 min read

AI Summary:

OxyCast episode 3 is a podcast episode where Oxylabs Software Developers Augustinas Kalvis and Povilas Kudriavcevas dig into data parsing in web scraping. They cover how parsing difficulty varies by website protection, the role of CSS and XPath selectors, and how Machine Learning is set to automate more of the process.

Ready to receive some new web scraping insights? A fresh episode of OxyCast is available on multiple platforms!

This time, our beloved host Augustinas Kalvis (Software Developer at Oxylabs), and an expert guest, Povilas Kudriavcevas (Software Engineer at Oxylabs), will have an engaging conversation about an integral part of any web scraping activity – parsing.

Watch the latest episode of OxyCast:

As usual, the new episode is also posted on different platforms so that you can choose whichever you like most:

Now, what exactly did we discuss in episode #3? Let’s take a closer look.

Easy vs. hard parsing

In simple terms, parsing is a part of web scraping where raw data is analyzed to filter out the necessary information that can later be structured into JSON, CSV, and other data formats. And even though parsing is easy when all you have to do is parse an HTML code, the situation can get more complicated depending on the way different websites protect their information. Thus, by discussing some real-life examples, our experts conclude what makes parsing hard and suggest ways to get the data you need in the right format.

Let’s talk selectors

Those who’ve ever tried digging deeper into the process of parsing probably know about the essential role of selectors in locating and selecting the needed elements from an HTML code.

Thus, in order to give you a deeper understanding of these tools, the third episode of OxyCast focuses on providing extensive answers to the following questions:

What is a selector?
What is the difference between CSS and XPath selectors?
How to choose the right selector?
How to write a good selector?

What does the future hold for parsing?

As a finishing note, Augustinas Kalvis and Povilas Kudriavcevas briefly talk about the future of parsing. So, to heighten your excitement a little bit, here’s a sneak peek of Povilas’ thoughts on what we could expect from parsing in the coming years:

“A lot of things are impacted by Machine Learning today. The same is with the parsing field. I think that Machine Learning will start replacing more and more manual tasks, and eventually, people engaged in web scraping will be just chilling and drinking margaritas while parsing happens on its own.”

– Povilas Kudriavcevas, Software Engineer at Oxylabs

To sum up

We hope this episode will be insightful for you because apart from covering all the mentioned topics, our experts will also shine light on such things as parser failures, tests, and duties. So, get ready to acquire a lot of valuable information and implement it in your future public web scraping activities. Additionally, you can check out this practical Python data parsing tutorial.

And while we aim at exploring the most in-demand web scraping topics in our podcast, you can always propose topics or ask questions by contacting us at events@oxylabs.io. We’ll try our best to cover your ideas in future OxyCast episodes.

Forget about complex web scraping processes

Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.

About the author

Yelyzaveta Hayrapetyan

Former Senior Technical Copywriter

Yelyzaveta Hayrapetyan was a Senior Technical Copywriter at Oxylabs. After working as a writer in fashion, e-commerce, and media, she decided to switch her career path and immerse in the fascinating world of tech. And believe it or not, she absolutely loves it! On weekends, you’ll probably find Yelyzaveta enjoying a cup of matcha at a cozy coffee shop, scrolling through social media, or binge-watching investigative TV series.

Learn more about Yelyzaveta Hayrapetyan Learn more about Yelyzaveta Hayrapetyan

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.