Back to blog

Machine Learning: The Driving Force of Web Scraping | OxyCast #6

Iveta Vistorskyte

2022-07-133 min read
Share

A brand new episode of OxyCast is live! This time, our favorite host Augustinas Kalvis (Software Developer), and a special guest Jurijus Gorskovas (Machine Learning Engineer), delve deeper into the world of Machine Learning! 

From Jurijus' experience working with machine learning to what his "OxyBrain" team does at Oxylabs, we covered everything in the #6 episode. Watch it to understand the details of machine learning and how it can make web scraping processes more efficient. 

Started from the bottom, now we're here

Not exactly from the bottom, as the popular song implies, but to start working in the field of machine learning isn't an easy nut to crack. In this episode, Jurijus talks about his career journey. To put it shortly, it started from his excitement for the revolutionary thing – machine learning back in 2015. Jurijus created a machine learning model for his final thesis, which was capable of classifying emotions from the pictures of people's faces. 

Even though studies gave valuable experience for Jurijus, finding an entry job in the machine learning field is hard – most of the time, companies are looking for specialists who can introduce something new and teach other teams from the start. In the #6 episode of OxyCast, our special guest tells what he did to reach his current Machine Learning Engineer position at Oxylabs. 

Explaining the basics of machine learning models

"If we think about developing a machine learning model, it's similar to building a startup. You have an idea in the beginning, and you sort of understand some particular areas that you have to work on, but you have no idea if it's going to work eventually or not. At the same time, it's very uncertain, but you have high expectations from it."

– Jurijus Gorskovas, Machine Learning Engineer at Oxylabs

Augustinas and Jurijus on the set of OxyCast #6 episode

For starters, Jurijus explains in detail what a machine learning model is. In this episode, Augustinas gives a specific idea of what machine learning model he would like to develop, and Jurijus explains what needs to be done from scratch. They discuss a machine learning model for recognizing the price on the e-commerce website. All the essential definitions, such as "feature," are also explained during the episode.

"OxyBrain" – the team inspired by machine learning

As mentioned above, Jurijus works in one of the most mysterious teams at Oxylabs – "OxyBrain." No one knows what exact projects they work on, but when they have something to announce – everyone, who is even slightly interested in the web scraping industry, is excited about it. 

One of the innovations his team maintains is the universal e-commerce parser. It's a piece of software that takes HTML content and finds various fields, such as descriptions, price, titles, etc. During this episode, Jurijus explains that you must go through a vast amount of e-commerce pages to visualize their similarities. For example, if the title is in a specific position, you can either take a picture and detect where that particular field is or use XPath to understand the depth of that page. Once you have information about many e-commerce pages, you can see statistically that the structure is often repeating. Simply put, this data helps to improve a parser. 

From the #6 episode of OxyCast, you can expect more details regarding the parsing of e-commerce pages, powered up by machine learning.

Wrapping it up

If you want to understand better how machine learning can help you gather structured public information or how to try to develop a machine learning model yourself, the #6 episode of OxyCast has it covered. You'll also learn the critical aspects of building a machine learning model for recognizing prices on e-commerce pages. 

If you have any suggestions or questions for our podcast, don't hesitate to contact us at events@oxylabs.io, and we'll try our best to discuss them in OxyCast episodes. You can also check out other OxyCast episodes that cover different aspects of the web scraping world. 

About the author

Iveta Vistorskyte

Lead Content Manager

Iveta Vistorskyte is a Lead Content Manager at Oxylabs. Growing up as a writer and a challenge seeker, she decided to welcome herself to the tech-side, and instantly became interested in this field. When she is not at work, you'll probably find her just chillin' while listening to her favorite music or playing board games with friends.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested