Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network statusCareers

Machine Learning: The Driving Force of Web Scraping | OxyCast #6

Machine Learning in Web Scraping

[0:00:00.0] Jurijus: We tried to use machine learning as this holistic approach so that we are giving it some features, some information about the similarities between those domains, for example, in the whole internet, and then we're trying to use those similarities to sort of automate our tasks. 

[0:00:38.6] Augustinas: Hey there, pals! Welcome to the Oxycast, a webcast where we talk about everything web scraping related. My name is Oggy, your host, as always, and on the second side of this table, we have Jurijus Gorskovas. Is that right? 

[0:00:48.8] He is a developer at a team called "Oxybrains." They are a team that, here in Oxylabs, we have incredible respect for, mainly because they work with a technology that every developer kind of, sort of, knows about, but not completely. We are always curious about it, always. It is the technology that solves the most mystical problems that we really want to know about, but we just never get to it. Jurijus is an expert in machine learning. 

[0:01:17.6] Even saying that name out loud sounds a little bit magical to me. So Jurijus, why don't you tell me a little bit about what exactly is machine learning and how do you get to learn about machine learning in general? How did you get to be where you are today? A developer in the most magical team here at Oxylabs. 

[0:01:37.7] Jurijus: Hey, hey, everyone, thanks for having me. So, if we talk about machine learning, my journey began in 2015 just from the idea of what I heard about it. Mainly, it was something revolutionary back then with image classification and maybe object recognition from images, and that really looked magical back then. What happened, for example, in 2010 is that we finally reached the expected progress in the hardware. So that is one, and the second thing is that we were able to have and gather a lot of data from the internet. So we have the data, and we have the processing power, and this is where those machine learning algorithms came back to life, you know because actually they were developed something like in the 1860s. So it's been a while since people have known about machine learning. 

[0:02:45.4] That was something that excited me and sort of directed me to a proper education path so that I had my final thesis in machine learning and especially in image recognition. I had to do a model which would classify the emotion from the facial images of people. I had like seven different emotions that my model could recognize. When I worked back then, it was just a beta version of TensorFlow, and I managed to connect it to an Android camera and have input from there. I don't know. It was very interesting for both practical and theoretical knowledge. In theory, it is not very clear. As you said, it looks magical, it sounds magical, but it's all just math. It's weird, but it works, let's say.

[0:03:52.8] Augustinas: Seven emotions, is that right? 

[0:03:53.0] Jurijus: Yeah.

[0:03:54.0] Augustinas: What kind of emotions? 

[0:03:56.5] Jurijus: Yeah. Anger, happiness, sadness. I don't know, maybe enviousness, jealousy. Some big ones, I do not remember to be honest. 

[0:04:10.5] Augustinas: That was a long time ago? The 2010s, you mentioned? Any idea what year exactly that was? 

[0:04:16.1] Jurijus: When I graduated? 

[0:04:16.9] Augustinas: Yeah. 

[0:04:19.2] Jurijus: It was 2018.

[0:04:22.5] Augustinas: Meaning, you have been doing machine learning for four years already, or were you doing something else in the meantime? 

[0:04:27.5] Jurijus: Yeah, exactly? 

[0:04:29.6] Augustinas: I mean, between now and the time you completed your thesis. 

[0:04:31.6] Jurjus: Back then, it was quite a difficult task to find an entry position for a machine learning job. Everyone was looking for specialists who could introduce some new things to their teams and teach somebody else. So, it was really hard to find something for a person who doesn't have experience. Actually, that final thesis wasn't only required as an academic thing, but it was also very useful from the practical point of view because after my graduation when I was going to interviews or looking for machine learning jobs, I could talk about it. I had experience because it takes over half a year to make, you know, to build a model, to document properly, to write the code, and everything else. So, it really was helpful when I had to look for a job.

[0:05:33.3] But anyway, I didn't manage to find one as an entry position, so I had to sort of travel and find my own path through experience in other fields. I worked in France after my graduation as a sort of systems administrator or monitoring specialist. I used Splunk, a sort of paid version of elastic search, to collect a lot of monitoring, like a lot of logs, access logs, and security logs. 

[0:06:11.7] Augustinas: So, just to clarify, Splunk is like a data storage system?

[0:06:15.4] Jurijus: Not only. It's something like elastic search, it is able to index the data that you are passing to it, and then you use their own language to query that data. Since that data is indexed, it can offer good speed too. 

[0:06:40.2] Augustinas: How did your system administrator job help you with your machine learning knowledge? 

[0:06:50.8] Jurijus: Machine learning was always in my head. I was following the trends in the development of all the open source models.

[0:06:56.8] Augustinas: So, I'm guessing that even while working as a system administrator, you would still hope that you will find a machine learning path.

[0:07:05.2] Jurijus: You know, like I could choose a different path to go from the Sysadmin to DevOps, and it would be sort of similar level of specialty, just another tree of knowledge. But machine learning is fun - that's what separates it from the other things. It is fun when you manage to do something when you sort of put in a lot of effort and time and you are not sure what the results are going to be, and they actually work. That's an amazing feeling.

[0:07:42.7] Augustinas: Sure, but that's kind of like the passion that I have for, like, software engineering in general. Like, I love to spend days and days working on something and then finally see it do something that I meant for it to do in the first place. 

[0:07:56.7] Jurijus: Yeah, but isn't it that developer jobs sort of become repetitive after some time or maybe after a short time? 

[0:08:06.0] Augustinas: I'd say not really because you are still solving different problems every single time. I mean, the problems are a little bit related, as far as it comes to web developer jobs. You are always designing APIs, and you are always thinking about how those APIs will interact with databases, but you are reaching for a different result. You know, maybe I'm the wrong person to ask that. Because, you know, I did play League of Legends for 5000 hours in my life, and maybe I'm just used to doing repetitive tasks.

[0:08:44.5] Jurijus: I think I spend over a hundred days in it too.

[0:08:48.5] Augustinas: Anyways, continuing the topic of machine learning, let's now get to do it. So Splunk, you did some system administration and decided that you still want to go for machine learning jobs. What then? 

[0:09:04.0] Jurijus: I sort of tried not to go too far from that topic because I knew if you want to get a job in it, you have to go through the interviews where they pretty much gonna ask a lot of theory, recent studies, maybe because it is a kind of recently exploded thing. So I had to follow it. And after working in France, when I came back to Lithuania, I found a job in operations, in systems operations in Nasdaq. And, funny or not, I listened to some machine learning-related videos or some information or maybe some paper reviews pretty much every day, you know, instead of music or something, as a part of multitasking.

[0:09:59.8] Augustinas: So you are highly into it, you know, listening to machine learning podcasts. Every single second, you are not really doing something that requires your full attention. Yeah, I mean, I can see why you managed to become a machine learning engineer right now. 

[0:10:15.9] Jurijus: And, you know, it always comes from the idea, and if you build something in your head, and you sort of try to go to that direction once you have an opportunity, it is more likely that you are going to use it. 

[0:10:30.9] Augustinas: The word manifestation comes to mind. Okay, so alright, you got a job at Oxylabs. Right? Or was that before we even had that particular name? 

[0:10:50.1] Jurijus: It was Oxylabs, like a year and a half ago. 

[0:11:00.4] Augustinas: So why did they decide to pick you? They probably interviewed you, and they understood that you have a big knowledge base from listening to these podcasts. That's a pitch for every single one of our listeners. I mean, try and listen to podcasts and especially our podcast. But yeah, I guess you just gathered up this huge amount of knowledge over the years. You came into the interview, they looked at you and said: "Yeah, you're cool, you have got the job." 

[0:11:26.5] Jurijus: Well, pretty much the same, but a part of the success was actually, as I mentioned, that final thesis because a project like this requires a lot of understanding, a lot of paper reading, everything. So theoretical knowledge was significant, but practical knowledge is, let's say, a common problem for people who look for entry positions or junior positions. So I had to prove that I could do something first, so I got a task to do for the interview. 

[0:12:08.1] And, you know, to be honest, I was happy where I was even when I received the task. The only intention that I had was to check if I could do it, you know. This sometimes can appear difficult when you work somewhere, and you get an opportunity to do something that maybe you would like to do, but you are sort of busy with the things that you are currently working on, and sometimes, you choose not to try. And this time, I chose to try it. I chose to check, basically, my knowledge because I know that there are people working that actually understand more than me that they need, for example, people at the moment.

[0:12:59.8] So I can use this opportunity to check my knowledge to see if I can do it at all. I spent a couple of days, maybe a week, on that task, and I passed the interview too. I had a couple of theoretical questions, but mostly it was about the task because machine learning is like experimenting. It is - try, fail, try, fail. So the task was more of trying to do it the best way and then explaining why you couldn't do it any better, for example. So, I don't know, Oxylabs were happy with the results, and they had a couple of ongoing projects in machine learning that sounded pretty interesting to me. So after I passed the interview, I gave it a second thought and chose to try to get back to the main path that I was planning before—machine learning.

[0:14:10.4] Augustinas: So, I know that there are a lot of things you can't talk about when it comes to your job because you guys are doing top secret kind of stuff. Maybe there is some wisdom that you can share. Something that you have been working with that is already, like, possible to talk about with us?

[0:14:35.3] Jurijus: For example, the e-commerce parser. 

[0:14:38.6] Augustinas: Yeah, sure, I would love to hear about that part. I'm assuming that as soon as you came here to Oxylabs, that was the very first project you worked on.

[0:14:48.1] Jurijus: Yes, one of a few.

[0:14:52.5] Augustinas: So, what exactly is that? Can you tell me about it, a universal e-commerce project? 

[0:14:57.7] Jurijus: So, the point of the team overall was to get the knowledge in machine learning first, then apply that knowledge in everyday, let's say, situations, everyday problems. So once you sort of understand the potential, you start to look for the spots where you can use it to either save costs or either save time, which is also money.

[0:15:27.7] So, first, we targeted some particular points, where for example, you as a developer spend a lot of time. Or maybe not a lot of time, but spend some time reacting to some failures over the parser because every domain that you are trying to parse, I mean, is similar in a sense. For example, by the categories - but they are still different by the structure of the web. You know it better. 

[0:16:03.7] So, we tried to use machine learning as this holistic approach so that we are giving it some features, some information about the similarities between those domains, for example, in the whole internet, and then we're trying to use those similarities to sort of automate our tasks. So, instead of the developer going to look for the problem and see why a particular field or particular information didn't pass, first, we use a machine learning algorithm, which, let's say, solves 80% of our problems, I mean, that would be ideal, I guess. And the 20% still remains for the developers, for the manual job, but we reduce that by four-fifths. So that was one of the goals, just to apply this as a tool, you know, because machine learning, AI—these buzzwords are always somewhere there.

[0:17:10.5] Augustinas: I still feel like we haven't really tackled what exactly is the universal e-commerce project. From my understanding, it's a parser, so, say, a piece of software that takes in just HTML content and finds a few fields in that content that are like descriptions. Maybe the price, maybe something else. And I'm struggling to understand, like, if I wanted to make something similar at home, how would I start moving towards that direction?

[0:17:52.5] Jurijus: Yeah, so basically, that universal e-commerce parser is able to parse the most generic and major fields that you are trying to parse from a page. So, in this case, since it is e-commerce related, it is able to parse the main fields of the product items that are most of the time on sale on the internet. So it's like the title of those items, main pictures, description, maybe sizes or, like, any meta information that you can get. So how to do it is you need data, you need to, basically, go through a lot of different e-commerce pages and see, like, actually visualize, those similarities that you see.

[0:18:49.2] So, for example, if the title is in the appropriate position on the web—you can either take a picture and detect where that specific field is, or you can use Xpath to sort of realize the depth of that item. And then, once you have a lot of those pages in the dataset, let's say you see statistically, for example, that over 70 % of titles were always on top of the picture.

[0:19:17.6] So when your model later receives a new webpage, a new code, at first, it looks for the title, and it sees it in the right position, which means most of the time that it's gonna be a success. If it doesn't, then it goes for the other features. For example: are there enough images, or is there one big image or three smaller images? Is the price in bold, for example, or is it in italic? Or is it the old price, which most of the time is crossed by a red line or something like that. So, you sort of collect all those features, and when you kind of stack them up, it gives you that probability, that success probability. And the more data you have, the better. The more features you sort of recognize in the data, the better. Did that explain something?

[0:20:19.3] Augustinas: Yeah. How do you describe a feature? So, okay. I still don't quite understand how exactly a machine learning algorithm would find if we were talking about the title. For example, let's say that you were trying to describe a feature that tells the machine learning algorithm whether or not there is text above the image. So how is that, like, done? Do you think that it would be possible to just, off the top of your head, throw a code example or something? So, do you use Xpaths, or…

[0:21:05.3] Jurijus: Yeah, exactly. I think I mentioned that to recognize the depth of an item within the code - you use Xpath. Sometimes we use metadata, Javascript code as well.

[0:21:17.7] Augustinas: So, would it be correct to say that the way you would do something like this is - first you find all the elements that are header one texts or… 

[0:21:31.1] Jurijus: Yeah, it was one of the features I think that most, like, in sort of 70% or maybe more cases of the data that we had, for example, for the training, we noticed that the title of the item was either header one or header two. So, this was one of the features that if the website contains the headers, there is a chance that it's a product item, for example. Let's say it's an e-commerce page. 

[0:22:01.9] Augustinas: And then once you have like a feature, does the feature itself, basically, only tells you whether or not there is header one text inside an HTML piece of content, or is the result a boolean, or is the result a number? What exactly is the result of that feature function?

[0:22:27.8] Jurijus: Most of the time, it's just, I would imagine it as a tree. You build features from the top, and you have a lot of knowledge, for example, of the tree. Each separation in two is your feature. And first, for example, when you build those features, the model is able to evaluate which feature is more important than the other, which feature brings more difference to your model and doesn't overfit it, for example. 

[0:22:59.7] So, when you have that list of features, the first feature of the tree will be the most deciding, for example, what on the e-commerce page appears all the time - I would say that it is the title, at least one picture of an item because most of the time we want to see what we are buying, and the price. There is no price—you cannot buy it. So, if we look for the three things and we find them, then most likely, it is a product page that we want to parse. So when you have this tree, and your new HTML with your e-commerce page is received, it finds, for example, those three items that we were talking about, and it just decides it goes this way. It is one, not zero. Then it checks all the remaining features and generates a probability.

[0:23:54.8] Augustinas: Okay.

[0:23:55.6] Jurijus: And once you have, like, a significant probability from your model on the test data, for example, over 90%, then it is a working model.

[0:24:08.4] Augustinas: There are still a few things that I am struggling to understand. I'm assuming, okay, so if we are trying to create a machine learning model that parses an e-commerce webpage, is that just one model or multiple models? 

[0:24:28.0] Jurijus: It depends. It depends on the experiments. First, basically, you try to find the most holistic model for your data because, first, you have just data, you know, and you try to clean it, you try to build features, and then you have to build an algorithm, basically, that's gonna take that data and learn from it. So you are trying to get your data to the best shape to make it clean, to make it very different from each other. It is also important that, for example, if you build an e-commerce web classifier, let's say, or parser, universal parser, you try to make training data as different as possible. To make it, for example, from different countries, in different languages. You are interested in it to be different, although you want to find the similarities, you know. So that when your model later is going to receive a very different webpage, a very new webpage that it didn't see before, there is a high chance that still it's gonna work holistically, and parse your needed details instead of just, you know, passing and training it on, let's say, only some big e-commerce sellers, for example. And being able to parse data from those sellers very well but doing it very badly from the other sources, you know. 

[0:26:08.8] Augustinas: So what exactly is a machine learning model? Can you give me a very simplified version of what the whole process looks like? From an idea that I think the machine learning algorithm could work for and the result—something tangible, something that I could try out at home, maybe even. 

[0:26:33.2] Jurijus: I see, yeah. If we think about developing a machine learning model, it's similar to, I don't know, building a startup. Because you have an idea, in the beginning, you sort of understand some particular areas that you have to work on, but you have no idea if it's gonna work eventually or not. So it's a very high uncertainty but very high expectations from it as well. When you take, I mean there are publicly a lot of data to be accessed, probably on any topic that you want to apply it to.

[0:27:11.8] Augustinas: Okay, what if I give you a particular idea? I want to make a machine learning model that would recognize the price on an e-commerce website.

[0:27:28.3] Jurijus: Alright. So, your data will be the HTML code of each e-commerce page. For example, your goal is to gather a lot of different e-commerce pages and see how similar they are. It sounds weird, but it works from both sides. You have to have various data for your model, but you have to find similarities in that various data. So, for example, for the price, your signals, your identical things between those HTML codes is what? It's position, for example, the type of text in it, the type of item. You don't need the text if you look for the price. You know that you have to have some sort of a number there, you know. Then, if we look for the price, what comes after the price - it's currency. On each webpage, there is a currency, so you look for the currency item, and most of the time, you know that the price is before the currency. Or in the UK, let's say, the pound is before the number. So you just look for those two cases, and that's your signal. 

[0:28:50.3] Augustinas: Do I need to, I mean, if I were making such a model, right, would I need first to establish if there's a currency at all, or would I just, basically, try to tell my model: "please find me the currency for this webpage." And what happens if there is no currency on the current webpage? What are the failure scenarios of such a model, and what would they possibly look like?

[0:29:19.6] Jurijus: I believe that in some countries, they have web pages that don't have, for example, the currency written because they are focused only on their people. So they have just a number, you know. So in such cases, maybe not very popular or widely used webpages, it can be challenging to find that, but it's a case of probability. If it's not gonna find a currency sign on your webpage that you are giving it, it's gonna look for other features because it is not the only feature. It's gonna say that "this thing doesn't exist, but do those other things that I am looking for exist?" If most of them do, it's still gonna generate a good probability that it's a success.

[0:30:10.6] Augustinas: So, just to clarify, what exactly is a feature?

[0:30:16.1] Jurijus: A feature is exactly that similarity you look for when you have an idea. For example, if you have an idea that the price is next to a currency, you have to gather the data, clear the data, and see if it's actually true. So you actually write a text analysis code, let's say, that checks every HTML code that you have and tells you that on this page, the currency goes after the price, and on this page - the currency goes before the price. So you get the statistical probability, and you have to decide from that probability whether your feature is good or not. Because if your feature only contains prices where the currency is after the number, then it's gonna be very bad on the web pages that you're going to pass it to later where the currency is before the price. It's just going to say: "no, I had 100% training data where the currency was after the price". So if it's before the price, probably it's not good, but you know this is not the case in our lives. That's why when we have to collect the data, we have to pass data points as differently as possible.

[0:31:33.5] Augustinas: So let's say I have data for both cases. For when there is a… for all three cases. 

[0:31:40.8] Jurijus: Is it 50/50%? 

[0:31:43.4] Augustinas: Let's say 30/70, right? Just for the sake of this creation, okay? So, in fact, let's say 30/75%, no, 25/75%. So the reason I'm saying these particular numbers is that I'm currently imagining that I have a data set where I have a bunch of pages with a currency sign at the beginning. A bunch of pages that are with the currency signed at the very end. And a few web pages that I have without any currency sign at all because they are specialized for a particular country.

[0:32:21.3] So, I'm assuming from what you're telling me right now that I have to encode two features, right? One feature would be - "is there a currency sign at the beginning of a number." Okay? And that could give me a good signal that, well, maybe that is actually the price. And another feature would be - "is there a currency sign at the beginning of the number," which would be, you know, another good signal, you know, for a price. And, I guess, the third one for when there's no currency sign at all - "is it a number," and that would be another feature, I guess? 

[0:33:04.1] Jurijus: Well, it depends. If you are sure, 100% sure, that the information that you're passing, like the web page that you're passing, is e-commerce indeed, then you can do something like this and pass the pages that don't have currency signs, for example, and just look for the numbers, which makes it quite a difficult task. 

[0:33:26.0] Augustinas: Because it could be a phone number. 

[0:33:28.3] Jurijus: Yeah, yeah, exactly. That's what I'm trying to get to. So uh, for example, if you're just gonna look for the currency sign in a number you're gonna collect, I mean, it's gonna say that pretty much all the pages are valid, and product pages or e-commerce pages which contain those things. So every financial page, every, I don't know, stock market page - they all have currencies, they all have numbers. So everything is going to be fine, like, for these three particular features, but if you have 35 features, they all sort of merge and generate a single probability, then you can be more sure than just based on these three, you know. 

[0:34:13.7] Augustinas: So, features… Are they booleans? Are they numbers? Are they a function, a single function that returns some kind of value? What exactly are they, and how do they look in code?

[0:34:29.4] Jurijus: Yeah, well, ideally, for the ML model, you want to have a boolean data point. A boolean feature, for example, does this - write yes or no, and the best thing to do is to get rid of those which are 50/50. For example, if we give an example we had before with the currency sign. If half of our data points have currency signed before the number and half after, this feature doesn't give us anything - it's just the 50/50 is, you know, it's like if I would just give a random probability. Just 50/50% — take this or this. So this feature is bad, and the same thing goes for the feature - "if it's 99% in the one point and one percent in the other", then it means that this feature is very biased to this particular signal that you're checking. So you have to find something like you mentioned 70/30, or 80/20, which gives you a very significant statistical difference between success and failure, basically. 

[0:35:42.6] Augustinas: Okay, so having a feature that doesn't have a 50/50 success rate isn't exactly the wrong word for it?

[0:35:57.2] Jurijus: Yeah, it's just it doesn't give any result if you would somehow be able to print all the, let's say, importance of the features for your model, so that feature that has 50/50 data most of the time it doesn't give you any difference. If you're gonna take it out, it's just gonna give you the same result.

[0:36:20.5] Augustinas: Right, there was another question that I had in mind. You know, like a feature that would only tell me whether or not there are numbers on a web page. Obviously, it's not as strong of an indicator as having a currency symbol, you know, right next to a number. So, can I manually specify maybe the weight of a feature?

[0:36:40.6] Jurjius: You need all of them like you have those holistic features which only say the position, for example, of the price on the web page - is it visual, or is it Xpath related according to the depth? It can be both as well. So you look for these holistic features that, for example, I don't know, maybe it's a bad example, but when you give a kid a picture and ask what he sees — he's gonna notice something and miss something else. So first, you use this very holistic approach as if you are just trying to recognize the scene, you know, of the web page. And this gives you a position, for example, of an item. And the next feature that you have, you're checking if it's a number, if that item is a number. If there is a currency next to that item. If there is a title next to the price, let's say. Is it far from the price. Are there pictures somewhere next to this data as well. And when these are packed together, they bring the difference, you know. 

[0:37:50.0] Once you have the features that you assume are going to bring you some use later, you have to check them. You have to, basically, prove that they're useful. So when you have the data, and you have the features, you have to check it on some very generic, very, I don't know, default model, let's say XG boost. Basically, a model gives you a probability. It gives you a probability if your result is a classification, if it's a positive or negative result, if it's some, I don't know, a prediction, let's say. It also gives you a percent of your predicted value, basically, if it's true or not. So when you have your features, you check them against some default model, and when you get the best results possible on the default model, you are also able to fine-tune that model. Is there a list of, let's say, hyperparameters that those models or those algorithms consist of? And you can, to some point, you can tweak them a little bit to get a little bit better result on your data, for example. So that's something that you do in the very end when you are sure that the features that remain in your model are bringing you use or bringing you difference. 

[0:39:30.4] Augustinas: So, okay, let's say that I do have a complete model, right? The model checks against the features to see if it's predicting the right things, right? And that's how, like, at the end of the day, after encoding a bunch of features that are highly likely or highly unlikely to give me an indication of whether or not there's a price field in my website. You know, after I write a bunch of these, like, features, you know, my model does, like, a bunch of hits on those particular features, it forms some decisions about the data. Afterward, it can more or less accurately guess which websites contain a price point and which websites don't contain a price point. Is that about right?

[0:40:19.2] Jurijus: Yeah.

[0:40:22.3] Augustinas: Okay, so that was difficult, honestly. Extremely. So, it makes me respect you more for what you do and, like, just exactly how much science there is in this whole thing. I think I'm gonna stick to web development. Thank you very much. 

[0:40:41.3] No, it's interesting, don't get me wrong. I've wanted to do gaming-related machine learning projects for as long as I can remember, but you don't exactly need machine learning for most of the things that you do in life. For example, I've been interested in aim bots for, you know, games like Counter-Strike, and that, you know, shooters in general, and you don't need machine learning to do, like, those kinds of tasks. It's about, you know, getting the player position and finding out where exactly the head is, and, you know, finding how to get your mouse from one point to another point. And I think where I was going with this particular thought is that you don't need machine learning to solve 90% of the problems that you have in your day-to-day life. But what is a problem that you would like to solve that is related to machine learning?

[0:41:40.7] Jurijus: I don't know, you're probably right about this statistic that in most cases it's not needed, but when you have a sort of very repetitive problem on a large scale, it can be very handy, you know. What would I like to solve with machine learning? I probably would say something naive, like, recognizing tumors or cancer, you know, in people's x-rays or something like that.

[0:42:13.3] Augustinas: I don't think that's naive at all. Like, the biggest problems that are currently solved in our lifetime right now are machine learning related. I think that the changes that we're gonna have, like, from self-driving cars, for example, are gonna be huge. And, you know, it's just gonna change our economy forever when that problem is…

[0:42:30.9] Jurijus: But don't you feel this tendency that still, like, people invest in things that are easy to commercialize, not those that are actually useful for humanity?

[0:42:42.7] Augustinas: I don't think that transportation is just about being commercial or not. I think that transportation was a problem that was, you know, necessary to solve for as long as humans have been around, you know. We invented the wheel, I don't know how many thousands of years ago, and it wasn't for commercial purposes back then. It was because, you know, humans always had the need to transport things from one place to another, but, and, you know, at some level, I think, that all problems that are currently being solved are like that, you know. Surely, there's a lot of, like, research in medical fields where, like, machine learning is applicable and, you know, surely, there's a lot of research going on, you know, capitalistic projects. Machine learning is a world-changing technology. Well, it can change the world, but right now, it's, unfortunately, used for commercial things only. 

[0:43:41.0] Jurijus: For now, it's on us to change.

[0:43:41.5] Augustinas: For now, it's absolutely. This is, in part, the reason why I think it's important to talk about machine learning in general. To teach our audience what exactly machine learning is and how does it work. And how you can use machine learning to solve the problems that you have. Or not just you, you know. Maybe think about the bigger picture. What is a problem that you can solve with machine learning that everybody has, but, you know, nobody is motivated enough to solve it at this point yet? It sounds a bit gloomy, but I'd like to think that there's a silver lining here somewhere that we're teaching, you know, our community to, you know, tackle these problems and that we ourselves will do something, you know, to help out with this kind of a huge humanitarian problem that we have in life right now.

[0:44:36.3] With that being said, I'd like to remind everyone to listen to us on Youtube, on Spotify, and on Apple Podcasts. And scrape responsibly and parse safely.

Get the latest news from data gathering world

I'm interested