[0:00:00.0] Augustinas: The way that I was imagining things that were going to be is that I'm just going to slap another load balancer between the microservices.
[0:00:09.9] Eivydas: That's a very good approach.
[0:00:11.5] Augustinas: Yeah, what other approaches do you happen to know?
[0:00:18.9] Eivydas: You could also think of asynchronous approaches, like message queues, messaging protocols, where you have some kind of broker that receives the scraped HTML data inside it. And there, the parsers consume the data from the queues and then put the result somewhere where the web server then can provide the data.
[0:00:49.5] Augustinas: Hello guys, and welcome to the OxyCast - the podcast where we talk about everything web scraping related. Today on the other side of this table, I have my colleague Eivydas. He's been here with us at oxy labs for around five years now.
[0:01:05.8] Eivydas: Yes.
[0:01:10.2] Augustinas: And that particular reason, those five years, is the reason why he's the perfect person to talk about today's topic, which is scalability. The thing about scalability is that you turn out to, like, really get to know the topic when you start doing it yourself, and from what I understand, you've been here since the very beginning, since the times when we were handling like 10 requests per second?
[0:01:32.2] Eivydas: Exactly.
[0:01:36.1] Augustinas: All right, and yeah, why don't you tell us about your journey a little bit?
[0:01:42.2] Eivydas: So when I joined the company, I joined straight directly to this web scraping-related project that we are currently still developing, and I joined in the very beginning of it. So then there were a couple of people, only a handful. So it was me as a back-end developer, it was a systems administrator that handled our servers, we had a product owner that defined what we are going to do, and we had a front-end developer. So those four, five, four people were all that it took for us to begin the project. And we developed it from the very initial stages, where we had to choose our architecture and development direction, which led us from the beginnings of tens of requests per second to hundreds and then to thousands of requests per second.
[0:02:52.6] Augustinas: How do you go from a small little service that only is supposed to scrape a website and then parse a website ten times per second into something that can do the same thing but at this incredible scale of, you know, thousands, ten thousand, hundreds of thousands per second?
[0:03:20.3] Eivydas: Well, let's not go all the way to the hundreds of thousands because that scale is reachable after many more years than just five, and we're talking about the scale of the likes of Google. I believe this company is the only one that is operating at such a high scale. So that the scalability problems that they experience are very different when we're talking and comparing about the scraping and parsing solutions that start from a very small ten of requests per second scale. And I think that it is possible for us to talk about scaling that infrastructure up to tens of thousands, not hundreds of thousands of requests per second.
[0:04:15.2] Augustinas: That already sounds pretty good to me.
[0:04:20.7] Eivydas: So, basically, sorry if I'm going to use this word a lot, but we have to understand the underlying concepts before we go into the details of the scalability. So, to begin with, scalability, as I personally understand it, is a process of getting enough resources to satisfy the needs of the clients while also having the capability to manage those resources on your back-end side. That involves both the infrastructure and the software parts.
[0:05:01.7] Augustinas: In its most basic sense, the way that I understand, scalability is to just make something bigger. It doesn't necessarily have to do anything with requests or outputs. It's just a matter of doing something more.
[0:05:17.2] Eivydas: Yes, and the scalability concept comes up when you have some process that you do and you have some initial data inputs, materials, you process them and then you have your output, your product - your frying pans, for example, or your scraping output that is a JSON that is a structured data set.
[0:05:51.7] Augustinas: Let's use that. I've had this particular example that I thought would be really good to think about scalability in general. Let's say we have a little application that's a web scraper, right? It scrapes a website I used as an example in one of the previous episodes here on the Oxycast. And the website that we are going to scrape today is potatomarketplace.com. Let's say that a particular website has lots of kinds of potatoes, right, and lots of sellers that are trying to sell them. And the idea is that we have this web scraper application that, given a link, can scrape a particular web page from potatomarketplace.com, and the result would be a JSON output that the client would just see as an HTTP response. So, I was thinking maybe we could use that particular application to think about just scalability, right? I have an application, but I can see that it only handles, let's use the previous example, for the previously mentioned ten requests per second. I believe potatomarketplace.com seems to be such a popular service that a lot of people want to use it more, and suddenly we're seeing, you know, more clients coming in, more clients wanting to use our little application for one reason or another, and now we need 50 requests per second. That's a huge jump, I know, but let's go with that for now. How do I make my application work for 50 people per second?
[0:07:44.6] Eivydas: Okay, so let's start with the requirements of your application. So when you are starting to think about scalability, you have to begin from your bottlenecks. Because scalability is making something bigger, faster, better. Well, bigger and faster, rather than better, but when scaling, you are trying to address a particular problem that is called a bottleneck and your application that can provide you with enough resources to handle ten requests per second has some kind of bottleneck that you're trying to resolve.
[0:08:34.9] Augustinas: When it comes to web scrapers, I know that they use a lot of CPU power. So we could start from that. Let's say that I went into my server I wrote htop, which is this application that would show me how much RAM usage there is and how much CPU usage my applications are using up. Let's say I'm noticing that my web scrapers are on the very top of CPU usage, and that would be a particularly strong hint for me that something's wrong with the CPU, I guess, or rather than something's wrong with it, maybe it's just not enough right? Maybe I need a better one?
[0:09:18.7] Eivydas: Yes, so it depends. Well, I have only I can only guess what your application is doing, but from your words, you scrape the website, and then you parse it. In these two processes, the scraping part should be quite CPU heavy. Not heavy. How is it the opposite of CPU intensive? It should be idle, almost, when you're scraping your websites because the scraping process involves you making HTTP requests, which involves a network, and that network takes a lot of time. And those requests take a lot of time, but they do not take a lot of CPU power to initiate or handle those connections. So your scraping part is very easy for the CPU to handle.
[0:10:26.0] Augustinas: Okay
[0:10:31.9] Eivydas: the only thing that can take a lot of CPU power on your application is the parsing part.
[0:10:32.9] Augustinas: Okay
[0:10:35.5] Eivydas: It is very CPU intensive because you have a lot of data to sift through. You have, well, depending on how heavy your potatomarket.com is, it can contain from a couple of tens of kilobytes of data to all the way to tens of megabytes of data, and you have to parse it, understand it, find the particular points in that huge amount of data that you need and extract it that extraction process is very CPU intensive. So this is the part you are actually scaling for because that is the bottleneck that imposes on you that you have a resource deficiency.
[0:11:24.3] Augustinas: How do I even, well, I obviously didn't know that scraping was the CPU light part of my application. I guess, in order to understand my bottlenecks, I would also need some kind of monitoring. I would need to be able to analyze the parts that are actually heavy, I guess, in my application. Would it be correct to say that before even thinking about scaling, I should be able to monitor my particular application?
[0:12:03.3] Eivydas: It depends, again, because you had a tool that you use to monitor your CPU usage - the htop. It already qualifies to be as a monitoring tool. It is not a particularly well-chosen tool for this particular case because you have to do it manually. You do not have any notification that would inform you that your CPU load is going to the ceiling, and you have to do it manually. So this manual process detaches you from the real-time situation that might be happening when you are asleep, and that is not, would not be enough if you want to provide a good service for the clients. For that, yes, for the automated responses, for the automated notification and alerts, and the morning call that you are so happy to take would require additional tools.
[0:13:13.5] Augustinas: Any tools that come to your mind? Something basic enough that would suit, like, my particular small use case where I'm only getting started. I have this little service - it's just doing 10 requests per second, tops.
[0:13:29.6] Eivydas: There is a particular specialized tool for that. It is quite simple to set up, and it scrapes, well, it again scrapes, but it scrapes the data from your machine. It handles and monitors your CPU usage, your hard drive disk space, network load and other system component resources - it is called a Prometheus. Combined with a node exporter and alert manager, those are all in the ecosystem of Prometheus. You can ask the alert manager to send you an email, for example.
[0:14:13.5] Augustinas: Cool!
[0:14:14.4] Eivydas: So that would be enough for your use case, and it scales very well up to quite a large scale, and we are using it in our production as well.
[0:14:30.3] Augustinas: So it's not just for small applications - it works for big ones as well?
[0:14:33.8] Eivydas: Yes. There is a limit to how much data one Prometheus instance can handle, but it can also be federated so that it could comprise multiple servers, and that way, you can scale even more. But your suggestion that monitoring would be the first thing that you would need before scaling is a good one but not necessarily the requirement for scaling. Because in the first instance, you had only one process, only one machine that you are currently monitoring, which you can do by setting up some monitor on your screen, on your wall in the office and then react to the changes in the graphs there and htop would be enough for that use case. But once you start scaling, meaning that you are putting more resources into this, you might be able to manage all the amount of work and the resource management part only through automated tools
[0:15:53.4] Augustinas: Okay, well, before we get deep into the monitoring parts of this particular conversation, I want to go back a little bit, and I want to think once again about how we actually scale our application in this particular sense. Let's say that, okay, we've established that the CPU is the bottleneck. You would say that the easiest way to scale an application at this point would be to just add a better CPU. Right?
[0:16:20.6] Eivydas: That would be a very respectable choice, and it is one of a few possible choices. The first one, being a better CPU, means that you can handle more CPU-intensive tasks on the same process. The other solution would be to spawn more processes, more applications that would divide the work between themselves and would make use of multiple CPU cores or even multiple CPUs on multiple machines. And this concept of whether you scale one resource to be better, more capable versus when you have the same resource that is multiplied multiple times is called - vertical versus horizontal scaling.
[0:17:23.4] Augustinas: Where vertical is just adding more resources into a machine, and horizontal scaling is spawning new processes. Right?
[0:17:32.2] Eivydas: Correct. So when you are vertically scaling, you are trying to do more work on the same resource that is limited to one particular use case. In this instance, you have a single process that can be single-threaded or multi-threaded. So if you have a single-threaded application, you would have to get a better CPU that has a better single-threaded performance. If your application is multi-threaded so, in that case, you have an option of getting a better CPU that handles single-threaded loads better so your thread count does not increase, but your capacity of a single thread increases or you get a CPU that has more cores. So you can have more threads. This is still a vertical approach because you have a constraint of one particular process, and you give access to more resources to that single process. In the horizon approach, you would have to make more processes. So more processes do not necessarily require more CPUs because you can have a single-threaded application that uses a single core, but you can scale the number of applications so that they can make use of multiple cores. You're scaling the application horizontally, but you're scaling the machine vertically still. So you are still confined to a single machine. That concept can go beyond a single machine and go to the multiple machine parts, where you have one application per machine or multiple applications per machine, but you have multiple machines. This is easier to do, in a sense, when you need a lot of resources - the horizontal scaling because you can understand, in this day and age, that we have physical limitations on how many cores we can have in a single machine.
[0:20:05.5] Augustinas: Those Ryzen CPUs sure are expensive.
[0:20:07.6] Eivydas: Exactly, and there is a limit, hard limit of how many cores we have. So when you have a simple application, and you have very small loads that you want to scale, it makes sense to scale your virtual CPU, that has one to two cores up to four, eight, sixteen thirty-two, sixty-four cores would still make sense economically for you.
[0:20:37.7] Augustinas: That's assuming I'm running my service on a VPS or something?
[0:20:44.0] Eivydas: Yes, if you put your service on a hardware service, so you would have to balance the change of the service with how many benefits you get from that new server because the setup fees are a thing, and the server cost is a bit higher for hardware service.
[0:21:06.9] Augustinas: Now, when it comes to scalability, it's definitely about a price-to-time ratio. You always need to consider how much money is it going to cost you to scale your particular service because nothing is free, really.
[0:21:23.3] Eivydas: That is the reason why I said that when you have low initial requirements, and you are scaling to a moderate level, well, at least in the scope of one service or one machine, it makes sense that putting more money into a scaling one machine vertically makes more sense rather than you having to redesign your application to be able to handle the horizontal loads.
[0:21:51.0] Augustinas: It's a lot of work, right?
[0:21:53.4] Eivydas: It is a lot of work, and that work comes from the developers, whose time is usually much more expensive than a monthly build difference between a two-core CPU, virtual CPU machine and 32-core CPU.
[0:22:09.4] Augustinas: Right, how many more requests do you think we would get from a better CPU for this particular imaginary application of ours?
[0:22:21.2] Eivydas: In this case, when we have a parsing task that is the CPU bottleneck I would, I would guess that we can scale linearly with the amount of cores that we have.
[0:22:41.5] Augustinas: You mentioned that we can also have single-threaded applications for our parsing, right? So I can just basically tell my single-threaded application to run on one CPU core, right?
[0:22:58.3] Eivydas: You have no other choice.
[0:23:00.4] Augustinas: Okay, how do I tell my server to route my requests to one of those particular applications because usually, right, the default is that, let's say, I'm very new to web application development, right? I spawned a new application that's just listening on port 8000. Right? I go to my router, and I create something called port forwarding, which means that any traffic that comes through port 80 is going to be routed to my particular server at port 8000. So that's just one, you know, little route of how the internet can reach my particular service. How do I make it so that my requests from the outside would be able to reach any one of those, like, server instances, application instances, for that?
[0:24:00.2] Eivydas: So you are talking about horizontal scaling already?
[0:24:04.0] Augustinas: Because we already went there, so we might as well, you know, get that out of the way?
[0:24:09.2] Eivydas: So when you're horizontally scaling, you have multiple processes, and those processes, if we assume that those processes are HTTP servers as well, they.
[0:24:23.2] Augustinas: It's the same application.
[0:24:24.4] Eivydas: It's the same application, okay. So they receive the IR requests through a single port on that machine because the operations systems, operating systems, those limit the application's access to them. They require that the applications have only a single or more one or more ports open, and the request has to reach that port so that part is limited to one single application - it is exclusive, and you said that you have a router that points to that particular port on your machine. So that application which listens on that port receives that request, and if you have multiple applications, they have to use different ports to work. Otherwise, the operating system does not allow them to use the port. There are two different approaches here - the first one, and the easier one, at least in the beginning, is to have someone who listens on a single port but is capable of routing those requests to other ports. Those applications are called load balancers, and one of them is called HAProxy. So it listens on that same 8 000 port, takes the request, and it knows what ports are available for it.
[0:26:12.8] Augustinas: And what ports are the services running on?
[0:26:14.6] Eivydas: Yes, which parts the services are running and then routes that request to that particular port in a round-robin or a randomized way. So round robin is you go one by one and then start from the beginning again.
[0:26:39.4] Augustinas: Just to make sure, my traffic will go to the router, first of all, then the router would redirect all of the traffic to HAProxy, for example, and the proxy would just route the traffic to one of my applications, right?
[0:27:00.2] Eivydas: Yes, it would distribute the load.
[0:27:01.0] Augustinas: Okay, and the application HAproxy would understand that it needs to route the responses back to the client.
[0:27:08.4] Eivydas: Yes, because the connection that comes through the router to the application directly is still open, and the router knows how to return the response, and the same is applicable to the HAProxy as well.
[0:27:24.2] Augustinas: So I think that this particular change in how we're, like, running our application, it's not just one application now, it's 10. This should be like a huge jump in how many requests per second we can handle. Right? Not linear anymore. It's probably going to be times 10. Would that be realistic?
[0:27:45.6] Eivydas: It would have to be linear. Well, it is definitely not times ten because I do not know how many applications you are going to run additionally.
[0:27:59.5] Augustinas: So let's say we just went from one application to.
[10 0:28:01.1] Eivydas: Yes, so if the scaling is linear, you should expect a 10x improvement. It all boils down to the bottleneck that you are currently solving, and the CPU and the parsing, in particular, usually tend to very easily scale linearly with the amount of resources available on the CPU. If you increase the CPU capability by ten times, it is likely that you're going to see a ten times improvement as well.
[0:28:41.5] Augustinas: Okay, so this, for most use cases, this like a little jump in how many requests per second we can handle would already be enough for a lot of applications out there. If I was developing a web scraper today, going from ten to a hundred is huge, really. But I want to go even further. Is it even possible without, like, simply just running more applications or is that the best we can do as far as scalability goes?
[0:29:15.8] Eivydas: This approach can be scaled even more because, well, we have to start in this current state where we have one single machine, which can house ten of your applications. Let's say that you have 10b cores, and you have all the CPU used by the parsers, and you have your router that points to that single machine that contains the load balancer, which distributes the load to those ten applications.
[0:29:49.7] Augustinas: What if we had a load balancer before even that machine that would distribute the traffic to different machines.
[0:29:54.6] Eivydas: Exactly.
[0:29:57.5] Augustinas: Which would then have load balancers inside those machines that would distribute the traffic into various applications running on one.
[0:30:02.8] Eivydas: Now you're thinking.
[0:30:05.3] Augustinas: Now that's the big meat of scalability, I suppose, that's where the whole thing is at. It's that, like, we can use this particular strategy as much as we want, really?
[0:30:18.0] Eivydas: Yes.
[0:30:18.7] Augustinas: For as long as we have the resources to do it, we can just, basically, keep scaling that way.
[0:30:22.6] Eivydas: Exactly, and that particular point that you made - "as long as we have enough resources", stands because currently, we had a bottleneck in CPU power, and scaling the CPU count and application count and then scaling the machines, their count, it also scales linearly the performance gains. But now we have to start thinking about the other components of our resources that we have.
[0:30:55.2] Augustinas: Okay.
[0:30:57.7] Eivydas: Because we only talked about the CPU, but we have other resources, like hard disk drive space, the amount of read and write it can contain, the memory constraints, and the network throughput.
[0:31:13.0] Augustinas: The most realistic scenario right now sounds like network throughput for our particular application.
[0:31:22.7] Eivydas: It, most likely, is going to be the next bottleneck. So when we are starting to scale those multiple machines, each of those machines contains a load balancer so we can be sure that we have enough resources to handle the requests that come to that machine. Once we have an additional load balancer before multiple machines instead of applications, we now have to look at that load balancer's resources. So if we assume that the request processing has very low CPU requirements, it is possible that one load balancer is capable of managing a lot of machines from the CPU standpoint, but the network use can be limited. The bottleneck is going to be there because, well, at the beginning, we established that your potatomarket.com is going to have some data that you have to scrape. And it can range from a couple of kilobytes to tens of megabytes, and that is enough when you scale enough of your scraping operations per second to saturate a network interface on that load balancer server. Not necessarily the machine that houses your applications. Because a hundred requests per second is not really a lot, and a hundred requests per second, and if you have one megabyte per scrape, and if we assume that one scrape takes one second, you are looking at 100 megabytes per second.
[0:33:24.7] Augustinas: Yeah, but that's the thing a lot of, well, at the very least here in Lithuania, a lot of these standard internet connections are 100 MB per second. That's megabits, not megabytes, so we would probably already keep that out.
[0:33:41.9] Eivydas: Exactly, exactly. You are now thinking correctly about your resources and your requirements. You have multiple bottlenecks ever present in your system design. So the CPU is usually the application that is very heavily processing something. You have inputted output requirements. Your network is going to be the main link that connects your devices and your applications with everything else. Your internal network capacity could be much higher than what you have when connecting to the external internet. So for this example, right now, if you had a hundred megabit download and upload to the external network, your router internally should still be able to process one gigabit and, in some cases, maybe even ten gigabits. Well, not necessarily, but usually, our routers in this day and age are strong enough for one gigabit.
[0:34:59.5] Augustinas: That's not necessarily true.
[0:35:01.4] Eivydas: That is not necessarily true, but your internal network where you can have multiple machines and if you had machines capable of hundred megabits each, and we have an external link of hundred megabits. So you would have a match of your resources between your available resources and what you have available on your machine. So, you have a bottleneck in both places that are matched. If you wanted to scale any of those, you would have to scale the other part as well.
[0:35:44.5] Augustinas: Okay, so a hundred requests per second, I had a hundred megabit connection. I'm gonna probably need a better one at this point.
[0:35:53.5] Eivydas: Yes, so this is, again, this is scaling vertically. You have to have enough resources in one of your points for it to handle the load.
[0:36:08.9] Augustinas: How do you horizontally scale a network connection?
[0:36:12.0] Eivydas: You get more lines.
[0:36:13.2] Augustinas: How does that work?
[0:36:16.4] Eivydas: If you're talking residential, that means your house's internet. You can ask for the provider to get you multiple lines, or you can ask for multiple providers to provide you lines, and then each of those lines would go into their separate router or a router that is capable of managing multiple incoming external networks. Those are more expensive than getting more multiple routers, and those then watch out for your machines. But I can anticipate your next question, which is, so, how would the clients reach your machines through multiple lines because each of those lines would have a different IP address?
[0:37:12.4] Augustinas: I am going to guess the answer is somehow related to DNS?
[0:37:16.7] Eivydas: It would be related to DNS and/or load balancing. So the internet and all of those services on the internet is, basically, a very large load-balanced network. So each of the entry points, anywhere, is likely to have something underneath it, and, in this particular case, we can use something called DNS. I'm sure you know what DNS stands for - domain name system, which resolves your hostname into a single or multiple IP addresses. So using those DNS address services, you can use it as a kind of load balancer. DNS load balancing makes sure that a client, when it connects to a hostname or gets an IP address to use, so the client chooses which IP address it uses, or the DNS sends him a single IP address from the multiple that is available. An alternative would be to use a load balancer that is outside of your residential area, outside of your providers and then it would load balance it to your lines like your router does to your applications, and for example, Cloudflare provides such a service.
[0:39:03.6] Augustinas: Okay.
[0:39:04.6] Eivydas: So you can point, you could have a single hostname, and the Cloudflare would have your lines separate, just like HAproxy does for your applications, and then the requests would be routed to your applications. Through that network of your load balancer in Cloudflare, through those multiple lines, each of those having their own router, which loads balances to your single machine, which has another load balancer, which then points to the applications.
[0:39:47.7] Augustinas: Right.
[0:39:48.7] Eivydas: And that approach is applicable all the way to the top. Well, Cloudflare load balancers are usually enough of the layer stacking for most of the use cases because through that cloud provider's load balancing solutions, you can use hundreds of gigabytes per second. I would suppose, given enough money for them, but the solution is always some kind of load balancing.
[0:40:25.6] Augustinas: It's just that while you've been telling me this, you know, the whole approach, I've been thinking that this is a really good experiment for people wanting to get into multi-cloud solutions. If you ever wanted to, you know, make a multi-cloud or learn how multi-cloud solutions are managed, really, maybe you should get yourself a cheap internet connection at home and see.
[0:40:50.0] Eivydas: Yes, that is very good, that's a good idea. But I would have to backtrack a little bit and put a disclaimer here because this approach of multiple load balancing solutions is for us to scale a particular bottleneck, which is a network capacity. Which is a possibility to route to multiple points underneath the load balancer, and we can use it only because we have a pretty simple application.
[0:41:28.0] Augustinas: Okay.
[0:41:28.7] Eivydas: It only requires an incoming signal. It does whatever it has to do, meaning scrape and parse, and then returns the data through those links back to the client.
[0:41:41.4] Augustinas: Would you say it's time to make our application a little bit more difficult?
[0:41:45.3] Eivydas: It could be possible to make sure that your scaling efforts are more efficient. So, in this particular case, as I said, we have two very different processes that have different CPU requirements. Well, in this case, we can add the network requirements as well because we started talking about that. We have scraping, and we have parsing. Scraping is CPU light and network intensive. On the other hand, parsing is CPU-intensive and usually not very network intensive. But as long as we have those right next to each other, we can ignore network requirements for the parser for the time being, and it only matters that we have a CPU-heavy task and CPU light task. So if we separated those two parts, we could look at requiring only a small number of scrapers and many numbers of parsers to satisfy our needs. Because if we were to separate those two parts - the scraping part and the application, it could possibly handle much more requests per second than it does when combined with parsing.
[0:43:27.8] Augustinas: So, the point is that parsers are very CPU intensive. They can only handle one particular instance of that application, can only handle ten HTML files or ten HTML inputs per second.
[0:43:44.1] Eivydas: Yes.
[0:43:47.0] Augustinas: It makes sense to split them because you can then horizontally scale the parsers, right? And then you know one scraper that can handle a hundred requests per second can just split its work to the parser?
[0:44:03.9] Eivydas: Yes. So we have, so by doing this, we observe our bottleneck and then scale the bottleneck to handle more requests per second and if we are to split those two parts, we have two different bottlenecks. The scraper and parser together have a bottleneck in the parsing part, which is ten requests per second. If we separate them, we have a parser that still does ten requests per second, and we have a scraping part that currently has no such limit of CPU requests, CPU power. And, as we said that handling HTTP requests is CPU lite, we can cram a lot of requests through that one single scraper instance. So, for example, it would not be an exaggeration as you said to handle hundred requests per second through the scraper. So you would be able to have one scraper and ten instances of parsers doing the same job just like if you had ten instances of the combined effort, but you would not be required to have those additional nine scraper instances.
[0:45:38.6] Augustinas: So this is more about utilizing your resources rather than scaling the application? Or, at the very least, that's what it sounds to me.
[0:45:48.1] Eivydas: Yes, because once you're getting to the much larger scale, well, at least, at the scale where you are starting to get price conscious, and your economics start to matter a lot more, you are looking into optimizing your processes and this optimization where you scale only the particular bottleneck to match your requirements is the more efficient approach.
[0:46:17.1] Augustinas: Okay, one question that I've had while you were explaining this thing, this concept, was that how are these separated, well, now separated parts, going to communicate? You know, are they also gonna do this just like HTTP requests and responses? But you know, if that was the case, then suddenly, you're just, I'm thinking, is that the communication is also going to be a CPU intensive part or it could be?
[0:46:52.7] Eivydas: The communication is not CPU intensive. If we were to compare the low balancing, which only does the communication part, it can handle thousands and tens of thousands of requests per second. So internal communication is not the issue here. It is not going to provide enough of communication and CPU overhead for us to bother, including in our calculations. What is going to pose a bigger problem is that, at first, we had a single application, and now we have two applications that have to communicate between themselves. So this is a difference between a monolith and a multi-service approach, and those two approaches have very different advantages and disadvantages. This splitting of your monolith should not be taken lightly because by introducing a multi-service architecture, yes, of course, you have the ability to deploy, test, scale and develop your applications and microservices separately from each other, but you have to start thinking about communication between the services, how you are going to manage the state between them. Because, at first, one of them can have a singular state in the memory, and now we have different states between the microservices. You have to think about deploying them, about versioning, about having multiple services where it has been one only. But the advantages are also heavily skewed into microservices once you reach a particular scale.
[0:49:14.7] Augustinas: When it comes to scalability, we're optimizing for resource consumption, usually when we are splitting services into, like, microservices. Right?
[0:49:24.0] Eivydas: Yes. Because additionally, when you have a monolithic approach, it is quite simple for us to talk in the sense when we have only two parts, but with time it is very likely that your application is going to grow in feature set. So instead of two parts, you're going to have three, five, ten, twenty parts that do different things, have different resource requirements and different scaling requirements.
[0:50:00.8] Augustinas: Is that how you should always look at microservices? Like, basically, the only thing that you should be thinking about is resource consumption that you're always just scaling or optimizing for bottlenecks?
[0:50:13.2] Eivydas: So, at the very beginning, I said that there are a couple of guidelines, splits that we have to look at. The first one is your infrastructure, and this resource allocation is the infrastructure part. The other is the software part, and managing software is also a part of the discussion between which is better, the microservices or the monolith in. In this case, the bigger the monolith, the bigger the coupling of the software parts inside it. It might be very difficult to develop a singular monolithic application once it gets to a certain size, and then you should ask yourself whether the split and multi-service approach starts to make sense for you. Because, well, multiple development teams could handle multiple services, but those multiple teams working on a single monolith could pose some additional challenges as well.
[0:51:44.0] Augustinas: Okay, so I want to backtrack a little bit. My question was how should, like, these microservices communicate with each other? The way that I was imagining things that were going to be is that I'm just going to slap another load balancer between the microservices.
[0:51:50.0] Eivydas: That's a very good approach.
[0:52:03.3] Augustinas: Yeah, what other approaches do you happen to know?
[0:52:05.6] Eivydas: You could also think of asynchronous approaches, like message queues, messaging protocols, where you have some kind of broker that receives the scraped HTML data inside it. And there, the parsers consume the data from the queues and then put the result somewhere where the web server then can provide the data for the client.
[0:52:40.7] Augustinas: How would that look like then? So my service would put some, sort of, like, message into a queue that would, the scraper, would put HTML content into the queue. The parser would then take that message, it would parse the whole thing, it would spit out a JSON output. Does that mean that I would need a third microservice in order to, like, to keep the connection alive between the person that requested the website to be scraped in the first place and then take the result from the parsed result queue and then just return it into? That sounds difficult.
[0:53:29.3] Eivydas: Yes, so that is one part of the decision of how you are going to split your application.
[0:53:38.7] Augustinas: No, but at the point where you're already introducing some, sort of, like, service, you already need some service that would just basically regulate everything, right? Because, you know, scrapers are supposed to be just doing the scraping part of the whole thing, and then parsers are supposed to be parsing the website. How do I make sure that, well, at the point where I already have, like microservices, how do I make sure that my client gets the end result? Where do I put my entry point into the whole application?
[0:54:15.1] Eivydas: You would have to redesign your approach to this, in the beginning, a simple solution. You would have, as you say, have to introduce a lot of parts in between, and that decision would not come lightly. Because, as we currently say, talk and discussion of introducing microservices into your architecture require you to rethink your approach. But it, additionally, provides you with additional features because with that asynchronous approach and additional layers of who gets what job when, you have a separation of concerns, and you have different time frames when the results happen. You can start implementing delayed queues. You can start providing the data after the fact because you are no longer limited to a synchronized single request data and time frame. Those features come with additional architectural improvements. As long as your application is simple, I would suggest you stick to the monolithic approach. Otherwise, you have to think about microservices.
[0:55:53.4.] Augustinas: Eivydai, thank you so much for coming. This is an incredibly broad topic, it sounds to me like we could keep going and just talk about this whole thing for hours and hours, and that's a lot to unpack, and, you know, just today, I've learned so much. Guys, if you're watching us on YouTube, feel free to leave any comments. I'm sure you have many questions. As always, I would like to thank you for being with us here, and I would like to remind you that you can also find on Spotify, YouTube, SoundCloud, Apple podcasts. I'm pretty sure I'm not forgetting anything this time, and, guys, remember - scrape responsibly and parse safely!
Get the latest news from data gathering world
Scale up your business with Oxylabs®