On the first day of OxyCon, Rimgaudas Mazgelis, head of Research Department at Oxylabs, introduced how scraping trends vary between different verticals and the overall sustainability of data center proxies.
Why are Trends Important for Businesses?
There are many reasons for a business to be interested in scraping trends. We selected a couple highlights:
- Market trends are crucial knowledge for any business. For example, there is a trend that every year during the holiday period (Thanksgiving, Black Friday, Christmas, and so on), growth in traffic will happen. This means that companies have to plan their time and resources accordingly.
- Businesses need to know the trends of their source websites. Knowing when their data source will be having a sale, an update, or anything like it, requires to strategically shuffle around both infrastructure and human resources (increasing them if necessary).
- Advertisement businesses have to know trends to do their job effectively. Why would anyone want their ad on an unpopular website? If you have the advantage of knowing trends, you can place your targeted ads on the most relevant sites of the time.
Data Research at Oxylabs
It all began 4 years ago, in 2015. Oxylabs data analyst team started out by looking for basic patterns and how websites respond to scraping (i.e., how websites can tell that they are being accessed by bots). It was all done by using Microsoft Excel – surprising, right? This data was then filtered through to remove all unnecessary information and shared with clients without looking for any trends or insights.
Later on, the team started analyzing the data with the same Excel and some other simple tools, found some tendencies, adapted formulas, and kept on working.
And here we are, 4 years later, in 2019 – we now have inner dashboards at Oxylabs. There we can get trends calculated using statistical methods on R, Python by putting in the client, data source, or date interval.
A couple of months ago, an interview with Oxylabs Data Research team was released on this blog, so give it a read for more insights into how everything changed.
Oxylabs Insights on Scraping Trends
Now we know the reason why knowing the scraping trends is important, so let’s get down to some of them Oxylabs Data Research Department noticed:
- Summer is a slow season for scraping. Why? There are many articles on this topic on the web, but the main idea is that in the summertime people slow down with their purchases. What’s interesting is that based on a report of SumAll, online shopping numbers drop 30 percent between December and July.
- Cybersecurity was the most stable vertical in Q3. With all that’s happening in the world with leaks and hacks, cybersecurity stayed the most stable business vertical out of them all. No significant drops in requests, no major growth either. Although looking at trends Oxylabs Research Department can say that we will see growth in this vertical this fall.
- September is bringing back the request growth. As mentioned above, the fall season is favorite for all businesses as all the celebrations roll back in: Halloween, Thanksgiving, Black Friday sales, Singles Day (happens in most Asian countries on 11-11), Christmas and more.
Q4 is the quarter which requires the most resources from every business, so be prepared.
- Average request size has grown immensely. Oxylabs client circle is ever-growing and so is the average request size. Businesses are investing more and more time, money and resources into collecting and analyzing data. Rimgaudas mentioned, both in the conference and the interview, that he doesn’t see it slowing down in the foreseeable future.
Data Sources by Country
While some information has to stay secret, R. Mazgelis shared some source websites by country, based on their lucky and unlucky numbers (what can we say, a very creative way of selection):
Most popular data sources based on lucky numbers:
8 – ctrip.com
1988 – china-ef.com
Most popular data sources based on unlucky numbers:
4 – yangkeduo.com
14 – baidu.com
And also, because China is always a country of interest, Mr. Mazgelis shared some of the web sources that experienced the most noticeable changes there from Q3:
Interest in Cathaypacific grew by 216%
Interest in Shenzhenair dropped by 94%
Interest in Taobao grew by 1594%
Interest in Baidu grew 253%
Nr. 4 is a lucky number in Germany and the data source attached to it is Instagram.com
Lucky Italian number is 3 and its data source is stanleybet.it
Lucky and unlucky numbers pretty much match the ones from China, so here we go:
8 – gmarket.co.kr (lucky number)
4 – lottemart.com (unlucky number)
9 – trip.com (unlucky number)
Rimgaudas included some numbers that would resonate with all of the OxyCon attendees, i.e., 666th data source is champssports.com!
And some number specifically for our IT people:
64 – amazon.co.uk
128 – google.co.uk
256 – kohls.de
1024 – rentalcars.com
Sustainability of Data Center Proxies
For Oxylabs clients to get the best results from the solutions they are using, R. Mazgelis’ team of data analysts are making sure that they wouldn’t be abusing their data center proxies.
Using statistical methods, like ordinary least squares and similar ones, they can see when a website starts blocking IPs, this way informing our clients how to keep their proxies unblocked.
Also, using statistical methods (which we won’t get too deep into) and their collectively accumulated know-how, Oxylabs data analysts can see influential tendencies. What are the changes in the amount of individual websites scraped, changes in a specific location-based scraping, changes of scraping in different verticals (e-commerce, travel fare aggregation, SEO and others).
Based on the collected information, Oxylabs can take measures accordingly. For example, if traffic starts growing in any location, we can acquire more resources there, this way making sure that the infrastructure would keep up with the demand.
To sum up, the presentation of Rimgaudas Mazgelis, knowing the scraping trends is essential for any business. And knowing the capability of the solutions you are using is crucial for the success of the scraping operations your company is conducting.
Oxylabs Research Department is always improving and adapting to fulfill the ever-growing demand from our clients. While inner dashboards on usage statistics are provided to users, our team is always ready to help with custom solutions.
To end this – a fun fact. Based on Oxylabs stats and Internet Live Stats, every 14th search query made to bing.com comes from Oxylabs data center proxies!