Back to blog

Scraping Trends and Infrastructure Sustainability

Vytautas Kirjazovas

2019-10-094 min read
Share

On the first day of OxyCon, Rimgaudas Mazgelis, head of Research Department at Oxylabs, introduced how scraping trends vary between different verticals and the overall sustainability of datacenter proxies.

There are many reasons for a business to be interested in scraping trends. We selected a couple highlights:

  • Market trends are crucial knowledge for any business. For example, there is a trend that every year during the holiday period (Thanksgiving, Black Friday, Christmas, and so on), growth in traffic will happen. This means that companies have to plan their time and resources accordingly.

  • Businesses need to know the trends of their source websites. Knowing when their data source will be having a sale, an update, or anything like it, requires to strategically shuffle around both infrastructure and human resources (increasing them if necessary).

  • Advertisement businesses have to know trends to do their job effectively. Why would anyone want their ad on an unpopular website? If you have the advantage of knowing trends, you can place your targeted ads on the most relevant sites of the time.

Data research at Oxylabs

It all began 4 years ago, in 2015. Oxylabs data analyst team started out by looking for basic patterns and how websites respond to scraping (i.e., how websites can tell that they are being accessed by bots). It was all done by using Microsoft Excel – surprising, right? This data was then filtered through to remove all unnecessary information and shared with clients without looking for any trends or insights.

Later on, the team started analyzing the data with the same Excel and some other simple tools, found some tendencies, adapted formulas, and kept on working.

And here we are, 4 years later, in 2019 – we now have inner dashboards at Oxylabs. There we can get trends calculated using statistical methods on R, Python by putting in the client, data source, or date interval. 

A couple of months ago, an interview with Oxylabs Data Research team was released on this blog, so give it a read for more insights into how everything changed.

Rimgaudas Mazgelis, Head of Research Department @Oxylabs

Rimgaudas Mazgelis, Head of Research Department @Oxylabs

Now we know the reason why knowing the scraping trends is important, so let’s get down to some of them Oxylabs Data Research Department noticed:

  1. Summer is a slow season for scraping. Why? There are many articles on this topic on the web, but the main idea is that in the summertime people slow down with their purchases. What’s interesting is that based on a report of SumAll, online shopping numbers drop 30 percent between December and July.

E-commerce, price scraping, market research – all these and more impact and are impacted by scraping, so not a big surprise that when one of them slows down – the other does too.

  1. Cybersecurity was the most stable vertical in Q3. With all that’s happening in the world with leaks and hacks, cybersecurity stayed the most stable business vertical out of them all. No significant drops in requests, no major growth either. Although looking at trends Oxylabs Research Department can say that we will see growth in this vertical this fall.

  2. September is bringing back the request growth. As mentioned above, the fall season is favorite for all businesses as all the celebrations roll back in: Halloween, Thanksgiving, Black Friday sales, Singles Day (happens in most Asian countries on 11-11), Christmas and more.
    Q4 is the quarter which requires the most resources from every business, so be prepared.

  3. Average request size has grown immensely. Oxylabs client circle is ever-growing and so is the average request size. Businesses are investing more and more time, money and resources into collecting and analyzing data. Rimgaudas mentioned, both in the conference and the interview, that he doesn’t see it slowing down in the foreseeable future.

Data sources by country

While some information has to stay secret, R. Mazgelis shared some source websites by country, based on their lucky and unlucky numbers (what can we say, a very creative way of selection):

China

Most popular data sources based on lucky numbers:

8 – ctrip.com

1988 – china-ef.com

Most popular data sources based on unlucky numbers:

4 – yangkeduo.com

14 – baidu.com

And also, because China is always a country of interest, Mr. Mazgelis shared some of the web sources that experienced the most noticeable changes there from Q3:

Travel:

Interest in Cathaypacific grew by 216%

Interest in Shenzhenair dropped by 94%

E-commerce:

Interest in Taobao grew by 1594%

Search engines:

Interest in Baidu grew 253%

Italy

Lucky Italian number is 3 and its data source is stanleybet.it

South Korea

Lucky and unlucky numbers pretty much match the ones from China, so here we go:

8 – gmarket.co.kr (lucky number)

4 – lottemart.com (unlucky number)

9 – trip.com (unlucky number)

Worldwide numbers

Rimgaudas included some numbers that would resonate with all of the OxyCon attendees, i.e., 666th data source is champssports.com!

And some number specifically for our IT people:

64 – amazon.co.uk

128 – google.co.uk

256 – kohls.de

1024 – rentalcars.com

Sustainability of Data Center Proxies

Sustainability of datacenter proxies

For Oxylabs clients to get the best results from the solutions they are using, R. Mazgelis’ team of data analysts are making sure that they wouldn’t be abusing their datacenter proxies.

Using statistical methods, like ordinary least squares and similar ones, they can see when a website starts blocking IPs, this way informing our clients how to keep their proxies unblocked.

Also, using statistical methods (which we won’t get too deep into) and their collectively accumulated know-how, Oxylabs data analysts can see influential tendencies. What are the changes in the amount of individual websites scraped, changes in a specific location-based scraping, changes of scraping in different verticals (e-commerce, travel fare aggregation, SEO and others). 

Based on the collected information, Oxylabs can take measures accordingly. For example, if traffic starts growing in any location, we can acquire more resources there, this way making sure that the infrastructure would keep up with the demand.

Conclusion

To sum up, the presentation of Rimgaudas Mazgelis, knowing the scraping trends is essential for any business. And knowing the capability of the solutions you are using is crucial for the success of the scraping operations your company is conducting. 

Oxylabs Research Department is always improving and adapting to fulfill the ever-growing demand from our clients. While inner dashboards on usage statistics are provided to users, our team is always ready to help with custom solutions.

To end this – a fun fact. Based on Oxylabs stats and Internet Live Stats, every 14th search query made to bing.com comes from Oxylabs datacenter proxies!

About the author

Vytautas Kirjazovas

Head of PR

Vytautas Kirjazovas is Head of PR at Oxylabs, and he places a strong personal interest in technology due to its magnifying potential to make everyday business processes easier and more efficient. Vytautas is fascinated by new digital tools and approaches, in particular, for web data harvesting purposes, so feel free to drop him a message if you have any questions on this topic. He appreciates a tasty meal, enjoys traveling and writing about himself in the third person.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested