Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network statusCareers

Back to blog

ML-based Adaptive Parser for Product Page Scraping

Gabija Fatenaite

2020-08-212 min read
Share

A little while ago, Oxylabs launched an innovative AI-based proxy type into the wild dubbed Next-Gen Residential Proxies. It did not take long to premier its first new feature as well! We’re introducing an ML-based HTML parser – Adaptive Parser. 

Its current abilities are parsing product pages from any e-commerce website. As the name implies, the parser will be able to adapt to any type of HTML code provided from an e-commerce product page. 

Let us dive into how this parser works and why it was built.

Parsing product pages: main challenges

Companies that scrape e-commerce sites and their product pages usually encounter these parser-based pain points:

  • Each e-commerce site has a different layout. Companies have to build custom parsers for each separate site to extract pricing and product intelligence.

  • E-commerce sites often change their own layouts. This means that new parsers will need to be built and maintained each time a website decides to renew its design.

  • Due to localization, e-commerce sites might change their layout depending on the country your scraping from. This once again forces companies to adapt their already built parsers to different localized product page layouts. 

To give a visual example, even though product pages can look alike from a visual perspective, in code they are entirely different:

And if we take not two, but ten or maybe, say, fifty different e-commerce page layouts – that is a lot of custom parsers to build, maintain, and adapt when these websites do changes.

How Adaptive Parser works

As the name implies, the parser will be able to adapt to any type of HTML code provided from an e-commerce product page. It will be able to parse any product page, no matter what language the webpage is in, and will be able to adapt to any website changes. 

When using Next-Gen Residential Proxies for scraping product pages, the Adaptive Parser feature will let you return these values from the page:

  • Price

  • Old-price

  • Currency

  • Title

  • Product description

  • Product ID

  • Image URLs

  • Product IDs from URLs

  • URLs from the page

If you’re more of a visual person, check out our video with Nedas Visniauskas, our Lead of Commercial Product Owners. He shows how Adaptive Parser works in practice:

How to use Adaptive Parser

Adaptive Parser is a feature of Next-Gen Residential Proxies. To use this feature, you first must get Next-Gen Residential Proxies. Once you have them, you will be able to use it via a few lines of code.

Basic cURL command:

curl -k -v -x ngrp.oxylabs.io:60000 -U user:pass1 "https://example.com"

Additional headers to enable parsing:

-H "X-Oxylabs-parse: 1"

-H "X-Oxylabs-parser-type: ecommerce_product”

Full cURL command:

curl -k -v -x ngrp.oxylabs.io:60000 -U user:pass1 "https://example.com" -H "X-Oxylabs-parse: 1” -H "X-Oxylabs-parser-type: ecommerce_product”

Wrapping up

Adaptive Parser is currently in beta. Therefore the current aim is to continue developing it to be as adaptive and powerful as possible. The next big step – make it parse not only e-commerce websites but other websites as well. 

If you want to learn more about Next-Gen Residential Proxies, their features, or how to implement Adaptive Parser – contact our sales team or email us at hello@oxylabs.io. 

About the author

Gabija Fatenaite

Lead Product Marketing Manager

Gabija Fatenaite is a Lead Product Marketing Manager at Oxylabs. Having grown up on video games and the internet, she grew to find the tech side of things more and more interesting over the years. So if you ever find yourself wanting to learn more about proxies (or video games), feel free to contact her - she’ll be more than happy to answer you.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested

IN THIS ARTICLE:


  • Parsing product pages: main challenges


  • How Adaptive Parser works


  • How to use Adaptive Parser


  • Wrapping up

Forget about complex web scraping processes

Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.

Scale up your business with Oxylabs®