adaptive-parser
avatar

Gabija Fatenaite

Aug 21, 2020 3 min read

Recently, Oxylabs launched an innovative AI-based proxy type into the wild dubbed Next-Gen Residential Proxies. It did not take long to premier its first new feature as well! Currently in beta, we introduce an ML-based HTML parser – Adaptive Parser. 

Its current abilities are parsing product pages from any e-commerce website. As the name implies, the parser will be able to adapt to any type of HTML code provided from an e-commerce product page. 

Let us dive into how this parser works and why it was built.

Parsing product pages: main challenges 

Companies that scrape e-commerce sites and their product pages usually encounter these parser-based pain points:

  • Each e-commerce site has a different layout. Companies have to build custom parsers for each separate site to extract pricing and product intelligence.
  • E-commerce sites often change their own layouts. This means that new parsers will need to be built and maintained each time a website decides to renew its design.
  • Due to localization, e-commerce sites might change their layout depending on the country your scraping from. This once again forces companies to adapt their already built parsers to different localized product page layouts. 

To give a visual example, even though product pages can look alike from a visual perspective, in code they are entirely different:

New Feature for Next-Gen Residential Proxies: code example
Source: https://www.ebooks.com/en-lt/book/210034739/python-automation-cookbook/jaime-buelta/
New Feature for Next-Gen Residential Proxies: code example no.2
Source: https://www.kobo.com/lt/en/ebook/python-web-scraping-projects

And if we take not two, but ten or maybe, say, fifty different e-commerce page layouts – that is a lot of custom parsers to build, maintain, and adapt when these websites do changes.

How Adaptive Parser works

As the name implies, the parser will be able to adapt to any type of HTML code provided from an e-commerce product page. It will be able to parse any product page, no matter what language the webpage is in, and will be able to adapt to any website changes. 

When using Next-Gen Residential Proxies for scraping product pages, the Adaptive Parser feature will let you return these values from the page:

  • Price
  • Old-price
  • Currency
  • Title
  • Product description
  • Product ID
  • Image URLs
  • Product IDs from URLs
  • URLs from the page

How to use Adaptive Parser

Adaptive Parser is a feature of Next-Gen Residential Proxies. To use this feature, you first must get Next-Gen Residential Proxies. Once you have them, you will be able to use it via a few lines of code.

Basic cURL command:

curl -k -v -x ngrp.oxylabs.io:60000 -U user:pass1 "https://example.com"

Additional headers to enable parsing:

-H "X-Oxylabs-parse: 1"
-H "X-Oxylabs-parser-type: ecommerce_product”

Full cURL command:

curl -k -v -x ngrp.oxylabs.io:60000 -U user:pass1 "https://example.com" -H "X-Oxylabs-parse: 1” -H "X-Oxylabs-parser-type: ecommerce_product”

Wrapping up

Adaptive Parser is currently in beta. Therefore the current aim is to continue developing it to be as adaptive and powerful as possible. The next big step – make it parse not only e-commerce websites but other websites as well. 

If you want to learn more about Next-Gen Residential Proxies, their features, or how to implement Adaptive Parser – contact our sales team or email us at [email protected] 

avatar

About Gabija Fatenaite

Gabija Fatenaite is a Senior Content Manager at Oxylabs. Having grown up on video games and the internet, she grew to find the tech side of things more and more interesting over the years. So if you ever find yourself wanting to learn more about proxies (or video games), feel free to contact her - she’ll be more than happy to answer you.

Related articles

How to Detect Bad Bots and How it Affects Web Scraping?

How to Detect Bad Bots and How it Affects Web Scraping?

Aug 14, 2020

7 min read

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.