Recently, Oxylabs launched an innovative AI-based proxy type into the wild dubbed Next-Gen Residential Proxies. It did not take long to premier its first new feature as well! Currently in beta, we introduce an ML-based HTML parser – Adaptive Parser.
Its current abilities are parsing product pages from any e-commerce website. As the name implies, the parser will be able to adapt to any type of HTML code provided from an e-commerce product page.
Let us dive into how this parser works and why it was built.
Parsing product pages: main challenges
Companies that scrape e-commerce sites and their product pages usually encounter these parser-based pain points:
- Each e-commerce site has a different layout. Companies have to build custom parsers for each separate site to extract pricing and product intelligence.
- E-commerce sites often change their own layouts. This means that new parsers will need to be built and maintained each time a website decides to renew its design.
- Due to localization, e-commerce sites might change their layout depending on the country your scraping from. This once again forces companies to adapt their already built parsers to different localized product page layouts.
To give a visual example, even though product pages can look alike from a visual perspective, in code they are entirely different:
And if we take not two, but ten or maybe, say, fifty different e-commerce page layouts – that is a lot of custom parsers to build, maintain, and adapt when these websites do changes.
How Adaptive Parser works
As the name implies, the parser will be able to adapt to any type of HTML code provided from an e-commerce product page. It will be able to parse any product page, no matter what language the webpage is in, and will be able to adapt to any website changes.
When using Next-Gen Residential Proxies for scraping product pages, the Adaptive Parser feature will let you return these values from the page:
- Product description
- Product ID
- Image URLs
- Product IDs from URLs
- URLs from the page
How to use Adaptive Parser
Adaptive Parser is a feature of Next-Gen Residential Proxies. To use this feature, you first must get Next-Gen Residential Proxies. Once you have them, you will be able to use it via a few lines of code.
Basic cURL command:
curl -k -v -x ngrp.oxylabs.io:60000 -U user:pass1 "https://example.com"
Additional headers to enable parsing:
-H "X-Oxylabs-parse: 1"
-H "X-Oxylabs-parser-type: ecommerce_product”
Full cURL command:
curl -k -v -x ngrp.oxylabs.io:60000 -U user:pass1 "https://example.com" -H "X-Oxylabs-parse: 1” -H "X-Oxylabs-parser-type: ecommerce_product”
Adaptive Parser is currently in beta. Therefore the current aim is to continue developing it to be as adaptive and powerful as possible. The next big step – make it parse not only e-commerce websites but other websites as well.