A little while ago, Oxylabs launched an innovative AI-based proxy type into the wild dubbed Next-Gen Residential Proxies. It did not take long to premier its first new feature as well! We’re introducing an ML-based HTML parser – Adaptive Parser.
Its current abilities are parsing product pages from any e-commerce website. As the name implies, the parser will be able to adapt to any type of HTML code provided from an e-commerce product page.
Let us dive into how this parser works and why it was built.
Companies that scrape e-commerce sites and their product pages usually encounter these parser-based pain points:
Each e-commerce site has a different layout. Companies have to build custom parsers for each separate site to extract pricing and product intelligence.
E-commerce sites often change their own layouts. This means that new parsers will need to be built and maintained each time a website decides to renew its design.
Due to localization, e-commerce sites might change their layout depending on the country your scraping from. This once again forces companies to adapt their already built parsers to different localized product page layouts.
To give a visual example, even though product pages can look alike from a visual perspective, in code they are entirely different:
And if we take not two, but ten or maybe, say, fifty different e-commerce page layouts – that is a lot of custom parsers to build, maintain, and adapt when these websites do changes.
As the name implies, the parser will be able to adapt to any type of HTML code provided from an e-commerce product page. It will be able to parse any product page, no matter what language the webpage is in, and will be able to adapt to any website changes.
When using Next-Gen Residential Proxies for scraping product pages, the Adaptive Parser feature will let you return these values from the page:
Product IDs from URLs
URLs from the page
If you’re more of a visual person, check out our video with Nedas Visniauskas, our Lead of Commercial Product Owners. He shows how Adaptive Parser works in practice:
Adaptive Parser is a feature of Next-Gen Residential Proxies. To use this feature, you first must get Next-Gen Residential Proxies. Once you have them, you will be able to use it via a few lines of code.
Basic cURL command:
curl -k -v -x ngrp.oxylabs.io:60000 -U user:pass1 "https://example.com"
Additional headers to enable parsing:
-H "X-Oxylabs-parse: 1" -H "X-Oxylabs-parser-type: ecommerce_product”
Full cURL command:
curl -k -v -x ngrp.oxylabs.io:60000 -U user:pass1 "https://example.com" -H "X-Oxylabs-parse: 1” -H "X-Oxylabs-parser-type: ecommerce_product”
Adaptive Parser is currently in beta. Therefore the current aim is to continue developing it to be as adaptive and powerful as possible. The next big step – make it parse not only e-commerce websites but other websites as well.
If you want to learn more about Next-Gen Residential Proxies, their features, or how to implement Adaptive Parser – contact our sales team or email us at email@example.com.
About the author
Lead Product Marketing Manager
Gabija Fatenaite is a Lead Product Marketing Manager at Oxylabs. Having grown up on video games and the internet, she grew to find the tech side of things more and more interesting over the years. So if you ever find yourself wanting to learn more about proxies (or video games), feel free to contact her - she’ll be more than happy to answer you.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scale up your business with Oxylabs®
GET IN TOUCH
Certified data centers and upstream providers
Connect with us