Back to blog

E-Commerce Scraper API Quick Start Guide

Iveta Vistorskyte

2023-04-067 min read
Share

Oxylabs’ E-Commerce Scraper API is a public data scraper API designed to collect real-time localized data and search information from most e-commerce websites at scale. This data gathering tool serves as a trustworthy solution for gathering public information from even the most complex e-commerce websites. E-Commerce Scraper API perfectly fits for business use cases such as price monitoring, product catalog mapping, competitor analysis. 

Try E-Commerce Scraper API right away with our 1-week free trial and start scraping today. Simply go to E-Commerce Scraper API page, register, and get free 5k results.

This quick start guide explains how the E-Commerce Scraper API works. We’ll also go through the process of getting started using this data gathering tool hassle-free.

What you get with E-Commerce Scraper API

  • High success rate – get your scraping results efficiently. Implemented ML-based Patented Proxy Rotator, AI-powered fingerprinting, and the auto-retry system helps you achieve a high success rate. Your web scraping operations will run with almost zero IP blocks. 

  • Proxy pool management – leave proxy management for us and focus on collecting and analyzing the required public data. E-Commerce Scraper API is powered by one of the largest proxy pools on the market, with 102M+ IPs worldwide. 

  • JavaScript rendering – gather e-commerce public data even from the most complex websites. Our professional team automatically runs headless browsers for you to get public data from the most advanced e-commerce targets. 

  • Structured e-commerce data – don’t worry about constantly changing e-commerce websites’ layouts. ML-based adaptive parsing feature adjusts to the changes, automatically detects products’ attributes from any e-commerce targets, and provides parsed data in JSON. 

  • Various integration options – choose from asynchronous (push-pull), synchronous (realtime), or proxy-like (proxy endpoint) integration options. Get your public data delivered via REST API or choose the required data delivery to the cloud (S3 or GCP.) Oxylabs’ professional team ensures 99.9% uptime for consistent data streams 24/7.

  • 24/7 support – get all your questions answered at any time. Our support team or your Dedicated Account Manager will ensure that your web scraping process won’t be interrupted with unexpected issues or errors.

Data sources*

With E-Commerce Scraper API, you can get parsed data in JSON from various sources. Get the required e-commerce data efficiently, and be sure that you have everything needed for convenient analysis.

World’s leading e-commerce marketplaces

Public data sources from search pages: 

  • Product title

  • Price

  • Position

  • URL

  • Sponsored products

  • Pagination

Public data sources from product pages: 

  • Title & description

  • Price

  • Category

  • Discount & coupons

  • Images & availability

  • Seller information 

Additional 1000+ e-commerce websites

Public data from product pages: 

  • Title & description

  • Discounted price

  • Regular price

  • Currency

  • Availability

  • Image URL

  • Product ID

*All data sources will be provided after purchasing the product.

Free trial & purchase information

We provide two plans – Regular and Enterprise each having four subscription options based on the number of results you wish to gather:

Regular:

  1. 1-week Free trial (5,000)

  2. Micro (17,500)

  3. Starter (38,077)

  4. Advanced (103,750)

Enterprise:

  1. Venture (226,818)

  2. Business (525,789)

  3. Corporate (1,250,000)

  4. Custom+ (10M+)

All plans, except for Corporate and Custom+, can be purchased through our self-service dashboard in just a few clicks. To purchase a Corporate or Custom+ plan, please contact our sales team. 

You will also get a Dedicated Account Manager for support when you choose the Business plan, and up. Visit the E-Commerce Scraper API pricing page for more detailed information about each plan.

E-Commerce Scraper API – how does it work?

After purchasing your desired plan, you can start using E-Commerce Scraper API right away. The setup consists of just a few simple steps:

  1. Login to the dashboard.

  2. Create an API user.

  3. Run a test query and continue setup.

E-Commerce Scraper API is an easy-to-use tool which doesn’t need any particular infrastructure or resources from your side. 

  1. Select product IDs, URLs, or search phrases

  2. Submit GET or POST request

  3. Receive the required public data via REST API directly or uploaded to the cloud

For a visual example of how to use E-Commerce Scraper API for public web data scraping, check out the step-by-step tutorial below.

What will you find on the dashboard

If you choose to work with Oxylabs’ E-Commerce Scraper API, you’ll gain access to a convenient dashboard. You can keep an eye on your data usage statistics and track your subscription details. Not only that, you can contact Oxylabs’ customer service team and get assistance at any time of the day.

Authentication

E-Commerce Scraper API employs basic HTTP authentication which requires username and password. This is the easiest way to get started with the tool. The code example below shows how you can send a GET request to books.toscrape.com using the Realtime delivery method we’ll discuss later in this guide.

If you are observing low success rates or retrieve empty content, please try using additional "render":"html" parameter in your request. More information about render parameter can be found here.

curl --user "USERNAME:PASSWORD" 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json" -d '{"source": "universal_ecommerce", "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html", "geo_location": "United States", "parse": true, "parser_type": "ecommerce_product"}'
Link to GitHub

You can try this request and start scraping right away with our free trial. Simply go to E-Commerce Scraper API page and register for a 1week free trial that offers free 5k results.

Integration methods

Oxylabs’ E-Commerce Scraper API offers various integration methods, each of them having unique benefits. You can choose the one that fits your needs best and get the required e-commerce data efficiently.

Push-Pull

When using a push-pull integration method, you need to maintain a stable connection with our endpoint to fetch the required public data. In this case, you simply send us a request, and then we return your job id. Once the job is done, you can use this id to get the data from the /results endpoint. 

You can check the status of your job yourself or set up a listener accepting POST requests. In this case, we’d send you a callback message once the job is ready to be reclaimed. 

This method allows for better scalability and offers the following possibilities:

  • Single query. Our endpoint will deal with single queries for one keyword or URL. The API will send you a confirmation message with the job id and other information. With the help of this id, you can check your job status manually.

  • Check job status. If you include callback_url in your query, we’ll send you a link to the content once the scraping task is completed. In case your query didn’t contain callback_url, you should check the job status yourself. You’ll need to use the URL in href under rel:self in the response message. 

  • Retrieve job content. Once the job content is ready to be fetched, you can obtain it using the URL in href under rel:results

  • Batch query. E-Commerce Scraper API can execute queries for multiple keywords – up to 1000 keywords per one batch. For this, you’ll have to post query parameters as data in the JSON body. The system will process every keyword as a separate request and return a unique job id for every request. 

  • Get notifier IP address list. To whitelist the IPs sending you callback messages, you should send a GET request to https://data.oxylabs.io/v1/info/callbacker_ips  endpoint.

  • Upload to storage. The scraped content is stored in our databases by default. To retrieve the results, you’ll need to query our endpoint. You can also get all your data directly to your storage space by using the custom storage feature. 

  • Callback. When the data collection task is finished, we send a callback request to your device and provide you with a URL to get the scraped data.

In this quick start guide, we’ll provide an example of how to interact with E-commerce Scraper API using Push-Pull integration method and cURL library to make requests. We’ll be extracting already parsed product data from a dummy e-commerce website called books.toscrape.com from United States geo-location. If you wish to get HTML page content instead of already parsed data, you can simply remove 'parse' and 'parser_type' parameters.  

Example of a single query request:

curl --user "USERNAME:PASSWORD" 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json" -d '{"source": "universal_ecommerce", "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html", "geo_location": "United States", "parse": true, "parser_type": "ecommerce_product"}'
Link to GitHub

Sample of the initial response output:

{  
   "created_at":"2019-10-01 00:00:01",   "client_id":1,
   "domain": "com",
  "geo_location": "United States",
  "id": "6849255332305179649",
  "limit": 10,
  "locale": "",
  "pages": 1,
  "parse": true,
  "parser_type": "ecommerce_product",
  "render": "mhtml",
  "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
  "query": "",
  "source": "universal",
  "start_page": 1,
  "status": "pending",
  "storage_type": null,
  "storage_url": null,
  "subdomain": "books",
  "content_encoding": "utf-8",
  "updated_at": "2019-10-01 00:00:15",
  "user_agent_type": "desktop",
  "session_info": null,
  "statuses": [],
  "_links": [
    {
      "rel": "self",
      "href": "http://data.oxylabs.io/v1/queries/6849255332305179649",
      "method": "GET"
    },
    {
      "rel": "results",
      "href": "http://data.oxylabs.io/v1/queries/6849255332305179649/results",
      "method": "GET"
    }
  ]
}
Link to GitHub

The initial response indicates that the job’s scrape-specific website has been created in our system. This means that it also displays all the job parameters and links where to check whether the job is complete or from where to download the contents.

In order to check whether the job is "status": "done", you can use a link from  ["_links"][0]["href"], which is: http://data.oxylabs.io/v1/queries/6849255332305179649.

Example of how to check job status:

curl --user "USERNAME:PASSWORD" 'http://data.oxylabs.io/v1/queries/6849255332305179649'

The response will contain the same data as the initial response. If the job is "status": "done", you can retrieve the contents using the link from ["_links"][1]["href"], which is http://data.oxylabs.io/v1/queries/6849255332305179649/results.

Example of how to retrieve data:

curl --user "USERNAME:PASSWORD" 'http://data.oxylabs.io/v1/queries/6849255332305179649/results'
Link to GitHub

Sample of the response data output:

{
    "results": [
        {
            "content": {
                "ids": [],
                "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
                "brand": null,
                "image": null,
                "price": 51.77,
                "title": "A Light in the Attic",
                "currency": "£",
                "old_price": null,
                "description": "Product Description. It's hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and drawings from Shel Silverstein celebrates its 20th anniversary with this special edition. Silverstein's humorous and creative verse can amuse the dowdiest of readers. Lemon-faced adults and fidgety kids sit still and read these rhythmic words and laugh and smile and love th It's hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and drawings from Shel Silverstein celebrates its 20th anniversary with this special edition. Silverstein's humorous and creative verse can amuse the dowdiest of readers. Lemon-faced adults and fidgety kids sit still and read these rhythmic words and laugh and smile and love that Silverstein. Need proof of his genius? RockabyeRockabye baby, in the treetopDon't you know a treetopIs no safe place to rock?And who put you up there,And your cradle, too?Baby, I think someone down here'sGot it in for you. Shel, you never sounded so good. ...more. Product Information.",
                "availability": null,
                "parse_status_code": 12000,
                "additional_properties": [
                    {
                        "UPC": "a897fe39b1053632"
                    },
                    {
                        "Product Type": "Books"
                    },
                    {
                        "Price (excl. tax)": "£51.77"
                    },
                    {
                        "Price (incl. tax)": "£51.77"
                    },
                    {
                        "Tax": "£0.00"
                    },
                    {
                        "Availability": "In stock (22 available)"
                    },
                    {
                        "Number of reviews": "0"
                    }
                ]
            },
            "created_at": "2021-10-04 08:49:29",
            "updated_at": "2021-10-04 08:49:41",
            "page": 1,
            "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
            "job_id": "6850713463170271233",
            "status_code": 200
        }
    ]
}
Link to GitHub

If you wish our system to ping your server when the job is done automatically, you could retrieve the data right away. Please use the additional "callback_url": "YOUR_CALLBACK_LISTENER_IP" parameter and refer to our documentation to set up your callback listener. If you wish to get the data directly to your cloud storage, you’ll need to use additional "storage_type" and "storage_url" parameters. To fully set up the delivery to the cloud storage, please refer to upload-to-storage documentation.

Realtime

Realtime delivery method is similar to a previously mentioned callback method. The main difference is that you get your data back on the same open HTTPS connection in real-time.

Example of a realtime request:

curl --user "USERNAME:PASSWORD" 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json" -d '{"source": "universal_ecommerce", "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html", "geo_location": "United States", "parse": true, "parser_type": "ecommerce_product"}'
Link to GitHub

Example response body that will be returned on open connection:

{
    "results": [
        {
            "content": {
                "ids": [],
                "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
                "brand": null,
                "image": null,
                "price": 51.77,
                "title": "A Light in the Attic",
                "currency": "£",
                "old_price": null,
                "description": "Product Description. It's hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and drawings from Shel Silverstein celebrates its 20th anniversary with this special edition. Silverstein's humorous and creative verse can amuse the dowdiest of readers. Lemon-faced adults and fidgety kids sit still and read these rhythmic words and laugh and smile and love th It's hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and drawings from Shel Silverstein celebrates its 20th anniversary with this special edition. Silverstein's humorous and creative verse can amuse the dowdiest of readers. Lemon-faced adults and fidgety kids sit still and read these rhythmic words and laugh and smile and love that Silverstein. Need proof of his genius? RockabyeRockabye baby, in the treetopDon't you know a treetopIs no safe place to rock?And who put you up there,And your cradle, too?Baby, I think someone down here'sGot it in for you. Shel, you never sounded so good. ...more. Product Information.",
                "availability": null,
                "parse_status_code": 12000,
                "additional_properties": [
                    {
                        "UPC": "a897fe39b1053632"
                    },
                    {
                        "Product Type": "Books"
                    },
                    {
                        "Price (excl. tax)": "£51.77"
                    },
                    {
                        "Price (incl. tax)": "£51.77"
                    },
                    {
                        "Tax": "£0.00"
                    },
                    {
                        "Availability": "In stock (22 available)"
                    },
                    {
                        "Number of reviews": "0"
                    }
                ]
            },
            "created_at": "2021-10-04 08:49:29",
            "updated_at": "2021-10-04 08:49:41",
            "page": 1,
            "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
            "job_id": "6850713463170271233",
            "status_code": 200
        }
    ]
}
Link to GitHub

Proxy Endpoint

When using a Proxy Endpoint integration method, you can only provide fully-formed URLs instead of domain or search query parameters. You can also provide additional information such as location whether you want to parse the data in the requests headers. 

In this case, you should use our entry node as a proxy, authorize with E-Commerce Scraper APIs credentials, and ignore certificates. The required public data will reach you on the same open connection.

Proxy Endpoint request sample using cURL library:

curl -k -x realtime.oxylabs.io:60000 -U USERNAME:PASSWORD -H "X-Oxylabs-Geo-Location: United States" -H "X-Oxylabs-Parse: 1" -H "X-Oxylabs-Parser-Type: ecommerce_product" "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"
Link to GitHub

GitHub

Oxylabs GitHub is the place to go for tutorials on how to scrape websites, use our tools, implement products or integrate them using the most popular programming languages (e.g. C#, Java, NodeJs, PHP, Python, etc.). Click here and check out a repository on GitHub to find the complete code used in this article.

Parameters*

ParameterDescriptionDefault Value
sourceData source
url / queryDirect URL (link)
user_agent_typeDevice type and browser. The full list can be found here.desktop
geo_locationGeo location of the proxy used to retrieve the data. The full list of supported locations can be found here. With some sources, geo_location param sets the delivery location – contact our sales team to get access to our full documentation.
localeLocale, as expected in the Accept-Language header.
renderEnables JavaScript rendering. Use it when the target requires JavaScript to load content. Only works via Push-Pull (a.k.a. Callback) method. There are two available values for this parameter: html (get raw output) and png (get a Base64-encoded screenshot).
parsetrue will return parsed data. If using universal_ecommerce source it is required to specify parser_type.
parser_typeSetting this value to ecommerce_product will give you access to our AI-powered Adaptive Parser which automatically adapts to nearly any e-commerce product page.
context: contentBase64-encoded POST request body. It is only useful if http_method is set to post.
context: cookiesPass your own cookies.
context: follow_redirectsIndicate whether you would like the scraper to follow redirects (3xx responses with a destination URL) to get the contents of the URL at the end of the redirect chain.true
context: headersPass your own headers.
context: http_methodSet it to post if you would like to make a POST request to your target URL via E-Commerce Universal Scraper.GET
context: session_idIf you want to use the same proxy with multiple requests, you can do so by using this parameter. Just set your session to any string you like, and we’ll assign a proxy to this ID and keep it for up to 10 minutes. After that, if you make another request with the same session ID, a new proxy will be assigned to that particular session ID.
context: successful_status_codesDefine a custom HTTP response code (or a few of them), upon which we should consider the scrape successful and return the content to you. May be useful if you want us to return the 503 error page or in some other non-standard cases.
callback_urlURL to your callback endpoint.
storage_typeStorage service provider. We support Amazon S3 and Google Cloud Storage. The storage_type parameter values for these storage providers are, correspondingly, s3 and gcs. The full implementation can be found on the Upload to Storage page. This feature only works via Push-Pull (Callback) method.
storage_urlYour storage bucket name. Only works via Push-Pull (Callback) method.

*All parameters will be provided after purchasing the product.

Response codes

ResponseError messageDescription
204No contentYou are trying to retrieve a job that has not been completed yet.
400Multiple error messagesBad request structure, could be a misspelled parameter or invalid value. Response body will have a more specific error message.
401‘Authorization header not provided’ / ‘Invalid authorization header’ / ‘Client not found’Missing authorization header or incorrect login credentials.
403ForbiddenYour account does not have access to this resource.
404Not foundJob ID you are looking for is no longer available.
429Too many requestsExceeded rate limit. Please contact your account manager to increase limits.
500Unknown errorService unavailable.
524Service unavailableService unavailable.
612Undefined internal errorSomething went wrong and we failed the job you submitted. You can try again at no extra cost, as we do not charge you for faulted jobs. If that does not work, please get in touch with us.
613Faulted after too many retriesWe tried scraping the job you submitted, but gave up after reaching our retry limit.

Conclusion

E-Commerce Scraper API is an advanced tool that enables you to collect real-time localized data and search information from most e-commerce websites at scale. To simplify integration, we offer a multitude of integration and data delivery methods, all of which ensure seamless data delivery. Like with any other Oxylabs’ offer, E-Commerce Scraper API also comes with additional benefits, including a convenient dashboard and customer support 24/7. 

We hope this guide has made E-Commerce API features easier to understand and covered all the questions about using this product. If you’re still not sure about every aspect of this public data gathering tool, get in touch with us via support@oxylabs.io.

About the author

Iveta Vistorskyte

Lead Content Manager

Iveta Vistorskyte is a Lead Content Manager at Oxylabs. Growing up as a writer and a challenge seeker, she decided to welcome herself to the tech-side, and instantly became interested in this field. When she is not at work, you'll probably find her just chillin' while listening to her favorite music or playing board games with friends.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested

IN THIS ARTICLE:


  • What you get with E-Commerce Scraper API

  • Data sources*

  • Free trial & purchase information

  • E-Commerce Scraper API – how does it work?

  • What will you find on the dashboard

  • Authentication

  • Integration methods

  • GitHub

  • Parameters*

  • Response codes

  • Conclusion

E-Commerce Scraper API for successful scraping

Extract localized e-commerce content at scale from almost any target.

Scale up your business with Oxylabs®