Home

features

web crawler

Web Crawler

  • Discover and collect only relevant data from target websites

  • Control the crawling approach and scope; define the end result

  • Get results as parsed data, a set of HTMLs, or a list of URLs

*Web Crawler is a feature of Web Scraper API.

Web Crawler

Rapidly gather relevant data from websites

Gather only the data you need by crawling a website in seconds. Web Crawler efficiently spiders any website based on your selected criteria and seamlessly returns the complete data to you.

Easily control the scope and tailor your end result

With Web Crawler, you have full control over the creation and continuity of the process. You can also specify how the website should be crawled using filters and scraping parameters such as regular expressions, proxy geo-location, results storage, and more.

Retrieve your results in a specified format

Receive results according to your data needs. There are three output formats: a list of URLs (sitemap), a set of HTML files, and parsed data. Optionally, Web Crawler can upload the result files to your cloud storage.

How does Web Crawler work?

Web Crawler is an add-on to Oxylabs Web Scraper API that allows you to leverage the API’s scraping and parsing functions to crawl websites at scale in real time. Select a starting URL, specify crawling patterns, let Web Crawler traverse the site, and receive results to your chosen cloud storage bucket.


User input

The service user forms an input that determines the crawling scope, specifies scraping parameters, and submits a request to the job initiation endpoint.


Web crawling

Web Crawler traverses a website by using links between pages until it finds no more new URLs that match the patterns specified by the user.


Job result

Web Crawler aggregates the result files (sitemaps, parsed data, or HTML documents) into one or more result files as a final output ready to use.


Transfer to cloud

Optionally, Web Crawler can upload the files to the client-specified cloud storage location on AWS S3.

Web Scraper API with Web Crawler

Web Crawler is an additional feature our Scraper API users get for free.

Web Scraper API delivers real-time data from most websites, including search engines, e-commerce marketplaces, and much more.

  • Customizable request parameters

  • JavaScript rendering

  • Convenient delivery

Start free trial

Aivaras Steponavicius

Senior Account Manager @ Oxylabs

Before you can extract data from a website, often, you need to do some web crawling first to find the specific URLs you're interested in. Web Crawler can take care of that for you automatically.

Ruta Petronyte

Senior Account Manager @ Oxylabs

Web Crawler is a great addition to our Scraper APIs that lets you efficiently explore and collect data using Oxylabs’ maintenance-free infrastructure.

A word from your Dedicated Account Manager

With Oxylabs Corporate and Enterprise plans, you get your very own dedicated Account Manager.

Frequently Asked Questions

What is Web Crawler?

Web Crawler is a feature of Oxylabs Web Scraper API that lets you spider any website, select useful content, and have it delivered in bulk.

If you want to learn more about web crawlers, check this blog post for a detailed explanation.

Is there a free trial available for the Web Crawler feature?

Yes. You can try out the Web Crawler feature by claiming a free trial of Web Scraper API directly from our dashboard.

What does Web Crawler do?

Web Crawler can discover all pages on a website and fetch data at scale and in real time.

The tool follows the links from the initial web page to other pages until it has visited and indexed all the pages it can find on a particular website.

Is it legal to crawl a website?

The answer depends on the specific task at hand. Before crawling, make sure you are in line with the applicable laws regarding your access to a particular public domain. Our team suggests seeking professional legal guidance. Check our blog post on the legalities of public web data gathering.

Who uses web crawlers?

Web crawlers are used by individuals and organizations – anyone who needs to collect data from websites – including but not limited to: 

  • Search engines index and organize web pages, allowing users to easily find relevant information.

  • E-commerce companies gather information about their competitors' products, prices, and promotions.

  • Marketing professionals collect data on their target audience, monitor social media mentions, and track online reputation of their brand.

  • Government agencies monitor websites for illegal or harmful content and gather intelligence for security purposes.

  • Website owners check their website's search engine rankings, identify broken links, and track their brand's online reputation.

Get the latest news from data gathering world

I'm interested