Create a Crawler - Extract data from an entire website

Extract data from an entire website

Tutorial length - 7 mins

To extract data from an entire website you will need a web crawler. A web crawler can systematically extract data from multiple webpages for you. It does so by crawling, or accessing, webpages and returns the data you need for each one of them.

This tutorial is for an crawler for cases when you don’t know the URLs to your data.

If you are looking to extract data from multiple pages that follow a simple URL structure, you may be looking for a multi-page Extractor.

Introduction to the Crawler

The best way to extract data that is spread out across many pages of a site is by building a Crawler. Based on your training, a Crawler travel to every page of that site looking for other pages that match. Crawlers are best used for when you want lots of data, but don’t know all the URLs for that site.

This tutorial will show you how to build and run a Crawler in just 7 simple steps.

Example source: Jean webpage

Ready? Let’s get started!

Next - Step 1: Navigate to the web page

All steps

Step 2: Detect optimal settings

Step 3: Single or multiple rows?

Step 4: Train rows

Step 5: Add columns

Step 6: Add 5 more pages

Step 7: Run Crawler

Feedback and Knowledge Base