Extract data from an entire website - Create a Crawler

Extract data from an entire website
Tutorial length - 7 mins

Introduction to the Crawler

The best way to extract data that is spread out across many pages of a site is by building a Crawler. Based on your training, a Crawler travel to every page of that site looking for other pages that match. Crawlers are best used for when you want lots of data, but don’t know all the URLs for that site.


This tutorial will show you how to build and run a Crawler in just 7 simple steps. In order to complete this tutorial, you’ll need to have the Desktop App installed, you can do that here: https://import.io/download


Example source: Jean webpage

Ready? Let’s get started!


Next - Step 1: Navigate to the web page

All steps

Step 2: Detect optimal settings

Step 3: Single or multiple rows?

Step 4: Train rows

Step 5: Add columns

Step 6: Add 5 more pages

Step 7: Run Crawler

Feedback and Knowledge Base