Create a Crawler - Extract data from an entire website
Extract data from an entire website
Tutorial length - 7 mins
To extract data from an entire website you will need a web crawler. A web crawler can systematically extract data from multiple webpages for you. It does so by crawling, or accessing, webpages and returns the data you need for each one of them.
This tutorial is for an import.io crawler for cases when you don’t know the URLs to your data.
If you are looking to extract data from multiple pages that follow a simple URL structure, you may be looking for a multi-page Extractor.
Introduction to the Crawler
The best way to extract data that is spread out across many pages of a site is by building a Crawler. Based on your training, a Crawler travel to every page of that site looking for other pages that match. Crawlers are best used for when you want lots of data, but don’t know all the URLs for that site.
This tutorial will show you how to build and run a Crawler in just 7 simple steps. In order to complete this tutorial, you’ll need to have the Desktop App installed, you can do that here: https://import.io/download
Example source: Jean webpage
Ready? Let’s get started!
Next - Step 1: Navigate to the web page
All steps
Step 2: Detect optimal settings
Step 3: Single or multiple rows?