Extract data from an entire website

To extract data from an entire website you will need a web crawler. A web crawler can systematically extract data from multiple webpages for you. It does so by crawling, or accessing, webpages and returns the data you need for each one of them.

This tutorial is for an crawler for cases when you don’t know the URLs to your data.

If you are looking to extract data from multiple pages that follow a simple URL structure, you may be looking for a multi-page Extractor.

Introduction to the Crawler

The best way to extract data that is spread out across many pages of a site is by building a Crawler. Based on your training, a Crawler travel to every page of that site looking for other pages that match. Crawlers are best used for when you want lots of data, but don’t know all the URLs for that site.

This tutorial will show you how to build and run a Crawler in just 7 simple steps.

Example source: Jean webpage

Ready? Let’s get started!

