How to Use GrabzIt's Structured Data Scraping Template

GrabzIt's Structured Data Scraping Template takes a unique, automated approach to data extraction. Instead of requiring you to manually select individual elements, the template scans the entire target website and automatically generates individual spreadsheets grouped by the common, repeating HTML structures it detects.

Here is a breakdown of how to read and navigate your generated scrape results:

1. Understanding the File Names

When you download and extract your zipped scrape data, you will see multiple .csv files. The names of these spreadsheets might look complex, such as:

a_div_div_div_div_div_div_div_div_div_div_div_body_html.csv

This file name simply indicates the overarching HTML DOM path where this specific repeating structure was found on the webpage.

2. Understanding the Column Headers

If you open one of these spreadsheets, the column headers will also represent HTML paths. These indicate the specific, relative location within the overarching HTML structure where the data was found.

Examples of column headers you might see:

img|src (Extracting an image source link)
div|div (Extracting text inside a nested div)
div|div|a|href (Extracting a hyperlink URL)

3. Locating Your Data

Directly beneath these structural headers is your actual scraped content. Depending on the website, these rows will contain the valuable information you are looking for, such as product names, pricing, image links, text descriptions, and URLs.

Pro Tip: Where to start?

When you first open your zipped scrape data, start by reviewing the largest spreadsheets first (by file size). Because the template groups data by repeating structures, the largest files almost always contain the primary data lists (like product grids, search results, or directories) that you are actually trying to capture.

More Web Scraping Articles Community Web Scraping Questions