Tools to Capture and Convert the Web
GrabzIt's Online Community

numbers incorrect

Ask questions relating to GrabzIt's Web Scraper Tool. Such as how to use the web scraper and API to extract data from web pages, images or PDF documents.

i performed a scrape and it said it captured 388 pages, but the download only gave me 189 items, including at least 6 duplicates. anyone know how to fix this or rerun it and get a complete result?

The problem I'm having is it's not grabbing or scraping everything, it's missing a lot of the pages that are on the site.

Asked by Nathan Shashoua on the 16th of June 2024

Hi Nathan,

This is normal, the scrape count is just the number of pages that are visited. This maybe because the scraper was sent there by a web page, it may not even be on your website if there is a automatic redirection that forced the scraper to visit a web page.

Duplicates can be removed by setting the ignore duplicate pages option which will try and ignore pages based on how similar the content is to previously visited pages.

It has to be done this way because, with the complexity of some website structures. For instance some web pages might have a structure like https://www.example.com/index?a=1 and https://www.example.com/index?a=2 that has the same content.

Answered by GrabzIt Support on the 16th of June 2024