Tools to Capture and Convert the Web

How to Scrape Product List and Detail Pages

On websites there is often a search page, that contains a list of items, with each item being given a summary description with a link to a detail page that includes in-depth information on the item.

As this structure is so often used there is often the need to scrape some information about each item from the search page and the rest from the detail page. This article will give guidance on how to scrape such information.

First enter the URL of the product list page you want to scrape. Then select the information you want to select from the product list page. Make sure all of the examples of the data are selected.

Then on the scrape instructions page, click Add Scrape Instruction.

The first thing to be aware of is that our scraper works in exactly the same way as a browser so if there is a cookie security notification or other inline popup that stops you clicking on the page you must instruct the scraper to close the popup before the rest of the scrape can be done. Most of these popups only need to be clicked once and so you can tell GrabzIt to do the same. To do this use the Click Element action and click on the HTML element required to close the popup. Then click the Once Only option then Save and Next.

Next choose the Extract Data action, then select the data you want to extract. So, if you wanted to select the title of an item, from the list of search results. Make sure that every title in that list is selected.

Our wizard tries to automatically identify sets of data and may select more information than you want automatically. If this happens just click again the items that you don’t want to be selected and they will no longer be included. This teaches our web scraper what to extract.

Now, choose the attribute of the data item you want to extract. Such as "Text" and then click Next. On the next screen give it a title. Note that here you want all of the data to use the Default Template. This is because you want the data to be extracted when ever it is not on a special template.

Once you have selected all of the items data you want to extract from the product search page. Select all of the links for more information on the product detail page. This could be for instance the image. Then click the Click Element action. Set the template to "detail" and then give it a delay of five seconds and click Next. When it asks if you would like to extract data from the new page choose yes. Now select the data you want to extract like before. But this time, specify that it must be executed under the "detail" template.

Add another scrape instruction and go back to the main page. This time select the next button from the pagination links. When the Click Action option box appears please select the next page button option. This way the scraper knows that this button is actually a pagination button and will paginate through all the results. Please make sure that you have this scrape instruction last. If it isn't the last scrape instruction, it can be dragged to the end.

Then go to the schedule tab and click Create to start the scrape. You can watch the progress of the scrape in real time on the Manage Scrapes page by clicking on the row icon and then the viewer icon of the scrape.

One template may help with this the extract all structured data from a website, this tries to find basic patterns in the data and extact it. However, be warned it may not extract all the data you want.

To get started load this template.