Tools to Capture and Convert the Web

Scrape email addresses from a website

The following two examples are part of the same template.

GrabzIt's Web Scraper provides several special utility methods to make it easy to extract email addresses from a website. The below example gets all HTML content from a web page and then passes it through the Utility.Text.extractAddresses method to find all valid email addresses before saving the addresses into a dataset, which is then sent to the user.

Alternatively just the first matching email address can be extracted by using the Utility.Text.extractAddress method.;

Scrape email addresses from PDF documents

PDF documents can also be scraped for email addresses in a similar way to how web pages are scraped above. As you can see in the below example the process is exactly the same except that the PDF.getText() method is used instead of Page.getHtml() method.;

Scrape email addresses from images

GrabzIt has the ability to extract text from images this means that this ability can also be leveraged to extract email addresses from images. The example below extracts any email addresses from all images on a web page.'src', {"tag":{"equals":"img"}}))));

While the scrape instructions below extract any email addreses from images found in PDF documents.{"type":"image"}))));