GrabzIt's Web Scraper provides several special utility methods to make it easy to extract email addresses from a website. The below example gets all HTML content from a web page and then passes it through the Utility.Text.extractAddresses
method to find all valid email addresses before saving the addresses into a dataset, which is then sent to the user.
To get all email addresses from a website you can also use this template.
Alternatively just the first matching email address can be extracted by using the Utility.Text.extractAddress
method.
Data.save(Utility.Text.extractAddresses(Page.getHtml()));
PDF documents can also be scraped for email addresses in a similar way to how web pages are scraped above. As you can see in the below example the process is exactly the same except that the PDF.getText()
method is used instead of Page.getHtml()
method.
Data.save(Utility.Text.extractAddresses(PDF.getText()));
GrabzIt has the ability to extract text from images this means that this ability can also be leveraged to extract email addresses from images. The example below extracts any email addresses from all images on a web page.
Data.save(Utility.Text.extractAddresses(Utility.Image.extractText(Page.getTagAttributes('src', {"tag":{"equals":"img"}}))));
While the scrape instructions below extract any email addreses from images found in PDF documents.
Data.save(Utility.Text.extractAddresses(Utility.Image.extractText(PDF.getValue({"type":"image"}))));