Tools to Capture and Convert the Web

Extract text from images

Often important textual information can be stored in images. However GrabzIt's Web Scraper provides the ability to automatically extract this information using Optical Character Recognition. Although as this is a form of artifical intelligence the results are not always perfect.

To extract text from images you should use the Utility.Image.extractText method as shown below.

var textArray = Utility.Image.extractText(Page.getTagAttributes('src', {"tag":{"equals":"img"}}));

These examples both get all image URL's from the web page and then pass the URL's to the extractText method which attempts to extract textual data from each image and passes back any matches as an array of strings.

If the text in the image is in a different language you need to specify the correct language code using the two letter (ISO 639-1) format as shown below.

var textArray = Utility.Image.extractText(Page.getTagAttributes('src', {"tag":{"equals":"img"}}), 'fr');