Tools to Capture and Convert the Web

How to Automatically Extract Structured Information from Unstructured Text?

Normal written text can include a lot of information that is not easily extractable. For instance a sentence maybe a review about a company but how do you know whether it is good or bad review?

A normal web scraper would not be able to extract this information. However GrabzIt can by using it's built in natural language processing abilities. As shown in the example below, the page text is analysed and returns one of the following values Very Negative, Negative, Neutral, Positive and Very Positive.

Data.save(Utility.Text.extractSentiment(Page.getText()), 'Dataset', 'Sentiment');

Although GrabzIt's Web Scraper can extract much more from text including language detection, names of locations, names of people and names of organizations. Examples of which are shown below.

//Language Detection
Data.save(Utility.Text.extractLanguageName(Page.getText()), 'Dataset', 'Language');
//Identify Geographic Locations
Data.save(Utility.Text.extractLocations(Page.getText()), 'Dataset', 'Locations');
//Identify People's Names
Data.save(Utility.Text.extractNames(Page.getText()), 'Dataset', 'Names');
//Identify Organizations Names
Data.save(Utility.Text.extractOrganizations(Page.getText()), 'Dataset', 'Organizations');

You don't have to write any of these scrape instructions yourself, as they will automatically appear when you select a applicable HTML element in our scraper wizard.