GrabzIt's Web Scraper API, Data as a Service!

Through GrabzIt's Web Scraper API we can provide your application with scraped data as a web service enabling you to integrate scraped information back into your application. The integration of data into your application is achieved through a callback handler, which is a script or application on a publicly accessible URL that processess the data sent from GrabzIt's Web Scraper. Complete files are posted to this callback handler sequentially, so for instance it could start with a series of images before ending with a JSON file, the data in the JSON file could then be easily parsed using the helper methods in the client library. This API also allows your application to automatically control when scrapes start and stop, as well as requesting results to be re-sent.

To get started first create a scrape then choose Callback URL option from the Export Options tab and enter the URL to your callback handler e.g http://www.example.com/handler/

If you are having any issues with your callback handler choose Debug mode from the Scrape Options tab. This will output the response returned by the callback handler into the logs.

Callback Handler

To process scraped data inside your callback handler choose the JSON or XML options on the Export Tab as this returns the data in a format that can easily be read by any object oriented language.

For data that is not JSON or XML data your processing options are limited as the data is not very machine readable so the best option may be to save the file to disk or in a database.

To help the integration process GrabzIt provides the following scraper API's for the below languages. However as our code is open source and available on GitHub there is no reason you can not make one for a programming language not listed here or you can ask us to create a library for you. If you do why not share it with the world?

While the callback handler is the best way of closely integrating the GrabzIt's web scraper with your applicaion you can also integrate via Amazon S3, Dropbox, FTP and WebDav.