Get a Free Trial

Web Scraper API: Python, REST, PHP & ASP.NET

Through GrabzIt's Web Scraper API, you can provide your application with scraped data as a web service, enabling you to integrate extracted information directly into your database or software logic. This allows you to automatically control when scrapes start and stop, as well as request results to be re-sent programmatically.

We provide native client libraries for Python, REST, PHP, and ASP.NET to make integration effortless. However as our code is open source and available on GitHub there is no reason you can not make one for a programming language not listed here or you can ask us to create a library for you. If you do why not share it with the world?

How It Works: The Callback Handler

The integration of data into your application is achieved through a callback handler you specify on the Export tab when you create your scrape. This is a script or controller method on a publicly accessible URL on your server. When GrabzIt finishes scraping, it posts the data files to this URL sequentially.

While you can export data in various formats (CSV, Excel), we recommend using JSON or XML for the Callback Handler, as these structured formats are easily parsed by object-oriented languages.

Step 1: Start the Scrape

First, you need to tell GrabzIt to start the scrape. This is done by sending a request using the Scrape method from the client library. You must pass the status of the scrape (e.g., "Start") and the `id` of the scrape you wish to control.

Use the GrabzItScrapeClient class to start and control the scrape.

from GrabzIt import GrabzItScrapeClient

client = GrabzItScrapeClient.GrabzItScrapeClient("Sign in to view your Application Key", "Sign in to view your Application Secret")
//Get all of our scrapes
myScrapes = client.GetScrapes()
if (len(myScrapes) == 0)
{
    raise Exception('You have not created any scrapes yet! Create one here: https://grabz.it/scraper/scrape/')
}
//Start the first scrape
client.SetScrapeStatus(myScrapes[0].ID, "Start")
if (len(myScrapes[0].Results) > 0)
{
    //re-send first scrape result if it exists
    client.SendResult(myScrapes[0].ID, myScrapes[0].Results[0].ID);
}
            

Download Python Library

Get your Current Scrapes

To retrieve a list of your scrapes, make a GET request to the scraper endpoint:

curl -X GET -H "Authorization: Bearer Sign in to view your Application Key" "https://api.grabz.it/scrape/"

This will return a 200 OK response with your results in the following format:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "Scrapes": [
    {
      "Identifier": "65f69ede923679902d12e103",
      "Name": "Screenshot every page on a website",
      "Status": "Disabled",
      "NextRun": null,
      "Results": [
        {
          "Identifier": "ac6ea402-07c0-4561-b6b9-03b66dcfd756",
          "Finished": "Sun, 24 May 2026 20:53:23 GMT"
        },
        {
          "Identifier": "34e29b2f-fc87-4478-aac8-bd85976cbb9a",
          "Finished": "Mon, 25 May 2026 06:53:52 GMT"
        }]
    }
...
]}

Change Scrape Status

To change the status of a scrape (e.g., to start it), make a POST request. Replace the ID in the URL with the scrape Identifier found in the result above. Valid status values at the end of the URL are start, stop, enable, and disable.

curl -X GET -H "Authorization: Bearer Sign in to view your Application Key" "https://api.grabz.it/scrape/65f69ede923679902d12e103/status/start"

Send Results to a Webhook

To manually send specific results to your configured webhook, make a POST request. The URL requires both the scrape Identifier and the specific result Identifier you want to send:

curl -X GET -H "Authorization: Bearer Sign in to view your Application Key" "https://api.grabz.it/scrape/65f69ede923679902d12e103/send/ac6ea402-07c0-4561-b6b9-03b66dcfd756"

Set a Scrape Property

To update specific properties of a scrape, make a POST request to the property endpoint. Replace {property} in the URL with either target or variable. Note: If you submit an invalid JSON payload, the API will return a 400 status code along with a JSON schema defining the expected structure.

Updating the Target:

This is the website you wish to scrape. When updating the target, you can specify the primary URL and an array of Seed URLs.

curl -X POST -H "Authorization: Bearer Sign in to view your Application Key" \
     -H "Content-Type: application/json" \
     -d '{"URL":"https://example.com","SeedURLs":["https://example.com/1.html","https://example.com/2.html"]}' \
     "https://api.grabz.it/scrape/65f69ede923679902d23f214/property/target"
Updating a Variable:

Variables are used in the scrape instructions, allowing scrape actions to be altered dynamically. When updating a variable, your JSON payload must include the variable Name. For the data itself, you must provide either a single string Value or an Array of key-value pairs (if one is populated, the other must be null or omitted).

Example A: Setting a string value

curl -X POST -H "Authorization: Bearer Sign in to view your Application Key" \
     -H "Content-Type: application/json" \
     -d '{"Name": "MyStringVariable", "Value": "Some text value"}' \
     "https://api.grabz.it/scrape/65f69ede923679902d23f214/property/variable"

Example B: Setting a key-value array

curl -X POST -H "Authorization: Bearer Sign in to view your Application Key" \
     -H "Content-Type: application/json" \
     -d '{"Name": "MyArrayVariable", "Array": [{"Key": "Category", "Value": "News"}, {"Key": "Limit", "Value": "50"}]}' \
     "https://api.grabz.it/scrape/65f69ede923679902d23f214/property/variable"

Use the GrabzItScrapeClient class to start and control the scrape.

include("GrabzItScrapeClient.class.php");

$client = new \GrabzIt\Scraper\GrabzItScrapeClient("Sign in to view your Application Key", "Sign in to view your Application Secret");
//Get all of our scrapes
$myScrapes = $client->GetScrapes();
if (empty($myScrapes))
{
    throw new Exception("You haven't created any scrapes yet! Create one here: https://grabz.it/scraper/scrape/");
}
//Start the first scrape
$client->SetScrapeStatus($myScrapes[0]->ID, "Start");
if (count($myScrapes[0]->Results) > 0)
{
    //re-send first scrape result if it exists
    $client->SendResult($myScrapes[0]->ID, $myScrapes[0]->Results[0]->ID);
}
            

Download PHP Library

Use the GrabzItScrapeClient class to start and control the scrape.

GrabzItScrapeClient client = new GrabzItScrapeClient("Sign in to view your Application Key", "Sign in to view your Application Secret");
//Get all of our scrapes
GrabzItScrape[] myScrapes = client.GetScrapes();
if (myScrapes.Length == 0)
{
    throw new Exception("You haven't created any scrapes yet! Create one here: https://grabz.it/scraper/scrape/");
}
//Start the first scrape
client.SetScrapeStatus(myScrapes[0].ID, ScrapeStatus.Start);
if (myScrapes[0].Results.Length > 0)
{
    //re-send first scrape result if it exists
    client.SendResult(myScrapes[0].ID, myScrapes[0].Results[0].ID);
}
            

Download ASP.NET Library

Step 2: Handle the Result

Once the scrape is complete, GrabzIt will send the data to your callback URL. You use the ProcessScrape method within your handler or use the REST method to automatically capture the file and send a confirmation back to GrabzIt.

Process Scraped Data

The easiest way to process scraped data is to access the data as a JSON or XML object, as this enables the data to be easily manipulated and queried. The JSON will be structured in the following general format with the dataset name as the object attribute, itself containing an array of objects with each column name as another attribute.

{
  "Items": [
    {
      "Column_One": "https://grabz.it/",
      "Column_Two": "Found"
    },
    {
      "Column_One": "http://dfadsdsa.com/",
      "Column_Two": "Missing"
    }]
}

First of all it must be remembered that the handler will be sent all scraped data, which may include data that can not be converted to JSON or XML objects. Therefore the type of data you are receiving must be checked before being processed.

This code goes in your callback handler (e.g., handler.py).

from GrabzIt import GrabzItScrapeClient

# The library automatically reads the posted file
client = GrabzItScrapeClient.GrabzItScrapeClient("Sign in to view your Application Key", "Sign in to view your Application Secret")
scrapeResult = ScrapeResult.ScrapeResult()

if scrapeResult.getExtension() == 'json':
    json = scrapeResult.toJSON()
    for json["Dataset_Name"] in obj:
        if obj["Column_Two"] == "Found":
            #do something
        else:
            #do something else
else:
    #probably a binary file etc save it
    scrapeResult.save("results/"+scrapeResult.getFilename())
            

Handle the Result Files (Webhook)

Once your task is complete, our system will automatically send an HTTP POST request to your callback URL for each file generated. Because a task might yield multiple files, these requests are sent sequentially to your endpoint.

The request is sent as multipart/form-data. It includes a standard form field called Format (indicating the file type, such as json, xml, csv, or jpg) and the actual file itself.

Here is an example of what the raw incoming request looks like for a JSON file:

POST /your-callback-endpoint HTTP/1.1
Host: yourdomain.com
Content-Type: multipart/form-data; boundary=------------------------BoundaryString

--------------------------BoundaryString
Content-Disposition: form-data; name="Format"

json
--------------------------BoundaryString
Content-Disposition: form-data; name="file"; filename="dataset_name.json"
Content-Type: application/json

{
  "Items": [
    {
      "Column_One": "https://example.com",
      "Column_Two": "Found"
    }
  ]
}
--------------------------BoundaryString--

If the result is a binary file (like an image or PDF document), the Format field will reflect that, and the file portion will contain the raw binary stream:

POST /your-callback-endpoint HTTP/1.1
Host: yourdomain.com
Content-Type: multipart/form-data; boundary=------------------------BoundaryString

--------------------------BoundaryString
Content-Disposition: form-data; name="Format"

jpg
--------------------------BoundaryString
Content-Disposition: form-data; name="file"; filename="screenshot.jpg"
Content-Type: image/jpeg

[... raw binary image data ...]
--------------------------BoundaryString--

Processing the Files

Using your web framework's form-parsing tools, always read the Format form field first. If it indicates structured data like JSON or XML, you can parse the attached file directly into objects to query or manipulate the data. If it is a binary format, save the attached file directly to your server's disk or cloud storage.

This code goes in your callback handler (e.g., handler.php).

include("GrabzItScrapeClient.class.php");

// The library automatically reads the posted file
$client = new \GrabzIt\Scraper\GrabzItScrapeClient("Sign in to view your Application Key", "Sign in to view your Application Secret");
$scrapeResult = new \GrabzIt\Scraper\ScrapeResult();

if ($scrapeResult->getExtension() == 'json')
{
    $json = $scrapeResult->toJSON();
    foreach ($json->Dataset_Name as $obj)
    {
        if ($obj->Column_Two == "Found")
        {
            //do something
        }
        else
        {
            //do something else
        }
    }
}
else
{
    //probably a binary file etc save it
    $scrapeResult->save("results/".$scrapeResult->getFilename());
}
            

This code goes in your callback handler (e.g., /handler/). However with the ASP.NET API an extra step is required in order to read JSON or XML files, in which classes are created that match the expected data structure. Alternatively, you can use the ToString() method to get the raw JSON or XML.

using GrabzIt.Scraper;

// The library automatically reads the posted file
GrabzItScrapeClient client = new GrabzItScrapeClient("APPLICATION KEY", "APPLICATION SECRET");
ScrapeResult scrapeResult = new ScrapeResult(context.Request);

if (scrapeResult.Extension == "json")
{
    DataSet dataSet = scrapeResult.FromJSON<DataSet>();
    foreach (Item item in dataSet.Items)
    {
        if (item.Column_Two == "Found")
        {
            //do something
        }
        else
        {
            //do something else
        }
    }
}
else
{
    //probably a binary file etc save it
    scrapeResult.save(context.Server.MapPath("~/results/" + scrapeResult.Filename));
}
            

The above example shows how to loop through all the results of the dataset Dataset_Name and do specific actions depending on the value of the Column_Two attribute. Also if the file received by the handler is not a JSON file then it is just saved to results directory. While the ScrapeResult class does attempt to ensure that all posted files originate from GrabzIt's servers the extension of the files should also be checked before they are saved.

API Reference

Below is a detailed reference of the methods and properties available in the Scrape Client for controlling and processing scrapes.

GrabzItScrapeClient

MethodDescription
GrabzItScrapeClient(key, secret)Constructor. Requires your Application Key and Secret.
GrabzItScrape[] GetScrapes() Returns all of the users scrapes as an array of GrabzItScrape objects.
GrabzItScrape GetScrapes(id) Returns a GrabzItScrape object representing the desired scrape.
SetScrapeProperty(id, property) Sets the property of a scrape and returns true if successful.
SetScrapeStatus(id, status) Sets the status ("Start", "Stop", "Enable", "Disable") of a scrape and returns true if successful.
SendResult(id, resultId) Resends the result of a scrape and returns true if successful.
Note: The scrape id and result id can be found from the GetScrape method.
SetLocalProxy(proxyUrl) Sets the local proxy server to be used for all requests.

ScrapeResult

MethodDescription
string getExtension() Gets the extension of any file resulting from the scrape.
string getFilename() Gets the filename of any file resulting from the scrape.
object toJSON() Converts any JSON file resulting from the scrape into an object.
string toString() Converts any file resulting from the scrape to a string.
xml.etree.ElementTree toXML() Converts any XML file resulting from the scrape to an XML Element.
boolean save(path) Saves any file resulting from the scrape, returns true if it succeeds.

API Endpoint Reference

Below is a quick reference guide to all available REST API endpoints for controlling and processing your scrapes.

Action HTTP Method Endpoint Description
Get Scrapes GET https://api.grabz.it/scrape/ Returns all of your scrapes and their associated results as a JSON object.
Get Scrape GET https://api.grabz.it/scrape/{id} Returns a single scrape and their associated results as a JSON object.
Set Scrape Status GET https://api.grabz.it/scrape/{id}/status/{status} Changes the state of a scrape. Valid {status} values are start, stop, enable, or disable.
Send Result GET https://api.grabz.it/scrape/{id}/send/{result_id} Resends a specific scrape result to the webhook configured in the scrape's export settings.
Set Property POST https://api.grabz.it/scrape/{id}/property/{property} Updates the configuration of a scrape. Valid {property} values are target or variable. Expects a JSON payload.

GrabzItScrapeClient

MethodDescription
__construct($key, $secret)Constructor. Requires your Application Key and Secret.
GrabzItScrape[] GetScrapes() Returns all of the user's scrapes as an array of GrabzItScrape objects.
GrabzItScrape GetScrapes($id) Returns a GrabzItScrape object representing the desired scrape.
SetScrapeProperty($id, $property) Sets the property of a scrape and returns true if successful.
SetScrapeStatus($id, $status) Sets the status ("Start", "Stop", "Enable", "Disable") of a scrape and returns true if successful.
SendResult($id, $resultId) Resends the result of a scrape and returns true if successful. Note: The scrape id and result id can be found from the GetScrape method.
SetLocalProxy($proxyUrl) Sets the local proxy server to be used for all requests.

ScrapeResult

Method Description
string getExtension() Gets the extension of any file resulting from the scrape.
string getFilename() Gets the filename of any file resulting from the scrape.
object toJSON() Converts any JSON file resulting from the scrape into an object.
string toString() Converts any file resulting from the scrape to a string.
SimpleXMLElement toXML() Converts any XML file resulting from the scrape to an SimpleXMLElement.
boolean save($path) Saves any file resulting from the scrape. Returns true if it succeeds.

GrabzItScrapeClient

MethodDescription
GrabzItScrapeClient(key, secret)Constructor. Requires your Application Key and Secret.
GrabzItScrape[] GetScrapes()Returns all of the users scrapes, which includes scrape results as an array of GrabzItScrape objects.
GrabzItScrape GetScrapes(string id)Returns a GrabzItScrape object representing the desired scrape.
bool SetScrapeProperty(string id, IProperty property)Sets the property of a scrape and returns true if successful.
bool SetScrapeStatus(string id, ScrapeStatus status)Sets the status of a scrape and returns true if successful.
bool SendResult(string id, string resultId) Resends the result of a scrape and returns true if successful. The scrape id and result id can be found from the GetScrape method.
SetLocalProxy(string proxyUrl)Sets the local proxy server to be used for all requests.

ScrapeResult

PropertyDescription
string ExtensionGets the extension of any file resulting from the scrape.
string FilenameGets the filename of any file resulting from the scrape.

MethodDescription
T FromJSON<T>()Converts any JSON file resulting from the scrape to the specified type.
string ToString()Converts any file resulting from the scrape to a string.
T FromXML<T>()Converts any XML file resulting from the scrape to the specified type.
boolean Save(string path)Saves any file resulting from the scrape, returns true if it succeeds.

Debugging & Testing

If you are having issues with your callback handler, you can enable Debug Mode in the Scrape Options tab. This will output the response returned by your callback handler into the logs, allowing you to see any errors your script might be generating.