Tools to Capture and Convert the Web

PHP Scraper API with GrabzIt

PHP Scraper API

Our PHP Scraper API enables you to integrate GrabzIt's Web Scraper into your app. This is a much better solution than the simple HTML DOM parsers, usually implemented by PHP scraping apps.

To start with you must first create a scrape. Then to parse the web in your app, you must download the PHP Library. Then look at the example handler located inside the download.

Process Scraped Data

The easiest way to process scraped data is to access the data as a JSON or XML object. This allows for easy querying of the data. The JSON will have the structure below, with the dataset name as a property. The property contained an array of objects with each column name and value as another attribute.

{
  "Dataset_Name": [
    {
      "Column_One": "https://grabz.it/",
      "Column_Two": "Found"
    },
    {
      "Column_One": "http://dfadsdsa.com/",
      "Column_Two": "Missing"
    }]
}

First of all it must be remembered that the handler will be sent all extracted data. This may include data that can not be converted to JSON or XML objects. Therefore the type of data you are receiving must be checked before being processed.

$scrapeResult = new \GrabzIt\Scraper\ScrapeResult();

if ($scrapeResult->getExtension() == 'json')
{
    $json = $scrapeResult->toJSON();
    foreach ($json->Dataset_Name as $obj)
    {
        if ($obj->Column_Two == "Found")
        {
            //do something
        }
        else
        {
            //do something else
        }
    }
}
else
{
    //probably a binary file etc save it
    $scrapeResult->save("results/".$scrapeResult->getFilename());
}

The above example shows how to loop through all the results of the dataset Dataset_Name. Then for each result do a specific action depending on the value of the Column_Two attribute.

If the handler does not receive a JSON file, it simply saves the file to the results directory. While the ScrapeResult class does attempt to ensure that all posted files originate from GrabzIt's servers. You should also check the extension of the files before saving them.

ScrapeResult Methods

The ScrapeResult class has all the methods listed below that you can use to process scrape results.

Debugging

The best way to debug your PHP handler is to download the results for a scrape from the web scrapes page. Then save the file you are having an issue with to an accessible location. You can then pass the path of this file to the constructor of the ScrapeResult class. This allows you to debug your handler without having to do a new scrape each time, as shown below.

$scrapeResult = new \GrabzIt\Scraper\ScrapeResult("data.json");

//the rest of your handler code remains the same

Controlling a Scrape

With GrabzIt's Web Scraper API you can change that status of a scrape. By remotely starting, stopping, enabling or disabling a scrape as needed. This is shown in the example below. By passing the ID of the scrape along with the desired scrape status to the SetScrapeStatus method.

$client = new \GrabzIt\Scraper\GrabzItScrapeClient("Sign in to view your Application Key", "Sign in to view your Application Secret");
//Get all of our scrapes
$myScrapes = $client->GetScrapes();
if (empty($myScrapes))
{
    throw new Exception("You haven't created any scrapes yet! Create one here: https://grabz.it/scraper/scrape/");
}
//Start the first scrape
$client->SetScrapeStatus($myScrapes[0]->ID, "Start");
if (count($myScrapes[0]->Results) > 0)
{
    //re-send first scrape result if it exists
    $client->SendResult($myScrapes[0]->ID, $myScrapes[0]->Results[0]->ID);
}

GrabzItScrapeClient Methods and Properties

The GrabzItScrapeClient class contains all the methods and properties that users can use to control the web scrapes.