Tools to Capture and Convert the Web

Capture HTML Tables from Websites with PHPPHP API

There are multiple ways of converting HTML tables into JSON, CSV or Excel spreadsheets using GrabzIt's PHP API, detailed here are some of the most useful techniques. However before you start remember that after calling the URLToTable, HTMLToTable or FileToTable methods the Save or SaveTo method must be called to capture the table. If you want to quickly see if this service is right for you, you can try a live demo of capturing HTML tables from a URL.

Basic Options

The code example found below automatically converts the first HTML table discovered in a specified webpage into a CSV document.

$grabzIt->URLToTable("http://www.google.com");
//Then call the Save or SaveTo method
$grabzIt->HTMLToTable("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>");
//Then call the Save or SaveTo method
$grabzIt->FileToTable("tables.html");
//Then call the Save or SaveTo method

By default this will convert the first table it identifies into a table. However the the second table in a web page could be converted by passing a 2 to the setTableNumberToInclude method.

$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");

$options = new \GrabzIt\GrabzItTableOptions();
$options->setTableNumberToInclude(2);

$grabzIt->URLToTable("http://www.google.com", $options);
//Then call the Save or SaveTo method
$grabzIt->SaveTo("result.csv");
$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");

$options = new \GrabzIt\GrabzItTableOptions();
$options->setTableNumberToInclude(2);

$grabzIt->HTMLToTable("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", $options);
//Then call the Save or SaveTo method
$grabzIt->SaveTo("result.csv");
$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");

$options = new \GrabzIt\GrabzItTableOptions();
$options->setTableNumberToInclude(2);

$grabzIt->FileToTable("tables.html", $options);
//Then call the Save or SaveTo method
$grabzIt->SaveTo("result.csv");

You can also use the setTargetElement method to ensure that only tables within the specified element id will be converted.

$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");

$options = new \GrabzIt\GrabzItTableOptions();
$options->setTargetElement("stocks_table");

$grabzIt->URLToTable("http://www.google.com", $options);
//Then call the Save or SaveTo method
$grabzIt->SaveTo("result.csv");
$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");

$options = new \GrabzIt\GrabzItTableOptions();
$options->setTargetElement("stocks_table");

$grabzIt->HTMLToTable("<html><body><table id='stocks_table'><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", $options);
//Then call the Save or SaveTo method
$grabzIt->SaveTo("result.csv");
$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");

$options = new \GrabzIt\GrabzItTableOptions();
$options->setTargetElement("stocks_table");

$grabzIt->FileToTable("tables.html", $options);
//Then call the Save or SaveTo method
$grabzIt->SaveTo("result.csv");

Alternatively you can capture all tables on a web page by passing true to the setIncludeAllTables method, however this will only work with the XLSX and JSON formats. This option will put each table in a new sheet within the generated spreadsheet workbook.

$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");

$options = new \GrabzIt\GrabzItTableOptions();
$options->setFormat('xlsx');
$options->setIncludeAllTables(true);

$grabzIt->URLToTable("http://www.google.com", $options);
//Then call the Save or SaveTo method
$grabzIt->SaveTo("result.xlsx");
$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");

$options = new \GrabzIt\GrabzItTableOptions();
$options->setFormat('xlsx');
$options->setIncludeAllTables(true);

$grabzIt->HTMLToTable("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", $options);
//Then call the Save or SaveTo method
$grabzIt->SaveTo("result.xlsx");
$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");

$options = new \GrabzIt\GrabzItTableOptions();
$options->setFormat('xlsx');
$options->setIncludeAllTables(true);

$grabzIt->FileToTable("tables.html", $options);
//Then call the Save or SaveTo method
$grabzIt->SaveTo("result.xlsx");

Convert HTML Tables to JSON

There is sometimes the need to read HTML tables programmatically, GrabzIt enables you to do this using PHP by converting online HTML tables into JSON. To do this specify json as the format parameter. For instance in the example below we are converting a HTML Table synchronously then using the inbuilt json_decode PHP method to parse the JSON string into an object we can easily work with.

$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");

$options = new \GrabzIt\GrabzItTableOptions();
$options->setFormat("json");
$options->setTableNumberToInclude(1);

$grabzIt->URLToTable("http://www.google.com", $options);

$json = $grabzIt->SaveTo();
if ($json != null)
{
    $tableObj = json_decode($json);
}
$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");

$options = new \GrabzIt\GrabzItTableOptions();
$options->setFormat("json");
$options->setTableNumberToInclude(1);

$grabzIt->HTMLToTable("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", $options);

$json = $grabzIt->SaveTo();
if ($json != null)
{
    $tableObj = json_decode($json);
}
$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");
    
$options = new \GrabzIt\GrabzItTableOptions();
$options->setFormat("json");
$options->setTableNumberToInclude(1);

$grabzIt->FileToTable("tables.html", $options);

$json = $grabzIt->SaveTo();
if ($json != null)
{
    $tableObj = json_decode($json);
}

Custom Identifier

You can pass a custom identifier to the table methods as shown below, this value is then returned to your GrabzIt PHP handler. For instance this custom identifier could be a database identifier, allowing a extracted table to be associated with a particular database record.

$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");

$options = new \GrabzIt\GrabzItTableOptions();
$options->setCustomId(123456);

$grabzIt->URLToTable("http://www.google.com", $options);
//Then call the Save method
$grabzIt->Save("http://www.example.com/handler.php");
$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");

$options = new \GrabzIt\GrabzItTableOptions();
$options->setCustomId(123456);

$grabzIt->HTMLToTable("<html><body><h1>Hello World!</h1></body></html>", $options);
//Then call the Save method
$grabzIt->Save("http://www.example.com/handler.php");
$grabzIt = new \GrabzIt\GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");

$options = new \GrabzIt\GrabzItTableOptions();
$options->setCustomId(123456);

$grabzIt->FileToTable("example.html", $options);
//Then call the Save method
$grabzIt->Save("http://www.example.com/handler.php");