GrabzIt
Tools to Capture and Convert the Web

Capture HTML Tables from Websites with Node.jsNode.js API

Warning At least a Entry Package is required to use HTML table capture. Try it for free with a 7 day free trial. Then from $5.99 a month, unless cancelled.Start 7 Day Free Trial

There are multiple ways of converting HTML tables into JSON, CSV's and Excel spreadsheets using GrabzIt's Node.js API, detailed here are some of the most useful techniques. However before you start remember that after calling the url_to_table, html_to_table or file_to_table methods the save or save_to method must be called to capture the table. If you want to quickly see if this service is right for you, you can try a live demo of capturing HTML tables from a URL.

Basic Options

This particular method call will convert the first HTML table in the webpage of the specified URL, into a CSV document. This code snippet will convert the first HTML table found in a specified webpage or HTML input into a CSV document.

client.url_to_table("http://www.google.com");
client.html_to_table("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>");
client.file_to_table("tables.html");

By default this will convert the first table it identifies into a table. However the the second table in a web page could be converted by passing a 2 to the tableNumberToInclude property.

var options = {"tableNumberToInclude":2};

client.url_to_table("http://www.google.com", options);
var options = {"tableNumberToInclude":2};

client.html_to_table("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options);
var options = {"tableNumberToInclude":2};

client.file_to_table("tables.html", options);

You can also specify the targetElement property that will ensure only tables within the specified element id will be converted.

var options = {"targetElement":"stocks_table"};

client.url_to_table("http://www.google.com", options);
var options = {"targetElement":"stocks_table"};

client.html_to_table("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options);
var options = {"targetElement":"stocks_table"};

client.file_to_table("tables.html", options);

Alternatively you can capture all tables on a web page by passing true to the includeAllTables property, however this will only work with the JSON and XLSX formats. This option will put each table in a new sheet within the generated spreadsheet workbook.

var options = {"format","xlsx","includeHeaderNames":true,"includeAllTables":true};

client.url_to_table("http://www.google.com", options);
var options = {"format","xlsx","includeHeaderNames":true,"includeAllTables":true};

client.html_to_table("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options);
var options = {"format","xlsx","includeHeaderNames":true,"includeAllTables":true};

client.file_to_table("tables.html", options);

Convert HTML Tables to JSON

By using Node.js and GrabzIt you can convert HTML tables into JSON, just specify json in the format parameter. As shown in the example below once the save_to method is finished the oncomplete function is called with the JSON in the result variable this is then parsed by the inbuilt Node.js JSON.parse function to create a object that represents the HTML table.

var options = {"format","json","includeHeaderNames":true,"includeAllTables":true};
client.url_to_table("http://www.google.com", options);

client.save_to(null, function(error, result){
    if (result != null)
    {
        var tableObj = JSON.parse(result);
    }
});

Custom Identifier

You can pass a custom identifier to the table methods as shown below, this value is then returned to your GrabzIt Node.js handler. For instance this custom identifier could be a database identifier, allowing a screenshot to be associated with a particular database record.

var options = {"customId":123456};

client.url_to_table("http://www.google.com", options);
var options = {"customId":123456};

client.html_to_table("<html><body><h1>Hello World!</h1></body></html>", options);
var options = {"customId":123456};

client.file_to_table("example.html", options);