GrabzIt
Tools to Capture and Convert the Web

Capture HTML Tables from Websites with PythonPython API

Warning At least a Entry Package is required to use HTML table capture. Try it for free with a 7 day free trial. Then from $5.99 a month, unless cancelled.Start 7 Day Free Trial

There are multiple ways of converting HTML tables into CSV's and Excel spreadsheets using GrabzIt's Python API, detailed here are some of the most useful techniques. However before you start remember that after calling the URLToTable, HTMLToTable or FileToTable methods the Save or SaveTo method must be called to capture the table.

Basic Options

The below code snippet automatically converts the first HTML table in a specified webpage into a CSV document that can then be downloaded or parsed.

grabzIt.URLToTable("http://www.google.com")
grabzIt.HTMLToTable("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>")
grabzIt.FileToTable("tables.html")

By default this will convert the first table it identifies into a table. However the the second table in a web page could be converted by passing a 2 to the tableNumberToInclude attribute.

options = GrabzItTableOptions.GrabzItTableOptions()
options.tableNumberToInclude = 2

grabzIt.URLToTable("http://www.google.com", options)
options = GrabzItTableOptions.GrabzItTableOptions()
options.tableNumberToInclude = 2

grabzIt.HTMLToTable("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options)
options = GrabzItTableOptions.GrabzItTableOptions()
options.tableNumberToInclude = 2

grabzIt.FileToTable("tables.html", options)

You can also specify the targetElement attribute that will ensure only tables within the specified element id will be converted.

options = GrabzItTableOptions.GrabzItTableOptions()
options.targetElement = "stocks_table"

grabzIt.URLToTable("http://www.google.com", options)
options = GrabzItTableOptions.GrabzItTableOptions()
options.targetElement = "stocks_table"

grabzIt.HTMLToTable("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options)
options = GrabzItTableOptions.GrabzItTableOptions()
options.targetElement = "stocks_table"

grabzIt.FileToTable("tables.html", options)

Alternatively you can capture all tables on a web page by passing true to the includeAllTables attribute, however this will only work with the XLSX format. This option will put each table in a new sheet within the generated spreadsheet workbook.

options = GrabzItTableOptions.GrabzItTableOptions()
options.format = 'xlsx'
options.includeAllTables = True

grabzIt.URLToTable("http://www.google.com", options)
options = GrabzItTableOptions.GrabzItTableOptions()
options.format = 'xlsx'
options.includeAllTables = True

grabzIt.HTMLToTable("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options)
options = GrabzItTableOptions.GrabzItTableOptions()
options.format = 'xlsx'
options.includeAllTables = True

grabzIt.FileToTable("tables.html", options)

Convert HTML Tables to JSON

Using Python and GrabzIt's HTML table conversion service enables you to convert HTML tables into JSON. The first step as shown below is to specify json in the format parameter. We then get the JSON string synchronously with the SaveTo method, you can then use your favourite JSON parser for Python to convert the JSON string into a object.

options = GrabzItTableOptions.GrabzItTableOptions()
options.format = "json"
options.tableNumberToInclude = 1

grabzIt.URLToTable("http://www.google.com", options)

json = grabzIt.SaveTo()
options = GrabzItTableOptions.GrabzItTableOptions()
options.format = "json"
options.tableNumberToInclude = 1

grabzIt.HTMLToTable("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options)

json = grabzIt.SaveTo()
options = GrabzItTableOptions.GrabzItTableOptions()
options.format = "json"
options.tableNumberToInclude = 1

grabzIt.FileToTable("tables.html", options)

json = grabzIt.SaveTo()

Custom Identifier

You can pass a custom identifier to the table methods as shown below, this value is then returned to your GrabzIt Python handler. For instance this custom identifier could be a database identifier, allowing a screenshot to be associated with a particular database record.

options = GrabzItTableOptions.GrabzItTableOptions()
options.customId = "123456"

grabzIt.URLToTable("http://www.google.com", options)
options = GrabzItTableOptions.GrabzItTableOptions()
options.customId = "123456"

grabzIt.HTMLToTable("<html><body><h1>Hello World!</h1></body></html>", options)
options = GrabzItTableOptions.GrabzItTableOptions()
options.customId = "123456"

grabzIt.FileToTable("example.html", options)