Tools to Capture and Convert the Web

Capture HTML Tables from Websites with Ruby

Ruby API

Converting HTML tables into JSON, CSV's and Excel spreadsheets using GrabzIt's Ruby API is east just follow the examples shown here. However before you start remember that after calling the url_to_table, html_to_table or file_to_table methods the save or save_to method must be called to capture the table. If you want to quickly see if this service is right for you, you can try a live demo of capturing HTML tables from a URL.

Basic Options

The below example converts the first HTML table in a specified webpage into a CSV document.

grabzItClient.url_to_table("https://www.tesla.com")
# Then call the save or save_to method
grabzItClient.html_to_table("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>")
# Then call the save or save_to method
grabzItClient.file_to_table("tables.html")
# Then call the save or save_to method

If you don't want to automatically convert the first table in a webpage you can specify the tableNumberToInclude method. For instance specifiying a 2 would convert the second table found in a web page.

grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.tableNumberToInclude = 2

grabzItClient.url_to_table("https://www.tesla.com", options)
# Then call the save or save_to method
grabzItClient.save_to("result.csv"
grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.tableNumberToInclude = 2

grabzItClient.html_to_table("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options)
# Then call the save or save_to method
grabzItClient.save_to("result.csv")
grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.tableNumberToInclude = 2

grabzItClient.file_to_table("tables.html", options)
# Then call the save or save_to method
grabzItClient.save_to("result.csv")

You can also specify the targetElement method that will ensure only tables within the specified element id will be converted.

grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.targetElement = "stocks_table"

grabzItClient.url_to_table("https://www.tesla.com", options)
# Then call the save or save_to method
grabzItClient.save_to("result.csv")
grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.targetElement = "stocks_table"

grabzItClient.html_to_table("<html><body><table id='stocks_table'><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options)
# Then call the save or save_to method
grabzItClient.save_to("result.csv")
grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.targetElement = "stocks_table"

grabzItClient.file_to_table("tables.html", options)
# Then call the save or save_to method
grabzItClient.save_to("result.csv")

If you use the XLSX format you can capture all the tables on a webpage by passing true to the includeAllTables method. This will then put each table in a new sheet within the spreadsheet workbook.

grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.format = "xlsx"
options.includeAllTables = true

grabzItClient.url_to_table("https://www.tesla.com", options)
# Then call the save or save_to method
grabzItClient.save_to("result.xlsx")
grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.format = "xlsx"
options.includeAllTables = true

grabzItClient.html_to_table("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options)
# Then call the save or save_to method
grabzItClient.save_to("result.xlsx")
grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.format = "xlsx"
options.includeAllTables = true

grabzItClient.file_to_table("tables.html", options)
# Then call the save or save_to method
grabzItClient.save_to("result.xlsx")

Convert HTML Tables to JSON

With GrabzIt, Ruby can easily convert HTML tables into JSON to do this specify json in the format parameter. In the example below the data is read synchronously by using the save_to method, to get the JSON as a string. This can be then parsed by a library like json gem.

grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.format = "json"
options.tableNumberToInclude = 1

grabzItClient.url_to_table("https://www.tesla.com", options)

json = grabzItClient.save_to()
grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.format = "json"
options.tableNumberToInclude = 1

grabzItClient.html_to_table("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options)

json = grabzItClient.save_to()
grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.format = "json"
options.tableNumberToInclude = 1

grabzItClient.file_to_table("tables.html", options)

json = grabzItClient.save_to()

Custom Identifier

You can pass a custom identifier to the table methods as shown below, this value is then returned to your GrabzIt Ruby handler. For instance this custom identifier could be a database identifier, allowing a screenshot to be associated with a particular database record.

grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.customId = "123456"

grabzItClient.url_to_table("https://www.tesla.com", options)
# Then call the save method
grabzItClient.save("http://www.example.com/handler/index")
grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.customId = "123456"

grabzItClient.html_to_table("<html><body><h1>Hello World!</h1></body></html>", options)
# Then call the save method
grabzItClient.save("http://www.example.com/handler/index")
grabzItClient = GrabzIt::Client.new("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzIt::TableOptions.new()
options.customId = "123456"

grabzItClient.file_to_table("example.html", options)
# Then call the save method
grabzItClient.save("http://www.example.com/handler/index")