Tools to Capture and Convert the Web

How to pad a dataset

Sometimes when constructing a dataset in the Web Scraper more values are added into one column than another. In the example below after the first page is scraped the name John is added to the Name column along with three colors and on the next page the name David is added with along with another two colors. To give the following dataset.

NameColor
JohnYellow
DavidRed
Green
Blue
Purple

However this table is misleading as it doesn't show which name was found with which colors. Instead the pad method can be used to automatically append empty cells to the end of the dataset columns until all columns are the same length. An example of the pad method being used is shown below.

Data.save(Page.getTagValue({"class":{"equals":"Name"}}), 'Name', 'Color');
Data.save(Page.getTagValues({"class":{"equals":"Color"}}), 'Name', 'Color');
Data.pad();

These scrape instructions produce a dataset which looks like this.

NameColor
JohnYellow
Red
Green
DavidBlue
Purple

We could improve this further by specifiying the padValue parameter of the pad method to be the name found by the scraper. As in this example there is only ever one name per page the scrape instructions becomes.

var name = Page.getTagValue({"class":{"equals":"Name"}});
Data.save(name, 'Name', 'Color');
Data.save(Page.getTagValues({"class":{"equals":"Color"}}), 'Name', 'Color');
Data.pad(name);

Which puts a name in every empty cell of the name column as shown below.

NameColor
JohnYellow
JohnRed
JohnGreen
DavidBlue
DavidPurple