GrabzIt
Tools to Capture and Convert the Web

Web Scraper Documentation

While our Web Scraper tool is very straightforward the most complicated part is writing the JavaScript based scrape instructions, which describes what data to extract and how to store it. To help with this we created a series of examples and help articles. However to give you an overview here is a list of every non-standard JavaScript method we make available through our scrape instructions.

Criteria.apply(array)

Removes any items at the same location as those items removed by previous operations in this criteria from the supplied array.

  • array - required, the array to apply the changes to.

Criteria.ascending(values)

Returns the values in ascending order.

  • values - required, pass an array you wish to sort in ascending order.

Criteria.descending(values)

Returns the values in descending order.

  • values - required, pass an array you wish to sort in descending order.

Criteria.create()

Creates a new criteria ready to operations to a new array.


Criteria.equals(needles, value)

Returns the only items in the needles array that equals the passed in value.

  • needles - required, the array to filter.
  • value - required, the value items must be equal to.

Criteria.notEquals(needles, value)

Returns the only items in the needles array that do NOT equal the passed in value.

  • needles - required, the array to filter.
  • value - required, the value items must be NOT equal to.

Criteria.remove(needles, haystack)

Returns the needles array after removing any matches found in the haystack array.

  • needles - required, the array to filter.
  • haystack - required, the array to use to remove the needles.

Criteria.keep(needles, haystack)

Returns the needles array after keeping any matches found in the haystack array.

  • needles - required, the array to filter.
  • haystack - required, the array to use to keep the needles.

Criteria.greaterThan(needles, value)

Returns the only items in the needles array that are greater than the passed in value.

  • needles - required, the array to filter.
  • value - required, the value items must be greater than.

Criteria.lessThan(needles, value)

Returns the only items in the needles array that are less than the passed in value.

  • needles - required, the array to filter.
  • value - required, the value items must be less than.

Criteria.limit(values, limit)

Returns the first n values, were n is the limit variable.

  • values - required, pass an array you wish to limit.
  • limit - required, the number of values you want to return from the array.

Criteria.unique(needles)

Returns only the unique values from the needles array.

  • needles - required, pass an array you wish to remove all duplicate values from.

Data.countFilesDownloaded()

Count the total number of files downloaded.


Data.log(message)

Writes a message to the scrape log.

  • message - required, the message to write to the log.

Data.pad(padValue, dataSet)

Pads all columns present in datasets by appending empty cells to the end of columns until all columns in a particular dataset have the same number of cells.

  • padValue - optional, the value to pad the cells with. If none is specified an empty value is used.
  • dataSet - optional, the dataset to pad.

Data.readColumn(dataSet, column)

Reads a column the specified column from the specified dataset.

  • dataSet - optional, the dataset to read the value from.
  • column - optional, the column in the dataset to read the value from.

Data.save(values, dataSet, column)

Saves any value or values to the dataset and column specified.

  • value - required, pass any value or array of values you wish to save.
  • dataSet - optional, the dataset to save the value into.
  • column - optional, the column in the dataset to save the value into.

Data.saveDOCXScreenshot(urls, options, dataSet, column)

Take a DOCX screenshot of any URL or URLs and optionally puts a link to the file in the dataset and column specified.

  • url - required, pass any url or array of urls you wish to take a DOCX screenshot of.
  • options - optional, screenshot options.
  • dataSet - optional, the dataset to save the DOCX screenshot link into.
  • column - optional, the column in the dataset to save the DOCX screenshot link into.

Data.saveImageScreenshot(urls, options, dataSet, column)

Take a image screenshot of any URL or URLs and optionally puts a link to the file in the dataset and column specified.

  • url - required, pass any url or array of urls you wish to take a image screenshot of.
  • options - optional, screenshot options.
  • dataSet - optional, the dataset to save the image screenshot link into.
  • column - optional, the column in the dataset to save the image screenshot link into.

Data.savePDFScreenshot(urls, options, dataSet, column)

Take a PDF screenshot of any URL or URLs and optionally puts a link to the file in the dataset and column specified.

  • url - required, pass any url or array of urls you wish to take a PDF screenshot of.
  • options - optional, screenshot options.
  • dataSet - optional, the dataset to save the PDF screenshot link into.
  • column - optional, the column in the dataset to save the PDF screenshot link into.

Data.saveTableScreenshot(urls, options, dataSet, column)

Take a table screenshot of any URL or URLs and optionally puts a link to the file in the dataset and column specified.

  • url - required, pass any url or array of urls you wish to take a table screenshot of.
  • options - optional, screenshot options.
  • dataSet - optional, the dataset to save the table screenshot link into.
  • column - optional, the column in the dataset to save the table screenshot link into.

Data.saveFile(urls, filename, dataSet, column)

Saves any URL or URLs as a file and optionally puts a link to the file in the dataset and column specified.

  • url - required, pass any URL or array of URLs you wish to turn into a file(s).
  • filename - optional, pass any filename you wish to use instead of the generated one.
  • dataSet - optional, the dataset to save the file link into.
  • column - optional, the column in the dataset to save the file link into.

Data.saveToFile(data, filename, dataSet, column)

Saves any data or data items as a file and optionally puts a link to the file in the dataset and column specified.

  • data - required, pass any data or array of data you wish to save in a file(s).
  • filename - optional, pass any filename you wish to use instead of the generated one.
  • dataSet - optional, the dataset to save the file link into.
  • column - optional, the column in the dataset to save the file link into.

Data.saveUnique(values, dataSet, column)

Saves any unique value or values to the dataset and column specified. Duplicate values in the same dataset and column are ignored.

  • value - required, pass any value or array of values you wish to save.
  • dataSet - optional, the dataset to save the value into.
  • column - optional, the column in the dataset to save the value into.

Data.saveUniqueFile(urls, filename, dataSet, column)

Saves any URL or URLs as a file and optionally puts a link to the file in the dataset and column specified. This method will only save unique values to the dataset and column specified, or if no dataset and column unique URLs for the entire scrape.

  • url - required, pass any URL or array of URLs you wish to turn into a file(s).
  • filename - optional, pass any filename you wish to use instead of the generated one.
  • dataSet - optional, the dataset to save the file link into.
  • column - optional, the column in the dataset to save the file link into.

Data.saveVideoAnimation(videoUrls, options, dataSet, column)

Convert an online video or videos into animated GIF(s), and optionally puts a link to the file in the dataset and column specified.

  • videoUrl - required, pass any video url or array of urls you wish to convert into animated GIF(s).
  • options - optional, animation options.
  • dataSet - optional, the dataset to save the animation link into.
  • column - optional, the column in the dataset to save the animation link into.

Global.get(name)

Gets a saved variable value.

  • name - required, the name of the variable to return.

Global.set(name, values, persist)

Saves any value or values between scraped pages.

  • name - required, the name of the variable to save.
  • value - required, the variable value to save.
  • persist - optional, if true the variable will be kept between scrapes.

Navigation.addUrlRestriction(urls, allow)

Restrict the scraper to scraping one or more urls.

  • url - required, pass any url or array of urls you wish to restrict.
  • allow - optional, if true the scraper will only scrape the specified URL, otherwise it will skip the URL. Defaults to true.

Navigation.removeUrlRestriction(urls)

Remove the URL restrictions for the specified URL.

  • url - required, pass any url or array of urls you wish to stop restricting.

Navigation.clearCookies()

Remove all the cookies for the current scrape.


Navigation.clearUrlRestrictions()

Remove all the URL restrictions on the scraper.


Navigation.click(filter)

Click on a HTML element.

  • filter - required, the filter used to identify which HTML element to click.

Navigation.goTo(url)

Go immediately to the URL specified.

  • url - required, the URL to navigate to.

Navigation.select(value, filter)

Select a valid in a select element.

  • value - required, the value to set.
  • filter - required, the filter used to identify which select element to select.

Navigation.stopScraping(abort)

Stop scraping immediately.

  • abort - optional, if true stop any more processing and do not export or transmit any results.

Navigation.type(text, filter)

Select a valid in a select element.

  • text - required, the text to type.
  • filter - required, the filter used to identify which element to type into.

Navigation.wait(seconds)

Wait a number of seconds before continuing. This is most useful when using this click, select and type commands.

  • seconds - required, the number of seconds to wait.

Page.contains(find, attribute, filter)

Returns true if the Page contains the text to find.

  • find - required, the text to find.
  • attribute - optional, the attribute to search in.
  • filter - optional, the filter used to identify which element to search in.

Page.exists(filter)

Returns true if the Page contains an element that matches the search filter.

  • filter - required, the filter used to identify which element to search for.

Page.getAuthor()

Gets the page author if one is specified.


Page.getDescription()

Gets the page description if one is specified.


Page.getFavIconUrl()

Gets the FavIcon URL of the page.


Page.getHtml()

Gets the raw page HTML.


Page.getKeywords()

Gets the keywords of the page being scraped.


Page.getLastModified()

Gets the time the webpage was last modified either from the page metadata or the response headers.


Page.getPageNumber()

Gets the page number of the current URL that is being scraped.


Page.getPreviousUrl(index)

Gets the previous url, a -1 indicates the last URL, while a lower number indicates an either earlier URL.

  • index - optional, the index of the previous page to return. Defaults to -1.

Page.getTagAttribute(attribute, filter, pattern)

Returns the matching attribute value.

  • attribute - required, the attribute to search for.
  • filter - optional, the filter used to identify which element to search for.
  • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
    For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.

Page.getTagAttributes(attribute, filter, pattern)

Returns the matching attribute values.

  • attribute - required, the attribute to search for.
  • filter - optional, the filter used to identify which element to search for.
  • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
    For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.

Page.getTagValue(filter, pattern)

Returns the matching element value.

  • filter - optional, the filter used to identify which element(s) to search for.
  • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
    For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.

Page.getTagValues(filter, pattern)

Returns the matching element values.

  • filter - optional, the filter used to identify which element(s) to search for.
  • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
    For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.

Page.getText()

Gets the visible text from the page.


Page.getTitle()

Gets the title of the page.


Page.getUrl()

Gets the URL of the page.


Page.getValueXPath(xpath, pattern)

Returns the value which matches the supplied XPATH.

  • xpath - required, the XPATH to match the element value or attribute.
  • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
    For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.

Page.getValuesXPath(xpath, pattern)

Returns the values which matches the supplied XPATH.

  • xpath - required, the XPATH to match the element values or attributes.
  • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
    For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.

Page.valid()

Returns true if the URL currently being scraped is a valid web page.


PDF.contains(find, filter)

Returns true if the PDF document contains the text to find.

  • find - required, the text to find.
  • attribute - optional, the attribute to search in.
  • filter - optional, the filter used to identify which element to search in.

PDF.exists(filter)

Returns true if the PDF document contains the document part defined in the filter.

  • filter - optional, the filter used to identify which element to search for.

PDF.getAuthor()

Gets the PDF author if one is specified.


PDF.getKeywords()

Gets the keywords of the PDF document being scraped.


PDF.getLastModified()

Gets the time the PDF document was last modified from the document metadata.


PDF.getPageNumber()

Gets the page number of the current URL that is being scraped.


PDF.getPreviousUrl(index)

Gets the previous url, a -1 indicates the last URL, while a lower number indicates an either earlier URL.

  • index - optional, the index of the previous page to return. Defaults to -1.

PDF.getText()

Gets the text from the PDF document.


PDF.getTitle()

Gets the title of the PDF document.


PDF.getUrl()

Gets the URL of the PDF document.


PDF.getValue(filter, pattern)

Returns the matching value.

  • filter - optional, the filter used to identify which part(s) of the document to search for.
  • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
    For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.

PDF.getValues(filter, pattern)

Returns the matching values.

  • filter - optional, the filter used to identify which part(s) of the document to search for.
  • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
    For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.

PDF.valid()

Returns true if the URL currently being scraped is a valid PDF document.


Utility.Array.contains(needless, haystack)

Returns true if the needle is in the haystack array.

  • needle - required, pass any value or array of values to find.
  • haystack - required, the array to search for the needle or needles.

Utility.Array.unique(values)

Returns the unique values from the values array.

  • values - required, pass any array of values to make unique.

Utility.Email.extractAddress(text)

Extracts the first email address from the specified text parameter.

  • text - required, the text to extract a email address from.

Utility.Email.extractAddresses(text)

Extracts all of the email addresses from the specified text parameter.

  • text - required, the text to extract all of the email addresses from.

Utility.Image.extractText(urls, language)

Attempts to use Optical Character Recognition to extract text from any specified images.

  • url - required, pass any URL or array of URLs of images you wish to extract text from.
  • language - optional, the language of the text to extract in the two letter ISO 639-1 format. Defaults to 'en'.

Utility.URL.addQueryStringParameter(urls, key, value)

Add a querystring parameter to any URL or URLs.

  • url - required, pass any URL or array of URLs you wish to add a query string parameter to.
  • key - required, the key of the parameter to add.
  • value - required, the value of the parameter to add.

Utility.URL.getQueryStringParameter(urls, key)

Gets the value of a querystring parameter from any URL or URLs.

  • url - required, pass any URL or array of URLs you wish to read the querystring parameter from.
  • key - required, the key of the parameter to read.

Utility.URL.removeQueryStringParameter(urls, key)

Remove a querystring parameter from any URL or URLs.

  • url - required, pass any URL or array of URLs you wish to remove a querystring parameter from.
  • key - required, the key of the parameter to remove.

Utility.URL.exists(urls)

Check if the URL or URLs actually exist by calling each URL.

  • url - required, pass any URL or array of URLs you wish to check exist.

Try all our premium features for free with a 7 day free trial. Then from $5.99 a month, unless cancelled.
  • More Captures
  • More Features
  • More API's
  • Bigger Scrapes
  • Bigger Screenshots
  • Bigger Everything
Start Free Trial