Tools to Capture and Convert the Web

Web Scraper Documentation

This is an overview of our special scrape instruction methods we make available through our web scraper.

Criteria.apply(array)

Removes any items at the same location as those items removed by previous operations in this criteria from the supplied array.

  • array - required, the array to apply the changes to.

Criteria.ascending(values)

Returns the values in ascending order.

  • values - required, pass an array you wish to sort in ascending order.

Criteria.descending(values)

Returns the values in descending order.

  • values - required, pass an array you wish to sort in descending order.

Criteria.contains(needles, value)

Returns only items in the needles array that contains the specified value.

  • needles - required, the array to filter.
  • value - required, the value items must be contain.

Criteria.create(array)

Creates a new criteria ready to perform operations on a new array.

  • array - required, the array of columns to apply the changes to.

  • Criteria.equals(needles, value)

    Returns only items in the needles array that equals the specified value.

    • needles - required, the array to filter.
    • value - required, the value items must be equal to.

    Criteria.notEquals(needles, value)

    Returns the only items in the needles array that do NOT equal the specified value.

    • needles - required, the array to filter.
    • value - required, the value items must be NOT equal to.

    Criteria.remove(needles, haystack)

    Returns the needles array after removing any matches found in the haystack array.

    • needles - required, the array to filter.
    • haystack - required, the array to use to remove the needles.

    Criteria.repeat(array)

    Repeat the items in the array until it matches the length of the longest column.

    • array - required, the array to repeat.

    Criteria.keep(needles, haystack)

    Returns the needles array after keeping any matches found in the haystack array.

    • needles - required, the array to filter.
    • haystack - required, the array to use to keep the needles.

    Criteria.greaterThan(needles, value)

    Returns the only items in the needles array that are greater than the specified value.

    • needles - required, the array to filter.
    • value - required, the value items must be greater than.

    Criteria.lessThan(needles, value)

    Returns the only items in the needles array that are less than the specified value.

    • needles - required, the array to filter.
    • value - required, the value items must be less than.

    Criteria.limit(values, limit)

    Returns the first n values, were n is the limit variable.

    • values - required, pass an array you wish to limit.
    • limit - required, the number of values you want to return from the array.

    Criteria.unique(needles)

    Returns only the unique values from the needles array.

    • needles - required, pass an array you wish to remove all duplicate values from.

    Data.countFilesDownloaded()

    Count the total number of files downloaded.


    Data.log(message)

    Writes a message to the scrape log.

    • message - required, the message to write to the log.

    Data.pad(padValue, dataSet)

    Pads all columns present in datasets by appending empty cells to the end of columns until all columns in a particular dataset have the same number of cells.

    • padValue - optional, the value to pad the cells with. If none is specified an empty value is used.
    • dataSet - optional, the dataset to pad.

    Data.readColumn(dataSet, column)

    Reads a column the specified column from the specified dataset.

    • dataSet - optional, the dataset to read the value from.
    • column - optional, the column in the dataset to read the value from.

    Data.save(values, dataSet, column)

    Saves any value or values to the dataset and column specified.

    • value - required, pass any value or array of values you wish to save.
    • dataSet - optional, the dataset to save the value into.
    • column - optional, the column in the dataset to save the value into.

    Data.saveDOCXScreenshot(htmlOrUrls, options, dataSet, column)

    Take a DOCX screenshot of HTML, URL or URLs and optionally puts a link to the file in the dataset and column specified.

    • url - required, pass any url or array of urls you wish to take a DOCX screenshot of.
    • options - optional, screenshot options.
    • dataSet - optional, the dataset to save the DOCX screenshot link into.
    • column - optional, the column in the dataset to save the DOCX screenshot link into.

    Data.saveImageScreenshot(htmlOrUrls, options, dataSet, column)

    Take a image screenshot of HTML, URL or URLs and optionally puts a link to the file in the dataset and column specified.

    • url - required, pass any url or array of urls you wish to take a image screenshot of.
    • options - optional, screenshot options.
    • dataSet - optional, the dataset to save the image screenshot link into.
    • column - optional, the column in the dataset to save the image screenshot link into.

    Data.savePDFScreenshot(htmlOrUrls, options, dataSet, column)

    Take a PDF screenshot of HTML, URL or URLs and optionally puts a link to the file in the dataset and column specified.

    • url - required, pass any url or array of urls you wish to take a PDF screenshot of.
    • options - optional, screenshot options.
    • dataSet - optional, the dataset to save the PDF screenshot link into.
    • column - optional, the column in the dataset to save the PDF screenshot link into.

    Data.saveTableScreenshot(htmlOrUrls, options, dataSet, column)

    Take a table screenshot of HTML, URL or URLs and optionally puts a link to the file in the dataset and column specified.

    • url - required, pass any url or array of urls you wish to take a table screenshot of.
    • options - optional, screenshot options.
    • dataSet - optional, the dataset to save the table screenshot link into.
    • column - optional, the column in the dataset to save the table screenshot link into.

    Data.saveFile(urls, filename, dataSet, column)

    Saves any URL or URLs as a file and optionally puts a link to the file in the dataset and column specified.

    • url - required, pass any URL or array of URLs you wish to turn into a file(s).
    • filename - optional, pass any filename you wish to use instead of the generated one.
    • dataSet - optional, the dataset to save the file link into.
    • column - optional, the column in the dataset to save the file link into.

    Data.saveToFile(data, filename, dataSet, column)

    Saves any data or data items as a file and optionally puts a link to the file in the dataset and column specified.

    • data - required, pass any data or array of data you wish to save in a file(s).
    • filename - optional, pass any filename you wish to use instead of the generated one.
    • dataSet - optional, the dataset to save the file link into.
    • column - optional, the column in the dataset to save the file link into.

    Data.saveUnique(values, dataSet, column)

    Saves any unique value or values to the dataset and column specified. Duplicate values in the same dataset and column are ignored.

    • value - required, pass any value or array of values you wish to save.
    • dataSet - optional, the dataset to save the value into.
    • column - optional, the column in the dataset to save the value into.

    Data.saveUniqueFile(urls, filename, dataSet, column)

    Saves any URL or URLs as a file and optionally puts a link to the file in the dataset and column specified. This method will only save unique values to the dataset and column specified, or if no dataset and column unique URLs for the entire scrape.

    • url - required, pass any URL or array of URLs you wish to turn into a file(s).
    • filename - optional, pass any filename you wish to use instead of the generated one.
    • dataSet - optional, the dataset to save the file link into.
    • column - optional, the column in the dataset to save the file link into.

    Data.saveVideoAnimation(videoUrls, options, dataSet, column)

    Convert an online video or videos into animated GIF(s), and optionally puts a link to the file in the dataset and column specified.

    • videoUrl - required, pass any video url or array of urls you wish to convert into animated GIF(s).
    • options - optional, animation options.
    • dataSet - optional, the dataset to save the animation link into.
    • column - optional, the column in the dataset to save the animation link into.

    Global.get(name)

    Gets a saved variable value.

    • name - required, the name of the variable to return.

    Global.set(name, values, persist)

    Saves any value or values between scraped pages.

    • name - required, the name of the variable to save.
    • value - required, the variable value to save.
    • persist - optional, if true the variable will be kept between scrapes.

    Navigation.addTemplate(urls, template)

    Define the URL or URLs as belonging to the specified template. This allows scrape instructions to be restricted to only executing on certain URLs.

    • url - required, pass any url or array of urls you wish to define a template for.
    • template - required.

    Navigation.addUrlRestriction(urls, allow)

    Restrict the scraper to scraping one or more urls.

    • url - required, pass any url or array of urls you wish to restrict.
    • allow - optional, if true the scraper will only scrape the specified URL, otherwise it will skip the URL. Defaults to true.

    Navigation.removeUrlRestriction(urls)

    Remove the URL restrictions for the specified URL.

    • url - required, pass any url or array of urls you wish to stop restricting.

    Navigation.clearCookies()

    Remove all the cookies for the current scrape.


    Navigation.clearUrlRestrictions()

    Remove all the URL restrictions on the scraper.


    Navigation.click(filter)

    Click on a HTML element.

    • filter - required, the filter used to identify which HTML element to click.

    Navigation.goTo(url)

    Go immediately to the URL specified.

    • url - required, the URL to navigate to.

    Navigation.isTemplate(template)

    Returns true if the current page belongs to the specified template.

    • template - required, the template to check if the page belongs to.

    Navigation.select(values, filter)

    Select one or more valid values in a select element.

    • value - required, the one or more values to select.
    • filter - required, the filter used to identify which select element to select.

    Navigation.stopScraping(abort)

    Stop scraping immediately.

    • abort - optional, if true stop any more processing and do not export or transmit any results.

    Navigation.type(texts, filter)

    Type text into a element.

    • text - required, the one or more items of text to type.
    • filter - required, the filter used to identify which element to type into.

    Navigation.wait(seconds)

    Wait a number of seconds before continuing. This is most useful when using this click, select and type commands.

    • seconds - required, the number of seconds to wait.

    Page.contains(find, attribute, filter)

    Returns true if the Page contains the text to find.

    • find - required, the text to find.
    • attribute - optional, the attribute to search in.
    • filter - optional, the filter used to identify which element to search in.

    Page.exists(filter)

    Returns true if the Page contains an element that matches the search filter.

    • filter - required, the filter used to identify which element to search for.

    Page.getAuthor()

    Gets the page author if one is specified.


    Page.getDescription()

    Gets the page description if one is specified.


    Page.getFavIconUrl()

    Gets the FavIcon URL of the page.


    Page.getHtml()

    Gets the raw page HTML.


    Page.getKeywords()

    Gets the keywords of the page being scraped.


    Page.getLastModified()

    Gets the time the webpage was last modified either from the page metadata or the response headers.


    Page.getPageNumber()

    Gets the page number of the current URL that is being scraped.


    Page.getPreviousUrl(index)

    Gets the previous url, a -1 indicates the last URL, while a lower number indicates an either earlier URL.

    • index - optional, the index of the previous page to return. Defaults to -1.

    Page.getTagAttribute(attribute, filter, pattern, patternBehaviour)

    Returns the matching attribute value.

    • attribute - required, the attribute to search for.
    • filter - optional, the filter used to identify which element to search for.
    • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
      For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.
    • patternBehaviour - optional, by default it will only return the matched value, however if you specify 'trim' trimmed values will be returned.

    Page.getTagAttributes(attribute, filter, pattern, patternBehaviour)

    Returns the matching CSS values.

    • attribute - required, the CSS attribute to search for.
    • filter - optional, the filter used to identify which element to search for.
    • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
      For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.
    • patternBehaviour - optional, by default it will only return the matched value, however if you specify 'trim' trimmed values will be returned.

    Page.getTagCSSAttribute(attribute, filter, pattern, patternBehaviour)

    Returns the matching CSS value.

    • attribute - required, the CSS attribute to search for.
    • filter - optional, the filter used to identify which element to search for.
    • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
      For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.
    • patternBehaviour - optional, by default it will only return the matched value, however if you specify 'trim' trimmed values will be returned.

    Page.getTagCSSAttributes(attribute, filter, pattern, patternBehaviour)

    Returns the matching attribute values.

    • attribute - required, the attribute to search for.
    • filter - optional, the filter used to identify which element to search for.
    • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
      For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.
    • patternBehaviour - optional, by default it will only return the matched value, however if you specify 'trim' trimmed values will be returned.

    Page.getTagValue(filter, pattern, patternBehaviour)

    Returns the matching element value.

    • filter - optional, the filter used to identify which element(s) to search for.
    • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
      For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.
    • patternBehaviour - optional, by default it will only return the matched value, however if you specify 'trim' trimmed values will be returned.

    Page.getTagValues(filter, pattern, patternBehaviour)

    Returns the matching element values.

    • filter - optional, the filter used to identify which element(s) to search for.
    • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
      For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.
    • patternBehaviour - optional, by default it will only return the matched value, however if you specify 'trim' trimmed values will be returned.

    Page.getText()

    Gets the visible text from the page.


    Page.getTitle()

    Gets the title of the page.


    Page.getUrl()

    Gets the URL of the page.


    Page.getValueXPath(xpath, pattern)

    Returns the value which matches the supplied XPATH.

    • xpath - required, the XPATH to match the element value or attribute.
    • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
      For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.

    Page.getValuesXPath(xpath, pattern)

    Returns the values which matches the supplied XPATH.

    • xpath - required, the XPATH to match the element values or attributes.
    • pattern - optional, the pattern defines how to capture the desired part of the returned text. The the value to capture is indicated by the {{VALUE}} in the pattern.
      For example to capture the age from 'My age is 33.' the pattern 'My age is {{VALUE}}.' would be used.

    Page.valid()

    Returns true if the URL currently being scraped is a valid web page.


    Utility.Array.clean(values)

    Returns all non-null and empty values from the values array.

    • values - required, pass any array of values to clean.

    Utility.Array.contains(values)

    Returns true if the needle is in the haystack array.

    • needle - required, pass any value or array of values to find.
    • haystack - required, the array to search for the needle or needles.

    Utility.Array.merge(array1, array2)

    Merges two arrays into one replacing a empty or null value with a value from the second array. Both arrays must be of equal size.

    • array1 - required, pass array of values to merge.
    • array2 - required, pass array of values to merge.

    Utility.Array.unique(values)

    Returns the unique values from the values array.

    • values - required, pass any array of values to make unique.

    Utility.Text.extractAddress(text)

    Extracts the first email address within the specified text parameter.

    • text - required, the text to extract a email address from.

    Utility.Text.extractAddresses(text)

    Extracts all of the email addresses from within the specified text parameter.

    • text - required, the text to extract all of the email addresses from.

    Utility.Text.extractLocation(text, language)

    Automatically extracts the first location from within the specifed text parameter.

    • text - required, the text to extract the location from.
    • language - optional, the language of the text to extract in the two letter ISO 639-1 format. Defaults to 'en'. Use 'auto' to attempt to automatically detect the text language.

    Utility.Text.extractLocations(text, language)

    Automatically extracts locations from within the specifed text parameter.

    • text - required, the text to extract locations from.
    • language - optional, the language of the text to extract in the two letter ISO 639-1 format. Defaults to 'en'. Use 'auto' to attempt to automatically detect the text language.

    Utility.Text.extractLanguageName(text)

    Automatically extracts the language specifed from within the text parameter.

    • text - required, the text to extract the language from.

    Utility.Text.extractLanguageCode(text)

    Automatically extracts the language specifed from within the text parameter.

    • text - required, the text to extract the language from.

    Utility.Text.extractName(text, language)

    Automatically extracts the first name from within the specifed text parameter.

    • text - required, the text to extract the name from.
    • language - optional, the language of the text to extract in the two letter ISO 639-1 format. Defaults to 'en'. Use 'auto' to attempt to automatically detect the text language.

    Utility.Text.extractNames(text, language)

    Automatically extracts names from within the specifed text parameter.

    • text - required, the text to extract the name from.
    • language - optional, the language of the text to extract in the two letter ISO 639-1 format. Defaults to 'en'. Use 'auto' to attempt to automatically detect the text language.

    Utility.Text.extractOrganization(text, language)

    Automatically extracts the first organization from within the specifed text parameter.

    • text - required, the text to extract the organization from.
    • language - optional, the language of the text to extract in the two letter ISO 639-1 format. Defaults to 'en'. Use 'auto' to attempt to automatically detect the text language.

    Utility.Text.extractOrganizations(text, language)

    Automatically extracts organizations from within the specifed text parameter.

    • text - required, the text to extract organizations from.
    • language - optional, the language of the text to extract in the two letter ISO 639-1 format. Defaults to 'en'. Use 'auto' to attempt to automatically detect the text language.

    Utility.Text.extractSentiment(text)

    Automatically extracts the sentiment from within the specifed text parameter.

    • text - required, the text to extract the sentiment from.

    Utility.Image.extractText(urls, language)

    Attempts to use Optical Character Recognition to extract text from any specified images.

    • url - required, pass any URL or array of URLs of images you wish to extract text from.
    • language - optional, the language of the text to extract in the two letter ISO 639-1 format. Defaults to 'en'.

    Utility.URL.addQueryStringParameter(urls, key, value)

    Add a querystring parameter to any URL or URLs.

    • url - required, pass any URL or array of URLs you wish to add a query string parameter to.
    • key - required, the key of the parameter to add.
    • value - required, the value of the parameter to add.

    Utility.URL.getQueryStringParameter(urls, key)

    Gets the value of a querystring parameter from any URL or URLs.

    • url - required, pass any URL or array of URLs you wish to read the querystring parameter from.
    • key - required, the key of the parameter to read.

    Utility.URL.removeQueryStringParameter(urls, key)

    Remove a querystring parameter from any URL or URLs.

    • url - required, pass any URL or array of URLs you wish to remove a querystring parameter from.
    • key - required, the key of the parameter to remove.

    Utility.URL.exists(urls)

    Check if the URL or URLs actually exist by calling each URL.

    • url - required, pass any URL or array of URLs you wish to check exist.

Try all our premium features for free with a 7 day free trial. Then from $5.99 a month, unless cancelled.
  • More Captures
  • More Features
  • More API's
  • Bigger Scrapes
  • Bigger Captures
  • Bigger Everything
Start Free Trial