There are no forum posts on this topic! Why don't you write one?
Hi community,
Please forgive my ignorance. I am a business guy, not a technical guy. I am sure it will shine through in my questions.
Is it possible to use the more robust selection and control parameters in the Web Scrape tool, like URL Pattern (most critical) and Follow Lines control settings wiht the screenshot tool. All I want is to crawl a base URL and extended domains from that URL with limitations on how many layers deep the crawl will go. My objective is just to grap news articles from certain domain set. I would prefer to only take the text as output but I can live with the images being delivered in teh docx. The output from the screenshot tool is great for me with my base requirement but the selection and control functionality is too limited. I would have to know all the subdomains and article names to use that tool's standard input, and that is not realistic.
I thought the scraper would be great and I started using it but then I quickly found out the pricing and restrictions on that tool are substantial! It is not feasible for me to use that based on that cost structure.
Note, I have a coder writing python who is playing around with the API but she is not knowledgeable about this tool and we are having challenges communicating clearly with one another on my expectations, so I am concerned her work is going in a direction that will not be optimal for me. Specifically, I told her I want to have the same input controls as the Web scraper but for the screeshot tool. but I dont think she understands or appreciates my issue. Is it possible using the API to leverage the robust selection criteria without paying the crazy high pricing for webscrapter
Any guidance would be greatly appreciated.
Thank you in advance!!!
Asked by anonymous on the 1st of February 2024
Hi,
No problem, I will do my best to answer your questions.
Generally a web scrape is limited either to a website or URL pattern, otherwise due to the nature of the internet following links would cause the web scrape never to end.
The screenshot tool takes screenshots of specified URL's once, on a schedule or when the web pages changes. If you have a list of URL's you want to capture this can be imported into the screenshot tool.
With the API you can use your own logic to trigger a screenshot. So for what you are doing I think you would need some kind of custom web scraper, that would trigger the API.
Hope this helps.