Tools to Capture and Convert the Web
How the intraproxy enables intranet screenshots

Capturing or Scraping Intranet and localhost websites

An intranet website can be just as important as any other website and may need screenshoting, scraping or converting into a offline version. Unfortunately, capturing an intranet or localhost website is more complicated than taking a screenshot of a normal website on the web.

The simplest way to do this would be to use GrabzIt’s IntraProxy, which opens up all of your internal websites to only GrabzIt's servers. The IntraProxy then handles the routing of requests to and from your internal websites for you as shown in the diagram.

To assure users of the security of the IntraProxy we have made the code open source both so people can see what it is doing and to encourage bug fixes and further enhancements.

First download the proxy from Github. You will need to have Java 1.6+ installed. Then using command line navigate to the directory containing intraproxy.jar. Then use the following command.

java -jar "intraproxy.jar" 

Next, you can see if the IntraProxy is running by going to then on your router forward the port 10000 to the machine IP Address the GrabzIt IntraProxy is installed on. Please do not ask us how to do this, information on configuring your router should be available on the Internet.

Visit http://localhost:10000/grabzit://dashboard.html for further information on how to configure and use the IntraProxy.

Once this is configured it can be used by all of our tools including our API, Screenshot Tool and Web Scraper. As all requests to the router IP address and port will now resolve to the correct internal website. For instance, if your website is located at http://localhost/mywebsite/index.html and your router IP address is 123.123.123.123 then to resolve your website externally you can pass http://123.123.123.123:10000/http://localhost/mywebsite/index.html to GrabzIt's API or tools.

Similarly, if you have the GrabazItDemo installed locally and what to call its callback handler at http://localhost/GrabzItDemo/handler.php you could pass http://123.123.123.123:10000/http://localhost/GrabzItDemo/handler.php as a callback handler URL.

Remember to remove this URL prefix if you make your website publicly available on the Internet!

Requirements

An Alternative Method

For intranet or localhost websites that don't have absolute URLs pointing to resources, such as CSS, image and JavaScript files, that are not accessible on the Internet simplest option would be to set up port forwarding to your internal website. However, you should only do this for websites that you don't mind opening the up to the Internet. Furthermore, it probably wouldn’t be suitable if you have a large number of internal websites to capture.

You would need log in to your router and add a port forwarding rule to forward all requests from that arrive at the routers IP address and port to the computer that hosts your website. You then need to configure your web server to accept calls on the port you are forwarding over.

For instance, if your router IP address is 222.222.222.222 you could add a port forwarding for port 12345 to the computer that hosts the website and add this port to your web server configuration as one of the ports it listens on.

Further information on how to configure your web server and router should be available on the internet. Once this is done calling an address like http://222.222.222.222:12345/mypage.html should load your website.