One would be surprised how much of a “deep web” these kinds of sites can actually get away with in these days. Since Archie, Jughead, and Veronica are all but retired and shodan is well good for its usage but never releases the datasets I needed a way to track which sites are running Gopher, Telnet BBS, and FTP.
To achieve this I spun up on the community grid an instance of the Elasticsearch, Logstash, Kibana (ELK) Docker image that copied in a few config files and pointed nmap to the instance.
Files
docker-compose.yml
https://hastebin.com/ucalukuhuy.bash
Dockerfile
https://hastebin.com/nuvatuvizi.nginx
elasticsearch_nmap_template.json
https://hastebin.com/lefoyuwogi.json
05-nmap.conf
https://hastebin.com/itudakepah.php
15-nmap.conf
https://hastebin.com/sovarumeqa.php
35-nmap.conf
https://hastebin.com/afehofikus.php
Agent
Then there is the part for scanning. This was simple, all I needed to do is be able to traceroute, and do a tcp handshake for ports 23, 70, and 21. What else was one going to choose. Nmap was the only answer ever needed.
Setting up a cronjob, of which I’ll save that for another post, to execute the following was fairly easy.
sudo nmap --traceroute -sP example.net -oX - | curl -H "x-nmap-target: remote-check" http://elasticsearch:8088 --data-binary @
Summary
The end results have not come in and over all I’ll have to address the issue of having to manually create an index after spinning up the elk container. Plus this is not at all setup for clustering yet, that’s the next step. But I’ll post the details as I go.
After this is set up then I’ll turn my attention to indexing files on those sites, plus capturing wifi metadata for statistical analysis.
Phase three and four would include setting up Kong-API for key based access to the elastic search api and kibana dashboard. Storing the results in Hadoop HDFS and Mongo/HBase then looking at how one can process resulting files indexed with Tensorflow.