[Project Log] Search-Core: Scanning the web with ELK, Docker, + Nmap

One would be surprised how much of a “deep web” these kinds of sites can actually get away with in these days. Since Archie, Jughead, and Veronica are all but retired and shodan is well good for its usage but never releases the datasets I needed a way to track which sites are running Gopher, Telnet BBS, and FTP.

To achieve this I spun up on the community grid an instance of the Elasticsearch, Logstash, Kibana (ELK) Docker image that copied in a few config files and pointed nmap to the instance.

Files

docker-compose.yml

https://hastebin.com/ucalukuhuy.bash

Dockerfile

https://hastebin.com/nuvatuvizi.nginx

elasticsearch_nmap_template.json

https://hastebin.com/lefoyuwogi.json

05-nmap.conf

https://hastebin.com/itudakepah.php

15-nmap.conf

https://hastebin.com/sovarumeqa.php

35-nmap.conf

https://hastebin.com/afehofikus.php

Agent

Then there is the part for scanning. This was simple, all I needed to do is be able to traceroute, and do a tcp handshake for ports 23, 70, and 21. What else was one going to choose. Nmap was the only answer ever needed.

Setting up a cronjob, of which I’ll save that for another post, to execute the following was fairly easy.

sudo nmap --traceroute -sP example.net -oX - | curl -H "x-nmap-target: remote-check" http://elasticsearch:8088 --data-binary @ 

Summary

The end results have not come in and over all I’ll have to address the issue of having to manually create an index after spinning up the elk container. Plus this is not at all setup for clustering yet, that’s the next step. But I’ll post the details as I go.

After this is set up then I’ll turn my attention to indexing files on those sites, plus capturing wifi metadata for statistical analysis.

Phase three and four would include setting up Kong-API for key based access to the elastic search api and kibana dashboard. Storing the results in Hadoop HDFS and Mongo/HBase then looking at how one can process resulting files indexed with Tensorflow.

Random Concept…

kermit(1) can log to a file. kermit is a serial, ssh, telnet, and modem client.

Could one use kermit to index serial connections, modem dials, and oscilloscope data into elastic search?

Looks like scrappy can talk to logstash: http://scrapy-cluster.readthedocs.io/en/latest/topics/advanced/integration.html

also… suckless tools has ii(1) that does fifo based irc connections: https://tools.suckless.org/ii/

One can wonder, could that be turned into a pipeline for logstash?!

GraphQL for ES:

Frontend code:

http://docs.searchkit.co/stable/setup/elasticsearch.html

Test public index: https://dashboard.searchly.com/65148/overview

yes, diehard hackers can figure out where I’m going with this :wink:

After hours of digging around logs, web and reconfiguring things; it looks like the elasticsearch_nmap_template.json file only works with ES-5.x. So rebuilt the image and deployed based on 5.6.10 and now its got an index and template.

There is a few steps after deploying the container which has to be done by hand (or in my case with Elasticsearch Head)

Creating the base index and aliases

Since we would want to implement warm cold indexing there would need to be a top level alias for access and a say weekly index which we point the top level to.

curl -XPUT -H 'Content-Type: application/json' http://localhost:9200/nmap-logstash-$(date -iso)
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
    "actions" : [
        { "add" : { "index" : "nmap-logstash-*", "alias" : "logstash-nmap" } }
    ]
}'

Adding the template

Elastic Search version 6 and up does not support templates but version 6 still supports version 5 templates if their where created ahead of time. Thus we’ll be able to create the template and index in 5 then redeploy into 6 later using the same volumes afterwards.

The following command updates the templates for nmap-logstash indexes based on the prebuilt json file.

curl -XPUT -H 'Content-Type: application/json' http://localhost:9200/_template/nmap-logstash-\* -d@elasticsearch_nmap_template.json

Adding data

Last step is to populate with data since this is well a datastore of sorts.

export TARGET_STORE=motherboard:8088
function scanner() {
  sudo nmap --traceroute -sP $1 -oX /tmp/$2
  curl -H "x-nmap-target: ${2}" http://${TARGET_STORE}/ --data-binary @${2}.xml
}; declare -x -f scanner

scanner 192.168.0.0/24 local-subnet

Next Steps

nmap is quite a bandwidth and time slice hog. Afterall we’re desiging this to be distributed anyways so that’s what’s needing to be addressed. DNmap does offer a good solution for scanning but is very hub and spoke style of architecture. While nice for its time we’re all living in the serverless world and thus a scanner or agent of any kind shall follow this model. Luckily we have OpenFAAS running on the community grid so setting up a docker image which runs python-nmap and deploys that to OpenFAAS on the community grid would help out a lot.

I’m also seeing a lot of memory intensive usage going on, (no shite sherlock its java), So adding in a few extra nodes to the grid and ensuring that http based traffic is accessible via swarm overly networks is a high priority.

Have you looked into using Masscan? It’s a lot more efficient than Nmap, although you need to strip one tag out of the parser if using the Nmap XML format intermediary due to an ongoing dispute between the Nmap maintainers and the Masscan maintainers (weirdly Nmap doesn’t even follow their own standard, but the Masscan maintainer insists on sticking to the documentation)

Cheers,
-Jim

2 Likes

masscan looks interesting but might not be 100% do-able.

Just a quick glance showed that it uses pf_ring which requires some special configurations within the host side of docker.

Now if it can be done with the smallest amount of parts and dependencies, great!

Project Update:

Currently fielding out a way to use jwt with logstash, Most likely would put this on the back burner for now while I get auto deployment stood up.

This week I’ll be indexing several rss feeds and hopefully be able to tie that back into the system using the logstash-input-elastic search to crawl any newly discovered links. If anything at lease be able to dump that into something for further processing.


[2018-07-30T16:51:45,528][INFO ][logstash.inputs.rss      ] Polling RSS {:url=>"https://www.investing.com/rss/286.rss"}


[2018-07-30T16:51:45,530][INFO ][logstash.inputs.udp      ] Starting UDP listener {:address=>"0.0.0.0:5000"}


[2018-07-30T16:51:45,538][INFO ][logstash.inputs.rss      ] Polling RSS {:url=>"http://feeds.marketwatch.com/marketwatch/bulletins"}


[2018-07-30T16:51:45,598][INFO ][logstash.inputs.udp      ] UDP listener started {:address=>"0.0.0.0:5000", :receive_buffer_bytes=>"106496", :queue_size=>"2000"}


[2018-07-30T16:51:45,625][INFO ][logstash.inputs.udp      ] Starting UDP listener {:address=>"0.0.0.0:5514"}


[2018-07-30T16:51:45,690][INFO ][logstash.inputs.udp      ] UDP listener started {:address=>"0.0.0.0:5514", :receive_buffer_bytes=>"106496", :queue_size=>"2000"}


[2018-07-30T16:51:46,150][INFO ][logstash.agent           ] Pipelines running {:count=>1, :pipelines=>["main"]}


[2018-07-30T16:51:51,215][ERROR][logstash.inputs.rss      ] Uknown error while parsing the feed {:url=>"https://www.investing.com/rss/286.rss", :exception=>#<RSS::NotAvailableValueError: value <Jul 30, 2018 10:30 GMT> of tag <pubDate> is not available.>}


[2018-07-30T16:51:51,216][INFO ][logstash.inputs.rss      ] Command completed {:command=>nil, :duration=>5.6883099999999995}


[2018-07-30T16:51:52,312][INFO ][logstash.inputs.rss      ] Command completed {:command=>nil, :duration=>6.772647}


[2018-07-30T16:52:07,968][ERROR][logstash.outputs.email   ] Something happen while delivering an email {:exception=>#<EOFError: end of file reached>}


[2018-07-30T16:52:19,088][ERROR][logstash.outputs.email   ] Something happen while delivering an email {:exception=>#<EOFError: end of file reached>}


[2018-07-30T16:52:29,630][ERROR][logstash.outputs.email   ] Something happen while delivering an email {:exception=>#<EOFError: end of file reached>}


[2018-07-30T16:52:40,072][ERROR][logstash.outputs.email   ] Something happen while delivering an email {:exception=>#<EOFError: end of file reached>}


[2018-07-30T16:52:50,315][ERROR][logstash.outputs.email   ] Something happen while delivering an email {:exception=>#<EOFError: end of file reached>}


[2018-07-30T16:53:00,548][ERROR][logstash.outputs.email   ] Something happen while delivering an email {:exception=>#<EOFError: end of file reached>}


[2018-07-30T16:53:11,094][ERROR][logstash.outputs.email   ] Something happen while delivering an email {:exception=>#<EOFError: end of file reached>}


[2018-07-30T16:53:21,334][ERROR][logstash.outputs.email   ] Something happen while delivering an email {:exception=>#<EOFError: end of file reached>}


[2018-07-30T16:53:31,675][ERROR][logstash.outputs.email   ] Something happen while delivering an email {:exception=>#<EOFError: end of file reached>}


[2018-07-30T16:53:41,982][ERROR][logstash.outputs.email   ] Something happen while delivering an email {:exception=>#<EOFError: end of file reached>}

Emails alerts is broken and one of the feeds does not parse correctly. Looks like they “RSS Gud.” Need to look into the edge cases but so far so good.

its on…