Dark Web OSINT with Python Part Two: SSH Keys and Shodan


This entry was originally posted on the AutomatingOSINT.com blog.

Welcome back good Python soldiers. In Part One of this series we created a wrapper around OnionScan, a fantastic tool created by Sarah Jamie Lewis (@sarajamielewis). If you haven’t read Part One then go do so now. Now that you have a bunch of data (or you downloaded it from here) we want to do some analysis and further intelligence gathering with it. Here are a few objectives we are going to cover in the rest of the series.

  1. Attempt to discover clearnet servers that share SSH fingerprints with hidden services, using Shodan. As part of this we will also analyze whether the same SSH key is shared amongst hidden services.
  2. Map out connections between hidden services, clearnet sites and any IP address leaks.
  3. Discover clusters of sites that are similar based on their index pages, this can help find knockoffs or clones of “legitimate” sites. We’ll use a machine learning library called scikit-learn to achieve this.

The scripts that were created for this series are quick little one-offs, so there is some shared code between each script. Feel free to tighten this up into a function or a module you can import. The goal is to give you little chunks of code that will teach you some basics on how to begin analyzing some of the data and more importantly to give you some ideas on how you can use it for your own purposes.

In this post we are going to look at how to connect hidden services by their SSH public key fingerprints, as well as how to expand our intelligence gathering using Shodan. Let’s get started!

A Common Construct

In all three posts there will be this common chunk of code that I will just briefly cover now so that we don’t have to go through it each post:

Screen Shot 2016-08-11 at 10.07.22 AM

  • Lines 1-2: we are importing the glob module which will give us the ability to load a list of file paths based on wildcard characters. This is super useful. The json module will handle converting the stored JSON into a Python dictionary.
  • Line 4: we use the glob.glob() function passing in the directory where our onionscan_results are stored and telling it to wildcard match all of the JSON files contained within the directory.
  • Lines 6-10: we loop through each file path (5), open the JSON file for reading (7) and then pass the file handle to the json.load() function to convert it to a dictionary stored in the scan_result variable.

This little snippet of code is the workhorse that prepares us to do all of the subsequent data analysis that we want to do, and you will see it pop up in each part of this series. For homework you can convert it into a function that you can include in each of your scripts, not required however, and if you are still developing your Python muscle memory then feel free to type it out each time.

Expanding Dark Web OSINT With Shodan

One of the awesome things that OnionScan does for us is grab the SSH fingerprint on any hidden service that has SSH enabled. The SSH fingerprint is a short sequence of characters that represents the larger public key of the server you are connecting to. It can be used (with some statistical certainty, I mean, somewhat certain, I am bad at math) to uniquely identify servers and devices. John Matherly of Shodan even wrote a post on this you can read here.

What we can do with this is take any discovered SSH fingerprints in the dark web and then run them against Shodan to see if the same fingerprint is identifiable in the clearnet. While this may not be 100% proof that a hidden service is hosted on the same machine as the clearnet site, it can definitely be a starting point for an investigation or for an administrator to realize that they may be deanonymizing themselves.

We are in luck because Shodan has a Python module that we can leverage to take care of the low level interaction with their API.

Get it installed using pip:

Windows: pip install shodan

Mac/Linux: sudo pip install shodan

You will also want to head to Shodan to get setup with an account and an API key. I spent the $45 on an account to get some credits and to support a site we all use and love.

Now let’s get some coding done, open up a new Python file, name it sshkeys.py and pound out the following (or download it here):

Screen Shot 2016-08-11 at 10.08.06 AM

No surprises here, we setup our imports (note we import the shodan module) and some variables to handle the data we are going to process. Let’s add some more logic in:

Screen Shot 2016-08-11 at 10.08.35 AM

  • Lines 19-20: if the scan result contains an SSH key (19) we simply print out a nice friendly message (20).
  • Lines 22-25: if our key_to_hosts dictionary already has the SSH key (22) we add the current hidden service to the list stored in that key (23). We use a list here because we can have the same SSH key map to multiple hidden services, which of course is interesting. If the SSH key isn’t present in our dictionary then we initialize our dictionary with a fresh list and add the current hidden service (25).

Once that loop of code is finished going through each of the JSON files we need to now do some quick analysis on each key to see if it is shared across multiple hidden services, and then we are going to check Shodan. Let’s code up the first part now:

Screen Shot 2016-08-11 at 10.09.02 AM

  • Lines 27-29: we begin iterating over each of the SSH keys (27) that we have stored in our dictionary, and if we see that the list of keys is greater than one (29) we know that the SSH key is present on more than one hidden service.  
  • Lines 31-35: We print out a message (31) and then iterate over the list of hidden services that have the same key (33) and print them out (35).

So this is useful to draw connections between hidden services based solely on their public key fingerprints. Now what we want to do is expand this search to the clearnet by tapping into the Shodan API and querying it for the fingerprints. Let’s add this code now:

Screen Shot 2016-08-11 at 10.09.28 AM

  • Lines 37-45: we create an infinite loop (37) that will continually try to run the search()  function (41) until it succeeds. The reason we do this is because the Shodan API can time out which will throw an exception. Our try/except block will catch the exception, sleep for 5 seconds (44) and continue executing until it succeeds. It is ugly but it is the one way that I found works reliably.
  • Lines 47-50: if we receive back hits from Shodan (47) we iterate over each hit (49) and then print out the clearnet IP address that is found for the SSH key (50).

There we go! A nice little script that is incredibly useful at expanding our intelligence gathering and can help us determine potential links between hosts on the open Internet and hidden services. Now let’s see it in action.


Let It Rip!

Now you just need to run the script and you should see output like the following:

# python sshkeys.py


zsgyq6ndex3xceeo.onion => 81:cb:d2:d4:f7:e4:8c:6b:1a:92:07:42:cd:e2:42:23
zti56emoqbwtiu2y.onion => 81:cb:d2:d4:f7:e4:8c:6b:1a:92:07:42:cd:e2:42:23
ztvcyfnj6fvvuklj.onion => 81:cb:d2:d4:f7:e4:8c:6b:1a:92:07:42:cd:e2:42:23
zuag53c4qgqiw2qr.onion => 54:70:2c:8a:93:b5:d1:7e:0c:92:51:76:ed:26:e5:73
zwnvycqmjlvjiwb7.onion => 81:cb:d2:d4:f7:e4:8c:6b:1a:92:07:42:cd:e2:42:23
zyxououig4nz7n4t.onion => 81:cb:d2:d4:f7:e4:8c:6b:1a:92:07:42:cd:e2:42:23

[!] SSH Key 67:ce:9a:30:85:c3:53:db:a3:93:58:d1:c2:dc:f0:b3 is used on multiple hidden services:

[!] Hit for e0:1e:a3:26:a6:c5:8e:0b:e9:34:e9:8f:7d:6e:c6:24 on for hidden services apkx44pmf7fyd63e.onion
[!] Hit for 8e:67:85:f5:13:f2:dc:dc:74:f3:aa:b3:fb:ca:04:80 on for hidden services uf2fjijpodfsv4fb.onion


Very cool stuff, now you can go do some sleuthing on the IP addresses discovered as well as potential links between the hidden services. Of course you can use Hunchly to help capture all of that manual research as well (cough cough). As homework, you can take our previous work on creating graphs to plot out the relationships between keys and multiple hidden services or to external IP addresses


There is a lot of interesting work we can do here with this data to begin to see relationships and patterns. This is only the beginning and hopefully will help drive some more ideas in how to look at this data both from a privacy protection perspective and from an investigative standpoint.

If you have some ideas for future blog posts or would like me to expand on this series, hit me up with an email: justin@automatingosint.com.