Help Bellingcat Build Tools For Open Source Investigators!

Open source investigations rely on the skills and efforts of human researchers. This will never change. However, tools can help make the research process significantly easier.

We at Bellingcat, like many other research organisations and open source enthusiasts, rely primarily on tools that are available for free. On the one hand, this is because we do not have the budget required to pay for access to many paid tools — we are a nonprofit organisation, after all. But it is also because we believe in the idea of open source software. 

We want digital researchers from all backgrounds to have the methods and tools at their disposal to be able to conduct open source investigations. This is why we publish free research guides and resource lists on our website. It is also why we started a Bellingcat Github, where many of the tools developed and used by Bellingcat’s Investigative Tech team will be made available as open source projects. 

In addition to publishing these tools, we are also opening up to contributions from, and collaborations with, those with open source software experience as well as open source investigation enthusiasts.

How you can Help 

  • Have a look at the tools on our Bellingcat Github and help us make them better.

  • Read through our list of open-questions and start working on some of the more complex technical challenges that are relevant for open source investigators.

  • If, for example, you lead a computer science department at a university or work at a tech-focused research institute and have a concrete idea for a tool development collaboration (the software must be open source), please feel free to get in touch with us here.

  • The same applies to every open source investigator who is in need of a specific tool for their work. You can submit your ideas here and if we think the development of the tool may be feasible, we will add it to our list of tech challenges. You can also submit a “pull request” directly. Maybe a helpful software developer will pick it up.

Use, Test, and Improve our Github Tools

The list of tools on our Github will grow over time. Currently, our small collection includes — amongst others — a tool for finding Instagram “locations”, an automatic video archiver for Google Sheets and a Google Earth Engine application for finding small cloud-free subregions within cloudy satellite imagery.

We hope that with your help, we will be able to improve these tools and add more over time. Many of these projects already have improvement suggestions on their Github Issues pages, including highlighted issues that would be excellent for community contributions.

Here’s some more about the tools on our Github to whose development you can already contribute:

Instagram is somewhat unique among modern social media websites in that both public sharing and geotagging are still social norms. However, photos are not searchable by specific coordinates — instead they are associated with an Instagram “place”. Instagram does not provide a way of searching or discovering these places by latitude and longitude coordinates, only by city and location name. For example, while it is straightforward to find Instagram posts from Amsterdam Centraal Station, there is no easy way to find photos taken at other locations nearby unless you know exactly what you are looking for (and the relevant Instagram place name).

However, the mobile Instagram app accesses a private API, or Application Programming Interface, which finds places near a particular Instagram location and is used when selecting a location for a new photo. By using this API, Bellingcat developed a tool for finding Instagram places (as explained and demonstrated in the thread below).

Several tools exist already for downloading images from a particular Instagram place or user. A useful enhancement to instagram-location-search would be to integrate these tools to enable a researcher to scrape all images from around a particular latitude and longitude in a single step.

The Telegram phone number checker is useful if a researcher wants to find out whether a specific phone number, or list of numbers, has been used to create accounts on Telegram.

To check a phone number, the tool uses the Telegram API and requires an ‘‘API_ID’ and ‘API_HASH’. If a phone number is associated with a Telegram username, the tool provides the connected username. Since usernames are not required on Telegram, it can also simply detect whether a Telegram account has been created at all with a specific phone number.

This is a tool that combines youtube-dl with Google Sheets to make a collaborative interface for archiving videos from social media. It is a Python script that when run regularly (as with cron) will: 

 

  • Download recognised videos from URLs in a Google Sheet.

 

  • Upload these videos to a cloud storage location that is compatible with Amazon Web Service’s S3 protocol, such as S3 itself or Digital Ocean Spaces.

 

  • Update the sheet with metadata that might be useful for organisation and retrieval.

 

The tool was developed by Bellingcat for handling social media imagery from the 2021 US Presidential Inauguration, but we believe it could be useful in a variety of other situations.

While most of the metadata is provided directly by youtube-dl, the script can also transform the video format and extract metadata using ffmpeg, an open source tool for video processing. This is used to generate a thumbnail, which can be inserted directly into the Google Sheet as an image. It can also generate periodic thumbnails for a video “contact sheet” that makes it easier to see a video’s content at a glance.

The auto-archiver tool. Image credit: Bellingcat

There are several opportunities for expanding the functionality of auto-archiver and making it easier for new or non-technical users. Additional guides and examples in the Github documentation would be a significant addition and make use of the application easier. Alternatively, there may be other application formats, such as a Google Sheets App, that would make some parts of setup easier. However, as it requires a remote computer to run youtube-dl and a place to upload archived videos to, it would likely still require some server setup.

This tool is an extremely simple helper for extracting the exact upload date from a Tiktok video. The source code demonstrates how to perform a request to a website that doesn’t provide CORS headers, and how a simple client-side tool can be hosted on Github Pages.

An example of the tiktok timestamp tool. Credit: Bellingcat

This is a Google Earth Engine application that attempts to find Sentinel-2 satellite images that are cloud-free in a particular area. Sentinel-2 is an earth observation satellite operated by the European Space Agency that captures medium resolution (10 meters per pixel) visible and infrared images of most locations on the Earth’s surface approximately twice per week. This imagery is tremendously useful to open source investigators, and in recent Bellingcat articles has been used to understand environmental pollution in Iraq and analyse claims of drone warfare in Tigray, Ethiopia.

This tool is especially helpful in equatorial regions where Sentinel-2 regularly details significant cloud cover when measured across entire images, as are provided by default in Google Earth Engine and Sentinel Hub. However, some of these images might have clear skies over particular small areas of interest. This application helps find those images.

An example of finding a Sentinel-2 image that is cloud free over a particular subregion. Credit: Bellingcat

Additionally, it filters image collections to contain only one image per month, which simplifies the process of sorting through large time ranges that might contain many images. It also offers contrast adjustment and near-infrared false colour options that are helpful for visualising vegetation.

It can be accessed directly as a Google Earth Engine application.

Contribute to our More Complex Challenges

We are also publishing a handful of open questions” for open source investigations. These are somewhat difficult, long-term technical challenges that could be of value to open source investigators.

While possible solutions to these challenges have been explored within the academic community, there are (not to our knowledge) any accessible tools for open source investigators that implement these methods or explore new approaches. These challenges are complex and open-ended, and they might not be solvable in a way that is practical for open source investigators. But we are interested in trying.

Our list of current challenges is published on GitHub which allows us to be transparent and collaborative about how we solve them. If you are interested in contributing, we welcome your additions to our repository, which you can propose by way of a GitHub pull request or by dropping us a line via our contact form.

Contributions that would be most welcomed include additional ideas for technical challenges, notes from literature exploration of related work and prototypes of potential approaches to the problem. If there is sufficient interest in a particular challenge, it could be spun off into its own project, potentially with support of the Bellingcat Investigative Tech Team.

Thank you in advance for contributing! Such work will help make the life of the worldwide open source investigator community easier.