Investigating India’s Army, Instagram Location Searches and Handling Traumatic Imagery: Here are the Results of Bellingcat’s Third Hackathon
In April, Bellingcat hosted its third hackathon, with the event focussing on increasing the accessibility of tools used for digital investigations.
This sought to address an issue that Bellingcat has previously identified based on our survey of open source researchers: there are many tools that can be helpful in investigations, but they’re often difficult for researchers to configure and use. Many of them require installing one or more software packages, or have a command line interface, which can be intimidating for researchers who aren’t familiar with programming.
We’re excited to present the final projects from this hackathon here with a brief explanation as to what they can do. We hope to see them used by open-source researchers in the future. In addition, Bellingcat is offering Tech Fellowships to the developers of the winning projects to further develop their tools.
In the latest hackathon, many aspects of “accessibility” were creatively addressed by different projects. Initially we anticipated the hackathon would lead to new browser interfaces for existing tools, but instead projects simplified complicated parts of workflows or organised information from documents addressing other important dimensions of accessibility.
We were happy to see several developers have participated in all three of Bellingcat’s hackathons and that a community is coalescing around the intersection of open-source software and open-source research. Tech Fellowships are being offered to the developers of the winning projects to further develop their tools, and if you’d like to learn more about the hackathons, don’t forget to check out the results of our first and second events.
We’re excited to present the final projects from this hackathon below with a brief explanation about what they can do. We hope open source researchers will be able to use these new tools in the near future.
First Place: Project Sentinel was developed by Myth and Laurence and is an interface for searching for documents from India’s Armed Forces Tribunal website. These documents detail complaints made against the Indian military from personnel or their family members. While they are publicly available online they are not centrally organised and are difficult to access, especially in a programmatic way. In short, it’s easy to download such a document about a single case but not to understand whether or how it is linked to others.
To address this, the developers scraped tens of thousands of documents from several different regions across India and indexed them using large language model (LLM)-based text embedding. The resulting interface allows users to upload a PDF document detailing one specific case; the tool then analyses keywords and suggests any possibly related documents.
Project Sentinel can be used by researchers, journalists, or people affected by these tribunal decisions to find similar cases and better understand systemic patterns. The hackathon judges liked how it applied useful LLM features to make a dataset about an important rights issue more accessible to researchers. They also appreciated how the methodology could likely be applied to analogous cases. An instance of the tool that indexes documents from the Union Territory of Chandigarh can be accessed here.
Second Place: SketchMapper, developed by Alexandra Leibova and Michael Malkin, is a tool that eases the geolocation process. It allows users to produce a basic sketch of where key features in the target imagery lie in relation to one another – such as streets, buildings and vegetation. Inspired by the winning project of a previous Bellingcat hackathon — OSM Finder — SketchMapper can be used by researchers who are intimidated or confused by OpenStreetMap’s Overpass Turbo API and want an easy to use interface to the data. Judges liked how it smartly combined two existing open–source projects (OpenStreetMap and a neo4j graph database) with an intuitive user interface. An instance of the tool that can search for locations within the Ukrainian city of Bakhmut can be accessed here.
As you can see in the image below, SketchMapper has provided nine possible locations for the combination of features seen in the submitted sketch.
Third Place: Improving accessibility for instagram-location-search, developed by Misha, is an enhancement to Bellingcat’s Instagram Location Search tool. The latter tool finds Instagram location IDs near a specified latitude and longitude. This then allows for an easier search for Instagram posts and stories tagged in each location.
One criticism of Bellingcat’s tool was that it required users to manually import Instagram cookies. They had to open their browser’s development tools, find the value of a specific cookie, then copy and paste that value into a text configuration file. Users complained that this was one of the most challenging parts of installing and configuring Bellingcat’s tool.
Misha’s addition makes it much easier for users to import their Instagram cookies into Bellingcat’s tool. This will potentially make it easier for less technically proficient researchers to use.
This enhancement now allows researchers to simply open up a browser window and log into their Instagram account, handling the rest of the configuration automatically. Judges liked how the project effectively and simply fixed an existing accessibility issue.
Most Impactful: painlessPeek, developed by Ada Cable, is a browser extension for working more safely with traumatic imagery. Working with traumatic content, such as images of potential war crimes or civilian harm, presents consistent challenges for the open source research community. Exposure to such content frequently results in researchers experiencing mental health issues like burnout, depression and vicarious trauma, as discussed in this Bellingcat article on the subject.
This extension could be used by researchers who want to investigate such graphic content while reducing its potential to negatively impact their mental health. There are already several extensions for this purpose, such as Blurry and Blur. As the name suggests, they automatically blur potentially disturbing injury.
However, PainlessPeak offers several features which do not exist in other extensions. The tool allows users to selectively blur imagery, leaving important areas fully visible. It can also change imagery to a colour of the user’s choice. Judges liked the tool’s well thought-out capabilities and how the extension could potentially contribute in reducing traumatic exposure for researchers examining graphic content.
The following tools were recognised as particularly useful or innovative by our judges.
Stratosphere, developed by Michele Dallachiesa, Eric Brichetto, Perry Borst, and Duplorers is a platform for automatically collecting and structuring information from websites you visit. It uses a man-in-the-middle proxy in combination with custom web scrapers to extract structured data from a set of websites (including LinkedIn contacts, VK contacts and Google search results) and makes the data available through a dashboard.
Therefore, unlike the history function in an internet browser or an add-on which records your steps such as Hunchly, Stratosphere doesn’t simply record the fact that you visited a certain page but pulls all the content from said page where possible. This allows you to easily access and analyse the most important data fields from the websites you visit. Judges liked how it utilised common researcher workflows (navigating the web), so that after it’s been installed and configured, it can extract information automatically without disrupting the research process.
Instagram Story Visualiser, developed by Lim Zhen Guang and Lam Wei Ern, is a tool for monitoring and visualising the locations of a specified set of users’ Instagram Stories. It periodically scrapes these users’ Instagram accounts, extracts location data if it exists, and displays the location data in an interactive map. This could be used by researchers who want to analyse the pattern of life for a particular group of Instagram users, or to help determine if two different accounts actually belong to the same person. Judges liked how the tool addresses a real need they encountered in previous investigations.
From AI Skim Reading to the OSINT Compass
Other projects to come out of the hackathon are listed below, alongside links so that anyone interested can try the tools out for themselves. These tools are listed in alphabetical order.
ArchivEye, developed by Runtime Terror, ATroubledMaker, and deltagear, is a tool that provides a graphical user interface for indexing and searching through text in PDF documents. It was developed by and for the Industry of Programs and Practices for Youth Treatment (IPYT Index). This project organises and makes accessible documents and information about abuses in what media have called the “troubled teen industry” — a scandal-plagued system of primarily US-based facilities for reforming at-risk youth.
This tool could be used by a researcher who wants to easily organise and search through a collection containing many PDF documents, without having to install and use command line interface programs.
Crucially, it allows users to search such documents offline, without first uploading them to cloud services which provide OCR (optical character recognition). This is especially important when dealing with sensitive personal information about vulnerable individuals such as that held by the IPYT.
BiliBili Scraper, developed by Xiang, is a tool for scraping video information from BiliBili, a public video sharing platform popular in China. It could be used by researchers who want to monitor trends in the Chinese social media ecosystem, which as Bellingcat has recently written is often opaque to the open-source research community.
Bombardier Watch, developed by Jack Kerr, is a tool for identifying aeroplanes that arrive or depart from the same location on the same day. It provides a web interface to use data from ADS-B Exchange to track a set of popular business jet models (limited to Bombardier, Gulfstream and Dassault). The tool could be used by researchers who want to track the comings and goings of the rich and powerful, and better understand the structure of their network.
Chat with Documents, developed by Radu Ciocan and Ana State, is a tool that provides an interface for asking questions about an uploaded document, using ChatGPT. The tool supports several file types, including PDF, DOCX, and TXT, and could be used by researchers who want to better understand a text-based file without having to read it in full.
DorkLab, developed by Ryan Willet and Ally Petitt, is a tool that helps users combine multiple advanced search operators into a single query, for several different search engines including Google, Bing, and Yandex. It also has a “quicksearch” feature that converts free-form text input into a query containing advanced search operators. This tool could be used by researchers who want to learn the syntax for advanced search operators across different search engines.
Geotagging Telegram Bot, developed by Alina and Arina, is a Telegram bot that converts text to geospatial coordinates. For example, it could read an address given in the text of a Telegram post and offer approximate coordinates. It uses Yachay AI’s text-based geolocation detection model and could be used by researchers who want a starting point for geolocating the text content of a social media post.
GPT-OSINT, developed by the developers gigz and wasphax, is a tool that provides an interface for asking questions about an uploaded document, using ChatGPT. In comparison to ChatWithDocuments, this tool supports a different set of file types, in particular Telegram exports from snscrape and the Telegram Desktop app. It could be used by a researcher who wants to quickly summarise all messages from a single Telegram channel, even if the messages are in a language the researcher doesn’t speak.
GRASSR00T, developed by Zlatko, is a tool that provides a messaging platform using the decentralised Nostr protocol. Nostr provides a decentralised social networking system. The tool connects to a specific Nostr relay and allows users to send and receive messages. It could be used by researchers who want to access Nostr relays in countries with internet censorship and communicate with people who live there. A live instance of the tool can be found here.
octo-web was developed by Richard Thewma and is a browser interface to the octosuite Python package, a tool for investigating GitHub users, repositories and organisations. You can read more about OctoSuite in this Bellingcat guide. Octo-web could be used by researchers who want to conduct investigations into GitHub users and repositories without having to install and configure any Python packages. A live instance of the tool can be found here.
open-webcam, developed by Peter Kompasz, is a browser tool for collecting and monitoring camera streams. Inspired by a 2021 scientific paper about automatically discovering networked cameras, the tool allows users to add specific camera streams to the website. It could be used by researchers who want to easily monitor several camera streams in real time from a single website.
OSINT-compass, developed by Tom Wierick, her0marodeur, and Elias Herrmann, is a framework that provides a web interface to simultaneously use several tools commonly used in open-source research. The framework is designed to be modular and extensible, allowing for new tools and services to be quickly integrated, allowing for the easier discovery of relevant information. The tools it currently supports include Wayback Machine, SpyCloud, NewsAPI, and snscrape. It could be used by researchers who want to scrape and analyse Twitter posts without having to install and configure any Python packages. A live instance of the tool can be found here.
PyScanner, developed by Neela Madheswari, is a command-line tool for scanning network ports. It could be used by a researcher who wants to detect whether any ports in their network are open.
telegramtrac, developed by Clarissa Mendes, is a tool that provides a browser interface to Telegram’s API. It allows users to use their Telegram API credentials to view posts from a specified Telegram channel. It could be used by researchers who want to scrape Telegram data without having to install and configure any software packages.
Bellingcat is a non-profit and the ability to carry out our work is dependent on the kind support of individual donors. If you would like to support our work, you can do so here. You can also subscribe to our Patreon channel here. Subscribe to our Newsletter and follow us on Twitter here.