Identifying Suspicious Businesses, Reddit Analysis and Tracking Russian Propaganda: Here are the Results of Bellingcat’s First Ever Hackathon
Bellingcat hosted its first ever hackathon earlier this month with the event focussing on developing network analysis tools. We were impressed with the quality of the projects and had a great time getting to know the developers, many of whom work at the intersection between open-source research and open-source software.
We’re excited to present their final projects here with a brief explanation of what they can do and we hope to see them used by open-source researchers in the future. In addition, Bellingcat is offering Tech Fellowships to the developers of the winning projects to further develop their tools.
Network analysis tools are useful to open source investigators as they allow connections to be drawn between groups, individuals or items that may be related to each other in ways that are not always entirely obvious to the naked eye. A network refers to a collection of digital objects, known as “nodes”, and the connections between those objects, referred to as “edges”.
For example, Bellingcat contributor Benjamin Strick made use of the Gephi network analysis tool to identify a network of pro-Indonesian bot accounts spreading propaganda under specific hashtags back in 2019. In this instance, the nodes were Twitter accounts and the edges between them indicated that one Twitter account had mentioned, liked, retweeted, or quote-tweeted another Twitter account. As the entrants to our hackathon skilfully demonstrated, the same principle can be applied to other social media platforms (such as Facebook, VK, Reddit) or large datasets (like those held in public company registers) where edges can show the connection and behaviour between various entities (or nodes).
It must be noted that some of the tools listed below have been designed for an audience that has an above average technical understanding. Still, each has its own unique practical use and capability that could help open source researchers in a variety of scenarios.
Bellingcat is hosting another hackathon during the weekend of September 23-25, with a more general focus on digital investigation tools. We still have spots available, see our announcement for more information. Applicants can apply individually or as a team.
First Place: Socnet Dynamic Image Search, developed by the developer Ulysses, is a tool for identifying VK social media accounts of interest. The tool generates a social network by scraping the profile of a specified VK account and that account’s friends and friends of friends. It then uses OpenAI’s clip model to allow users to search for accounts within that network that posted pictures matching a given phrase, for example “soldier” or “flag”. Judges liked how well the image search worked, and the clear applicability of the tool to open-source investigations involving the Russian-language internet. This tool could be used if a researcher has a VK account of a Russian soldier and wants to find the accounts of other soldiers associated with them.
Second Place: GraphNAVI, developed by Ilya Boyandin and Peter Kompasz, is a tool that allows users to upload a tabular data file, select nodes and edges (or items of interest and connections between them) from the data using SQL statements, and visualise the network. The tool can be viewed from the GraphNAVI website. Judges liked that it provided an end-to-end solution that covered multiple steps in a typical network analysis workflow, as well as the technical innovation of the backend. This tool could be used if a researcher has a tabular data file (CSV or Parquet) and wants to easily generate a network visualisation without having to write a conversion script or use Gephi’s more complicated interface. The capability to tweak visualisation parameters, including thresholding edge weights (i.e. discarding less important connections) can also be useful for iteratively prototyping a visualisation.
Third Place: Graphinator, developed by Alex Trefilov, is a tool for creating network graphs. It allows users to add, edit, and remove nodes, export the graph visualisation and expand the graph using built-in scraping modules, including a Twitter scraper. Judges liked its easy-to-use interface and how well it addressed the previously-identified need for a tool to draw simple network graphs. This tool could be used if a researcher wants to make a nice looking, relatively small network visualisation for a news article or publication without having to use image editing software or a more complicated network software.
Sugartrail, developed by Sean Greaves, is an interface to the UK business registry, Companies House, that allows users to find companies and individuals that are connected in various ways, for example companies with the same individual on their board or companies registered to the same address. See Sean’s Jupyter Notebook tutorial for an example of the tool’s use in an investigation. Judges liked the detailed tutorial showing how it could be used to investigate a potentially suspicious business owner. This tool could be used if a researcher has the identity of a specific business or board member and wants to find other businesses and corporate officers associated with them.
From Reddit Analysis to Tracking Russian Propaganda
Other projects to come out of the hackathon are listed below, alongside links so that anyone interested can try the tools out for themselves. These tools are listed in alphabetical order.
Gordian, developed by John Rodley, Eric Brichetto, and Ryan Willett, is a tool that provides utilities for extracting graph data from tabular data files and converting between several graph file formats. This tool could be used if a researcher has a data file in one format (for example Graph Modeling Language) and wants to convert it into a format supported by a specific network visualisation software.
Graph Modeller, developed by Henry Legge and John Smith, is a tool that allows users to compare Google and Bing search results and visualise shared keywords of result pages in a network graph. This tool could be used if a researcher wants to find the most relevant result from a search engine or understand how result pages relate to each other.
Graphwork, developed by Terence Denning and Richard Theuma, is a tool that allows users to upload a GEXF graph file and easily visualise the network, as well as view node properties and neighbour information. This tool could be used if a researcher has a GEXF graph file and wants to generate a simple visualisation without having to install or configure the parameters of other visualisation tools like Gephi.
Reddit Network Analysis Tool, developed by cbot, is a tool that scrapes a subreddit’s posts and generates network graph files of Reddit users who post in the same subreddit, and subreddits that have posts by the same users. This tool could be used if a researcher is interested in a particular subreddit and wants to find similar subreddits.
RTAA-72, developed by Dimitris Papaevagelos from CVCIO, is a real-time dashboard for monitoring inauthentic accounts and mis/disinformation narratives on Twitter. This tool could be used if a researcher wants to track a specific propaganda narrative on Twitter in real-time and see the network structure of accounts espousing those narratives.
Scrapegoat, developed by Matticus Hunt and Ryan, is a tool for extracting outlinks from the output of snscrape’s twitter-user scraper (which gathers details such as user profiles, hashtags and searches). This tool could be used if a researcher wants to find the websites that a particular Twitter account has linked to.
Spevktator, developed by MischaU8 & Morsaki, is an interface to a database of VK posts from Russian news channels that can be used to research domestic Russian propaganda narratives. The interface is powered by the Datasette tool and can be viewed from the Spevktator website. This tool could be used if a researcher wants to track specific Russian state propaganda narratives. Alternatively, if a more technically knowledgeable researcher has a set of VK accounts they’re interested in, they can run their own instance of the tool to monitor and analyse the posts of those accounts.
Twitter Analysis and Network Visualization (TANV), developed by Timo Damm and Jakob Hauser, is a tool that scrapes post information from a specified Twitter account and generates a network visualisation of all accounts that have interacted with that account. This could be used if a researcher is interested in a particular Twitter account and wants to find other accounts that are associated with them.
Twitter Trend Analysis, developed by Shivansh Sethi and Rohit, is a dashboard providing analyses of tweets containing specified hashtags. The dashboard also aims to make it easier to detect inauthentic accounts. The tool is designed to help users analyse bot or propaganda based behaviour over a given period. This tool could be used if a researcher is interested in a specific Twitter hashtag and wants to determine if the accounts boosting it are authentic or not.
Bellingcat is a non-profit and the ability to carry out our work is dependent on the kind support of individual donors. If you would like to support our work, you can do so here. You can also subscribe to our Patreon channel here. Subscribe to our Newsletter and follow us on Twitter here.
If you are interested in helping online researchers learn more about the technical tools and methods that can enhance open source investigations, apply for Bellingcat’s October to December 2022 Tech Team Fellowship. We are looking for candidates with the skills to break down complex technical topics into small steps and an easy-to-understand language.