This New Tool Lets You Analyse TikTok Hashtags
In just a few years, TikTok has become one of the world’s most successful social networks. The company claims that its platform is used by over a billion people every month.
TikTok’s role during the invasion of Ukraine has changed the common misconception of a website predominantly used by its young users to post videos of lip-synced songs or dance challenges. During Russia’s military buildup, users posted videos of military equipment being transported to the Ukrainian border. Since that equipment crossed that border, numerous TikToks have shown missiles, destroyed buildings or the daily life of Ukrainians who have to spend their days in shelters, leading some commenters to even call this a “TikTok war”.
But this isn’t the first time TikTok has played a role for open source investigators. Videos that appeared on TikTok in Myanmar in early 2021 showed men in uniform threatening to kill anti-coup protesters. During search efforts to find the disappeared US travel blogger Gabby Petito, TikTokers reportedly uploaded information that helped solve the case. Videos of mass protests, including Black Lives Matter marches in 2020, were also regularly shared on the platform.
In our previous guide detailing “how to investigate TikTok like a pro”, we introduced and detailed various search options that can help find relevant information on the app.
Nevertheless, it’s still tricky to monitor specific topics on TikTok over a longer period of time. Compared to those of other social media platforms, TikTok’s API (Application Programming Interface) presents more obstacles to developers. It can change frequently, making it harder for developers to use and harder for researchers to draw wider insights from trends as they develop over weeks and months.
Bellingcat’s Investigative Tech Team has therefore created the Bellingcat TikTok Analysis Tool (built on the basis of a TikTok scraper offered by Github user drawrowfly) that allows researchers to collect a dataset of TikToks associated with specific hashtags over lengthier periods. The tool also allows researchers to analyse what other topics appear together with selected hashtags most frequently.
When reviewing large datasets, it can be particularly interesting to discover which hashtags are regularly added to TikToks which already share one specific tag. Many TikToks contain multiple hashtags, as illustrated by the image below of a train transporting Russian military equipment from January 2022.
By analysing patterns in the joint use of hashtags researchers could, for instance, help identify coordinated disinformation or influence campaigns.
In such cases, the same set of hashtags may be used over and over again across several posts. Looking at a larger number of posts can also help researchers learn more about the context in which a specific hashtag is used.
This is what Bellingcat’s TikTok tool does
With the help of this tool, we were able to compile a list of 732 TikToks which used the hashtag #schützteurekinder. This German-language hashtag translates to “protect your children”. It’s similar to the English #savethechildren hashtag which was widely used by QAnon believers on TikTok before the platform took measures to stop its spread by conspiracy theorists.
We wanted to find out whether this German hashtag has been used in a similar context and, if so, what that might mean.
Our tool then allowed us to view all the hashtags that showed up most frequently alongside #schützteurekinder in the 732 TikToks in our dataset. What we found was that among the 30 most frequently used hashtags were a combination of keywords that are often used in anti-vaxx communities.
The list included terms such as ungeimpft (unvaccinated), stehtauf (stand up), wahrheit (truth) and gegenimpfpflicht (against vaccine mandate).
Analysis with our tool suggests that German anti-vaxx communities are taking over the #schützteurekinder hashtag.
This is noteworthy as this hashtag has previously been used by concerned parents to, among other things, warn of the dangers of publishing pictures of their children on the internet.
It now appears that an individual with these concerns or who uses this hashtag for alternative means could potentially be exposed or drawn into Covid-19 conspiracy or anti-vaxx content.
How to use Bellingcat’s TikTok tool
Using Bellingcat’s TikTok tool is simple enough (see a condensed set of instructions on GitHub). Nevertheless, it’s worth providing a step-by-step guide for the benefit of researchers of all levels.
Importantly, it’s necessary to familiarise yourself with the command line. But if you have never done this before, don’t worry, we will show you how.
In order to collect the required amount of data you will have to activate the Bellingcat TikTok Analysis Tool regularly, perhaps daily.
Each time the tool is activated, it will start to identify new TikToks that have been labelled with your chosen hashtag and add them to a dataset that will be stored locally on your computer.
It is of course also possible to automate data collection by setting up a server but this is beyond the scope of this article.
The TikTok Analysis Tool will go to the page for the hashtag(s) you selected on the site — for example, here’s the page for #ukraine — and collect any text which accompanies the trending TikToks listed there.
However, there are two limitations to keep in mind when using our tool to analyse the data collected in this way.
Firstly, usually the tool does not at present collect more than approximately 1,000 TikToks at a time. This limitation seems to originate in the use of TikTok’s API by the scraper, which is used by the tool to obtain TikToks. This means that — unless a hashtag appeared only on very few occasions — our tool will not collect all TikToks which used it. Instead, it will provide a snapshot of what is happening on the platform. This fact must be taken into account when analysing the collected data.
Secondly, we are not currently aware of a method to only collect the newest TikToks due to the way the platform’s algorithm works. If you’ve conducted research on TikTok, you’ll know that some videos trend over and over again, even though they may be several months or even years old. The platform’s hashtag pages work in a similar way. They display currently trending videos for each hashtag irrespective of when those videos were initially posted. Exactly how the platform’s algorithms work remains TikTok’s secret. However, with a few exceptions, it appeared to the Bellingcat Tech Team that the number of likes a video received played some role in determining which showed up first on a page detailing specific hashtags.
Bearing these important caveats in mind, the Bellingcat TikTok tool works as follows: The first time you activate the tool, it looks at roughly the first 1,000 trending TikToks that are listed on the selected hashtag page, adding any text associated with them to the database stored on your computer.
The next time you use the tool, it again searches through the list of trending TikToks for your selected hashtag. But this time it only adds those that made it into the top TikToks since you last checked and which you don’t yet have in your database.
Since TikToks often trend for several weeks or even months, in most cases you should expect your database to grow slowly. This being the case, we recommend that you activate the tool on a regular basis over a longer period of time in order to collect a wider variety of data.
Download the TikTok tool on your computer
Now you can download drawrowfly’s scraper by typing:
npm i -g tiktok-scraper
And with that, the scraper should be installed.
Continue to use the terminal and go to the Desktop by using the command cd. This stands for “change directory” and brings you from your current location to the desktop.
You’ll now need to download Bellingcat’s TikTok tool. To do this you simply go to the following Github link that opens the page for our tiktok-hashtag-analysis tool. Click on the “Code” button (detailed in the image below) and choose the option Download ZIP.
The downloaded file should already be unzipped but if not go to your Downloads folder and unzip it there.
Now you can move the TikTok tool from the Downloads folder to the Desktop. You can do it the usual way or via the command line. If you use the command line, you first must type mv, which stands for “move”. Following mv, you add the path of the current location of the tool (the Downloads folder) followed by the path to the Desktop folder. It should look something like this:
mv ~/Downloads/tiktok-hashtag-analysis-main ~/Desktop
Now let us move to the tool folder using the command below:
You can of course also move the tool folder to another location than the desktop. However, we chose this location to make it as easy as possible to see where the tool actually ends up on your computer.
Start a virtual environment
We recommend using our tool within a virtual environment. A virtual environment is a guarded space within your operating system. Here, you can work on projects without breaking the current configuration of your computer’s operating system, which might sometimes happen if you install external software.
Create a virtual environment by typing:
python3 -m venv env
Start your virtual environment by typing:
On Windows, the command is:
You should now start to see the prefix (env) at the beginning of the terminal prompt. Now, install all the needed dependencies by typing:
pip install -r requirements.txt
Your first steps with the TikTok tool
You are finally ready to use the tool. The first step is to run the tool to collect TikToks for your chosen hashtag.
We are already in the tool folder tiktok-hashtag-analysis-main, and now let us go to the subfolder tiktok_hashtag_analysis in order to access and run our tool’s python scripts.
Starting the actual scraping process is easy. Let’s say we want to gather information about events that may be unfolding in Ukraine utilising the #ukraine hashtag. To do so, input the following command:
python3 run_downloader.py -t ukraine -p
Let us break down this command to understand what is happening here. The word python3 tells the computer that you will be running a Python script. run_downloander.py is the name of the file that you are running. The command then specifies the hashtag for which you want to download the post text for with “-t” informing the tool that the following words will be your chosen hashtag. In this example we chose the hashtag #ukraine but you can replace this hashtag with any other you may like depending on the subject you are researching. Just take care to leave the # symbol out of the command (type ukraine instead of #ukraine).
The “-p” flag means posts and informs the tool to just download the text and not the actual TikTok videos.
You can also download the actual TikTok videos (in mp4 format) along with the post text but this feature is currently not optimised. In general, however, it is worth keeping in mind that downloading videos requires much more storage, and as a result higher data rates, than if you are just downloading JSON-formatted post data.
We recommend you to skip this task for now. Still, if you want to experiment with downloading TikTok videos you can use the following command:
python3 run_downloader.py -t ukraine -v
You just replaced the “-p” with “-v” at the end of the command which stands for videos.
If you want to download both the videos and the text of TikToks for your chosen hashtag, just add both “-p” and “-v” flags at the end:
python3 run_downloader.py -t ukraine -p -v
Note that drawrowfly’s scraper is currently unable to download the videos and its corresponding text in a unified format, meaning that you’ll need to run multiple TikTok scraper requests.
Download the data for several hashtags
You can also download the results for several hashtags by simply adding multiple search terms to the command line. This is helpful if you want to quickly download the results for several hashtags that might be relevant for your research (but the method can not find posts that include several hashtags at once in a single post). In this case, the command would look something like this:
python3 run_downloader -t ukraine russia us germany -p
However, if you want to collect the text posts of a longer list of hashtags (for instance 50 or 100 hashtags) over a longer period of time, we suggest to turn those hashtags into a list (which we will explain below) and to run those hashtags all together with only one command instead of typing each hashtag in the command line each time.
The format of the list document has already been created for you and is waiting to be edited for your own purposes in the tool’s folder.
Taking a short break from the command line, let’s go and edit the list. Using the Mac finder, go to the subfolder tiktok_hashtag_analysis folder (in our case Desktop → tiktok-hashtag-analysis-main → tiktok_hashtag_analysis).
Within tiktok_hashtag_analysis you will find a file called hashtag_list.txt. Open this file by right clicking and choosing for instance the editor IDLE (which is standard for Macs) and you should see something like this:
Four hashtags (london, paris, newyork, tokyo) are provided as an example. You can change the list according to your preferences. If you want to add more hashtags, please add the new hashtag in a new line. Then save the file.
Each time you run the TikTok Analysis Tool, you can start the downloading process for all the hashtags in your list by entering the following instruction into the command line:
python3 run_downloader.py -f hashtag_list -p
How long the scraping process takes will depend on how many hashtags you have added to your list.
Find your downloaded data
You have now downloaded the text files for one or several hashtags on your computer. You can find the downloaded data in the data folder which is within the tool folder tiktok-hashtag-analysis-main.
The data folder contains subfolders. These concern each of the hashtags that you downloaded (for instance, if you have searched for the hashtag ukraine, this will have its own folder).
Let’s use our ukraine hashtag downloads as an example to explore how the downloaded data is organised. First, we click on the folder labelled after our search term (in this case, and as detailed in the image below, ukraine. All the text from the posts we captured that include this hashtag are stored in the data.json file in the “posts” folder. A json file is a file format that is commonly used to send data from a web application to a server and vice versa. It contains structured text and has the advantage that it can be understood by every programming language.
If you also downloaded videos, you will also see a videos folder here.
The ids folder contains the unique identifiers (provided by TikTok) for all posts and videos that you downloaded.
In addition, the tool folder folder also contains a file (logfile.log) that provides an overview of how many new posts were downloaded for each hashtag each time the tool was activated.
Analysing the data
With that data downloaded, you are now ready to do some analysis. Bellingcat’s TikTok tool is able to identify which other hashtags show up most frequently alongside your chosen hashtag.
Let’s find out which hashtags can be found most frequently in the TikTok posts in our #ukraine dataset. The next few steps will show you how to create a top hashtags list.
To begin, go back to the command line and punch in the following command:
python3 hashtag_frequencies.py ukraine 31 -d
You can change the word ukraine to any of your previously downloaded hashtags you are interested in by simply changing that term and maintaining the rest of the instruction.
Let’s look at what this command does. You already know the first part (python3 hashtag_frequencies.py) which tells your computer to run a Python script called hashtag_frequencies.py. The “-d” argument specifies that the output should be displayed in the terminal.
At the end of the command we added the number 31. This means that we will get a list of the top 30 hashtags that were found in our downloaded posts for the hashtag ukraine (the first listed hashtag is always the original input hashtag). You can also change this number to bring back a larger or smaller list.
Here’s what our hashtag analysis looked like in early May 2022:
Occurrences is the number of times this specific hashtag was found in the downloaded dataset for the hashtag ukraine. You can see that ukraine is number 0 in the list because we only collected TikTok posts that contain this hashtag. #Ukraine can be found 3908 times in the database, which consists of 3920 posts in total. In case there is a difference between the number of the Total posts and Occurrences, this is due to the fact that a few posts do not contain the requested hashtag. This seems to be due to an anomaly within the TikTok API.
All the other listed hashtags were found alongside the baseline hashtag ukraine in a number of TikTok posts. The hashtag russia, for instance, was found 2014 times. #prayforukraine showed up 108 times.
Frequency represents the ratio between how many posts in the dataset contain the listed hashtag divided by the total number of posts. It tells you how common this specific hashtag is. In our dataset #russia is by far the most common hashtag, followed by #fyp which stands for “for your page” and is used by TikTok users hoping their content will be shown more often to individual users. #War is the third most frequent hashtag in our dataset.
Now let’s compare this with the top hashtags in a dataset for the hashtag ukraine that we had scraped before the start of the war in Ukraine in early February 2022:
Hashtags related to the war could already be found in our previous dataset but they were still mixed with topics like #eurovision or #comedy. More Russian words were visible (which were mainly general terms like “recommendation” or “top”) as well as the word “Ukraine” in Russian and in Ukrainian.
You can also visualise the most common hashtags in your dataset using the following command:
python3 hashtag_frequencies.py -p ukraine 31
“-p” stands for plotting the result on a chart or graph and saving it in the following folder:
We recommend trying out this hashtag analysis with a few TikTok hashtags that are relevant to your own research topics.
You can also view a video of this process on our YouTube channel.
We plan to add further features to our TikTok tool over time. Anyone who would like to contribute to our tool and help perfect it is invited to do so. See the corresponding Github repository’s issues page for potential contribution ideas.
Editorial note (18/05/22): We have amended the installation instructions for node since the previous instructions caused installation difficulties on some computers.
Tristan Lee contributed to reviewing and improving the code of this TikTok tool.
The Bellingcat Investigative Tech Team develops tools for open source investigations and explores tech-focused research techniques. It consists of Aiganysh Aidarbekova, Tristan Lee, Miguel Ramalho, Johanna Wild and Logan Williams. Do you have a question about applying these methods or tools to your own research, or an interest in collaborating? Contact us here.