the home of online investigations

How to Collect Sources from Syria If You Don’t Read Arabic

July 10, 2018

By Noor Nahas

Information from groups involved in international conflicts has never been as accessible as it is now. Rebel groups run social media accounts, video of missile launches are posted to Twitter, and the production value of propaganda has skyrocketed.

For researchers and journalists who do not master the Arabic language, the same problems still exist: how does one, with no knowledge of a region’s language, collect accurate and timely information?

While traditional resources like translators and sources remain important to any non-native speaker, a number of tools and strategies exist on the internet for doing some if not most of the work yourself.

This guide will introduce you to some free and easy to use tools that should get you started on researching groups and verify content you find online.

Establishing a Basic Understanding

The Syrian civil war is a complex, convoluted mess of alliances, backers, and enemies. In a matter of days, a group can go from allies, enemies, and back to working closely.

As you can see from this detailed chart of groups and factions by Cody Roche, which may be already partly outdated by now, keeping track of groups is at best extremely difficult.

You will need to familiarize yourself with the basic actors, overall factions, and recognize major areas of interest. This involves looking at important mainstream sources like Associated Press or any other major news organization for a broad overview or simply using a site like Wikipedia.

This will give you some initial insight to begin your research and identification of a group, but also important information like city names, province names, or other important geographical identification.

Once you established a basic understanding you’ll be ready to jump into what’s available in English language resources.

Basic Tools for Arabic Research

At this point you have some idea of the conflict. You know the major factions, and you’ve gathered some solid English language sources of media.

Before going over any Arabic related content it’s important to have some basic tools to help you with your research.

Google Translate is an obvious one, giving you the ability to translate phrases and names into Arabic when doing searches online.

Optical Character Recognition (OCR) will help in identifying text and writing in images you would otherwise be unable to read or is not included in the post.

Both of these tools are not guaranteed to be perfect. To make up for any outputs or results that don’t match the original text you’ll need to correct the language issues.

To do so, you’ll need to install an Arabic language keyboard for your computer or use a physical one. Since this is a guide for people who don’t speak Arabic, I’ve put together the below cheat sheet to help you identify and match any letters you may see that Google Translate or OCR software were not able to pick up.

Arabic Cheat Sheet

The Process

So you’ve come across a post on Liveuamap about a group named al-Binyan al-Marsus in Manshia, Daraa.

Bunyan al Marsous live map update

Our goal will be to find their Telegram or other social media accounts.

Without knowing much about the group, we can do a quick Google search of the English name in Google Translate and get a pretty rough translation of “عمليات البنيان المرسي”. Taking this we’ll plug it into Google and search.

This will bring up a similarly named Libyan Rebel group’s Facebook page.  This isn’t really useful for our search, but you’ll notice that their name is written as “عمليات البنيان المرصوص”.

Because we can’t expect Google Translate to properly translate every single phrase, double checking against other search terms is a great way of finding alternative or correct spellings of group names.

We can plug in the name into Facebook, Twitter, or YouTube and look for possible matches, looking for anything that matches content from the initial post we found.

To save us time, we can take the translation of Daraa “درعا” and search for them together. Searching for “ درعا عمليات البنيان المرصوص” on Twitter leads us to this page:

Twitter search results for "درعا عمليات البنيان المرصوص"

Twitter search results for “درعا عمليات البنيان المرصوص”

Combing through their Tweets or media releases on other social media channels you’ve searched you will find images and videos that include their social media handles.

Typical tweet from a rebel group twitter account

A typical tweet from a rebel group Twitter account

For the most part, these groups want to be found. If anything you’ll be in a struggle with YouTube and Facebook as they take down pages and content.

By finding original sources you’ll often find hashtags and names of organizations and conflicts. You can use these to narrow down your search and find groups linked to each other.

Most importantly in any images you find you should lookings of social media sites. Telegram, an encrypted messaging service, is one of the most popular and consistent platforms for groups. If you find a group’s telegram page you’re guaranteed a much more stable source of information from these groups.

Due to efforts by some of these platforms to censor or delete pages with graphic content, it’s recommended you try and find at least two or three pages you can follow them on. Groups, especially larger ones who are consistently targeted on social media, are always sharing their most recent pages.

Using OCR

Optical Character Recognition (OCR) is a powerful tool when you do not have access to anything but an image of the text you are trying to translate.

Because there is no set standard for media releases and the level of professionalism can vary from group to group, not all releases include editable text or even written summaries.

To get around this and save time, you can use an OCR tool. There’s many available, so try them out and see which one works best for you.

Using the previous Tweet from Bunyan Marsous, which has matching text in the tweet, we can compare the results.

First, to make it easier for the software to recognize the letters we’ll cut out all the unnecessary images.

Cropped bunyan al marsous tweet

Going through New OCR we’ll upload the image and select the language to detect.

Language and image upload new on OCR

Pressing upload and OCR will give us the following result.

Cropping Language and image upload new on OCR

Without knowing any Arabic we know can see that the first and last sentences were good matches. There are some errors, but it’s close.

Now we can start tuning our input. By changing the bounding box around it, the OCR site gives us this:

There are still some issues, but we now have three of the five lines written down, with some fixable errors.

There could be a variety of reasons for why the site is not picking up third or fourth lines. In this case, the background includes a lot of mixed graphical elements. By editing the image and filling the space with black and inputting it into the site we then get:

Cropping Language and image upload new on OCR 3

We now have four lines of something workable. Isolating the sentence that was not being identified at all gives us one more rough transcription:

Cropping Language and image upload new on OCR 4

 

Comparing the text…

أَم حَسِبتُم أَن تَدخُلُوا الجَنَّةَ وَلَمّا يَأتِكُم مَثَلُ الَّذينَ خَلَوا مِن قَبلِكُم مَسَّتهُمُ البَأساءُ وَالضَّرّاءُ وَزُلزِلوا حَتّى يَقولَ الرَّسولُ وَالَّذينَ آمَنوا مَعَهُ مَتى نَصرُ اللَّهِ أَلا إِنَّ نَصرَ اللَّهِ قَريبٌ

أم خسـبتم أن تدخنـور الجنة ولمـ بأتكـم مثل الذين خنـوا مـن قبيخـم هشـتهم التأسـاغ والضـزاغ وزلزنوا حتـى تقول السول والذين آقنـوا فغــة فتى نصز الله أال إن نص الله قريت

..will show a flawed, but ultimately a workable piece of text and, in combination with the cheat sheet provided, we can make the appropriate corrections to translate or search for the origins of the text. This example text being verse 214 of al-Baqarah, the second Surah of the Qur’an.

Most of the time you won’t be working on complicated texts like this, but it shows how powerful OCR can be when you’re working on translation.

Cropping Language and image upload new on OCR 5

Typical use of OCR software where the name of the Southern Front in Arabic was copied correctly and can easily be searched for.

Keywords and Key Images

The most important thing to remember when searching for groups is to always look for unique keywords or hashtags.

These channels are run by human beings who are doing their best to make their releases easy to find with the least amount of effort.

So finding common hashtags, keywords, or phrases you see being repeated throughout releases will often lead you to other groups.

Search showing National Union Twitter presence of multiple accounts, including back ups

The same idea helps when it comes to images and logos. Keeping note of common imagery and media styles helps to identify allied groups, backup channels, and ideologically related groups.

As you go through these channels make sure you’re always asking if what you’re reading is in line with the other channels you’ve seen. Disinformation and fake channels exist and can be very well made.

Limitations and Recommendations

What you’ve gained and learned from the guide will serve as a stepping stone into a larger world of open source Arabic research.

This is not a total replacement of Arabic interpreters and learning the language. A basic knowledge of Arabic, even a short introductory course, can provide you with an even greater understanding of the content and I can’t recommend it enough.

But even without a full mastery of the language, this guide should serve as the foundation for better understanding of researching groups in Syria and the region.

Noor Nahas

Independent researcher following the Syrian Civil War using open source research.

Join the Bellingcat Mailing List:

Enter your email address to receive a weekly digest of Bellingcat posts, links to open source research articles, and more.

Support Bellingcat

You can support the work of Bellingcat by donating through the below link:

4 Comments

  1. Ramon Hernandez

    Dear Mr. Nahas impressive work, many hours of work I hope that with a minimum of risk,Question: in some of your investigations you have information about the activity of those organizations in Latin America specifically Venezuela / Colombia

    Reply
    • Noor Nahas

      Most of my focus has been on rebel groups and international Jihadist organizations in Syria primarily. From what I can remember I have never encountered any connections with groups/activity in South America.

      There is a pretty large diaspora of Syrians and Lebanese in the region, but I think the most active group in the region has been (Lebanese) Hezbollah, which will be easier to find online.

      If you’ve heard of any of these groups becoming active in Venezuela or Colombia, that would be really interesting to read about.

      Reply
  2. Mark Monday

    It is harder than difficult to boil down a complex investigation process into an article of this size. This is a valuable addition to OSINT writings.

    Reply

Leave a Reply

  • (will not be published)