Tuesday, January 15, 2019

Architecture independent Malware Similarity Analysis with Joe Sandbox Class 3.0

Hunting for similar malware is the process of identifying similar samples based on IOCs, behavior, functions or other data. It helps analysts to find malware families, understand the evolution of threats and provides an indication for attribution.

There are various techniques to perform similarity analysis or classification. Often, the malware is disassembled and a unique identifier at a function level is being calculated (e.g. by using the instructions, opcodes, control flow graphs, API calls etc). This process is called feature selection and it is done on a large volume of malware. In order to check for similar malware, the feature database is queried for all samples which share a set of identical features:

Joe Sandbox Class 2.0, the similarity engine of Joe Sandbox is based on this technique. To get a better idea, please have a look at the screenshot below, extracted from a recent Emotet analysis. The first section contains the number of features in the database followed by a list of processes. For the Emotet process 161.exe, all similar samples are listed. On the right side, a bar indicates the number of similar functions for each similar process. For instance, the sample deepwindow.exe has 79 similar functions.

Using disassembly data for similarity analysis has many benefits, such as the possibility to use differential hashing, as well as the high interest in the matched data.

However, malware authors have the freedom to write malware in any programing language including C#, VB.Net, Java, Powershell, VBS or Javascript. Generating meaningful disassembly and function out of all these frameworks is a very challenging task.

Secondly, malware also targets other operating systems like Linux, macOS, or Android. Again, we have a large variety of new frameworks and programming languages to support. Think about Python, Bash, Golang, LUA etc.

Finally, x86 and x64 code can be well obfuscated, making the disassembly and feature selection extremely difficult.

Isn't there an easier way to perform similarity analysis on all of these architectures?

There is, but let us first have a look at something else: Behavior Signatures. Joe Sandbox executes malware in a controlled environment and during execution, it records dynamic data such as system calls, API calls, memory dumps etc. To identify and rate that dynamic data, we write rules, so-called Behavior Signatures. Here is an example:

Joe Sandbox has one of the largest behavior signature set in the industry. The set includes nearly 2,000 manually written behavior signatures, detecting malware on Windows, Android, MacOS, Linux and iOS. Please note, a behavior signature does not care about the programming language used by the malware, it just detects a fact about the behavior. So behavior signatures are abstractions of the code and therefore are the perfect features for similarity analysis.

In Joe Sandbox Class 3.0 which will be part of our upcoming Joe Sandbox v25 Tiger's Eye release, we have successfully implemented similarity analysis based on behavior signatures. The results are really good, let us have a look at a couple of recent samples.

Windows: LokiBot

The results of the signature similarity have been integrated into the Joe Sandbox main analysis report. However, there is also a separate report which contains just the similar sample information:

From the top navigation, go to Overview and then Signature Overview. What you see there is what we call signature similarity graph:

Each node represents a malware analysis (not a malware sample!). If two nodes are connected the analyses are similar. The number, as well as the color, indicates how similar. Each node has the name of the sample submitted to Joe Sandbox as well as a color bar. The color bar represents all the behavior signatures which matched. You can move over the bar with your mouse to see which signatures were hit:

The color bar helps to see why two analyses are similar. The graph itself is interactive, you can use your mouse wheel to zoom in or out. If a node has a small plus symbol you can extend the graph. The minus symbol will close the connected subgraph:

Let us focus on the graph structure of LokiBot - a very famous and active information stealer. On the left side of the graph, you find many samples with high similarity. We manually verified that they are all LokiBot. The samples on the right are also confirmed LokiBots, but an older version. Right after the graph, you find a list of all similar samples including a link to the behavior report:

Windows: NanoCore RAT

LokiBot is written in C/C++ so it could also have been detected with function based similarity analysis. Nanocore RAT is a remote access tool developed in .NET. The corresponding similarity graph looks like so:

What are some of the most common behaviors of NanoCore RAT? Here is a list:

  • Uses schtasks.exe or at.exe to add and modify task schedules
  • Hides that the sample has been downloaded from the Internet (zone.identifier)
  • Detected unpacking (overwrites its own PE header)
  • .NET source code contains potential unpacker
  • Detected TCP or UDP traffic on non-standard ports
  • Uses dynamic DNS services
  • Injects a PE file into a foreign processes
  • Parts of these applications are using the .NET runtime (Probably coded in C#)
  • Initial sample is a PE file and has a suspicious name

Because NanoCore RAT is written in .NET, x86/x64 ASM based function similarity analysis would fail. The same applies to ADWIND RAT, a remote access tool written in Java:

Android: Anubis

We have seen that behavior signatures work great to classify analysis on Windows. How about Android? A particular interesting sample is Anubis. Anubis is a well-known banking Trojan which has been around for years. Beside the Trojan payload, it has also some ransomware functionality. Joe Sandbox detects Anubis right away:

The behavior similarity graph of Anubis is shown below:

All analyses are confirmed to be Anubis. The right subgraph has some very high similarities. We checked the analysis reports in detail and found out that they all come from a specific campaign where a link to Anubis was likely distributed via MMS. To prevent that the user gets worried about his device all analyses show the same sweet puppy on the screen:

Another interesting observation is that the list of target bank has been continuously extended. The recent sample targets over 300 banks while the one from the MMS campaign has only 70 targets:

macOS: Retefe

We looked at malware targeting Windows and Android so far, what else? macOS! Retefe is an e-Banking trojan which infects Windows and macOS systems. Retefe is very active in European countries. A recent sample was detected by one of our customers. The similarity graph looks as shown below:

Only the left branch has high similarities and is Retefe. The right branch has some similar behavior but contains different programs. From the analysis reports, we extracted all screenshots which demonstrate that Retefe has changed the installer over time:

Linux: Miners

Finally, let us move to Linux and the IOT world. Crypto Miners are a constant threat to Linux server operating systems:

We will use the following crypto miner shell script named lowerv2.sh:

The generated similarity graph reveals some interesting facts:

First, all analysis have crypto mining functionality. 

The analysis with the highest match is coming from a sample with the name rootv2_1.sh:

Rootv2_1.sh is a modified version:

What are the differences? First, as you can imagine it uses different domains to download the crypto config:

Secondly, it changes the install location:

However, both times the malware persists itself to /tmp.

Final Worlds

By using several recent samples we have demonstrated that behavior signature-based similarity analysis has many benefits. It classifies samples no matter if they are written in .Net, Java or Visual Basic. Traditional similarity analysis which depends on x86 / x64 functions as features can be easily foiled by using packing and obfuscations. Behavior signature does not have this limitation. Finally, behavior signatures enable to do architecture independent sample comparison. 

Joe Sandbox Class 3.0 includes a new similarity analysis which is based on Joe Security's massive behavior signature set. Class 3.0 will be released as a part of our upcoming Tiger's Eye - Joe Sandbox v25 release.

Want to try Joe Sandbox? Register for Free at Joe Sandbox Cloud Basic or contact us for an in-depth technical demo!

Bonus Pafish

You are looking for a bonus? Below you find the similarity graph of Pafish. Pafish is a well-known tool to check how well a sandbox hides its artifacts from the malware. Malware often tries to detect that it is running - e.g. by checking that computer is a virtual machine.

On the left side, you find a couple of different Pafish variants, mostly old versions. The fourth branch which starts with loader.exe is interesting:

Those samples are not Pafish variants but rather loaders which adopted techniques implemented in Pafish. Loaders are small tools which have the purpose to verify that all is good and then start the main payload. Often they include anti-debugging and anti-virtual machine checks:

Monday, December 31, 2018

Happy New Year!

The Joe Security team wishes you success, satisfaction and many pleasant moments in 2019!

Wednesday, December 12, 2018

Joe Sandbox Mail Monitor 2.0

As a security professional working in a SOC, CERT or CIRT, you are constantly bombarded with requests from end users asking if the e-Mail attachment they received is safe to open or not. This kind of requests have recently increased with the last Emotet trojan malspam campaign using Word or PDF attachments as a lure:

In most cases, you would take the e-mail and submit it to Joe Sandbox in order to check if it is malicious. If the document analysis shows signs of maliciousness, you would consequently inform the end user.

Wouldn't it be nice if this whole process could be automated so that you can focus on more important tasks?

In this regards, we have good news for you! Joe Sandbox Mail Monitor may be exactly what you are looking for. Joe Sandbox Mail Monitor is integrated into Joe Sandbox Cloud Pro as well as into our on-premise products. We recently added a couple of new interesting features to Joe Sandbox Mail Monitor 2.0 and will present some of them in this blog post.

What exactly is Mail Monitor? Please have a look at the diagram below:

To enable Mail Monitor you first create a new e-mail account with the name sandbox@yourhost.com. Your end-users will then forward suspicious e-Mails to the defined email account. Mail Monitor will periodically fetch new e-mails from that account and submit them to Joe Sandbox. Then, Joe Sandbox will fully dissect the e-mail and analyze all the attachments and URLs it finds in the email body (you have a configurable whitelist to prevent analysis of links in your e-mail signatures). Once the analysis is finished a notification e-Mail is sent to the end user:

With Mail Monitor 2.0 end-users can now also be notified as soon as the forwarded e-mail has been received by Joe Sandbox:

Further, we added summary notifications. Let us assume that the forwarded email contains multiple links and/or attachments. With Mail Monitor 2.0 you can choose if the end user shall receive a notification for each analyzed link and attachment, or just one summary notification:

The detection for summary notifications is based on the analysis with the highest score, i.e. the most malicious sample or URL.

On top of this enhancement, we extended the customization of notifications:

For each notification, you can change the subject and body. For better visibility please choose the Joe Security design.

Finally, we also improved:
  • URL extraction from e-Mail bodies
  • Notifications for cached analysis
  • More intuitive design 
  • Use of {{subject}}, {{to}} and {{from}} in the templates
Does this sound good to you? Would you like to try out Joe Sandbox Mail Monitor 2.0? Contact us today!

Tuesday, November 27, 2018

Generic Unpacking Detection

Malware authors use a wide range of techniques to avoid detection by security tools. One of the most used techniques is packing. This powerful procedure allows attackers to bypass static signature detection, an important defense line of Antivirus products.

Unpacking is the process of restoring the original malware code and is considered a hot topic for academic research due to its complexity.

Joe Sandbox includes a generic unpacking engine since 2014. While unpacking is one problem, generic unpacking detection is another.

In this blog post, we are going to outline how packing works and how the recently added unpacking detection of Joe Sandbox works.

The Art of Packing and Unpacking

It is hard to describe packing in words, therefore please have a look at the visualization below:

Packing is usually applied to executable files such as the Windows Portable Executable (PE) or the Linux Executable and Linkable Format (ELF). The tool which performs the packing process is called "Packer".

The starting point is a PE file. The workflow of packing, unpacking and execution is as follows:

1) Original File

If you look at your PE file it contains a header, a code section (.text) and some additional sections (.data, .rsrc etc). Very important, all the code is available for static analysis. It is relatively simple to find unique code patterns in the code segment to detect the file as malicious. 

2) Packing Process

The packing process will generate a totally new PE file and will contain a new header. Next, the original file will be transformed. The transformation is often a compression algorithm, a cryptographic operation (XOR) or a mix of both. Often a random key is used for the transformation. As a result, each packed sample is unique. The transformed original file is copied to the new file. Finally, a small Stub code is added to the new PE file. Its goal is to reverse the transformation during execution. Since the original file is compressed and encrypted, static analysis and detection is hard

3) Loading the Packed File Phase 1

When the packed file is started it is mapped to virtual memory. Next, the unpacking stub is called to reverse the compression and/or cryptographic operation. As a result, the original file is "restored" in memory. There are two possibilities where the file is restored. Either the complete packed PE file is replaced with the original, or it is restored on a different memory address. 

4)  Loading the Packed File Phase 2

As soon as the original file is "restored", the stub will transfer execution to the "restored" file. The restored file will then execute as normal.


Packers are available in a large variety. You can buy them in the DarkNet or also from legit software vendors. Below you can see a nice map from Ange Albertini which shows some of the most famous packers:

Generic Unpacking Detection

Since most malware is packed, it not only makes sense to do generic unpacking but also to detect the unpacking process itself. This generic unpacking detection has been recently added to Joe Sandbox. In order to demonstrate its power, we will look at two different samples. 

PE Header Overwriting

The first sample is called XgkKQZc74T.exe. During execution, the image is mapped to address 0x400000:

Joe Sandbox's unpacking engine generates several "restored" files:

The first file with the name 1.0.XgkKQZc74T.exe.400000.0.unpack was captured before any code has been executed. The second file which starts with the name 1.2.XgkKQZc74T.exe.400000.0.unpack was stored when the analysis finished. Please note that both files have been restored from the same address 0x400000. 

Let us have a look at the import address table for each restored file. The import address table shows what functions are imported by the PE file. The first file (1.0.XgkKQZc74T.exe.400000.0.unpack) has many imports:

In contrast, the second file has fewer imports and most of them are not in the previous files. For instance, the sample can connect via HTTP to the Internet. The previous sample does not have an import for such a function:

This change of the PE file header proves that the sample is packed. The PE header at address 0x400000 has been overwritten with the unpacked file. As a result, the import address table changed and we see above the table from the unpacked/malicious file. With a new behavior signature Joe Sandbox detects this anomaly:

If we look at the unpacked file we can also find the command and control IP / domain:

Dynamic Code Loading

The second sample is named WBKDqSfWLj.exe. It is loaded at address 0xdb0000:

If we browse some of the behavior we detect that some calls originate from 0xdb0000:

However, there are also calls coming from 0x400000:

Could this be an unpacked file? If we browse to the memory activities we indeed see that there is some allocation of memory at the address 0x400000:

As for the previous sample, we can compare the import address table of the corresponding unpacked files. This times the base address of the images is different:

For file 1.0.WBKDqSfWLj.exe.db0000.0.unpack the import address table is:

And for file 1.2.WBKDqSfWLj.exe.400000.1.unpack the import address table is:

Again, we see different tables which outlines that a new PE file has been loaded. This time the PE header of the original file is not overwritten. Rather, the original file is unpacked/decompressed to a new memory section which was allocated by the stub. Of course, there is also a behavior signature in Joe Sandbox to detect this:

Final Words

Packing is widely used by many malware samples to bypass static signature detection. Joe Sandbox includes an unpacking engine which will restore the original file. The restored files can be downloaded by analysts:

While unpacking itself is helpful, unpacking detection is even more important. With the upgrade, Joe Sandbox detects unpacking via PE header overwriting and dynamic code loading:

Want to try Joe Sandbox? Register for Free at Joe Sandbox Cloud Basic or contact us for an in-depth technical demo!