Tuesday, September 4, 2018

Hunting for similar Samples with Joe Sandbox Class 2.0

The malware landscape is constantly evolving, and currently, we no longer see tens of thousands of different active malware threats, but only a few different malware families that often share common source code.

Similarity analysis aka hunting for similar samples has recently gained a lot of attention in the security community and as a result, we decided to completely renew Joe Sandbox Class and enhance it with great new features.


In this blog post, we will outline some of the new features related to x86 / x64 code hunting while in a second one, we will outline all the major improvements we have done to search samples for similar architectures.

For those who are not yet familiar with this feature, Joe Sandbox Class is Joe Security's code hunting engine. It's built upon a large database of disassembly functions which are compared against the analyzed sample. 

Joe Sandbox Class 2.0 Intro


How does it work? Joe Sandbox Class acquires data from the Hybrid Code Analysis technology that generates disassembly from memory dumps:




Doing disassembly on memory dumps has a couple of benefits which result in richer functions that include more strings and API calls. In addition, results are more constant than what a disassembler would create from an executable on the disk. Finally, Hybrid Code Analysis generates disassembly from any code including hidden or non-executed sections, shell code etc. 

Rich disassembly functions are an excellent source for similarity analysis and hunting. They often stay the same for several malware versions or variants or are just changed slightly. 

All those rich functions are loaded into Joe Sandbox Class also known as feature selection. Next, Class will generalize the functions. For instance, a file path or URL string is replaced with a generic token. This is important because in different variants the code stays the same but a URL or file path may vary. Afterward, Class will select only the most interesting and relevant functions and those which appear too often are classified as not interesting. The same applies to functions which appear in goodware. Finally, the actual similar function search is performed:



Joe Sandbox Class has several comparison algorithms based on:
  • Strings and APIs
  • Instruction bytes
  • Opcodes 
It implements both precise and fuzzy matching. Once the similarity search is done, Class generates an extensive report. 

Hunting for similar DarkComet Samples


That all being said, let us have a look at a couple of interesting class reports. Here is a DarkComet RAT sample:



The sample was analyzed on August 29th and created six processes. If we jump to the Hybrid Code Analysis section, the redrv.exe with PID 3468 has many interesting functions. Below you can see the function which is the core of DarkComet's keylogger:



Let us now move to the Classification Report for that sample:


Strings and APIs were used for similarity analysis with a precise match:


In total, Joe Sandbox Class found 207915 similar functions in 20178 processes. If we browse down to the similar processes we see that the first process does not have many similar functions. The most are 8 functions.



However, if we scroll down to the process with PID 3468 we see some processes with many similar functions:


If we click on the first process named SCAN00GO we can have a look at all similar functions. Those functions appear one to one in our initial sample and SCAN00GO:


Do you remember this function? Yes, this is the keylogging code. 

If you browse further you can also see all similar functions and how often they appear. For instance, the keylogging function is very unique and perfect for matching similar samples since it was found only 18 times:


However, function Function_0004E254 appears very often and thus does not qualify as being relevant:


While we could introduce whitelists for functions and statistical bounds, we decided not to do that and let the analyst have the final decision. 

Hunting for EQNEDT32.EXE Shellcode

Let us have a look at another sample. This time it is a malicious RTF which uses CVE 2017-11882 or CVE-2018-0802 for payload execution:



Joe Sandbox found shell code which was executed in the Microsoft Office Equation Editor:





Let us move on to the Classification report:


There are 8 function matches in 5 processes which all are inside EQNEDT32.EXE:

 For each match we can easily access the initial file name Conti5290.doc as well as the SHA256:


Or here Quotation Request FRQW9087454.doc:



Final Words


Joe Sandbox Class 2.0 has been completely revamped with the cybersecurity analyst focus in mind. The new Classification Report enables security professionals to easily find similar processes based on rich disassembly functions generated by Hybrid Code Analysis. Hunting for individual functions is now easily possible with Class 2.0 that can be configured to use a wide set of different data sources and comparison algorithms.

Interested in trying out Joe Sandbox Class 2.0?  Then hurry up and contact us for an in-depth technical demo!

Full Analysis and Class Reports:

* DarkComet Analysis Report
* DarkComet Classification Report
* CVE 2017-1188 Shellcode Analysis Report