Automatic classification of malware


Last year we posted an article about graphic representations of malware, in which we commented that it's possible to automatically identify and classify malware into a family based on their graphical structure representation. This representation is based on the relationship between function calls in the executable. These relationships create a graph of the internal structure of the executable. These graphs are very similar among samples of the same family or among samples which share the same source code. There are several publications about this technique (Ero Carrera & Gergely Erdély [VB2004]) and all of us have heard about Sabre Security VxClass Project, which is a system to automatically unpack and classify a binary into a family.

PandaLabs is 'two or three steps ahead' too and we have developed our own system to automatically identify and classify the samples we receive daily. Of course, this system works with unpacked samples, that's why we use it with our generic unpacker engine. We have made a flash video [14 MB] (to show you how this system works. Basically the steps are:

  • Unpack the sample (the system only works with unpacked binaries)
  • Drag&Drop it into the client application
  • The client application sends it to the graph server
  • The server analyzes it with IDA and uses several python scripts to extract:
    • Graph of function calls
    • Control Flow Graph (cfg) of functions
    • Entropy
    • CRC32 and custom CRC of functions
  • Preselect samples from the database, applying several filters: entropy, compiler, filesize,... Then, the resulting ones will be compared with our sample.
This data will be used to compare the sample with our entire graph database (Actually, we have already analyzed and stored in the graph database 185.000 samples).
 

Site feed