Article written by David Sánchez Lavado

This post explains how to analyze the malicious code used in current Exploit Kits.

There are many ways to analyze this type of code, and you can find tools that do most of the job automatically. However, as researchers who like to understand how things work, we are going to analyze it with no other tools than a text editor and a Web browser.

My goal is to lay the basis for you to learn how to remove the different obfuscation layers that a malicious JavaScript code may employ. I will teach you how to remove those layers step by until you get to the last layer where the logic that exploits the relevant vulnerability is found.

IMPORTANT: I recommend that you perform this type of analysis on a virtual machine on its own isolated network in a laboratory dedicated exclusively to this type of research to avoid unwanted infection.

BASIC CONCEPTS

Generally speaking, malicious code is used to exploit vulnerabilities in Web browsers and PDF readers like Adobe Reader or Foxit. This code is usually written in javascript and has various layers of obfuscation. Code obfuscation techniques are generally used to make code difficult to understand for researchers, avoid detection by signatures or bypass automated scanning tools. The way they work is really simple: each of these layers calls other functions that obfuscate code that will become part of the next layer and so on and so forth until the final code.

The final code is normally divided into two parts. The first one aims at detecting the Web browser version and the plug-ins installed on the victim’s computer (like Adobe Reader, Apple Quicktime or the Java virtual machine). The second part selects the vulnerability to exploit according to the information gathered in the first part.

CODE ANALYSIS

The image below is a screenshot of the malicious code to be analyzed in this article.

As you can see, the code is made up of several HTML objects. However, if you look closer you can actually identify different things in these objects: First: The value of the id attribute for each of these objects has the format “<number>+CytobimusubekUda”, where “<number>” is a number from 0 to 1230 in consecutive order. Second: The value of each object is an apparently meaningless string of characters of approximately the same length, and the word Add repeated several times inside it.

All this seems to indicate that the id attribute is used as an index (look at the consecutive numbers) in a cycle to parse all HTML objects and deobfuscate their contents to create a new code layer. Let’s start analyzing the code.

FORMATTING THE CODE

The first thing I usually do when examining a javascript code is use the Format Code option in Malzilla. This option formats the code as if it had been written with a program such as Visual Studio. Although simple, this is a very important step as many times the code is not properly formatted and is hard to understand.

You could also do this manually, line by line, but you risk making a mistake and it will take you too long. For example, the malicious code that we will analyze here contains almost 600 lines of script code and HTML code.

Malzilla is an excellent utility to analyze malicious code automatically. However, in this article we intend to analyze this malware strain manually.

Unformatted code (before using the "Format Code” option)
Unformatted code (before using the "Format Code” option)
Well-formatted code (after using Malzilla’s “Format Code” option)
Well-formatted code (after using Malzilla’s “Format Code” option)

THE TOOL

The next step is to copy the well-formatted Javascript code to the text editor to be used in the analysis. Any text editor with the following basic options should be enough:

  1. JavaScript code identification: It will help you view the code and quickly detect Javascript functions.
  2. String search-and-replace: This will help you avoid mistakes when replacing the names of functions and variables.
  3. Windows Tabs: This is optional. Tabs will let you work very quickly when analyzing the code of various files.

FINDING THE ‘START’ FUNCTION

The sample currently has 96 lines of javascript code and more than 500 lines of HTML code. You will reduce the number of lines as you remove the obfuscation layers. The first thing you have to do is determine the javascript code that runs when the browser loads the malicious Web page. Then you have to analyze all the other functions as they are run.

The first steps to take with every function are the following:

  1. Simplify the code to analyze
  2. Rename the functions and variables for the code to be easier to understand.

To do that, first check the HTML code, and if there is no HTML object that calls a javascript function, proceed to analyze the code found between the <script> and </script> tags. There you must find the code that does not belong to a function definition, as that will be the code that runs automatically when the Web page is loaded by the browser.

The screenshot below shows that code between lines 81 and 89 (both included). You can also see that the HazakeduhaQurenepenus() function (85) is the first one to run (the previous three don’t perform any important actions). Therefore, this is the first function that you must analyze.

Code run on loading the page (red rectangle)
Code run on loading the page (red rectangle)

SIMPLIFYING THE CODE AND MAKING IT EASIER TO UNDERSTAND

Simplifying the code and making it easy to understand is one of the most difficult yet important tasks. It involves studying almost every instruction in the javascript code, and modifying them to create a code that is easier to understand and analyze.

VERY IMPORTANT: When modifying the code, don’t change the final result that would be returned by the original code.

As previously said, start with the HazakeduhaQurenepenus() function. This function looks like this:

“HazakedubaQurenepenus()” function before the analysis
“HazakedubaQurenepenus()” function before the analysis

In the code, pay special attention to the functions that are not part of the javascript API, that is, the functions programmed by the user. You have to resolve the value that these will return in order to analyze the function.

In the code above, the factor to resolve is the PypiwIgo() function that has the following code:

If you take a look at it and you are familiar with the javascript language, you will realize that the function will return the getElementById string every time it is called. With this in mind and knowing that the DeqesedaDakonyqev variable refers to the document object, you can make the first change for the code to be easier to understand. The resulting code will look like this:

“HazakedubaQurenepenus()” function after the analysis
“HazakedubaQurenepenus()” function after the analysis

You may have noticed that I have changed the name of several variables and of the analyzed function itself to func_decrypt_01. This may seem a little bit bold, but after having analyzed many functions like this you become capable of recognizing certain code structures at a glance.

Your next objective is to resolve the value to be returned by the function in the buffer variable. To do that, you must separate the function from the original code and run it independently. Prior to that, you must make sure that the function to analyze will not need any external values or any other piece of data calculated by any other function of the assigned code in any global variable. Otherwise, you will have to first calculate that value and then replace it in the code to isolate. This is very important as otherwise you will probably not be able to run the code separately: the Web browser will show an error when loading the page and it will not be possible to run the code or it simply won’t behave in the same way as if it had been run with the entire malicious code.

Let’s see this with an example in the code we are analyzing. The following instruction refers to an external value in the DasuRokyduconiwidy HTML object.

string_01 = document.getElementById(“DasuRokyduconiwidy”).innerHTML;

The resulting value is assigned to the string_01 variable. Since this variable is used inside the code, you must resolve its value. Otherwise, if the variable was only used to confuse the user, you could eliminate it from the code.

The technique of using data in HTML objects and referring to it from the javascript code is frequently used to obfuscate code by splitting it into parts. This serves to bypass the automatic analyses performed by certain tools unable to interpret the connection between the javascript and the HTML code.

This anti-analysis technique is also used by malicious PDF files. The technique involves making calls to the Adobe PDF API’s javascript functions, which cannot be interpreted by many analysis tools.

The first thing you need to do is find the DasuRokyduconiwidy object. Once you find it, assign its value to the string_01 variable in the script code that you have created, and replace the return buffer instruction with a TEXTAREA object that will show the content of the buffer variable once the new code is run in the Web browser.

Value of the DasuRokyduconiwidy object and line of code to replace
Value of the DasuRokyduconiwidy object and line of code to replace

The screenshot below shows the simplified code and how the “return buffer” instruction has been replaced with a textarea object created at runtime.

New code created to view the result of the buffer variable
New code created to view the result of the buffer variable

Once you have the code, open it with the Web browser to see the function result.

Value of the buffer variable
Value of the buffer variable

As you can see, the returned result is a string comprising a sequence of names of javascript API functions. Once you have resolved the value obtained when calling the func_decrypt_01 function, rename the GuzoZaq variable. This is the variable that the return value is assigned to. For example, call it concat_func_string, and then assign to it the value obtained in the textarea object. The code will look like this.

concat_func_string variable with the value already resolved
concat_func_string variable with the value already resolved

Continue analyzing the code run when loading the Web page. The next function to analyze is NupUr(). This function calls function HaynubOguf(), which you must resolve before continuing to analyze the code. HaynubOguf( ) is a very simple function that returns the substr string, which is the name of a javascript function whose job is to obtain a substring from a string. Therefore, rename the HanynubOguf() function to func_substr(). The NupUr() function will look like this.

NupUr() function to analyze
NupUr() function to analyze

Now that you have “resolved” the different parts of the function code, make the code more readable. This involves resolving the names of all the functions in brackets from inside out.

As you can see, the code uses the concat_func_string variable. If you remember, this variable refers to a string made up of the names of multiple javascript API functions. Also, note that the code uses the substr variable as well. This indicates that part of the string will be extracted to obtain the name of the function to be later on used in the code.

Original function Resolved function
[func_substr()](63,14) .substr(63,14)
[concat_func_string.substr(63,14)] getElementById
[func_substr()](1736/56,585/65) [func_substr()][31,9] → .substr(31,9)
[concat_func_string.substr(31,9)] .InnerHTML

The result is the following code:

Resolved NupUr() function
Resolved NupUr() function

As you resolve more and more functions you will be able to discover the actions to be taken by the rest of them simply by taking a glance at their code. This is because you’ll have already resolved many unknown values. This will help you analyze other functions more quickly and eliminate obfuscation layers more easily.

Finally, let’s analyze the MivoJaqugutec() function:

Unresolved NivoJaqugutec function
Unresolved NivoJaqugutec function

At first glance, the first thing that you can identify in the code is a cycle that runs through all of the HTML objects, storing their values and concatenating them in the PofUhicehofudilysuwe variable returned by the function once the cycle ends. Well, with everything you have learnt so far you probably know what to do. Separate the function from the original code, resolve the unknown values and rename its variables for the code to be easier to understand. Your objective should be to determine the value of the PofUhicehofudilysuwe variable in the return instruction.

Code used to get the value of the PofUhicehofudilysuwe variable renamed to buffer
Code used to get the value of the PofUhicehofudilysuwe variable renamed to buffer

Once you run the code on the Web browser you’ll get the following result:

Similarly, transform the other functions in the code that’s left to analyze. The final result is quite interesting: you’ve gone from 96 lines of javascript code and some 500 lines of HTML code to just 2 lines of javascript code with the eval() and unescape() functions.

These 2 functions normally indicate the execution of a new obfuscation layer. Have you reached your final objective yet? Is this the final layer responsible for triggering the vulnerability? Well, let’s see what it contains.

ACCESSING THE FINAL CODE

The last 2 lines of code include the payload variable, which refers to an encoded, 55,496-character-long unicode string. After running its content with the eval( unescape(payload) ) instruction you’ll get to the last layer in the malicious code.

In this last part of the article we will only analyze the generic parts often found in malicious codes.

The following two screenshots show a series of instructions that are often used both in legitimate and malicious code, although with very different purposes. Whereas they are used in legitimate code for design purposes, in malicious code they are used to obtain information about the victim’s environment and exploit the most appropriate vulnerability.

As you can see in the two screenshots above, the programmer has used the userAgent method of the navigator object to identify the Web browser used by the victim. In the case of Internet Explorer they check to see if the version is lower than 6.

They also try to identify if there are any plug-ins installed on the browser.

In this code the programmer has decided to create an object identified by the CLSID CA8A9780-280D-11CF-A24D-444553540000 in the Pdf1 variable. Although the name of the variable gives a hint as to what object the programmer wants to create, let’s make sure. Use the regedit.exe tool to find the CLSID key in the Windows registry.

Our suppositions were true: The CLSID key refers to the Adobe Acrobat/Reader ActiveX control. The programmer has created this object to find out if the victim has Adobe Acrobat or Adobe Reader installed (and what version they are using), and select the malicious PDF file that can exploit one of the vulnerabilities in the detected version.

They use the GetVersions() method to find out the version of the Adobe program installed on the victim’s computer, as seen in the first instruction in the code below:

The last part of the code is used to select the most appropriate PDF file to exploit the vulnerability. If the value of the lv variable is greater than or equal to 800 (which possibly identifies version 8), the code will call the fghjdfgxbz function passing the string “d0456d.pdf” as a parameter. Otherwise, it will pass the “07dd5d.pdf” string as a parameter. The fghjdfgxbz function simply creates an IFRAME object at runtime that points to the value passed as the parameter. As a result, the Web browser will open a malicious PDF file designed to exploit an unpatched security vulnerability.

To sum up, in this article we have explained how to analyze and deobfuscate the layers of one of the malicious codes currently used in exploit kits, with just a text editor, a Web browser and some knowledge of JavaScript and HTML. We have also analyzed part of the final code to show you some of the methods used to detect the Web browser and the plug-ins installed on victims’ computers. Happy hunting!!