November 27, 2018

The very basics of static malware analysis in Windows environment

A lot of the information in this article is based on the book "Practical malware analysis - A hands on guide to dissecting malicious software" by Michael Sikorski and Andrew Honig. That is a highly recommended read if you are serious about learning to analyze malware.


About malware analysis

This article focuses on the theory of basic static malware analysis. I will later write an article where I conduct hands on analysis using the static analysis process described in this article. By conducting basic malware analysis, one can learn if the file is malicious, how it works, and maybe even find some indicators of compromise (IoC). Basic analysis is easy and quick to do, but it is not sufficient when dealing with more advanced malware.

In addition to static analysis, there is dynamic malware analysis. Dynamic analysis involves running the malware and monitoring its activity in a safe environment. Furthermore, static and dynamic analysis can be divided into basic and advanced levels. Advanced static analysis involves reverse-engineering the malware with a disassembler and looking into the malwares program instructions. Advanced dynamic analysis involves using a debugger to research the internal state of a running malware. These are topics I will also discuss later.

The why

The purpose of malware analysis is to find information needed to properly respond to a cybersecurity incident. Through malware analysis one should attempt to find answers to the following questions.
  1. What does the malware do? 
  2. How to detect it in your environment? 
  3. How to measure and contain its damage?

The basic concepts

Here are some basic concepts one should understand before delving into actual malware analysis. These concepts are relevant for Windows environments.

Portable Executable and file headers

As most malware are executable applications, it is important to understand what is a Portable Executable (PE). In practice, it is a file format used by Windows executables, object code, and DLLs. It is a data structure that contains information needed by the Windows PE loader to manage the executable code contained inside the PE file. 

In the beginning of a PE file is a file header that contains useful information about the application. We are particularly interested in the libraries that will be loaded and the functions that will be imported by the executable. Imported functions tell us a lot about the purpose of the application. Other relevant information inside the file headers include the compile time of the program. If the compile time is old, the malware is old and should be detected by antivirus software. However, it should be noted that the compile time can be faked.

Inside the file headers is also information about the image sections. The sections are the following:
  • ".text" - Contains the instructions the CPU executes.
  • ".rdata" - Contains import and export information.
  • ".data" - Contains any global or static variables which have a pre-defined value and can be modified.
  • ".rsrc" - Includes strings and other resources used by the executable. Malware can contain another program or a driver stored as a resource.
Image section headers reveal, for example, how much memory space should be allocated for each section during the PE loading process (Virtual Size), and how big the section is on disk (Raw Data Size). If the virtual size is bigger than the size on disk, it usually means the file is packed. This is is true especially for the ".text" section. For the ."data" section it is normal to have bigger virtual size than raw data size. Packed malware needs to be unpacked before it can be analyzed.


Dynamic-link Libraries

Dynamic-link Library (DLL) is Microsoft's implementation of the shared library concept in the Windows operating systems. DLLs can contain code, data, and resources. Actually, DLLs are essentially the same as EXEs. The difference is that it is not possible to directly execute a DLL. For executing code in DLLs, there are utilities such as RUNDLL.EXE and RUNDLL32.EXE. In essence, DLLs provide a mechanism to share code and data, allowing a developer to upgrade functionality without requiring applications to be re-linked or re-compiled.
Note! It might be hard to understand the difference between svchost.exe and rundll.exe as they are both associated with DLLs. While the rundll.exe and rundll32.exe are used to launch functionality in .dll files, the svchost.exe is used as a host process for services that run from DLL files. Let me clarify that with an example. If you want to install a service from a DLL file, you will execute that DLL with rundll32.exe. After installation, you start that service using net start. The service will then start inside a new svchost.exe process.

Common DLLs

It is useful to understand the purpose of common DLLs because that will quickly let you deduce what kind of functionality the malware might have. Below are some common DLLs imported by malware.
  • Kernel32.dll - Contains core functionality, such as access and manipulation of memory, files, and hardware.
  • Advapi32.dll - Provides access to advanced core Windows components such as the Service Manager and Registry. If this is imported by malware, we should search for strings that look like registry keys. For example the following registry key "Software\Microsoft\Windows\CurrentVersion\Run" is commonly used by malware to establish persistence. It controls which programs are automatically run when Windows starts up.
  • User32.dll - Contains all the user-interface components, such as buttons, scroll bars, and components for controlling and responding to user actions.
  • Gdi32.dll - Contains functions for displaying and manipulating graphics.
  • Ntdll.dll - The interface to the Windows kernel. Executables generally do not import this file directly, although it is always imported indirectly by Kernel32.dll. If an executable imports this file, it means that the author intended to use functionality not normally available to Windows programs. Some tasks, such as hiding functionality or manipulating processes, will use this interface.
  • WSock32.dll and Ws2_32.dll - Networking DLLs. A program that accesses either of these most likely connects to a network or performs network-related tasks.
  • Wininet.dll - Contains higher-level networking functions that implement protocols such as FTP, HTTP, and NTP.
  • Shell32.dll - Includes functionality that allows the program to execute other programs.

Linking and importing functions

Imported functions are executable code that is stored inside other files. Functions are imported to avoid needing to rewrite code that has already been written. There are so many functions available in Windows that most features in a program can be implemented using those functions. For example, the function "URLDownloadToFile" is used to download bits from the Internet and save them to a file. This function is often used by Trojan downloaders. They are types of malware that connect to a remote server in order to download additional malware onto the compromised computer.

Executables import functions by linking to the code in another program (usually a DLL). There are three different ways a program can import functions; dynamic, runtime, and static linking. Dynamic linking is the most common method, and the "preferred one" for malware analysts. When dynamic linking is used, the host OS fetches the required libraries when the program is loaded. Because the PE file header stores information about every dynamically linked library and function used by the executable, the malware analyst can easily snoop into the functionality of the executable.

Legitimate programs rarely use runtime linking, but it is used as an obfuscation method by malware. Executable that uses runtime linking connects to libraries and fetches function code only when it needs the function during execution. Functions imported during runtime are not visible in the file headers so it is harder to deduce the functionality of the malware using static analysis. The "LoadLibrary" and "GetProcAddress" are common Windows functions that allow a program to import functions during runtime. If you see those in the file headers, and not much besides them, it probably means that the rest are imported during runtime.

Static linking is commonly used in Unix and Linux programs, but not so much in Windows environments. In practice, static linking copies all the target code from the library into the executable. This method introduces some unnecessary redundancy and does not scale so well when compared to dynamic and runtime linking. 


Static Analysis process

  1. Test the malware against multiple antivirus programs to see if it is detected and for information on what type of malware it might be. I recommend using Virustotal.
    • Note that when you send files to VirusTotal they become openly available for download for other people. Do not do this unless you are sure that it is alright to upload the file.
  2. Run the malware in a sandbox to see what can be easily found about it. I recommend using Cuckoo
    • Good sandbox can automate most of the basic malware analysis process. If you want to learn, you might want to do the work manually first.
  3. Search through the strings of the malware to get hints about its functionality. A program will contain strings if it prints a message, connects to a URL, or copies a file to a specific location.
    • Note that legitimate programs usually contain many text strings. If you find a suspicious program that only contains few strings, it is probably either obfuscated or packed to hide it's malicious functionality.
  4. Check Image file headers. When was the file compiled? Look into the ".text" image section header to see if the virtual size is bigger than the size on disk.
  5. If a program is packed, you must unpack it before you can perform any further analysis. Check for other signs of packing/obfuscation used on the malware. What obfuscation was used? Can you deobfuscate it?
  6. Check what functions the malware imports.
    • If you do not know what a function does, you have to find out. In that case, search for the function in Microsoft documentation.

Useful programs

  • Cuckoo is an open source automated malware analysis system.
  • Dependency Walker can be used to list dynamically linked functions in an executable.
  • Flare VM is a customized Windows virtual machine with numerous tools for malware analysis.
  • INetSim is a software suite for simulating common Internet services in lab environment.
  • PEiD is used to identify what packaging  method is used on a packed PE file.
  • PEView is used to look into the PE file headers.
  • Resource Hacker can be used to extract and look into a resource in a PE file.
  • Strings can be used to print out the strings in a malware.
  • UPX is a common packaging method and the software can be used to unpack a program.


Concluding remarks

This concludes the theory of the basic static malware analysis on Windows. Next time we will do some actual malware analysis.

In my opinion, one of the hardest parts in conducting static analysis is to understand what are the applications of the functions the malware imports. In other words, how does a malicious program harness a particular function to achieve its goals? Why does the malware use that specific function? That understanding comes with experience and via deliberate practice. In malware analysis, as in life, great power is not easy to attain. You need to work hard for it.

No comments :

Post a Comment