Malware Analysis At A Glance
Written on October 12th, 2022 by MouseWhat is Malware?
Malware, or malicious software, is software designed to cause harm to a system or network. It is common for malware to have mechanisms built in to spread itself throughout a system, remain undetectable, damage data, and render a computer unusable.
One famous example of malware is Stuxnet. In January 2010, a routine inspection of the Natanz Uranium Enrichment Plant in Iran was carried out by the International Atomic Energy Agency. They found that many centrifuges were failing, but no cause was specified. Later in 2010, a number of computer systems in the plant were crashing and rebooting. When technicians analysed the computer for IoCs, or Indicators of Compromise, they found numerous malicious files installed. Those technicians had just discovered one of the most infamous pieces of malware ever cooked up - Stuxnet.
Stuxnet spread itself through Windows PCs and USB sticks until it found its way into the Natanz plant. When critical SCADA (Supervisory Control and Data Acquisition) systems, such as nuclear gas centrifuges, were infected, the malware caused them to malfunction. The Natanz centrifuges would essentially tear themselves apart.
Stuxnet was an extremely sophisticated, targeted attack which severely crippled Iran’s nuclear weapons program. It is speculated that Stuxnet was built as a cyber weapon as a collaboration between US and Israeli intelligence agencies, but neither have since claimed responsibility. Stuxnet was the first demonstration of malware being used to cause significant physical damage, and bring a country’s nuclear program to its knees.
This article is an introduction to malware, and will discuss some basic malware analysis techniques. Malware analysis is a huge area, so I will only be scratching the surface. This article is loosely based off the TryHackMe malware module.
Types of Malware
Malware campaigns generally come in two flavours:
- Targeted campaigns - created for a specific purpose for a specific target
The Stuxnet malware campaign was targeted - it was targeting Iranian nuclear facilities
- Mass campaigns - the aim is to infect as many devices as possible, and are usually run by APT groups, or Advanced Persistent Threat groups.
In April 2022, Bitdefender detailed a mass malware campaign using RedLine Stealer - a data harvesting piece of malware that can steal passwords, usernames and credit card data on Windows PCs. RedLine Stealer is often sold on underground forums as malware-as-a-service. This is when novices can buy malware off the shelf and execute their own malware campaigns. Bitdefender’s whitepaper can be downloaded from here.
Malware can come in all shapes and sizes, and some common malware types are listed below. Note that some definitions have no consensus - I will be using the definitions coined by Fortinet.
- Viruses - Malware which can propagate through a system, but requires human intervention
- File viruses - These infect files on a system, often executables, and can infect other files when opened
- Macro viruses - Microsoft Office products allow users to write small programs to automate repetitive tasks, called macros. Macro viruses infect Microsoft Office files, and propagate once the file is opened
- Polymorphic viruses - Viruses that can change their form to evade detection
- Trojans - Malware that hides in legitimate files, named after the famous Trojan Horse from Greek mythology
- Worms - Malware that is similar to viruses, but does not require human intervention to propagate - they spread themselves through a system on their own
- Spam - Malware that is stored inside email attachments or links, usually tricking a user to click a link or download a program
- Ransomware - Malware that encrypts data on a system, and kindly requests users to pay a ransom to get their data back, usually in the form of cryptocurrencies.
- Rootkits - Malware that functions as a Swiss army knife - they embed themselves deep into a system, often onto the kernel itself, and allow threat actors complete control over a system
- Adware - One of the most common types of malware infection, these will cause unwanted advertisements to pop up all over a computer system
- Spyware - Malware that quietly infects a system and sends personal data back to threat actors. This category includes keyloggers, which record all keystrokes typed by a user and sends these back to the threat actor.
Malware Attack Chain
When malware infects a system it will usually follow a similar pattern, often leaving a trail of crumbs for malware analysts to follow. The steps most malware will go through are:
- Delivery - The method by which the malware gains initial access to a system
For example, USB sticks (like Stuxnet) and email attachments from phishing emails
- Execution - The main part of malware classification, this is what the malware actually does to a system
For example, ransomware will encrypt files and spyware will record and transmit data such as keystrokes.
- Persistence - Usually malware will want to stay in the target system without being detected for further attack.
Malware can often be found hiding in parts of a system like the registry and startup programs.
- Propagation - The method by which malware will spread to other avaliable devices
The BlackBasta Ransomware can spread through local Windows machines by connecting to Active Directory, scanning for other computers on a local network, copy the malware onto them, then run them remotely with the Component Object Model (COM). Trend Micro have a fascinating analysis of the BlackBasta ransomware.
This whole attack chain will leave signatures behind, both host-based and network-based:
- Host-based - the results of any execution or persistence, such as encrypted files and additional installed software
For example, automated vulnerability scans and enumeration on a system often leave a very obvious trail, and can usually be considered an indicator of compromise
- Network-based - this includes any networking communications made during delivery, execution and propagation, for example connecting to a C2 server, or ransomware contacting victims for cryptocurrency payments
C2 servers, or command and control servers, are servers run by threat actors to send additional instructions to small persistent programs hidden in a system called beacons.
Malware Analysis
Now for some actual malware analysis. There are two types of malware analysis:
- Static analysis - analysis of the state of the malware sample before any execution. This is used to get a high-level understanding of a piece of malware
This uses techniques such as signature analysis - all files can be hashed to generate a checksum and compared against other known hashes. Organisations such as Virustotal have a library of known malware hashes. More on this to come…
- Dynamic analysis - analysis of malware whilst it is being executed. This is much more involved and dangerous, and should always be carried out in a sandbox - an isolated environment where any malware can’t affect your real data, such as a virtual machine.
MD5 Checksums
Taking the MD5 hash of a file is like taking a cryptographic fingerprint. The checksum will be a 32 character hexadecimal number. If two files have the same checksum, there is a high probability that the files are the same.
MD5 is a hashing algorithm. Hashing algorithms are ‘one-way functions’. It is easy to compute the hash of an input, but computationally difficult to go backwards. For this reason, hash functions are ubiquitous in cryptography. Hash functions also have the property that a tiny change in an input will result in a completely different output. This makes hashing useful for fingerprinting files.
On Linux, the md5sum
command will compute the MD5 checksum of a file. In the example below, I have created two text files file1.txt
and file2.txt
, each with different content. Next, the md5sum
command is used. Observe that the two files each have completely different checksums.
┌──(kali㉿kali)-[~/Documents]
└─$ echo 'bees' > file1.txt; echo 'honey' > file2.txt
┌──(kali㉿kali)-[~/Documents]
└─$ md5sum file1.txt
cd4c78001a0a37a39c44154f7b680785 file1.txt
┌──(kali㉿kali)-[~/Documents]
└─$ md5sum file2.txt
069cfc88dd2a624d8963fb9657ecfa74 file2.txt
In the next example, I have created a third file, file3.txt
, and given it the same content as file1.txt
- bees
. When the MD5 checksum is computed, note that the result is the same as file1.txt
.
┌──(kali㉿kali)-[~/Documents]
└─$ echo 'bees' > file3.txt
┌──(kali㉿kali)-[~/Documents]
└─$ md5sum file3.txt
cd4c78001a0a37a39c44154f7b680785 file3.txt
MD5 checksums allow uniform identification of malware for the malware analysis community. Organisations like Virustotal will collect and analyse malware, and publish their checksums for users to observe and compare to. If you are a malware analyst and come across a nasty unfamiliar piece of malware, you can upload it to Virustotal . If someone else has come across this malware before and uploaded it, you will have their analysis ready for you.
These examples have been provided by Tryhackme’s Malware Analysis module.
These three files are common enough looking executables. They are named aws.exe
, NetLogo.exe
and vlc.exe
. For example, VLC is a legitimate Windows media player.
Let’s check out their MD5 checksums to see if they are who they say they are.
HashTab is a Windows application for computing hashes of files. It is an extension to Windows explorer, and appears as an additional tab in a file’s properties
window. HashTab can be downloaded from here. HashTab will compute checksums with various hashing algorithms, and allows you to compare the hashes of two files.
On each of the above screenshots, we can see the MD5 checksums of each file. We see that the hashes are
aws.exe - D2778164EF643BA8F44CC202EC7EF157
NetLogo.exe - 59CB421172A89E1E16C11A428326952C
vlc.exe - 5416BE1B8B04B1681CB39CF0E2CAAD9F
We can now upload these hashes to Virustotal to check if they have been previously flagged as malicious.
We see that Virustotal does not mark this file as malicious. The same result occurs for all three example files. Note that just because a file was not marked as malicious, doesn’t mean it is harmless. This just means that no vendors or sandboxes on Virustotal were able to find anything malicious. There are many ways for threat actors to hide the true purpose of their malware.
Magic Numbers
Just because a file tells you it is an executable doesn’t mean it really is one. Likewise, just because a file tells you it is a cute cat picture in JPG format, doesn’t mean it is not a malicious executable. It is easy for threat actors to spoof a file extension. The first few bytes in the hexdump of any file contains a series of magic bytes - identifying the true file type. The table below shows the magic bytes of some common file types:
Magic Bytes | File Type | File Extension |
---|---|---|
0x4D 0x5A | DOS MZ executables | .exe, .dll |
0x7F 0x45 0x4C 0x46 | Linux executables | .elf |
0x50 0x4B 0x03 0x04 | ZIP archives | .zip |
0x52 0x61 0x72 0x21 0x1A 0x00 | RAR archives | .rar |
0x89 0x50 0x4E 0x47 0x0D 0x0A 0x1A 0x0A | PNG images | .png |
0x47 0x49 0x46 0x38 0x37 0x61 | GIF images | .gif |
0xFF 0xD8 0xFF 0xEE | JPG images | .jpg |
This page and this Wikipedia entry contain comprehensive lists of magic bytes and their corresponding file signatures.
So how do you check the magic bytes of a file? One of the methods by which the Linux file
command determines file types is via magic numbers. From the file manual page:
Files have a “magic number” stored in a particular place near the beginning of the file that tells the UNIX operating system that the file is a binary executable, and which of several types thereof.
Moreover, one can manually view the hexdump of a file, and check its leading hex values. This can be achieved with the Linux xxd
command, which generates the hexdump of a given file.
In the example below, we investigate the uname
binary program on Linux. uname
is a binary that will output system information - it should be an executable file.
┌──(kali㉿kali)-[/bin]
└─$ file uname
uname: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=f8e1b819dde5bcc14373bbbbf5b8dc3e1d1dad7c, for GNU/Linux 3.2.0, stripped
┌──(kali㉿kali)-[/bin]
└─$ xxd uname | head -1
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............
When we use the file
command on uname
, we see that this is an ELF file. When we generate the hexdump of the file, the first few bytes are 7f45 4c46
. Referring back to the table above, we see that these are indeed the ELF magic numbers.
Let’s see how threat actors can exploit magic bytes.
The file php_reverse_shell.php
is exactly what it sounds like - it is a reverse shell payload written in PHP. This payload was written by PentestMonkey and the full source code can be found here.
Websites which allow arbitrary file uploads are vulnerable to this kind of payload. Many websites will filter uploads by checking both file extensions and magic numbers. Changing a file extension is easy - just change the name.
We will now see how to change magic numbers.
└─$ file php_reverse_shell.php
php_reverse_shell.php: PHP script, ASCII text
As we can see, the php_reverse_shell.php
file is correctly recognised as a PHP file. We will use the Linux program hexeditor
to add the JPG magic bytes to the header of this file.
Using the command hexeditor -b php_reverse_shell.php
we can see the current hexdump in the above image. The -b
flag instructs the program to load the entire hexdump into memory, so that we can edit it. CTRL+A
will append a new hex byte to the beginning. We will use this to add the bytes FF D8 FF EE
- the magic bytes for JPG files. The new hexdump can be seen in the image below.
Now that we have added some new magic bytes, we use file
to check the file type.
┌──(kali㉿kali)-[~/Downloads]
└─$ file php_reverse_shell.php
php_reverse_shell.php: PHP script, ASCII text
┌──(kali㉿kali)-[~/Downloads]
└─$ hexeditor -b php_reverse_shell.php
┌──(kali㉿kali)-[~/Downloads]
└─$ file php_reverse_shell.php
php_reverse_shell.php: JPEG image data
The file
command thinks that our malicious PHP script is a JPEG file! This technique can be used by threat actors to bypass file restrictions, and obfuscate the true nature of their malware. Malware analysts should always check the magic bytes of any suspicious files to uncover their true nature.
Packing
Packing is a method to prevent the decompiling of a program, often used by threat actors when making malware. Packing is a type of obfuscation - attempting to conceal the true purpose of a program. See my previous article for a brief overview of code obfuscation.
Packing is accomplished using a program called a packer. These will modify the format of code by compressing or encrypting it. Packing will change a file’s signature, so can be used to attempt evasion of signature-based detection. Packers will often leave small portions of code, known as stubs, which contain the decryption or decompression agent necessary to decrypt the file (the program must be decompressed/decrypted to actually be executable).
One notable example of packing was during the SolarWinds attack in 2020. APT29, a Russian Foreign Intelligence Service threat group, used packing in their Raindrop loaders. They used a custom packer to obfuscate Cobalt Strike payloads, using the LZMA compression algorithm. For more examples of packing in the wild, see MITRE Att&ck’s technique page for packing.
Packers come from the olden days of computing, when networks were not as fast as today and programs had to be compressed to be transmitted. These days, packers are not as necessary to transfer large files, so the usage of packers can be considered suspicious.
UPX (Ultimate Packer for eXecutables) is a file packer. It is open-source, and will compress files and obfuscate their content. See this article to learn how to unpack UPX-packed files. UPX uses a stub-payload architecture. This method of packing will compress the file, add a new header, and add a stub at the end for decompression purposes.
When a program needs to be executed, the unpacking algorithm will use the stub to unpack, and write the original unpacked executable in a new segment of the program.
So how can we figure out if a file has been packed?
- Lack of imports
Most executables require importing system libraries to interact with the operating system. For example, most Windows executables will import
kernel32.dll
anduser31.dll
. When examining ordinary unpacked programs you will find many system imports, but packed programs will have few. Moreover, the stub of a packed program doesn’t have much functionality other than unpacking and executing the payload, so will have a suspiciously low number of imports. - Non-standard section names
Most executables will have similar section headers, depending on the exact format. For example, most executables will have sections labeled
text
,raw
anddata
. Packers will commonly define their own custom section names. For example, UPX uses section namesUPX0
andUPX1
. These non-standard section names can be a telltale sign that a packer has been used. - High entropy
Entropy is a measure of the ‘randomness’ of a string of text. Standard programming languages will have a low entropy given their strict structuring. However, encrypted and compressed strings are far less predictable, so will have high entropy levels. This Wikipedia page contains a good description of the entropy measure.
- Lack of strings
Ordinary executables will have many human-readable strings, for example HTTP
POST
andGET
requests can easily be understood by us humans, and will show up when examining a file. When a program is packed, the encryption/compression will obfuscate these strings. If there are a low number of human readable strings in a program, there’s a good chance it has been packed. - Read-write-execute permissions
The permissions of an executable can reveal vital information. Ordinary executables would usually only have execution permissions, as it is rare for a program to dynamically write to itself. However, packed programs must be unpacked into a section (using write permissions) before being executed. If a program has both read, write and execute permissions, it could have been passed through a packer.
Alternatively, there exist tools to detect if a given executable has been packed. PEiD is a tool for identifying the packer used on packed PE files (Portable Executable). It can detect over 470 different packer signatures. Beware that if a threat actor uses their own custom packer, like APT29 did as discussed above, PEiD will not be able to identify the packer.
In the below screenshot, we see PEiD used to identify a packer. For the given sample, we see that the FSG packer was used. To learn more about unpacking FSG, see this article.
Reverse engineering techniques can be used to determine if a file has been packed. IDA Freeware (Interactive DissAssembler) is a program for disassembling compiled files, by generating assembly code from machine code. It can be used to disassemble Windows PE, MAC and Linux ELF executables.
This example of a packed file has been taken from the Tryhackme Malware Analysis module. In the below screenshot, we see what IDA greets us with when opening a packed file.
If we check the imports tab, we see very few imports. As discussed above, this is unusual for ordinary executables so should raise our suspicions.
We also observe the flow of the program. The execution flowchart is fairly small. Ordinary executables will have a much more complicated program flow, as seen in the second image below. This is because the unobfuscated part of a packed file does not need to do much - it only needs to unpack the executable and execute.
We have had a very brief overview of IDA Freeware. In the future, I plan on writing more articles on reverse engineering going much more in depth into this powerful tool.
Strings
Analysing the strings of a program can be a powerful tool for static analysis. Compiled programs will usually have human-readable strings in their executables. Some examples can be hard-coded passwords, IP addresses, and cryptocurrency wallet addresses (often found when analysing ransomware). Analysing these strings can help build up a picture of the behaviours of a program.
The infamous WannaCry ransomware attacks of 2017 were stopped by analysing strings. Whilst the attacks were ongoing, Marcus Hutchins, a British hacker, found a suspicious URL in the list of strings in the WannaCry program. Hutchins discovered that this URL functioned as a killswitch for the ransomware - as soon as the URL was registered, the attacks stopped. Hutchins was shot to fame and gained the attention of the whole cybersecurity community - as well as the FBI. After a week of partying during the DEFCON conference in August 2017, Hutchins was arrested by the FBI when trying to board his flight back to the UK. The FBI had discovered that years earlier, Hutchins was involved in creating his own rootkits for stealing people’s passwords and other personal data. To read more on Hutchin’s story, check out this Wired article.
In Linux, the strings
command can be used for string analysis. On Windows, Microsoft’s Sysinternals has a strings program. strings
will output a lot of garbage, so sifting through the output can be time consuming. Below is the output of strings
when run on Tryhackme’s example.
The first thing we can see are section headers .text
, .rdata
, .reloc
. Compiled executables contain different sections for holding textual data, variable values, memory addresses, etc.
As we scroll through the output, we find something more interesting. It appears that these are functions used in the executable.
Upon further inspection, we can see that this program is attempting to query some Windows registry entries. Analysing output like this can give a great indication into the internals of a program.
Conclusion
In this article, we have looked at some common types of malware, and some ways of understanding them. We have learnt how to get the MD5 checksums of malware, and how to determine file types. We looked at some basic reverse engineering. Each of these topics could have been an article in their own right. Reverse engineering is a massive topic, with entire books written on it. For a great book on malware analysis, check out Practical Malware Analysis by Michael Sikorski and Andrew Honig.
Ever since the inception of computers, malware has been an everlasting battle. Hackers create malware, and analysts dissect it to try prevent more damage. As long as there are hackers around, malware will always be a pressing issue, and cybersecurity professionals must always try stay on top.
List of External Links
TryHackMe Malware Analysis module
Microsoft Component Object Model
Black Basta analysis by Trend Micro
Wikipedia list of file signatures
PentestMonkey’s PHP reverse shell
MITRE page on raindrop loaders