Malware Analysis At A Glance

What is Malware?

Malware, or malicious software, is software designed to cause harm to a system or network. It is common for malware to have mechanisms built in to spread itself throughout a system, remain undetectable, damage data, and render a computer unusable.

One famous example of malware is Stuxnet. In January 2010, a routine inspection of the Natanz Uranium Enrichment Plant in Iran was carried out by the International Atomic Energy Agency. They found that many centrifuges were failing, but no cause was specified. Later in 2010, a number of computer systems in the plant were crashing and rebooting. When technicians analysed the computer for IoCs, or Indicators of Compromise, they found numerous malicious files installed. Those technicians had just discovered one of the most infamous pieces of malware ever cooked up - Stuxnet.

Stuxnet spread itself through Windows PCs and USB sticks until it found its way into the Natanz plant. When critical SCADA (Supervisory Control and Data Acquisition) systems, such as nuclear gas centrifuges, were infected, the malware caused them to malfunction. The Natanz centrifuges would essentially tear themselves apart.

Stuxnet was an extremely sophisticated, targeted attack which severely crippled Iran’s nuclear weapons program. It is speculated that Stuxnet was built as a cyber weapon as a collaboration between US and Israeli intelligence agencies, but neither have since claimed responsibility. Stuxnet was the first demonstration of malware being used to cause significant physical damage, and bring a country’s nuclear program to its knees.

This article is an introduction to malware, and will discuss some basic malware analysis techniques. Malware analysis is a huge area, so I will only be scratching the surface. This article is loosely based off the TryHackMe malware module.

Types of Malware

Malware campaigns generally come in two flavours:

  1. Targeted campaigns - created for a specific purpose for a specific target

    The Stuxnet malware campaign was targeted - it was targeting Iranian nuclear facilities

  2. Mass campaigns - the aim is to infect as many devices as possible, and are usually run by APT groups, or Advanced Persistent Threat groups.

    In April 2022, Bitdefender detailed a mass malware campaign using RedLine Stealer - a data harvesting piece of malware that can steal passwords, usernames and credit card data on Windows PCs. RedLine Stealer is often sold on underground forums as malware-as-a-service. This is when novices can buy malware off the shelf and execute their own malware campaigns. Bitdefender’s whitepaper can be downloaded from here.

Malware can come in all shapes and sizes, and some common malware types are listed below. Note that some definitions have no consensus - I will be using the definitions coined by Fortinet.

  • Viruses - Malware which can propagate through a system, but requires human intervention
    • File viruses - These infect files on a system, often executables, and can infect other files when opened
    • Macro viruses - Microsoft Office products allow users to write small programs to automate repetitive tasks, called macros. Macro viruses infect Microsoft Office files, and propagate once the file is opened
    • Polymorphic viruses - Viruses that can change their form to evade detection
  • Trojans - Malware that hides in legitimate files, named after the famous Trojan Horse from Greek mythology
  • Worms - Malware that is similar to viruses, but does not require human intervention to propagate - they spread themselves through a system on their own
  • Spam - Malware that is stored inside email attachments or links, usually tricking a user to click a link or download a program
  • Ransomware - Malware that encrypts data on a system, and kindly requests users to pay a ransom to get their data back, usually in the form of cryptocurrencies.
  • Rootkits - Malware that functions as a Swiss army knife - they embed themselves deep into a system, often onto the kernel itself, and allow threat actors complete control over a system
  • Adware - One of the most common types of malware infection, these will cause unwanted advertisements to pop up all over a computer system
  • Spyware - Malware that quietly infects a system and sends personal data back to threat actors. This category includes keyloggers, which record all keystrokes typed by a user and sends these back to the threat actor.

Malware Attack Chain

When malware infects a system it will usually follow a similar pattern, often leaving a trail of crumbs for malware analysts to follow. The steps most malware will go through are:

  • Delivery - The method by which the malware gains initial access to a system

    For example, USB sticks (like Stuxnet) and email attachments from phishing emails

  • Execution - The main part of malware classification, this is what the malware actually does to a system

    For example, ransomware will encrypt files and spyware will record and transmit data such as keystrokes.

  • Persistence - Usually malware will want to stay in the target system without being detected for further attack.

    Malware can often be found hiding in parts of a system like the registry and startup programs.

  • Propagation - The method by which malware will spread to other avaliable devices

    The BlackBasta Ransomware can spread through local Windows machines by connecting to Active Directory, scanning for other computers on a local network, copy the malware onto them, then run them remotely with the Component Object Model (COM). Trend Micro have a fascinating analysis of the BlackBasta ransomware.

This whole attack chain will leave signatures behind, both host-based and network-based:

  • Host-based - the results of any execution or persistence, such as encrypted files and additional installed software

    For example, automated vulnerability scans and enumeration on a system often leave a very obvious trail, and can usually be considered an indicator of compromise

  • Network-based - this includes any networking communications made during delivery, execution and propagation, for example connecting to a C2 server, or ransomware contacting victims for cryptocurrency payments

    C2 servers, or command and control servers, are servers run by threat actors to send additional instructions to small persistent programs hidden in a system called beacons.

Malware Analysis

Now for some actual malware analysis. There are two types of malware analysis:

  1. Static analysis - analysis of the state of the malware sample before any execution. This is used to get a high-level understanding of a piece of malware

    This uses techniques such as signature analysis - all files can be hashed to generate a checksum and compared against other known hashes. Organisations such as Virustotal have a library of known malware hashes. More on this to come…

  2. Dynamic analysis - analysis of malware whilst it is being executed. This is much more involved and dangerous, and should always be carried out in a sandbox - an isolated environment where any malware can’t affect your real data, such as a virtual machine.

MD5 Checksums

Taking the MD5 hash of a file is like taking a cryptographic fingerprint. The checksum will be a 32 character hexadecimal number. If two files have the same checksum, there is a high probability that the files are the same.

MD5 is a hashing algorithm. Hashing algorithms are ‘one-way functions’. It is easy to compute the hash of an input, but computationally difficult to go backwards. For this reason, hash functions are ubiquitous in cryptography. Hash functions also have the property that a tiny change in an input will result in a completely different output. This makes hashing useful for fingerprinting files.

On Linux, the md5sum command will compute the MD5 checksum of a file. In the example below, I have created two text files file1.txt and file2.txt, each with different content. Next, the md5sum command is used. Observe that the two files each have completely different checksums.

┌──(kali㉿kali)-[~/Documents]
└─$ echo 'bees' > file1.txt; echo 'honey' > file2.txt

┌──(kali㉿kali)-[~/Documents]
└─$ md5sum file1.txt
cd4c78001a0a37a39c44154f7b680785  file1.txt

┌──(kali㉿kali)-[~/Documents]
└─$ md5sum file2.txt
069cfc88dd2a624d8963fb9657ecfa74  file2.txt

In the next example, I have created a third file, file3.txt, and given it the same content as file1.txt - bees. When the MD5 checksum is computed, note that the result is the same as file1.txt.

┌──(kali㉿kali)-[~/Documents]
└─$ echo 'bees' > file3.txt

┌──(kali㉿kali)-[~/Documents]
└─$ md5sum file3.txt
cd4c78001a0a37a39c44154f7b680785  file3.txt

MD5 checksums allow uniform identification of malware for the malware analysis community. Organisations like Virustotal will collect and analyse malware, and publish their checksums for users to observe and compare to. If you are a malware analyst and come across a nasty unfamiliar piece of malware, you can upload it to Virustotal . If someone else has come across this malware before and uploaded it, you will have their analysis ready for you.

These examples have been provided by Tryhackme’s Malware Analysis module.

These three files are common enough looking executables. They are named aws.exe, NetLogo.exe and vlc.exe. For example, VLC is a legitimate Windows media player.

alt text

Let’s check out their MD5 checksums to see if they are who they say they are.

HashTab is a Windows application for computing hashes of files. It is an extension to Windows explorer, and appears as an additional tab in a file’s properties window. HashTab can be downloaded from here. HashTab will compute checksums with various hashing algorithms, and allows you to compare the hashes of two files.

alt text alt text alt text

On each of the above screenshots, we can see the MD5 checksums of each file. We see that the hashes are

aws.exe - D2778164EF643BA8F44CC202EC7EF157
NetLogo.exe - 59CB421172A89E1E16C11A428326952C
vlc.exe - 5416BE1B8B04B1681CB39CF0E2CAAD9F

We can now upload these hashes to Virustotal to check if they have been previously flagged as malicious.

alt text alt text

We see that Virustotal does not mark this file as malicious. The same result occurs for all three example files. Note that just because a file was not marked as malicious, doesn’t mean it is harmless. This just means that no vendors or sandboxes on Virustotal were able to find anything malicious. There are many ways for threat actors to hide the true purpose of their malware.

Magic Numbers

Just because a file tells you it is an executable doesn’t mean it really is one. Likewise, just because a file tells you it is a cute cat picture in JPG format, doesn’t mean it is not a malicious executable. It is easy for threat actors to spoof a file extension. The first few bytes in the hexdump of any file contains a series of magic bytes - identifying the true file type. The table below shows the magic bytes of some common file types:

Magic Bytes File Type File Extension
0x4D 0x5A DOS MZ executables .exe, .dll
0x7F 0x45 0x4C 0x46 Linux executables .elf
0x50 0x4B 0x03 0x04 ZIP archives .zip
0x52 0x61 0x72 0x21 0x1A 0x00 RAR archives .rar
0x89 0x50 0x4E 0x47 0x0D 0x0A 0x1A 0x0A PNG images .png
0x47 0x49 0x46 0x38 0x37 0x61 GIF images .gif
0xFF 0xD8 0xFF 0xEE JPG images .jpg

This page and this Wikipedia entry contain comprehensive lists of magic bytes and their corresponding file signatures.

So how do you check the magic bytes of a file? One of the methods by which the Linux file command determines file types is via magic numbers. From the file manual page:

Files have a “magic number” stored in a particular place near the beginning of the file that tells the UNIX operating system that the file is a binary executable, and which of several types thereof.

Moreover, one can manually view the hexdump of a file, and check its leading hex values. This can be achieved with the Linux xxd command, which generates the hexdump of a given file.

In the example below, we investigate the uname binary program on Linux. uname is a binary that will output system information - it should be an executable file.

┌──(kali㉿kali)-[/bin]
└─$ file uname                
uname: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=f8e1b819dde5bcc14373bbbbf5b8dc3e1d1dad7c, for GNU/Linux 3.2.0, stripped

┌──(kali㉿kali)-[/bin]
└─$ xxd uname | head -1
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............

When we use the file command on uname, we see that this is an ELF file. When we generate the hexdump of the file, the first few bytes are 7f45 4c46. Referring back to the table above, we see that these are indeed the ELF magic numbers.

Let’s see how threat actors can exploit magic bytes.

The file php_reverse_shell.php is exactly what it sounds like - it is a reverse shell payload written in PHP. This payload was written by PentestMonkey and the full source code can be found here.

Websites which allow arbitrary file uploads are vulnerable to this kind of payload. Many websites will filter uploads by checking both file extensions and magic numbers. Changing a file extension is easy - just change the name.

We will now see how to change magic numbers.

└─$ file php_reverse_shell.php
php_reverse_shell.php: PHP script, ASCII text

As we can see, the php_reverse_shell.php file is correctly recognised as a PHP file. We will use the Linux program hexeditor to add the JPG magic bytes to the header of this file.

alt text

Using the command hexeditor -b php_reverse_shell.php we can see the current hexdump in the above image. The -b flag instructs the program to load the entire hexdump into memory, so that we can edit it. CTRL+A will append a new hex byte to the beginning. We will use this to add the bytes FF D8 FF EE - the magic bytes for JPG files. The new hexdump can be seen in the image below.

alt text

Now that we have added some new magic bytes, we use file to check the file type.

┌──(kali㉿kali)-[~/Downloads]
└─$ file php_reverse_shell.php
php_reverse_shell.php: PHP script, ASCII text

┌──(kali㉿kali)-[~/Downloads]
└─$ hexeditor -b php_reverse_shell.php

┌──(kali㉿kali)-[~/Downloads]
└─$ file php_reverse_shell.php
php_reverse_shell.php: JPEG image data

The file command thinks that our malicious PHP script is a JPEG file! This technique can be used by threat actors to bypass file restrictions, and obfuscate the true nature of their malware. Malware analysts should always check the magic bytes of any suspicious files to uncover their true nature.

Packing

Packing is a method to prevent the decompiling of a program, often used by threat actors when making malware. Packing is a type of obfuscation - attempting to conceal the true purpose of a program. See my previous article for a brief overview of code obfuscation.

Packing is accomplished using a program called a packer. These will modify the format of code by compressing or encrypting it. Packing will change a file’s signature, so can be used to attempt evasion of signature-based detection. Packers will often leave small portions of code, known as stubs, which contain the decryption or decompression agent necessary to decrypt the file (the program must be decompressed/decrypted to actually be executable).

One notable example of packing was during the SolarWinds attack in 2020. APT29, a Russian Foreign Intelligence Service threat group, used packing in their Raindrop loaders. They used a custom packer to obfuscate Cobalt Strike payloads, using the LZMA compression algorithm. For more examples of packing in the wild, see MITRE Att&ck’s technique page for packing.

Packers come from the olden days of computing, when networks were not as fast as today and programs had to be compressed to be transmitted. These days, packers are not as necessary to transfer large files, so the usage of packers can be considered suspicious.

UPX (Ultimate Packer for eXecutables) is a file packer. It is open-source, and will compress files and obfuscate their content. See this article to learn how to unpack UPX-packed files. UPX uses a stub-payload architecture. This method of packing will compress the file, add a new header, and add a stub at the end for decompression purposes.

alt text

When a program needs to be executed, the unpacking algorithm will use the stub to unpack, and write the original unpacked executable in a new segment of the program.

alt text

So how can we figure out if a file has been packed?

  • Lack of imports

    Most executables require importing system libraries to interact with the operating system. For example, most Windows executables will import kernel32.dll and user31.dll. When examining ordinary unpacked programs you will find many system imports, but packed programs will have few. Moreover, the stub of a packed program doesn’t have much functionality other than unpacking and executing the payload, so will have a suspiciously low number of imports.

  • Non-standard section names

    Most executables will have similar section headers, depending on the exact format. For example, most executables will have sections labeled text, raw and data. Packers will commonly define their own custom section names. For example, UPX uses section names UPX0 and UPX1. These non-standard section names can be a telltale sign that a packer has been used.

  • High entropy

    Entropy is a measure of the ‘randomness’ of a string of text. Standard programming languages will have a low entropy given their strict structuring. However, encrypted and compressed strings are far less predictable, so will have high entropy levels. This Wikipedia page contains a good description of the entropy measure.

  • Lack of strings

    Ordinary executables will have many human-readable strings, for example HTTP POST and GET requests can easily be understood by us humans, and will show up when examining a file. When a program is packed, the encryption/compression will obfuscate these strings. If there are a low number of human readable strings in a program, there’s a good chance it has been packed.

  • Read-write-execute permissions

    The permissions of an executable can reveal vital information. Ordinary executables would usually only have execution permissions, as it is rare for a program to dynamically write to itself. However, packed programs must be unpacked into a section (using write permissions) before being executed. If a program has both read, write and execute permissions, it could have been passed through a packer.

Alternatively, there exist tools to detect if a given executable has been packed. PEiD is a tool for identifying the packer used on packed PE files (Portable Executable). It can detect over 470 different packer signatures. Beware that if a threat actor uses their own custom packer, like APT29 did as discussed above, PEiD will not be able to identify the packer.

In the below screenshot, we see PEiD used to identify a packer. For the given sample, we see that the FSG packer was used. To learn more about unpacking FSG, see this article.

alt text

Reverse engineering techniques can be used to determine if a file has been packed. IDA Freeware (Interactive DissAssembler) is a program for disassembling compiled files, by generating assembly code from machine code. It can be used to disassemble Windows PE, MAC and Linux ELF executables.

This example of a packed file has been taken from the Tryhackme Malware Analysis module. In the below screenshot, we see what IDA greets us with when opening a packed file.

alt text

If we check the imports tab, we see very few imports. As discussed above, this is unusual for ordinary executables so should raise our suspicions.

alt text

We also observe the flow of the program. The execution flowchart is fairly small. Ordinary executables will have a much more complicated program flow, as seen in the second image below. This is because the unobfuscated part of a packed file does not need to do much - it only needs to unpack the executable and execute.

alt text alt text

We have had a very brief overview of IDA Freeware. In the future, I plan on writing more articles on reverse engineering going much more in depth into this powerful tool.

Strings

Analysing the strings of a program can be a powerful tool for static analysis. Compiled programs will usually have human-readable strings in their executables. Some examples can be hard-coded passwords, IP addresses, and cryptocurrency wallet addresses (often found when analysing ransomware). Analysing these strings can help build up a picture of the behaviours of a program.

The infamous WannaCry ransomware attacks of 2017 were stopped by analysing strings. Whilst the attacks were ongoing, Marcus Hutchins, a British hacker, found a suspicious URL in the list of strings in the WannaCry program. Hutchins discovered that this URL functioned as a killswitch for the ransomware - as soon as the URL was registered, the attacks stopped. Hutchins was shot to fame and gained the attention of the whole cybersecurity community - as well as the FBI. After a week of partying during the DEFCON conference in August 2017, Hutchins was arrested by the FBI when trying to board his flight back to the UK. The FBI had discovered that years earlier, Hutchins was involved in creating his own rootkits for stealing people’s passwords and other personal data. To read more on Hutchin’s story, check out this Wired article.

In Linux, the strings command can be used for string analysis. On Windows, Microsoft’s Sysinternals has a strings program. strings will output a lot of garbage, so sifting through the output can be time consuming. Below is the output of strings when run on Tryhackme’s example.

alt text

The first thing we can see are section headers .text, .rdata, .reloc. Compiled executables contain different sections for holding textual data, variable values, memory addresses, etc.

alt text

As we scroll through the output, we find something more interesting. It appears that these are functions used in the executable.

alt text

Upon further inspection, we can see that this program is attempting to query some Windows registry entries. Analysing output like this can give a great indication into the internals of a program.

Conclusion

In this article, we have looked at some common types of malware, and some ways of understanding them. We have learnt how to get the MD5 checksums of malware, and how to determine file types. We looked at some basic reverse engineering. Each of these topics could have been an article in their own right. Reverse engineering is a massive topic, with entire books written on it. For a great book on malware analysis, check out Practical Malware Analysis by Michael Sikorski and Andrew Honig.

Ever since the inception of computers, malware has been an everlasting battle. Hackers create malware, and analysts dissect it to try prevent more damage. As long as there are hackers around, malware will always be a pressing issue, and cybersecurity professionals must always try stay on top.

TryHackMe Malware Analysis module

RedLine Stealer malware

Microsoft Component Object Model

Black Basta analysis by Trend Micro

VirusTotal upload

HashTab download

File signatures table

Wikipedia list of file signatures

PentestMonkey’s PHP reverse shell

Overview of code obfuscation

MITRE page on APT29

MITRE page on raindrop loaders

MITRE page on packing

UPX file packer

Unpacking UPX

Wikipedia page on entropy

PeID packer identifyer

Unpacking FSG

IDA Freeware

WannaCry ransomware attack

Marcus Hutchins’ Twitter

Wired article on Marcus Hutchins

Practical Malware Analysis