stkof: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=4872b087443d1e52ce720d0a4007b1920f18e7b0, stripped

  • Partial RELRO
  • Canary found
  • NX enabled
  • No PIE
  • No RPATH

Out of all the HITCON CTF 2014 challenges I worked on this was my favorite one. I wasn't able to submit it in time and I could make up excuses as to why but "late is late" so it doesn't matter. The binary is pretty simple, it can be discribed as an allocator or something like that.
Basically there is a very large array of pointers that I call bag in the .bss and the binary implements functions that operate on this bag:

  • alloc(size): this function malloc()'s size bytes and puts the pointer in bag it then returns a number indicating the index into the bag where that pointer can be found.

  • dealloc(idx): this function calls free() on the pointer found at bag[idx].

  • read_in(idx, data): this function reads the data into the pointer found at bag[idx]. There is no limit to the size of data.

  • wtf(idx): this not-so-useful-function checks the strlen() of bag[idx] if its less than or equal to 3, it prints "//TODO" otherwise it prints "...".

Obviously we see that there is a heap-based buffer overflow. We can alloc(n) and then read_in(idx, bigdata) with the length of bigdata > n.
At this point I recalled some rumors I had heard about the new version of malloc and how it was unexploitable. I thought that this challenge was a way to slow down people since it was "unexploitable" so lazy person that I am, I decided to look at some other challenges (like web and guessing ones, of which I couldn't solve a single one).

I was minding other things when around 5.5 hours before the end of the CTF I noticed that PPP had solved this challenge which was just what I was waiting for.

An arbitrary write primitive

I worked on this challenge with @antoniob who helped me browse through malloc.c (note that this may not be the exact code used on Ubuntu 14.04 but it's pretty close). Before I started to read the source code describing the implementation of malloc I did some simple tests and fuzzing to make the binary crash. For example allocate two 8-byte chunks overflow the first one and the free the first or second one. With these tests I triggered a number of errors like "double free or corruption (out)" etc. These seemed to suggest that the error was happening somehwere in _int_free() so if it wasn't clear before that I was corrupting memory now it was.

Remembering the old school heap-based buffer overflows we decided to look at the source code to see where and how the unlink() was happening:

#define unlink(P, BK, FD) {                                    \
    FD = P->fd;                                                            \
    BK = P->bk;                                                            \
    if (__builtin_expect (FD->bk != P || BK->fd != P, 0))                  \
      malloc_printerr (check_action, "corrupted double-linked list", P);   \
    else {                                                                 \
        FD->bk = BK;                                                       \
        BK->fd = FD; [*]                                                   \
        if (!in_smallbin_range (P->size)                                   \
            && __builtin_expect (P->fd_nextsize != NULL, 0)) {             \
            assert (P->fd_nextsize->bk_nextsize == P);                     \
            assert (P->bk_nextsize->fd_nextsize == P);                     \
            if (FD->fd_nextsize == NULL) {                                 \
                if (P->fd_nextsize == P)                                   \
                  FD->fd_nextsize = FD->bk_nextsize = FD;                  \
                else {                                                     \
                    FD->fd_nextsize = P->fd_nextsize;                      \
                    FD->bk_nextsize = P->bk_nextsize;                      \
                    P->fd_nextsize->bk_nextsize = FD;                      \
                    P->bk_nextsize->fd_nextsize = FD;                      \
                  }                                                        \
              } else {                                                     \
                P->fd_nextsize->bk_nextsize = P->bk_nextsize;              \
                P->bk_nextsize->fd_nextsize = P->fd_nextsize;              \
              }                                                            \
          }                                                                \
      }                                                                    \

The following figure illustrates the above code:

Due to the check (FD->bk != P || BK->fd != P, 0) we can only write to a location that is "pointing to us". In other words, if we can find a pointer X to P and set FD and BK to it, we can put overwrite P with X. This happens with:

FD->bk = BK;  
BK->fd = FD;  

Of course this seems pretty useless but if you consider what the binary is doing, we can actually leverage this to an arbitrary write. Remember the bag ? As stated earlier that bag contains a list of pointers that were returned by malloc() and since the bag is at a static location (in the .bss), we can easily get a pointer to P and using the heap overflow, we will set FD and BK to that. In the end we will end up with the bag containing a pointer to itself (or somewhere in the bag).
Now we can call readin(idxb, ptrto). If bag[idxb] is a pointer to somewhere in the bag this means that ptr_to will be written somewhere in the bag. Finally we can call readin(idxpto, what) which will write what to ptrto given that now bag[idxpto] == ptrto.
Note that everything after [*] we can ignore because we can skip the remaining code by making sure P->fd_nextsize is set to NULL.

In a nutshell to obtain the arbitrary write primitive:

  • Allocate 2 buffers.
  • Overflow the first buffer into the second buffer.
  • Setup the data structures so that free() doesn't error out and calls unlink() with our wanted values.
  • Free the 2nd buffer.
  • Call readin(idxb, ptrto) to put the destination pointer (a GOT address) in the bag and also put our ROP chain (which we will pivot to later) in the bag.
  • Call readin(idpto, what) to write to ptrto.

Out of all these steps the most difficult one is obviously the 3rd one. The free()'ing mechanism contains a good number of checks before arriving to the unlink(). If you want to learn more about these checks, check out the source

Exploiting the service

After doing this funky stuff and obtaining the arbitrary write primitive I had to get code execution in order to read the flag. There were a few things I had to do in order to acheive this. First I had to find a way to pivot the stack to the bag where my ROP chain was stored. I couldn't find any straightforward way to do this so I came up with the following idea after looking at the disassembly.

.text:0000000000400B07                 push    rbp
.text:0000000000400B08                 mov     rbp, rsp
.text:0000000000400B0B                 add     rsp, 0FFFFFFFFFFFFFF80h
.text:0000000000400B0F                 mov     rax, fs:28h
.text:0000000000400B18                 mov     [rbp+var_8], rax
.text:0000000000400B1C                 xor     eax, eax
.text:0000000000400B1E                 mov     rdx, cs:stdin   ; stream
.text:0000000000400B25                 lea     rax, [rbp+tbuf]
.text:0000000000400B29                 mov     esi, 10h        ; n
.text:0000000000400B2E                 mov     rdi, rax        ; s
.text:0000000000400B31                 call    _fgets
.text:0000000000400B36                 lea     rax, [rbp+tbuf]
.text:0000000000400B3A                 mov     rdi, rax        ; nptr
.text:0000000000400B3D                 call    _atol

After the fgets(), rsp is pointing to 24 bytes before the beginning tbuf which contains our data. Therefore, overwriting the GOT entry of atol() with a pointer to a pop; pop; pop; ret will allow us to have a 16 byte temporary ROP chain since tbuf points to data read by the fgets(). We then use this ROP chain to pivot the stack by doing a pop rsp; ret to pivot to the location in bag where our "real" ROP chain is.

After that it is a standard ROP chain that leaks a libc pointer through the GOT and then computes the address of system and finally calls system.
The flag: HITCON{ASZ0_H4eP_0VerFlOw_317H_sTrAn9e}. The full exploit can be found here