Check out The NetSPI Platform, our all-in-one proactive security solution.

Intro to Intel Pin

Dynamic Binary Instrumentation (DBI) is a technique for analyzing a running program by dynamically injecting analysis code. The added analysis code, or instrumentation code, is run in the context of the instrumented program with access to real, runtime values. DBI is a powerful technique since it does not require the source code for a program, as opposed to static analysis methods. In addition, it can instrument programs that generate code dynamically. To security researchers, DBI frameworks are invaluable tools as they allow for efficient ways to perform fuzzing, control flow analysis, and vulnerability detection with minimal overhead.

For this blog, I’ll explore Intel’s Pin tool and Linux system call hooking. Pin offers a comprehensive framework for creating pin tools to instrument at differing levels of granularity. You can find links to the Pin documentation in the references section. Also check out Gal Diskin’s slides from BlackHat for a more hands on overview of Pin’s functionality.

Identifying Linux System Calls

The main function of our pin tool example will be to intercept and identify the system calls made by a program. For reference, we can view the Linux x86_64 system call table here: https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/.

This table will help to identify the system calls by the mapped system call number.

One of the advantages of DBI is that we do not need the source code for analysis. For the sake of simplicity, the python script below will be our target for instrumentation. We know that it returns the response of a GET request to Google.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import urllib2
page = urllib2.urlopen("https://www.google.com").read()
import urllib2 page = urllib2.urlopen("https://www.google.com").read()
import urllib2
page = urllib2.urlopen("https://www.google.com").read()

We can use the strace tool to see the system calls made.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# strace python http.py
execve("/usr/bin/python", ["python", "http.py"], [/* 19 vars */]) = 0
[TRUNCATED]
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
sendto(3, "GET / HTTP/1.1\r\nAccept-Encoding:"..., 117, 0, NULL, 0) = 117
recvfrom(3, "HTTP/1.1 200 OK\r\nDate: Mon, 15 M"..., 8192, 0, NULL, NULL) = 1418
recvfrom(3, "d\"><meta content=\"@GoogleDoodles"..., 7422, 0, NULL, NULL) = 2836
recvfrom(3, "ocation,b=a.href.indexOf(\"#\");if"..., 4586, 0, NULL, NULL) = 4586
recvfrom(3, "b\" value=\"Google Search\" name=\"b"..., 8192, 0, NULL, NULL) = 3154
recvfrom(3, "", 5038, 0, NULL, NULL) = 0
recvfrom(3, "", 8192, 0, NULL, NULL) = 0
close(3) = 0
[TRUNCATED]
# strace python http.py execve("/usr/bin/python", ["python", "http.py"], [/* 19 vars */]) = 0 [TRUNCATED] socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3 sendto(3, "GET / HTTP/1.1\r\nAccept-Encoding:"..., 117, 0, NULL, 0) = 117 recvfrom(3, "HTTP/1.1 200 OK\r\nDate: Mon, 15 M"..., 8192, 0, NULL, NULL) = 1418 recvfrom(3, "d\"><meta content=\"@GoogleDoodles"..., 7422, 0, NULL, NULL) = 2836 recvfrom(3, "ocation,b=a.href.indexOf(\"#\");if"..., 4586, 0, NULL, NULL) = 4586 recvfrom(3, "b\" value=\"Google Search\" name=\"b"..., 8192, 0, NULL, NULL) = 3154 recvfrom(3, "", 5038, 0, NULL, NULL) = 0 recvfrom(3, "", 8192, 0, NULL, NULL) = 0 close(3) = 0 [TRUNCATED]
# strace python http.py
execve("/usr/bin/python", ["python", "http.py"], [/* 19 vars */]) = 0
[TRUNCATED]
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
sendto(3, "GET / HTTP/1.1\r\nAccept-Encoding:"..., 117, 0, NULL, 0) = 117
recvfrom(3, "HTTP/1.1 200 OK\r\nDate: Mon, 15 M"..., 8192, 0, NULL, NULL) = 1418
recvfrom(3, "d\"><meta content=\"@GoogleDoodles"..., 7422, 0, NULL, NULL) = 2836
recvfrom(3, "ocation,b=a.href.indexOf(\"#\");if"..., 4586, 0, NULL, NULL) = 4586
recvfrom(3, "b\" value=\"Google Search\" name=\"b"..., 8192, 0, NULL, NULL) = 3154
recvfrom(3, "", 5038, 0, NULL, NULL)    = 0
recvfrom(3, "", 8192, 0, NULL, NULL)    = 0
close(3)                                = 0
[TRUNCATED]

The strace output above gives us an abundance of information to work with, but we will focus on the system calls we want to intercept: sendto and recvfrom. These system calls are used to transmit messages to and from sockets. We can see the arguments provided to both of the system calls and we will try to read those same arguments with our pin tool.

Hooking sendto and recvfrom

The Pin API for system calls starts with two main functions: PIN_AddSyscallEntryFunction and PIN_AddSyscallExitFunction. These functions register callback functions for before and after the execution of the system call, respectively. The registered callback functions allow us to add instrumentation code before and after every system call is executed.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
PIN_AddSyscallEntryFunction(&syscallEntryCallback, NULL);
PIN_AddSyscallExitFunction(&syscallExitCallback, NULL);
PIN_AddSyscallEntryFunction(&syscallEntryCallback, NULL); PIN_AddSyscallExitFunction(&syscallExitCallback, NULL);
PIN_AddSyscallEntryFunction(&syscallEntryCallback, NULL);
PIN_AddSyscallExitFunction(&syscallExitCallback, NULL);

We can get the system call number with the PIN_GetSyscallNumber function. This function will get the system call number in the current context. Likewise, we can get the arguments for the current system call with PIN_GetSyscallArgument where ‘i’ is the ordinal number of the argument value.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
//sendto: 44, recvfrom: 45
PIN_GetSyscallNumber(ctxt, std);
PIN_GetSyscallArgument(ctxt, std, i);
//sendto: 44, recvfrom: 45 PIN_GetSyscallNumber(ctxt, std); PIN_GetSyscallArgument(ctxt, std, i);
//sendto: 44, recvfrom: 45
PIN_GetSyscallNumber(ctxt, std);
PIN_GetSyscallArgument(ctxt, std, i);

By referencing the man pages for our intercepted system calls we know that the second argument holds a pointer to a buffer containing the message contents to be sent or received. The third argument is the length of that buffer. Once we intercept our system call, we can read the value of the buffer with the code below.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
ADDRINT buf = PIN_GetSyscallArgument(ctxt, std, 1);
ADDRINT len = PIN_GetSyscallArgument(ctxt, std, 2);
int buflen = (int)len;
char *bufptr = (char *)buf;
for (int i = 0; i < buflen; i++, bufptr++) {
fprintf(stdout, "%c", *bufptr);
}
ADDRINT buf = PIN_GetSyscallArgument(ctxt, std, 1); ADDRINT len = PIN_GetSyscallArgument(ctxt, std, 2); int buflen = (int)len; char *bufptr = (char *)buf; for (int i = 0; i < buflen; i++, bufptr++) { fprintf(stdout, "%c", *bufptr); }
ADDRINT buf = PIN_GetSyscallArgument(ctxt, std, 1);
ADDRINT len = PIN_GetSyscallArgument(ctxt, std, 2);
int buflen = (int)len;
char *bufptr = (char *)buf;
for (int i = 0; i < buflen; i++, bufptr++) {
    fprintf(stdout, "%c", *bufptr);
}

The buffer pointer is our starting point and we walk “byte-by-byte” dereferencing the buffer pointer to read the value at each point until we hit the end length. Putting it all together, we can see some of the results below.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
#../../../pin -t obj-intel64/syscalltest.so -- python http.py
call PIN_AddSyscallEntryFunction
call PIN_AddSyscallExitFunction
call PIN_StartProgram()
[TRUNCATED]
systemcall sendto: 44
buffer start: 0x7ff81ef26eb4
length: 117
GET / HTTP/1.1
Accept-Encoding: identity
Host: www.google.com
Connection: close
User-Agent: Python-urllib/2.7
[TRUNCATED]
systemcall recvfrom: 45
buffer start: 0x5644e5db7934
length: 8192
emtype="https://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script>
[TRUNCATED]
#../../../pin -t obj-intel64/syscalltest.so -- python http.py call PIN_AddSyscallEntryFunction call PIN_AddSyscallExitFunction call PIN_StartProgram() [TRUNCATED] systemcall sendto: 44 buffer start: 0x7ff81ef26eb4 length: 117 GET / HTTP/1.1 Accept-Encoding: identity Host: www.google.com Connection: close User-Agent: Python-urllib/2.7 [TRUNCATED] systemcall recvfrom: 45 buffer start: 0x5644e5db7934 length: 8192 emtype="https://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script> [TRUNCATED]
#../../../pin -t obj-intel64/syscalltest.so -- python http.py
call PIN_AddSyscallEntryFunction
call PIN_AddSyscallExitFunction
call PIN_StartProgram()
[TRUNCATED]
systemcall sendto: 44
buffer start: 0x7ff81ef26eb4
length: 117
GET / HTTP/1.1
Accept-Encoding: identity
Host: www.google.com
Connection: close
User-Agent: Python-urllib/2.7
[TRUNCATED]
systemcall recvfrom: 45
buffer start: 0x5644e5db7934
length: 8192
emtype="https://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script>
[TRUNCATED]

The output of the example is far from clean but it does contain the information we want to intercept, the GET request and response. We can identify the system calls associated with network communications and even see the values of the arguments passed back and forth. Imagine if our binary from before sent login credentials in a GET request. We can retrieve that information.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
systemcall sendto: 44
buffer start: 0x7f3b3dcf61c4
length: 146
GET /login?user=admin&pass=badpass HTTP/1.1
Accept-Encoding: identity
Host: www.notarealhost.com
Connection: close
User-Agent: Python-urllib/2.7
systemcall sendto: 44 buffer start: 0x7f3b3dcf61c4 length: 146 GET /login?user=admin&pass=badpass HTTP/1.1 Accept-Encoding: identity Host: www.notarealhost.com Connection: close User-Agent: Python-urllib/2.7
systemcall sendto: 44
buffer start: 0x7f3b3dcf61c4
length: 146
GET /login?user=admin&pass=badpass HTTP/1.1
Accept-Encoding: identity
Host: www.notarealhost.com
Connection: close
User-Agent: Python-urllib/2.7

This example only scrapes the surface of the functionality that the Pin framework has to offer. In the future, I hope to create more complex tools for fuzzing.

You can find the example code at https://github.com/NetSPI/Pin.

References