Writing Your own Libc
Note: This code was taken from Linux’s nolibc: https://github.com/torvalds/linux/tree/master/tools/include/nolibc. Check it out to learn more about implementing libc!
What are System Calls?
Let’s write some libc functions.
Libc is C’s standard library, which implements a group of functions
that can be used by all C programs. Libc provides wrappers for OS level
constructs, like printf
, open
,
puts
, and so on.
There’s two parts to libc:
- functions like
max
, orislower
, which don’t require any system calls. - functions that do require system calls, like
write
,read
, oropen
.
Functions in the first category can be implemented without calling into the OS.
For example, max
would look like this:
int max(int a, int b) {
return a > b ? a : b;
}
or islower
:
int islower(int c) {
return (c >= 'a' && c <= 'z') ? 1 : 0;
}
However, when writing our own write
or read
or open
, we hit a roadblock:
int open(const char* path, int flags, ...) {
// how do I open a file???
}
Open is a system call that needs to manipulate hardware; we need to ask the OS to do the action for us before being able to read and/or write to the file.
The OS supports an interface to facilitate that, called
system calls
. These system calls allow us to request the OS
to do something on our behalf. These calls normally manipuluate hardware
in some fashion, or have to do with processes.
So open
might look like this:
int open(const char* path, int flags, ...) {
return system_call(SYSCALL_OPEN, path, flags, ...);
}
And we defer to the OS, which takes care of everything for us.
That leaves the question: what should our system_call
function look like? And what is SYSCALL_OPEN
?
System Call Numbers
System calls take as their first argument a number, which indicates
what system call the OS should execute. The OS then reads the first
argument from the system_call
function, looksup which
system call it corresponds to, and the remaining arguments, and executes
that system call.
Let’s say that we call open, which is the number 2:
#define SYSCALL_OPEN 2
(SYSCALL_OPEN, path, flags, ...); system_call
The OS will then take that number and run the desired code.
#define SYSCALL_OPEN 2
int system_call(SYSCALL syscall, ...) {
switch (syscall) {
case SYSCALL_OPEN:
// run code to open up a file in the hardware
break;
default:
break;
}
}
We now need a correct list of system calls. Imagine if we thought
SYSCALL_OPEN
was 3
, but the OS thought
3
means close
:
The computer could crash, our process could crash, anything could happen.
#define SYSCALL_OPEN 3
(SYSCALL_OPEN, path, flags, ...); system_call
#define SYSCALL_OPEN 2
#define SYSCALL_CLOSE 3
int system_call(SYSCALL syscall, ...) {
switch (syscall) {
case SYSCALL_OPEN:
// run code to open up a file in the hardware
break;
case SYSCALL_CLOSE:
// run code to close a file
// oops, we called the wrong function
break;
default:
break;
}
}
So we need to get that list of system calls:
You can find the system calls in your linux system with
ausyscall
:
$ ausyscall --dump
Using x86_64 syscall table:
0 read
1 write
2 open
3 close
...
This differs per architecture:
for example, on aarch64 (ARM 64 bit):
$ ausyscall aarch64 --dump
Using aarch64 syscall table:
0 io_setup
1 io_destroy
2 io_submit
3 io_cancel
We could define these ourselves, or rely on the system to export them
at asm/unistd.h
. I’m going to include it instead of
rewriting it.
#include <asm/unistd.h>
Writing system calls (with assembly)
Now that we have the system call number we need, let’s write that syscall!
We need to know a few things:
- What the system call function is called, so we can tell the OS we’re running a system call
- Where to put the system call number
- Where to put any other arguments required (
open
needs a file to open, for example) - Where our return values are placed.
What better way to learn than through the manual pages: run
man 2 syscall
in the shell to learn more:
First, there’s a table that tells us what a system call is called in
the instruction
column. For x86-64, it is called
syscall
. Next, the table tells us the register to put the
system call number. In x86-64, it is rax
. Finally, the
registers to check for return values, which in x86-64 are
rax
and rdx
, and the register to check for
errors (in x86-64, no registers store errors after a system call).
Arch/ABI Instruction System Ret Ret Error Notes
call # val val2
───────────────────────────────────────────────────────────────────
alpha callsys v0 v0 a4 a3 1, 6
arc trap0 r8 r0 - -
arm/OABI swi NR - r0 - - 2
arm/EABI swi 0x0 r7 r0 r1 -
arm64 svc #0 w8 x0 x1 -
blackfin excpt 0x0 P0 R0 - -
i386 int $0x80 eax eax edx -
ia64 break 0x100000 r15 r8 r9 r10 1, 6
m68k trap #0 d0 d0 - -
microblaze brki r14,8 r12 r3 - -
mips syscall v0 v0 v1 a3 1, 6
nios2 trap r2 r2 - r7
parisc ble 0x100(%sr2, %r0) r20 r28 - -
powerpc sc r0 r3 - r0 1
powerpc64 sc r0 r3 - cr0.SO 1
riscv ecall a7 a0 a1 -
s390 svc 0 r1 r2 r3 - 3
s390x svc 0 r1 r2 r3 - 3
superh trapa #31 r3 r0 r1 - 4, 6
sparc/32 t 0x10 g1 o0 o1 psr/csr 1, 6
sparc/64 t 0x6d g1 o0 o1 psr/csr 1, 6
tile swint1 R10 R00 - R01 1
x86-64 syscall rax rax rdx - 5
x32 syscall rax rax rdx - 5
xtensa syscall a2 a2 - -
Later down the page, there’s another table that shows where arguments go.
For x86-64, rdi
rsi
rdx
r10
r8
r9
are the registers to
put arguments in order, with rax
being the system call
number.
An interesting thing to note: mips/o32
here only
supports 4 arguments in registers. That doesn’t necessarily mean it only
supports system calls with 4 or less arguments – arguments 5 through 8
are placed on the stack and read when the system call instruction is
executed.
Arch/ABI arg1 arg2 arg3 arg4 arg5 arg6 arg7 Notes
──────────────────────────────────────────────────────────────
alpha a0 a1 a2 a3 a4 a5 -
arc r0 r1 r2 r3 r4 r5 -
arm/OABI r0 r1 r2 r3 r4 r5 r6
arm/EABI r0 r1 r2 r3 r4 r5 r6
arm64 x0 x1 x2 x3 x4 x5 -
blackfin R0 R1 R2 R3 R4 R5 -
i386 ebx ecx edx esi edi ebp -
ia64 out0 out1 out2 out3 out4 out5 -
m68k d1 d2 d3 d4 d5 a0 -
microblaze r5 r6 r7 r8 r9 r10 -
mips/o32 a0 a1 a2 a3 - - - 1
mips/n32,64 a0 a1 a2 a3 a4 a5 -
nios2 r4 r5 r6 r7 r8 r9 -
parisc r26 r25 r24 r23 r22 r21 -
powerpc r3 r4 r5 r6 r7 r8 r9
powerpc64 r3 r4 r5 r6 r7 r8 -
riscv a0 a1 a2 a3 a4 a5 -
s390 r2 r3 r4 r5 r6 r7 -
s390x r2 r3 r4 r5 r6 r7 -
superh r4 r5 r6 r7 r0 r1 r2
sparc/32 o0 o1 o2 o3 o4 o5 -
sparc/64 o0 o1 o2 o3 o4 o5 -
tile R00 R01 R02 R03 R04 R05 -
x86-64 rdi rsi rdx r10 r8 r9 -
x32 rdi rsi rdx r10 r8 r9 -
xtensa a6 a3 a4 a5 a8 a9 -
With all that information out of the way, we want to write a function that does the following:
- Write out the system call instruction in assembly
(
syscall
for x86). - Set the
rax
register to the system call we want to call. - Read the register has the return value and return it.
On x86, the registers rcx
, r11
,
cc
, and memory
are clobbered by the syscall,
so our assembly call must include them in the last line of the syscall
instruction. The last line notes the clobbered registers that are
overwritten by the OS, as well as other directives.
For an explanation why rcx
and r11
are
clobbered.
rcx
rcx
is clobbered to store the address of the next instruction to return to.
r11
r11
is clobbered to store the value of the rflags register.
And for cc
and memory
, from: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Extended-Asm
cc
The
cc
clobber indicates that the assembler code modifies the flags register. On some machines, GCC represents the condition codes as a specific hardware register; “cc” serves to name this register. On other machines, condition code handling is different, and specifying “cc” has no effect. But it is valid no matter what the target.
memory
The
memory
clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the “memory” clobber effectively forms a read/write memory barrier for the compiler.
Note that this clobber does not prevent the processor from doing speculative reads past the asm statement. To prevent that, you need processor-specific fence instructions.
So on x86, a system call will look like:
#define syscall0(num) \
({ \
long _ret; \
register long _num asm("rax") = (num); \
\
asm volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
For the Arm64 (aarch64) version:
#define syscall0(num) \
({ \
register long _num asm("x8") = (num); \
register long _arg1 asm("x0"); \
\
asm volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
Writing a C Function that makes a system call
Let’s write our first libc function, getpid
.
getpid
returns the pid
of the current
process.
It has a signature of pid_t getpid(void);
, where
pid_t
is int
.
All we have to do is to make the right system call to the OS and return it.
typedef int pid_t;
(void) {
pid_t getpidreturn syscall0(__NR_getpid);
}
This should return your pid, and we’re done creating a libc function that calls into the kernel.
Putting it all together
To put it all together, we need to write the _start
function of our program, since we are going to link to our own libc.
For x86, that means adding this code to the top of your file:
(".section .text\n"
asm".weak _start\n"
".global _start\n"
"_start:\n"
"pop %rdi\n" // argc (first arg, %rdi)
"mov %rsp, %rsi\n" // argv[] (second arg, %rsi)
"lea 8(%rsi,%rdi,8),%rdx\n" // then a NULL then envp (third arg, %rdx)
"xor %ebp, %ebp\n" // zero the stack frame
"and $-16, %rsp\n" // x86 ABI : esp must be 16-byte aligned before call
"call main\n" // main() returns the status code, we'll exit with it.
"mov %eax, %edi\n" // retrieve exit code (32 bit)
"mov $60, %eax\n" // NR_exit == 60
"syscall\n" // really exit
"hlt\n" // ensure it does not return
"");
This sets up everything main needs to run.
For ARM64 (aarch64):
(".section .text\n"
asm".weak _start\n"
".global _start\n"
"_start:\n"
"ldr x0, [sp]\n" // argc (x0) was in the stack
"add x1, sp, 8\n" // argv (x1) = sp
"lsl x2, x0, 3\n" // envp (x2) = 8*argc ...
"add x2, x2, 8\n" // + 8 (skip null)
"add x2, x2, x1\n" // + argv
"and sp, x1, -16\n" // sp must be 16-byte aligned in the callee
"bl main\n" // main() returns the status code, we'll exit with it.
"mov x8, 93\n" // NR_exit == 93
"svc #0\n"
"");
And aggregating that together:
For x86:
#include <asm/unistd.h>
#define syscall0(num) \
({ \
long _ret; \
register long _num asm("rax") = (num); \
\
asm volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
(".section .text\n"
asm".weak _start\n"
".global _start\n"
"_start:\n"
"pop %rdi\n" // argc (first arg, %rdi)
"mov %rsp, %rsi\n" // argv[] (second arg, %rsi)
"lea 8(%rsi,%rdi,8),%rdx\n" // then a NULL then envp (third arg, %rdx)
"xor %ebp, %ebp\n" // zero the stack frame
"and $-16, %rsp\n" // x86 ABI : esp must be 16-byte aligned before call
"call main\n" // main() returns the status code, we'll exit with it.
"mov %eax, %edi\n" // retrieve exit code (32 bit)
"mov $60, %eax\n" // NR_exit == 60
"syscall\n" // really exit
"hlt\n" // ensure it does not return
"");
typedef int pid_t;
(void) {
pid_t getpidreturn syscall0(__NR_getpid);
}
int main() {
return getpid();
}
For ARM64 (aarch64):
#define syscall0(num) \
({ \
register long _num asm("x8") = (num); \
register long _arg1 asm("x0"); \
\
asm volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
(".section .text\n"
asm".weak _start\n"
".global _start\n"
"_start:\n"
"ldr x0, [sp]\n" // argc (x0) was in the stack
"add x1, sp, 8\n" // argv (x1) = sp
"lsl x2, x0, 3\n" // envp (x2) = 8*argc ...
"add x2, x2, 8\n" // + 8 (skip null)
"add x2, x2, x1\n" // + argv
"and sp, x1, -16\n" // sp must be 16-byte aligned in the callee
"bl main\n" // main() returns the status code, we'll exit with it.
"mov x8, 93\n" // NR_exit == 93
"svc #0\n"
"");
typedef int pid_t;
(void) {
pid_t getpidreturn syscall0(__NR_getpid);
}
int main() {
return getpid();
}
Now to compile this program, we can’t link to libc, so, assuming the
c file is called main.c
$ gcc -static -lgcc -nostdlib -g main.c -o main
to compile it, and run it:
$ ./main
Grab the exit status of the binary:
$? # some random number
And we’re done with implementing getpid
!
Reading and writing to files
getpid
is a fine starting function, but we want to be
able to read and write to files.
Let’s start by defining the system calls that take 1 argument to 3 arguments:
In x86-64:
#define syscall1(num, arg1) \
({ \
long _ret; \
register long _num asm("rax") = (num); \
register long _arg1 asm("rdi") = (long)(arg1); \
\
asm volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "r"(_arg1), \
"0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
#define syscall2(num, arg1, arg2) \
({ \
long _ret; \
register long _num asm("rax") = (num); \
register long _arg1 asm("rdi") = (long)(arg1); \
register long _arg2 asm("rsi") = (long)(arg2); \
\
asm volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "r"(_arg1), "r"(_arg2), \
"0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
#define syscall3(num, arg1, arg2, arg3) \
({ \
long _ret; \
register long _num asm("rax") = (num); \
register long _arg1 asm("rdi") = (long)(arg1); \
register long _arg2 asm("rsi") = (long)(arg2); \
register long _arg3 asm("rdx") = (long)(arg3); \
\
asm volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), \
"0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
In ARM64 (aarch64):
#define syscall1(num, arg1) \
({ \
register long _num asm("x8") = (num); \
register long _arg1 asm("x0") = (long)(arg1); \
\
asm volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_arg1), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define syscall2(num, arg1, arg2) \
({ \
register long _num asm("x8") = (num); \
register long _arg1 asm("x0") = (long)(arg1); \
register long _arg2 asm("x1") = (long)(arg2); \
\
asm volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_arg1), "r"(_arg2), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define syscall3(num, arg1, arg2, arg3) \
({ \
register long _num asm("x8") = (num); \
register long _arg1 asm("x0") = (long)(arg1); \
register long _arg2 asm("x1") = (long)(arg2); \
register long _arg3 asm("x2") = (long)(arg3); \
\
asm volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
Next, some definitions that the libc functions will use:
typedef int pid_t;
typedef int mode_t;
typedef int ssize_t;
typedef unsigned long long size_t;
#define STDIN_FILENO 0
#define STDOUT_FILENO 1
#define STDERR_FILENO 2
And flags for calls to open
:
For x86-64:
#define O_RDONLY 0
#define O_WRONLY 1
#define O_RDWR 2
#define O_CREAT 0x40
#define O_EXCL 0x80
#define O_NOCTTY 0x100
#define O_TRUNC 0x200
#define O_APPEND 0x400
#define O_NONBLOCK 0x800
#define O_DIRECTORY 0x10000
For Arm64 (aarch64):
#define O_RDONLY 0
#define O_WRONLY 1
#define O_RDWR 2
#define O_CREAT 0x40
#define O_EXCL 0x80
#define O_NOCTTY 0x100
#define O_TRUNC 0x200
#define O_APPEND 0x400
#define O_NONBLOCK 0x800
#define O_DIRECTORY 0x4000
Finally, let’s define the functions we’ll use:
ssize_t close(int fd) {
return syscall1(__NR_close, fd);
}
int fsync(int fd) {
return syscall1(__NR_fsync, fd);
}
ssize_t read(int fd, void *buf, size_t count)
{
return syscall3(__NR_read, fd, buf, count);
}
int open(const char *path, int flags, mode_t mode) {
return syscall3(__NR_open, path, flags, mode);
}
ssize_t write(int fd, const void *buf, size_t count) {
return syscall3(__NR_write, fd, buf, count);
}
And a helper function, strlen
:
size_t strlen(const char *str) {
size_t len;
for (len = 0; str[len]; len++)
("");
asmreturn len;
}
Finally, we can start writing a main function that uses this code:
int main() {
const char* text = "hello world\n"; // text to write
(STDOUT_FILENO, text, strlen(text)); // write text to stdout
write(STDOUT_FILENO); // flush stdout
fsync
const char* file_text = "Hello from file"; // text to write to file
int fd = open("hello.txt", O_CREAT | O_TRUNC | O_RDWR, 0666); // open, truncate, create file hello.txt
(fd, file_text, strlen(file_text)); // write the file text to file
write(fd); // flush hello.txt
fsync(fd); // close hello.txt
close
= open("hello.txt", O_RDONLY, 0666); // open the file hello.txt for reading
fd
char read_from_file[strlen(file_text) + 1]; // the buffer to read into
(fd, (void *)read_from_file, strlen(file_text)); // read from hello.txt to the buffer
read[strlen(file_text)] = '\n'; // add a new line to the buffer
read_from_file(STDOUT_FILENO, read_from_file, strlen(read_from_file)); // write the buffer to stdout
write(STDOUT_FILENO); // flush stdout
fsync(fd); // close hello.txt
close}
After compiling it as above, you should have the following text:
hello world
Hello from file
With hello.txt
containing
Hello from file
.
Writing some simple libc code isn’t that hard!