Date: 2023-02-16T16:05:35-05:00
Note: This code was taken from Linux’s nolibc: https://github.com/torvalds/linux/tree/master/tools/include/nolibc. Check it out to learn more about implementing libc!
Let’s write some libc functions.
Libc is C’s standard library, which implements a group of functions that can be used by all C programs. Libc provides wrappers for OS level constructs, like printf
, open
, puts
, and so on.
There’s two parts to libc:
max
, or islower
, which don’t require any system calls.write
, read
, or open
.Functions in the first category can be implemented without calling into the OS.
For example, max
would look like this:
int max(int a, int b) {
return a > b ? a : b;
}
or islower
:
int islower(int c) {
return (c >= 'a' && c <= 'z') ? 1 : 0;
}
However, when writing our own write
or read
or open
, we hit a roadblock:
int open(const char* path, int flags, ...) {
// how do I open a file???
}
Open is a system call that needs to manipulate hardware; we need to ask the OS to do the action for us before being able to read and/or write to the file.
The OS supports an interface to facilitate that, called system calls
. These system calls allow us to request the OS to do something on our behalf. These calls normally manipuluate hardware in some fashion, or have to do with processes.
So open
might look like this:
int open(const char* path, int flags, ...) {
return system_call(SYSCALL_OPEN, path, flags, ...);
}
And we defer to the OS, which takes care of everything for us.
That leaves the question: what should our system_call
function look like? And what is SYSCALL_OPEN
?
System calls take as their first argument a number, which indicates what system call the OS should execute. The OS then reads the first argument from the system_call
function, looksup which system call it corresponds to, and the remaining arguments, and executes that system call.
Let’s say that we call open, which is the number 2:
#define SYSCALL_OPEN 2
(SYSCALL_OPEN, path, flags, ...); system_call
The OS will then take that number and run the desired code.
#define SYSCALL_OPEN 2
int system_call(SYSCALL syscall, ...) {
switch (syscall) {
case SYSCALL_OPEN:
// run code to open up a file in the hardware
break;
default:
break;
}
}
We now need a correct list of system calls. Imagine if we thought SYSCALL_OPEN
was 3
, but the OS thought 3
means close
:
The computer could crash, our process could crash, anything could happen.
#define SYSCALL_OPEN 3
(SYSCALL_OPEN, path, flags, ...); system_call
#define SYSCALL_OPEN 2
#define SYSCALL_CLOSE 3
int system_call(SYSCALL syscall, ...) {
switch (syscall) {
case SYSCALL_OPEN:
// run code to open up a file in the hardware
break;
case SYSCALL_CLOSE:
// run code to close a file
// oops, we called the wrong function
break;
default:
break;
}
}
So we need to get that list of system calls:
You can find the system calls in your linux system with ausyscall
:
$ ausyscall --dump
Using x86_64 syscall table:
0 read
1 write
2 open
3 close
...
This differs per architecture:
for example, on aarch64 (ARM 64 bit):
$ ausyscall aarch64 --dump
Using aarch64 syscall table:
0 io_setup
1 io_destroy
2 io_submit
3 io_cancel
We could define these ourselves, or rely on the system to export them at asm/unistd.h
. I’m going to include it instead of rewriting it.
#include <asm/unistd.h>
Now that we have the system call number we need, let’s write that syscall!
We need to know a few things:
open
needs a file to open, for example)What better way to learn than through the manual pages: run man 2 syscall
in the shell to learn more:
First, there’s a table that tells us what a system call is called in the instruction
column. For x86-64, it is called syscall
. Next, the table tells us the register to put the system call number. In x86-64, it is rax
. Finally, the registers to check for return values, which in x86-64 are rax
and rdx
, and the register to check for errors (in x86-64, no registers store errors after a system call).
Arch/ABI Instruction System Ret Ret Error Notes
call # val val2
───────────────────────────────────────────────────────────────────
alpha callsys v0 v0 a4 a3 1, 6
arc trap0 r8 r0 - -
arm/OABI swi NR - r0 - - 2
arm/EABI swi 0x0 r7 r0 r1 -
arm64 svc #0 w8 x0 x1 -
blackfin excpt 0x0 P0 R0 - -
i386 int $0x80 eax eax edx -
ia64 break 0x100000 r15 r8 r9 r10 1, 6
m68k trap #0 d0 d0 - -
microblaze brki r14,8 r12 r3 - -
mips syscall v0 v0 v1 a3 1, 6
nios2 trap r2 r2 - r7
parisc ble 0x100(%sr2, %r0) r20 r28 - -
powerpc sc r0 r3 - r0 1
powerpc64 sc r0 r3 - cr0.SO 1
riscv ecall a7 a0 a1 -
s390 svc 0 r1 r2 r3 - 3
s390x svc 0 r1 r2 r3 - 3
superh trapa #31 r3 r0 r1 - 4, 6
sparc/32 t 0x10 g1 o0 o1 psr/csr 1, 6
sparc/64 t 0x6d g1 o0 o1 psr/csr 1, 6
tile swint1 R10 R00 - R01 1
x86-64 syscall rax rax rdx - 5
x32 syscall rax rax rdx - 5
xtensa syscall a2 a2 - -
Later down the page, there’s another table that shows where arguments go.
For x86-64, rdi
rsi
rdx
r10
r8
r9
are the registers to put arguments in order, with rax
being the system call number.
An interesting thing to note: mips/o32
here only supports 4 arguments in registers. That doesn’t necessarily mean it only supports system calls with 4 or less arguments – arguments 5 through 8 are placed on the stack and read when the system call instruction is executed.
Arch/ABI arg1 arg2 arg3 arg4 arg5 arg6 arg7 Notes
──────────────────────────────────────────────────────────────
alpha a0 a1 a2 a3 a4 a5 -
arc r0 r1 r2 r3 r4 r5 -
arm/OABI r0 r1 r2 r3 r4 r5 r6
arm/EABI r0 r1 r2 r3 r4 r5 r6
arm64 x0 x1 x2 x3 x4 x5 -
blackfin R0 R1 R2 R3 R4 R5 -
i386 ebx ecx edx esi edi ebp -
ia64 out0 out1 out2 out3 out4 out5 -
m68k d1 d2 d3 d4 d5 a0 -
microblaze r5 r6 r7 r8 r9 r10 -
mips/o32 a0 a1 a2 a3 - - - 1
mips/n32,64 a0 a1 a2 a3 a4 a5 -
nios2 r4 r5 r6 r7 r8 r9 -
parisc r26 r25 r24 r23 r22 r21 -
powerpc r3 r4 r5 r6 r7 r8 r9
powerpc64 r3 r4 r5 r6 r7 r8 -
riscv a0 a1 a2 a3 a4 a5 -
s390 r2 r3 r4 r5 r6 r7 -
s390x r2 r3 r4 r5 r6 r7 -
superh r4 r5 r6 r7 r0 r1 r2
sparc/32 o0 o1 o2 o3 o4 o5 -
sparc/64 o0 o1 o2 o3 o4 o5 -
tile R00 R01 R02 R03 R04 R05 -
x86-64 rdi rsi rdx r10 r8 r9 -
x32 rdi rsi rdx r10 r8 r9 -
xtensa a6 a3 a4 a5 a8 a9 -
With all that information out of the way, we want to write a function that does the following:
syscall
for x86).rax
register to the system call we want to call.On x86, the registers rcx
, r11
, cc
, and memory
are clobbered by the syscall, so our assembly call must include them in the last line of the syscall instruction. The last line notes the clobbered registers that are overwritten by the OS, as well as other directives.
For an explanation why rcx
and r11
are clobbered.
rcx
rcx
is clobbered to store the address of the next instruction to return to.
r11
r11
is clobbered to store the value of the rflags register.
And for cc
and memory
, from: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Extended-Asm
cc
The
cc
clobber indicates that the assembler code modifies the flags register. On some machines, GCC represents the condition codes as a specific hardware register; “cc” serves to name this register. On other machines, condition code handling is different, and specifying “cc” has no effect. But it is valid no matter what the target.
memory
The
memory
clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the “memory” clobber effectively forms a read/write memory barrier for the compiler.
Note that this clobber does not prevent the processor from doing speculative reads past the asm statement. To prevent that, you need processor-specific fence instructions.
So on x86, a system call will look like:
#define syscall0(num) \
({ \
long _ret; \
register long _num asm("rax") = (num); \
\
asm volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
For the Arm64 (aarch64) version:
#define syscall0(num) \
({ \
register long _num asm("x8") = (num); \
register long _arg1 asm("x0"); \
\
asm volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
Let’s write our first libc function, getpid
.
getpid
returns the pid
of the current process.
It has a signature of pid_t getpid(void);
, where pid_t
is int
.
All we have to do is to make the right system call to the OS and return it.
typedef int pid_t;
(void) {
pid_t getpidreturn syscall0(__NR_getpid);
}
This should return your pid, and we’re done creating a libc function that calls into the kernel.
To put it all together, we need to write the _start
function of our program, since we are going to link to our own libc.
For x86, that means adding this code to the top of your file:
(".section .text\n"
asm".weak _start\n"
".global _start\n"
"_start:\n"
"pop %rdi\n" // argc (first arg, %rdi)
"mov %rsp, %rsi\n" // argv[] (second arg, %rsi)
"lea 8(%rsi,%rdi,8),%rdx\n" // then a NULL then envp (third arg, %rdx)
"xor %ebp, %ebp\n" // zero the stack frame
"and $-16, %rsp\n" // x86 ABI : esp must be 16-byte aligned before call
"call main\n" // main() returns the status code, we'll exit with it.
"mov %eax, %edi\n" // retrieve exit code (32 bit)
"mov $60, %eax\n" // NR_exit == 60
"syscall\n" // really exit
"hlt\n" // ensure it does not return
"");
This sets up everything main needs to run.
For ARM64 (aarch64):
(".section .text\n"
asm".weak _start\n"
".global _start\n"
"_start:\n"
"ldr x0, [sp]\n" // argc (x0) was in the stack
"add x1, sp, 8\n" // argv (x1) = sp
"lsl x2, x0, 3\n" // envp (x2) = 8*argc ...
"add x2, x2, 8\n" // + 8 (skip null)
"add x2, x2, x1\n" // + argv
"and sp, x1, -16\n" // sp must be 16-byte aligned in the callee
"bl main\n" // main() returns the status code, we'll exit with it.
"mov x8, 93\n" // NR_exit == 93
"svc #0\n"
"");
And aggregating that together:
For x86:
#include <asm/unistd.h>
#define syscall0(num) \
({ \
long _ret; \
register long _num asm("rax") = (num); \
\
asm volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
(".section .text\n"
asm".weak _start\n"
".global _start\n"
"_start:\n"
"pop %rdi\n" // argc (first arg, %rdi)
"mov %rsp, %rsi\n" // argv[] (second arg, %rsi)
"lea 8(%rsi,%rdi,8),%rdx\n" // then a NULL then envp (third arg, %rdx)
"xor %ebp, %ebp\n" // zero the stack frame
"and $-16, %rsp\n" // x86 ABI : esp must be 16-byte aligned before call
"call main\n" // main() returns the status code, we'll exit with it.
"mov %eax, %edi\n" // retrieve exit code (32 bit)
"mov $60, %eax\n" // NR_exit == 60
"syscall\n" // really exit
"hlt\n" // ensure it does not return
"");
typedef int pid_t;
(void) {
pid_t getpidreturn syscall0(__NR_getpid);
}
int main() {
return getpid();
}
For ARM64 (aarch64):
#define syscall0(num) \
({ \
register long _num asm("x8") = (num); \
register long _arg1 asm("x0"); \
\
asm volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
(".section .text\n"
asm".weak _start\n"
".global _start\n"
"_start:\n"
"ldr x0, [sp]\n" // argc (x0) was in the stack
"add x1, sp, 8\n" // argv (x1) = sp
"lsl x2, x0, 3\n" // envp (x2) = 8*argc ...
"add x2, x2, 8\n" // + 8 (skip null)
"add x2, x2, x1\n" // + argv
"and sp, x1, -16\n" // sp must be 16-byte aligned in the callee
"bl main\n" // main() returns the status code, we'll exit with it.
"mov x8, 93\n" // NR_exit == 93
"svc #0\n"
"");
typedef int pid_t;
(void) {
pid_t getpidreturn syscall0(__NR_getpid);
}
int main() {
return getpid();
}
Now to compile this program, we can’t link to libc, so, assuming the c file is called main.c
$ gcc -static -lgcc -nostdlib -g main.c -o main
to compile it, and run it:
$ ./main
Grab the exit status of the binary:
$? # some random number
And we’re done with implementing getpid
!
getpid
is a fine starting function, but we want to be able to read and write to files.
Let’s start by defining the system calls that take 1 argument to 3 arguments:
In x86-64:
#define syscall1(num, arg1) \
({ \
long _ret; \
register long _num asm("rax") = (num); \
register long _arg1 asm("rdi") = (long)(arg1); \
\
asm volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "r"(_arg1), \
"0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
#define syscall2(num, arg1, arg2) \
({ \
long _ret; \
register long _num asm("rax") = (num); \
register long _arg1 asm("rdi") = (long)(arg1); \
register long _arg2 asm("rsi") = (long)(arg2); \
\
asm volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "r"(_arg1), "r"(_arg2), \
"0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
#define syscall3(num, arg1, arg2, arg3) \
({ \
long _ret; \
register long _num asm("rax") = (num); \
register long _arg1 asm("rdi") = (long)(arg1); \
register long _arg2 asm("rsi") = (long)(arg2); \
register long _arg3 asm("rdx") = (long)(arg3); \
\
asm volatile ( \
"syscall\n" \
: "=a"(_ret) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), \
"0"(_num) \
: "rcx", "r11", "memory", "cc" \
); \
_ret; \
})
In ARM64 (aarch64):
#define syscall1(num, arg1) \
({ \
register long _num asm("x8") = (num); \
register long _arg1 asm("x0") = (long)(arg1); \
\
asm volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_arg1), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define syscall2(num, arg1, arg2) \
({ \
register long _num asm("x8") = (num); \
register long _arg1 asm("x0") = (long)(arg1); \
register long _arg2 asm("x1") = (long)(arg2); \
\
asm volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_arg1), "r"(_arg2), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
#define syscall3(num, arg1, arg2, arg3) \
({ \
register long _num asm("x8") = (num); \
register long _arg1 asm("x0") = (long)(arg1); \
register long _arg2 asm("x1") = (long)(arg2); \
register long _arg3 asm("x2") = (long)(arg3); \
\
asm volatile ( \
"svc #0\n" \
: "=r"(_arg1) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), \
"r"(_num) \
: "memory", "cc" \
); \
_arg1; \
})
Next, some definitions that the libc functions will use:
typedef int pid_t;
typedef int mode_t;
typedef int ssize_t;
typedef unsigned long long size_t;
#define STDIN_FILENO 0
#define STDOUT_FILENO 1
#define STDERR_FILENO 2
And flags for calls to open
:
For x86-64:
#define O_RDONLY 0
#define O_WRONLY 1
#define O_RDWR 2
#define O_CREAT 0x40
#define O_EXCL 0x80
#define O_NOCTTY 0x100
#define O_TRUNC 0x200
#define O_APPEND 0x400
#define O_NONBLOCK 0x800
#define O_DIRECTORY 0x10000
For Arm64 (aarch64):
#define O_RDONLY 0
#define O_WRONLY 1
#define O_RDWR 2
#define O_CREAT 0x40
#define O_EXCL 0x80
#define O_NOCTTY 0x100
#define O_TRUNC 0x200
#define O_APPEND 0x400
#define O_NONBLOCK 0x800
#define O_DIRECTORY 0x4000
Finally, let’s define the functions we’ll use:
ssize_t close(int fd) {
return syscall1(__NR_close, fd);
}
int fsync(int fd) {
return syscall1(__NR_fsync, fd);
}
ssize_t read(int fd, void *buf, size_t count)
{
return syscall3(__NR_read, fd, buf, count);
}
int open(const char *path, int flags, mode_t mode) {
return syscall3(__NR_open, path, flags, mode);
}
ssize_t write(int fd, const void *buf, size_t count) {
return syscall3(__NR_write, fd, buf, count);
}
And a helper function, strlen
:
size_t strlen(const char *str) {
size_t len;
for (len = 0; str[len]; len++)
("");
asmreturn len;
}
Finally, we can start writing a main function that uses this code:
int main() {
const char* text = "hello world\n"; // text to write
(STDOUT_FILENO, text, strlen(text)); // write text to stdout
write(STDOUT_FILENO); // flush stdout
fsync
const char* file_text = "Hello from file"; // text to write to file
int fd = open("hello.txt", O_CREAT | O_TRUNC | O_RDWR, 0666); // open, truncate, create file hello.txt
(fd, file_text, strlen(file_text)); // write the file text to file
write(fd); // flush hello.txt
fsync(fd); // close hello.txt
close
= open("hello.txt", O_RDONLY, 0666); // open the file hello.txt for reading
fd
char read_from_file[strlen(file_text) + 1]; // the buffer to read into
(fd, (void *)read_from_file, strlen(file_text)); // read from hello.txt to the buffer
read[strlen(file_text)] = '\n'; // add a new line to the buffer
read_from_file(STDOUT_FILENO, read_from_file, strlen(read_from_file)); // write the buffer to stdout
write(STDOUT_FILENO); // flush stdout
fsync(fd); // close hello.txt
close}
After compiling it as above, you should have the following text:
hello world
Hello from file
With hello.txt
containing Hello from file
.
Writing some simple libc code isn’t that hard!