Writing Portable C
On my free time I’ve been hacking away on unix-utils, a project that implements some common unix utilities. In order to test how portable C really is for the average programmer, I decided to lean on open source to see how many platforms I could target my C with. I wanted to write the most portable code I could (that meant targeting only POSIX) functions and using a minimalistic stdlib that tried its best to adhere to the standards (musl). This let me write C code for about 50 targets. Let’s go deeper into the background and how that was all possible:
POSIX and SUS
In the 70s, C was made at AT&T. In the 80s, it escaped, becoming more popular, before eventually becoming standardized by ANSI (C89). But what about standards for Operating Systems? Enter POSIX (the Portable Operating System Interface), a standard for operating system interfaces. In 1988, the first version of POSIX was released by the IEEE, which detailed common interfaces, like Signals, pipes, and the C standard library.
The POSIX standards were fairly minimalistic in the 80s, adding some extensions (like real time programming) and thread extensions) in the early 90s before being subsumed by the Austin Group, a committee that designed the Single Unix Specification. The Austin Group has steered the POSIX standards since 1997, creating standards like (POSIX 1997/SUS v2), (POSIX 2001/SUS v3), and (POSIX 2008/SUS v4).
Since 2008, there have been two minor corrections to the POSIX standards (one in 2011 and 2017), but the two most common POSIX standards in use are POSIX 2001 and 2008, which is where we’ll be directing most of our attention to.
POSIX compliance in particular ends up being extremely important, because most Operating Systems have at least some level of POSIX compliance. Linux, the BSDs, Mac OS, and Windows all do to some extent. That means that our C code can target all of them by following the standards, which makes our code more flexible.
This is so important that GCC (GNU’s C Compiler) ended up implementing a flag that checks for strict compliance to the POSIX standard of your choice.
In my Makefile, I have this line, which says to compile my code strictly according to the standard.
Since I wrote the first draft of my utilities on a Mac OS computer with no POSIX compatibility flags, you can imagine there was a lot of breakage. As to why there was so much breakage, well, that requires another history lesson.
GCC vs MUSL
In the 80s, the Free Software Foundation (FSF) wanted to create the ideal “Free” programming environment. To do so, they were going to start from the top-down, by implementing the user space (a C compiler, a shell, the POSIX shell utilities, etc), and then build an OS kernel (GNU Hurd). GNU succeeded at one part of their mission, by providing the most common userspace tools to date (GCC, Bash, and the GNU utils). However, GNU’s kernel lost out to Linux, and the rest is history.
Linux started out only supporting GCC tools for its userspace, but now it can support a wide variety of C standard libraries (libc for short). One of those ends up being Musl, the standard library of this article.
The choice of standard library would end up being entirely inconsequential if not for one detail: Musl supports static linking, and GCC does not.
Sure GCC supports a lot of non-standard extensions, and sure GCC executables are more bloaty than their musl counterparts, but static linking lets us execute our binaries without having installed a libc on the platform.
That means our code can reach even more users!
Much blood has been spilt on static vs dynamic linking, so I will spare you the carnage by simply saying that static linking tends to be more convenient for the end user (they require less dependencies on their side to run the code), which is good for us, the application builders.
What Sacrifices were made?
How do you make static binaries?
Going back to building some unix utilities, I downloaded a musl-gcc compiler, logged into my linux VM and started compiling.
The first issue I ran into was that musl-gcc didn’t compile static binaries.
I added the flag -static
to my build, but
file
and ldd
ended up telling me that my
binary was still dynamically linked.
I dug through troves of documentation. Eventually, I discovered that
it wasn’t enough just to provide the -static
flag, because
GCC can ignore it. You have to provide another flag,
--static
as well. Oh, and if that wasn’t enough, that still
didn’t compile static binaries. You had to disable pie
, or
position independent executables
with the flag
-no-pie
as well.
Finally, I had compiled a hello world binary statically. Time to move on!
Don’t name your functions _init
I then tried to compile my utilities. I wanted to decrease
duplication so I wrote a header file with a function called
_init
. This ended up causing a duplicate symbol error (musl
defines this function in crti.o first).
Of course, GCC never complained, so I had to rename this function.
getopt_long
doesn’t exist
Next up, getopt_long
(Get options with long flags) isn’t
POSIX standard. Unshocking. POSIX only specifies the normal
getopt
, which supports short options only. Long options
like --file
or --color
are a GNUism.
I ended up finding a copy of getopt_long
online and
rewriting my header file includes for my utilities.
Sysctl isn’t standard
Next up, I had a compiler error where my implementation of
uptime
failed to compile. <sys/sysctl.h>
is a Macism, and not part of POSIX. Linux offers it up in
<linux/sysctl.h>
for convenience, but as its name
might indicate, it’s not portable.
Next!
lstat has optional fields
In my implementation of stat, I used the functions
major
, minor
and ctime
, none of
which are POSIX compliant. They’re useful on mac os, so I can gate them
behind an __APPLE__
macro, but that makes the code less
succinct. Oh well.
NI_MAXHOST
isn’t defined
As an oddity, musl doesn’t define NI_MAXHOST
at all.
This is useful for dig
, which returns the ip address for a
given address. I ended up defining it if it wasn’t already defined.
Getting the Toolchains
With all these changes made, our code will now compile for Linux + musl, thankfully. The next problem was actually getting our toolchains.
Luckily, after some googling, I found out about <musl.cc>, a website which releases versions of musl-gcc toolchains.
Now, since I didn’t want to create an undue amount of load onto this website, I created a mirror of it: https://github.com/Takashiidobe/muslcc.
Next, I had to create a github action that would fetch the compiler required, set it up properly, compile all of the binaries, strip the debug information, tar them into one directory, and release them on a push to tags. Phew!
This part turned out to be a lot of guesswork and letting it run, so I’ll leave the final results here:
https://github.com/Takashiidobe/unix-utils/blob/master/.github/workflows/release.yml
And the repo here:
In Short
It’s amazing that you can write code that targets so many architectures, and compile to them easily, all for free, with the power of open source (and Microsoft’s wallet, thanks Github Actions).
With this, I was able to build for 52 architectures and release code for them (I ended up adding in support for x86_64 Darwin and arm64 Darwin).
Viva portable code.