Tuesday, September 22, 2015

On Go, Portability, and System Interfaces

I've been noticing more and more lately that we have a plethora of libraries and programs written for Go, which don't work on one platform or another.  The root cause of these is often the use of direct system call coding to system calls such as ioctl().  On some platforms (illumos/solaris!) there is no such system call.

The Problems

But this underscores a far far worse problem, that has become common (mal)-practice in the Go community.  That is, the coding of system calls directly into high level libraries and even application programs.  For example, it isn't uncommon to see something like this (taken from termbox-go):
func tcsetattr(fd uintptr, termios *syscall_Termios) error {
        r, _, e := syscall.Syscall(syscall.SYS_IOCTL,
                fd, uintptr(syscall_TCSETS), uintptr(unsafe.Pointer(termios)))
        if r != 0 {
                return os.NewSyscallError("SYS_IOCTL", e)
        return nil

This has quite a few problems with it.
  1. It's not platform portable.  This function depends on a specific implementation of tcsetattr() that is done in terms of specific ioctl()s.  For example, TCSETS may be used on one platform, but on others TIOCSETA might be used.
  2. It's not Go portable, since SYS_IOCTL isn't implemented on platforms like illumos, even though as a POSIX system we do have a working tcsetattr(). 
  3. The code is actually pretty unreadable, and somewhat challenging to write the first time correctly.
  4. The code uses unsafe.Pointer(), which is clearly something we ought to avoid.
  5. On some platforms, the details of the ioctls are subject to change, so that the coding above is actually fragile.  (In illumos & Solaris system call interfaces are "undocumented", and one must use the C library to access system services.  This is our "stable API boundary".  This is somewhat different from Linux practice; the reasons for this difference is both historical and related to the fact that Linux delivers only a kernel, while illumos delivers a system that includes both the kernel and core libraries.)
How did we wind up in this ugly situation?

The problem I believe stems from some misconceptions, and some historical precedents in the Go community.   First the Go community has long touted static linking as one of its significant advantages.  However, I believe this has been taken too far.

Why is static linking beneficial?  The obvious (to me at any rate) reason is to avoid the dependency nightmares and breakage that occurs with other systems where many dynamic libraries are brought together.  For example, if A depends directly on both B and C, and B depends on C, but some future version of B depends on a newer version of C that is incompatible with the version of C that A was using, then we cannot update A to use the new B.  And when the system components are shared across the entire system, the web of dependencies gets to be so challenging that managing these dependencies in real environments can become a full time job, consuming an entire engineering salary.

You can get into surprising results where upgrading one library can cause unexpected failures in some other application.  So the desire to avoid this kind of breakage is to encode the entire binary together, in a single stand-alone executable, so that we need never have a fear as to whether our application will work in the future or not.  As I will show, we've not really achieved this with 100% statically linked executables in Go, though I'll grant that we have greatly reduced the risk.

This is truly necessary because much of the open source ecosystem has no idea about interface stability nor versioning interfaces.  This is gradually changing, such that we now have ideas like semver coming around as if they are somehow new and great ideas.  The reality is that commercial operating system vendors have understood the importance of stable API boundaries for a very very long time.  Some, like Sun, even made legally binding promises around the stability of their interfaces.  However, in each of these cases, the boundary has to a greater or lesser extent been at the discretion of the vendor.

Until we consider standards such as POSIX 1003.1.  Some mistakenly believe that POSIX defines system calls.  It does not.  It defines a C function call interface.  The expectation is that many of these interfaces have 1:1 mappings with system calls, but the details of those system calls are completely unspecified by POSIX.

Basically, the Go folks want to minimize external dependencies and the web of failure that can lead to.  Fixing that is a goal I heartily agree with.  However, we cannot eliminate our dependency on the platform.  And using system calls directly is actually worse, because it moves our dependency from something that is stable and defined by standards bodies, to an interface that is undocumented, not portable, and may change at any time.

If you're not willing to have a dynamic link dependency on the C library, why would you be willing to have a dependency on the operating system kernel?  In fact, the former is far safer than the latter! (And on Solaris, you don't have a choice -- the Go compiler always links against the system libraries.)

Harmful results that occur with static linking

If the application depends on a library that has a critical security update, it becomes necessary to recompile the application.  If you have a low level library such as a TLS or HTTP client, and a security fix for a TLS bug is necessary (and we've never ever ever had any bugs in TLS or SSL implementation, right?), this could mean recompiling a very large body of software to be sure you've closed the gaps.
With statically linked programs, even knowing which applications need to be updated can be difficult or impossible.  They defy the most easy kinds of inspection, using tools like ldd or otool to see what they are built on top of.

What is also tragic, is that static executables wind up encoding the details of the kernel system call interface into the binary.  On some systems this isn't a big deal because they have a stable system call interface.  (Linux mostly has this -- although glibc still has to cope with quite a few differences here by handling ENOSYS, and don't even get me started on systemd related changes.)  But on systems like Solaris and illumos, we've historically considered those details a private implementation detail between libc and kernel.  And to prevent applications from abusing this, we don't even deliver a static libc.  This gives us the freedom to change the kernel/userland interface fairly freely, without affecting applications.

When you consider standards specifications like POSIX or X/OPEN, this approach makes a lot of sense.  They standardize the C function call interface, and leave the kernel implementation up to the implementor.

But statically linked Go programs break this, badly.  If that kernel interface changes, we can wind up breaking all of the Go programs that use it, although "correct" programs that only use libc will continue to work fine.

The elephant in the room (licensing)

The other problem with static linking is that it can create a license condition that is very undesirable.  For example, glibc is LGPL.  That means that per the terms of the LGPL it must be possible to relink against a different glibc, if you link statically.

Go programs avoid this by not including any of the C library statically. Even when cgo is used, the system libraries are linked dynamically.  (This is usually the C library, but can include things like a pthreads library or other base system libraries.)

In terms of the system, the primary practice for Go programmers has been to use licenses like MIT, BSD, or Apache, that are permissive enough that static linking of 3rd party Go libraries is usually not a problem.  I suppose that this is a great benefit in that it will serve to help prevent GPL and LGPL code from infecting the bulk of the corpus of Go software.

The Solutions

The solution here is rather straightforward.

First, we should not eschew use of the C library, or other libraries that are part of the standard system image.  I'm talking about things like libm, libc, and for those that have them, libpthread, libnsl, libsocket.  Basically the standard libraries that every non-trivial program has to include.  On most platforms this is just libc.  If recoded to use the system's tcsetattr (which is defined to exist by POSIX), the above function looks like this:
// include <termios.h>
import "C"
import "os"
func tcsetattr(f *os.File, termios *C.struct_termios) error {
        _, e := C.tcsetattr(C.int(f.Fd(), C.TCSANOW, termios)
        return e
The above implementation will cause your library or program to dynamically link against and use the standard C library on the platform.  And it works on all POSIX systems everywhere and because it uses a stable documented standard API, it is pretty much immune to breakage from changes elsewhere in the system.  (At least any change that broke this implementation would also break so many other things that the platform would be unusable.  Generally we can usually trust people who make the operating system kernel and C library to not screw things up too badly.)

What would be even better, and cleaner, would be to abstract that interface above behind some Go code, converting between a Go struct and the C struct as needed, just as is done in much of the rest of the Go runtime.  The logical place to do this would be in the standard Go system libraries.  I'd argue rather strongly that core services like termio handling ought to be made available to Go developers in the standard system libraries that are part of Go, or perhaps more appropriately, with the golang.org/x/sys/unix repository.

In any event, if you're a Go programmer, please consider NOT directly calling syscall interfaces, but instead using higher level interfaces, and when those aren't already provided in Go, don't be afraid to use cgo to access standard functions in the C library.  Its far far better for everyone that you do this, than that you code to low level system calls.

Sunday, September 20, 2015

Announcing govisor 1.0

I'm happy to announce that I feel I've wrapped up Govisor to a point where its ready for public consumption.

Govisor is a service similar to supervisord, in that it can be used to manage a bunch of processes.  However, it is much richer in that it understands process dependencies, conflicts, and also offers capabilities for self-healing, and consolidated log management.

It runs as an ordinary user process, and while it has some things in common with programs like init, upstart, and Solaris SMF, it is not a replacement for any of those things.  Instead think of this is a portable way to manage a group of processes without requiring root.  In my case I wanted something that could manage a tree of microservices that was deployable by normal users.  Govisor is my answer to that problem.

Govisor is also written entirely in Go, and is embeddable in other projects.  The REST server interface uses a stock http.ServeHTTP interface, so it can be used with various middleware or frameworks like the Gorilla toolkit.

Services can be implemented as processes, or in native Go.

Govisor also comes with a nice terminal oriented user interface (I'd welcome a JavaScript based UI, but I don't write JS myself).  Obligatory screen shots below.

Which actually brings up the topic of "tops'l"  (which is a contraction of top-sail, often used in sailing).  Govisor depends the package tops -- "terminal oriented panels support library"), which provides a mid-level API for interacting with terminals.   I created topsl specifically to help me with creating the govisor client application.  I do consider topsl ready for use by Govisor, but I advise caution if you want to use it your own programs -- its really young and there are probably breaking changes looming in its future.

The documentation is a bit thin on the ground, but if you want to help or have questions, just let me know!  In the meantime, enjoy!