Monday, December 10, 2018

Golang sync.Cond vs. Channel...

The backstory here is that mostly I love the Go programming language.

But I've been very dismayed by certain statements from some of the core Go team members about topics that have significant ramification for my concurrent application design.  Specifically, bold statements to the effect that "channels" are the way to write concurrent programs, and deemphasizing condition variables.  (In one case, there is even a proposal to remove condition variables entirely from Go2!)

The Go Position


Essentially, the Go team believes very strongly in a design principle that can be stated thusly:

"Do not communicate by sharing memory; instead, share memory by communicating."

This design principle underlies the design of channels, which behave very much like UNIX pipes, although there are some very surprising semantics associated with channels, which I have found limiting over the years.  More on that below.

Certainly, if you can avoid having shared memory state, but instead pass your entire state between cooperating parties, this leads to a simpler, lock free (sort of -- channels have their own locks under the hood!) design.  When your work is easily expressed as a pattern of pipelines, this is a better design.

The Real World


The problem is that sometimes (frequently in the real world) your design cannot be expressed this way.   Imagine a game engine, dealing with events from the network,  multiple players, input sources, physics, modeling, etc.  One simple design is to use a single engine model, with a single go routine, and have events come in via many channels.  Then you have to create a giant select loop to consume events.  This is typical of large event driven systems.

There are some problems with this model.


  1.  Adding channels dynamically just isn't really possible, because you have a single hard coded select loop.  Which means you can't always cope with changes in the real world.   (For example, if you have a channel for inputs, what happens when someone plugs in a new controller?)
  2. Any processing that has to be done on your common state needs to be in that giant event loop.  For example, updates to lighting effects because of an in game event like a laser beam needs to know lots of things about the model -- the starting point of the laser beam, the position of any possible objects in the path of the laser, and so forth.  And then this can update the state model with things like whether the beam hit an object, causing a player kill, etc.
  3. Consequently, it is somewhere between difficult and impossible to really engage multiple CPU cores in this model.  (Modern multithreaded games may have an event loop, but they will also make heavy use of locks to access shared state, in order to permit physics calculations and such to be done in parallel with other tasks.)


So in the real world, we sometimes have to share memory still.

Limitations of Channels


There are some other specific problems with channels as well.


  • Closed channels cannot be closed again (panic if you do), and writing to a closed channel panics. 
  • This means that you cannot easily use go channels with multiple writers.  Instead, you have to orchestrate closing the channel with some other outside synchronization primitive, such as a mutex and flag, or a wait group.)  This semantic also means that close() is not idempotent.  That's a really unfortunate design choice.
  • It's not possible to broadcast to multiple readers simultaneously with a channel other than by closing it.  For example, if I am going to want to wake a bunch of readers simultaneously (such as to notify multiple client applications about a significant change in a global status), I have no easy way to do that.  I either need to have separate channels for each waiter, or I need to hack together something else (for example adding a mutex, and allocating a fresh replacement channel each time I need to do a broadcast.  The mutex has to be used so that waiters know to rescan for a changed channel, and to ensure that if there are multiple signalers, I don't wake them all.)
  • Channels are slow.  More correctly, select with multiple channels is slow.  This means that designs where I have multiple potential "wakers" (for different events) require the use of separate channels, with separate cases within a select statement.  For performance sensitive work, the cost of adding a single additional case to a select statement was found to be quite measurable.
There are other things about channels that are unfortunate (for example no way to peek, or to return an object to channel), but not necessarily fatal.

What does concern me is the false belief that I think the Go maintainers are expressing, that channels are a kind of panacea for concurrency problems.

Can you convert any program that uses shared state into one that uses channels instead?  Probably.

Would you want to?  No.  For many kinds of problems, the constructs you have to create to make this work, such as passing around channels of channels, allocating new channels for each operation, etc. are fundamentally harder to understand, less performant, and more fragile than a simpler design making use of a single mutex and a condition variable would be.

Others have written on this as well.

Channels Are Not A Universal Cure


It has been said before that the Go folks are guilty of ignoring the work that has been done in operating systems for the past several decades (or maybe rather of being guilty of NIH). I believe that the attempt to push channels as the solution over all others is another sign of this.  We (in the operating system development community) have ample experience using threads (true concurrency), mutexes, and condition variables to solve large numbers of problems with real concurrency for decades, and doing so scalably

It takes a lot of hubris for the Golang team to say we've all been doing it wrong the entire time.  Indeed, if you look for condition variables in the implementation of the standard Go APIs, you will find them.  Really, this is a tool in the toolbox, and a useful one, and I personally find it a bit insulting that the Go team seems to treat this as a tool with sharp edges with which I can't really be trusted.

I also think there is a recurring disease in our industry to try to find a single approach as a silver bullet for all problems -- and this is definitely a case in point.  Mature software engineers understand that there are many different problems, and different tools to solve them, and should be trusted to understand when a certain tool is or is not appropriate.

Saturday, December 1, 2018

Go modules, so much promise, so much busted

Folks who follow me may know that Go is one of my favorite programming languages.  The ethos of Go has historically been closer to that of C, but seems mostly to try to improve on the things that make C less than ideal for lots of projects.

One of the challenges that Go has always had is it's very weak support for versioning, dependency management, and vendoring.  The Go team's historic promise and premise (called the Go1 Promise) was that the latest version in any repo should always be preferred. This has a few ramifications:

  • No breaking changes permitted in a library, or package, ever.
  • The package should be "bug-free" at master.  (I.e. regression free.)
  • The package should live forever.

For small projects, these are noble goals, but over time it's been well demonstrated that this doesn't work. APIs just too often need to evolve (perhaps to correct earlier mistakes) in incompatible ways. Sometimes its easier to discard an older API than to update it to support new directions.

Various 3rd party solutions, such as gopkg.in, have been offered to deal with this, by providing some form of semantic versioning support.

Recently, go1.11 was released with an opt-in new feature called "modules".  The premise here is to provide a way for packages to manage dependencies, and to break away from the historic pain point of $GOPATH.

Unfortunately, with go modules, they have basically tossed the Go1 promise out the window. 

Packages that have a v2 in their import URL (like my mangos version 2 package) are assumed to have certain layouts, and are required to have a new go.mod module file to be importable in any project using modules.  This is a new, unannounced requirement, and it broke my project from being used with any other code that wants to use modules.  (As of right now this is still broken.)

At the moment, I'm believing that there is no way to correct my repository so that it will be importable by both old code, and new code, using the same import URL.  The "magical" handling of "v2" in the import path seems to preclude this.  (I suspect that I probably need different contradictory lines in my HTML file that I use to pass "go-imports", depending on whether someone is using the new style go modules, or the old style $GOPATH imports.)

The old way of looking at vendored code is no longer used.  (You can opt-in to it if you want still.)

It's entirely unclear how godoc is meant to operate the presence of modules.  I was trying to setup a new repo for a v2 that might be module safe, but I have no idea how to direct godoc at a specific branch.  Google and go help doc were unhelpful in explaining this.

This is all rather frustrating, because getting away from $GOPATH seems to be such a good thing.

At any rate, it seems that go's modules are not yet fully baked.  I hope that they figure out a way for existing packages to automatically be supported without requiring reorganizing repos.  (I realize that this is already true for most packages, but for some -- like my mangos/v2 package -- that doesn't seem to hold true).