Sunday, September 9, 2012

The Case for a new Reference Distro for illumos

With the recent announcement and discussion around OpenIndiana, I think its time to bring to the light of day an idea I've been toying with for a while now -- and have discussed with a few others in the community.

illumos needs a reference distribution.

Now just hold on there... before you get all wound up, let's discuss what I mean by a reference distribution.


  • A reference distribution should be a base for additional development, and testing. 
  • It should support automated installation for mass deployment in test lab/QA/build farm scenarios.
  • It must support "self-hosting" -- i.e. illumos-gate itself should be buildable on this distribution -- with as few additional steps as possible.
  • It must be valid as a QA platform -- for example, it needs to support the packaging tools we use in order to validate the packaging metadata (manifests).  (Yes, this means it needs to support - and be packaged with - IPS -- in spite of my loathing of IPS itself.)
  • It should be a reasonable (but not the only reasonable possible) base to use for other distributions looking to build on top of it.
  • Ideally it should be pretty darn minimal.
  • It may have graphics support, e.g. an X server, but only in so far as having that would support validation of technologies in illumos-gate.  (Sorry, no big desktops, Gnome, web browsers, or any of that crap.)
  • Community driven - it contains the things that the entire community agrees upon as our common core.
  • It must require only a modicum of effort around release engineering - none of these multi-week release cycles.
  • It should be released biweekly with a numbered build, and probably also have nightlies.  These should be automated.
  • Multi-platform - i.e. it needs to support both SPARC and x86.


As a common reference, it must use illumos-gate verbatim, i.e. no local changes.

Here's what it is not:

  • A general purpose desktop.
  • A general purpose server.
  • A hypervisor.
  • An appliance.
  • Subject to the whims of any corporate entity.
  • An experimental version (it isn't Fedora!)
Why do we need this?  OpenIndiana has been filling the role - but OpenIndiana is insufficient for several reasons:

  • OpenIndiana is too big and unwieldy -- both to build, and to install.
  • OpenIndiana maintains their own fork of illumos-gate.
  • OpenIndiana needs the freedom to follow their own course - without being too strictly tied to illumos-gate.
  • OpenIndiana lacks any kind of formal QA.  (The recent debacles around 151 are good examples of this.)
  • OpenIndiana is not built for SPARC.
(Some of the above items could be addressed with OI, but I think addressing all of them is nigh impossible without totally changing the role of OI.)

So, here's what I propose we do instead.
  • I propose that we start with OpenIndiana or OmniOS, but using core illumos-gate (no fork), and we remove things that are not relevant to the reference.  E.g. Gnome, etc.
  • We will need lots of assistance with release engineering - I want a formal RE plan in place for this thing, with regular scheduled builds and releases following the "train" model rather than the "station wagon" model.
  • We will need to set up some central build and test servers.  (I hope to get some funding for this through the Foundation).
  • We will need to have formalized QA.  (Imagine a Jenkins instance that ran a build and regression test group every night.
  • The bug database should add fields to reference formal build numbers.  (Introduced-In, Fixed-In, Reported-In, etc.)
  • We do not need "an installer".  So the image can just be a basic ISO or tarball, probably.  Think minimalist.  I don't want any customization in the installer, excepting possibly the identification of the initial disk to install to.

Ultimately, if we're very successful, then this will help the entire illumos community.  This reference may ultimately be useful to become a base for OpenIndiana or OmniOS to build upon.  Put another way, I want to make OI's job easier, but I don't intend that this should replace OI.

So, what do I want from the community?

I want to hear from people who think this is a terrible idea.  A descriptive criticism of concerns is far more useful than just another +1.  (If you want to +1 it, do it using Google or FB to like it -- don't add a +1 comment.)  Post your concerns directly to this thread, so that we can keep them for posterity.

I want to hear from people who want - and are able - to actively contribute to this.  Most of this effort will be volunteer based initially, but I think we'll eventually need to staff some work from the Foundation.  Corporate sponsors may help here as well, either by loaning of resources, or gifting of earmarked funds.

Btw, for folks that want to debate names -- let me kill that now.  I want to name this "IRD" -- illumos Reference Distribution.   Its about as unsexy as possible - and that is the point -- this thing should be viewed as something the community uses internally, and not something that is ever given marketing attention.  (Those spot lights belong to illumos itself, and to the other end-user oriented distributions.)

To be clear, I'm not 100% set that this is all the way to go -- but I'm heavily leaning in that direction.  While I can't force the rest of the community to follow along, I can allocate resources to it, and I want to make sure that we can address any major concerns before I start investing resources here.

Thanks.

17 comments:

A Hettinger said...

My first thought (I'll admit, before actually reading your post) was "this is a knee-jerk reaction to recent events in OI and a mistake." After reading your post and thinking about it, I'm hard-pressed to come up with a reason to oppose it.

Having a testbed that is minimal for the purposes of testing Illumos is in the best interests of Illumos QC.

Having a testbed that is not encumbered with alterations (as OI is) for the purposes of testing Illumos is in the best intrests of Illumos QC.

Not creating a sense of responsibility to the needs of Illumos from the OI side is in the best interests of OI.

Having Illumos releasing IPS packages of it self, may RE easier for OI.

Now the bad news, this is going to mean more work for you (Illumos). The only way I see this as being practical is for you to simplify RE. Using Jenkins for CI will help, but even once that's up, I doubt this will be free.

A Hettinger said...
This comment has been removed by a blog administrator.
Yuri said...

Absolutely love the idea as I end up deleting a lot of packages from OI installation anyway (even from text one). And, of course, I'd like to help to see this become reality.

Garrett D'Amore said...

I just removed the duplicate post. Same content as first. ;-)

Theo Schlossnagle said...

+1

OmniOS has a clear corporate interest (OmniTI's), but seems to fit well to the other bits. I'd imagine we broke something related to SPARC when broadened the dual-architecture library build support in that we didn't test the changes on SPARC. I likely botched a Makefile here and there (or forgot one). Sounds like an hour of cleanup on a SPARC build system. All in, there are probably only one or two commits that have not been upstreamed that are required for OmniOS to run illumos-gate (all related to dual-architecture libs).

All that said, I took a pretty hard line on OmniOS core being only what is required to remotely log in and build the system. We gutted CUPS, so while illumos-gate allows building without CUPS, you'd no longer be able to test the flip side without reintroducing CUPS.

Regardless of the choice of OI or OmniOS, I think the goals and the motivations are solid and I completely support it.

gea said...

+1
Currently I see:

- different patchlevel of distributions
- different behaviours of systemtools
(example manual IP settings)
- insecurity about the future of distributions
- different packaging tools
- different repositories
- different ZFS versions (feature flags)

Its a huge problem, when every distribution must care about core features. And its a huge problem, if I as a user must care about differences in the core of the distribution. Put Illumos inside when something is based on Illumos - with the option to check if a problem is a distribution problem or a Illumos problem. A working Illumos reference installation is the right way - absolutely.

gea said...
This comment has been removed by a blog administrator.
Peter Tribble said...

Do you mean a reference distro, or a basic foundation? Your description is more the latter, yet isn't self-consistent.

A reference distro, to me, is self-contained. Which means that you end up supporting all use cases of the technologies you include. So, if you include IPS (which I believe to be a terrible mistake) then you include *all* of IPS, which includes the graphical parts, which drags in GNOME, and you have to drag in *all* of GNOME. (Note that derived distributions don't have to ship all those bits, but you still have to support them.)

And in that case, the reference IPS distro is OI. So you're either duplicating effort or competing head on. In any case, you can't really produce a minimal distro, which was another aim.

One might imagine building a bootable Illumos image that imports the non-Illumos components from elsewhere. That, to me, isn't a reference distro, as it's dependent for core functionality on at least one other distro.

If you just want to produce a base foundation, then you need to either choose a packaging system that doesn't drag in the whole kitchen sink, or not bother with packaging in that distro at all.

Ivan Nudzik said...

+1 ;-)

And it was said in the begining Illumos will never be a distro... if I remember right. ;-)

I suggest you to get inspired by *BSD world... namely by what is good to hold in 'core' OS and what's for ports. And *BSD don't use any fancy packaging system for core OS. Well written and revisited scripts are fair enough for upgrades of core. Plus with ZFS snapshots... better to forget IPS for that I think. I can imagine 'core' update as of zfs receive for application volumes (diff) and then hg pull && hg update for config files (potentially merge local config changes)... may be few patch scripts to run at last.

And for packages/ports... I'm still comfortable with pkgsrc on some of my T1000. I don't miss IPS. I'm getting more coherent system with compiled in SW, than packaged can offer. For example on that T1000 I'm compiling T1 optimised with Studio Experss... it is worth of that time, cause of performance I get.

So my hint in summary: Illumos reference = OS + configs + man + compiler + sources + X + X configs + X sources
Very similar to *BSDs. No IPS for reference, but IPS can be part of OS... it's on wider discussion, what has to be part of OS, but in general has to be as minimal as possible.

Garrett D'Amore said...

First, I hate IPS -- or rather I hate the IPS toolchain. (I *love* the script-less model of expressing the entire packaging state in well-formed metadata though.)

I think we need to have IPS tools -- at least that part of the toolset required to validate our metadata, and ultimately to product the ISO image itself.

If the core IPS set is not separable from Gnome (a tragic failing IMO, if true), then we would need to fork it -- or provide our simpler toolchain. (I wouldn't mind a solution that didn't depend on Python. I hate requiring Perl and Python in our core, and I'm working to eliminate them as dependencies -- they properly belong as optional technologies for distros to include.)

To Peters comment -- I want this to be a reference. I don't agree that means we have to support all use cases -- and I especially don't think we need to drag in Gnome, etc.

Please see my actual post for the *purpose* of this beast -- that should make it far more clear. We need a small beast that we can readily use for build servers, for test farms, etc. OI is all wrong for this role. I don't want to get hung up on terminology of "reference" vs. base.

Ivan Nudzik said...

Ok, Garrett, when main accent is on a beast for build and test farms, then my hint is don't think of ISO at all and rather focus on Net install. Every x86 board nowadays can boot off network and SPARCs too. Reference distro you want is not for BFU, so no problem it is bit more complex comparing to insert burned CD. You can get farm of test machines with clean install up and running in minutes, just letting them boot off the network. Get inspired by SmartOS. Just minimal system that can instantiate fresh downloaded build to disk, then reboot.
Lot of work saved screwing IPS and its dependencies. Such unified install can be upgraded by zfs receive of differentials, I think.

That's my brainstorm... basically I'm admin, not developer, so may be I'm missing some crucial points, that makes my comment whole irrelevat.

Btw what about virtual appliance updated few times a year. I'll be happy to get one for experiments. ISOs are passe... And it could be also prepared as a boot server to instantiate those fresh new installs.

Garrett D'Amore said...

Ivan, I agree that netinstall is the most interesting use case for this. We definitely need to support that for automation.

I do think that the "image" that the netinstall applies should be constructed by IPS. Again, we use IPS to indicate packaging and dependencies, and validate that e.g. files are installed and not missing, etc. So for the purposes of *validation*, both we and our downstreams need to know that we are using at least some of the IPS toolchain (or have our own) to ensure that our manifests are correct.

(Also, onu may be useful to some people using this reference distro, so I don't want to eliminate that possibility.)

SmartOS uses an entirely different -- wholly diskless -- model. I like that model a lot (I'm using it in a different corporate effort), but I think its not proper for this distro. As part of our validation process, we need to ensure that we are installing and running in a manner that reflects the majority use case -- and despite SmartOS' growing popularity, I think that means installing to a ZFS pool for now.

Ivan Nudzik said...

Garrett, IPS meaning is clear for me, but I'm missing its meaning for that particular case - the Reference. Let's call it that one word.
Once decided what belongs to Reference, what dependencies do you want check for? Everything must be there - one huge list of files. It's for a tar file. Build XYZ is that one tar file on URL... when simplified as example. Only consistency must be assured. When you run into problems during build or test -> check consistency of Reference -> inconsistent -> netboot and reinstall from scratch (it takes <5min.)... not wasting a time with finding what's get botched in Reference.
Only dependencies I see are build dependencies and those are expressed in makefiles. And sources are pulled/pushed from/to mercurial.
If you want small slick Reference install I don't see any meaning for check dependencies inside of this reference... it's a buch of files and every one must be there. For consistency check, may be it can be done on zfs level... snapshot after clean install. Or SQLite DB with abs. files paths and metadata, md5 for it. SQLite is one huge C file - easy to manage. Successful build can fill out SQLite DB file with list of Reference files and its metadata. When you want to upgrade revision of reference, just get new DB file to know what to download. Very similar way FreeBSD core updates works... you choose revision you want and then DB file with list of files get downloaded, every file on disk gets compared against data in DB and those requiring change are replaced by new ones.
Such system can be even shell scripted, but in C version wouldn't be much complicated too...

Packaging system like IPS is ok for task like keeping Gnome up2date, but luxury for Reference which is in fact one huge package I think.

SmartOS a meant as inspiration for netboot and instantiating something to zpool. Btw such netboot image need not to be prepared very often, only task it has to do, is netboot -> create some zpool (when found 2 same disks, then mirror) -> create unified zfs volumes list -> download required files and copy them to zpool -> make it bootable -> reboot. In fact that image has to be updated, only when new version of ZFS. It can be manual task to make it.

Matt F-V said...

IPS or game over. apt/deb is not good enough i see it breaking all the time. I could tolerate yum/rpm, although it is far from perfect.

Gabriele Bulfon said...

It's absolutely time for this.
We have our own distro, not public yet, based on illumos kernel, but the packages it builds depend a lot on some other packages coming from the OI distro.
The illumos-userland still have not enough packages to substitute the ones I need from OI to make a bootable/installable ISO. Also, many of them have different pkg names (such as, illumos-kernel references gmake, while userland contains make).

I absolutely agree because we need to reach a point of not depending on OI packages.

I would love to contribute this idea with our own distro work.
If anyone would like to check, I can provide an ISO.
It's a base distro: kernel, text-installer, no graphic at all.

comay said...

Just a note that the IPS packagemanager(1), the so-called GUI component, is not core to IPS and does not need to be part of the toolchain. Hence, GNOME is not a true dependency here.

jojopig.com said...

Thanks for the posts. good stuff.