I want those two days of my life back, Puppet

Posted: Fri, 4 September 2009 | permalink | No comments

Pop Quiz time, everyone (well, the Puppet-using subset of "everyone", anyway). Where is the dependency loop in the following manifest:

class klass {
   file { ['/tmp/a']: ensure => present }
}

define def_file() {
   include klass

   file { $name:
      ensure => present,
      require => File['/tmp/a']
   }
}

def_file { '/tmp/b': }

def_file { '/tmp/c': require => File['/tmp/b'] }

This seems pretty straightforward: /tmp/c can't be created until /tmp/b (via the direct require) and /tmp/a (via the define) have been created, while /tmp/b can't be created until /tmp/a is created. You can draw some pretty graphs and see that everything looks fine.

However, if you run this manifest, you will find that roughly half the time, you will get a failure saying that there's a dependency loop (and the other half of the time it'll work fine), like this: "err: Could not apply complete catalog: Found dependency cycles in the following relationships".

So, where is the loop? It's because the require on the /tmp/c resource also applies to the /tmp/a resource -- iff the include klass directive is applied to the /tmp/c resource. So `/tmp/a` then requires `/tmp/b`, and `/tmp/b` requires `/tmp/a`, and it's LOOP TIME!

Whether or not this loop occurs on a particular Puppet run is controlled by fickle Lady Fate (because Ruby hashes, which are used at some point internally in this process, don't have a well-defined or stable order). As to why the include only applies to one of the defines rather than all of them, well, that's related to the fact that a class only gets evaluated once, and I think someone was having a lazy day when they wrote that part of the system (I can't imagine it would be the end of the world if each instance of the class was linked to the defined resource in the scope tree).

Now, when you've only got the above manifest, it's relatively straightforward to isolate the problem and fix it -- I had it sorted in about an hour. The two days prior to that were isolating the bug from the live manifest, because there are about 20 instances of def_file in our live manifest, and so I'd have to run Puppet (on average) 20 times to make the problem appear every time I had a new hypothesis (and since I didn't know whether the bug was client-side or server-side, I had to run it in the real, full-blown environment, which slows things down further). If the class-handling stuff had been written properly, then the error would have been constant, rather than intermittent, and I would have been asking for a couple of hours credit, not a couple of days.

The fix? Use virtual resources instead of the class, which does $DEITY-only-knows-what to the scope tree. These have their own giant buckets of ugliness, but our options, at this point, are unpleasantly limited.

I really do wonder why I'm not a raging alcoholic sometimes. The reasons not to are strangely hard to recall right at this moment...


Post a comment

All comments are held for moderation; markdown formatting accepted.

This is a honeypot form. Do not use this form unless you want to get your IP address blacklisted. Use the second form below for comments.
Name: (required)
E-mail: (required, not published)
Website: (optional)
Name: (required)
E-mail: (required, not published)
Website: (optional)