Keep your gems clean!

Posted: Sat, 11 January 2014 | permalink | No comments

Ruby has a reputation for being slow. But there’s slow, and then there’s “three seconds to show the help for a fairly command line program”. That’s not slow, that’s ridiculous. The gem cleanup command is the solution.

Rubygems has a nice feature whereby you can have multiple versions of a gem installed at once. That’s neat, because it allows programs with different gem version requirements to co-exist on the system. Unfortunately, over time, you can end up with many versions of a gem installed, as upgrades pull in ever-newer versions of all your gems.

Couple this accumulation of cruft with a need for Rubygems to look at every one of them on every startup, and you can see how you could very quickly end up with a three second startup time.

By running gem cleanup, you’ll have the opportunity to nuke all the out-of-date versions of your gems. It takes into account dependencies, so it will at least ask you before nuking a gem that another gem absolutely depends on (I’d prefer a command-line switch to say, “Don’t even think about uninstalling something that another gem depends on!”, but you can’t have everything). If you’ve got non-gem things on your system that rely on out-of-date gems (anything using bundler, for instance), those will assplode the next time you run them, but you can always re-run bundle install to get just the specific versions of gems back you need.

The end result of my little spring cleaning? Down from three seconds to three quarters of a second. A 75% improvement. Win!

I am officially smarter than the Internet

Posted: Fri, 20 December 2013 | permalink | 3 Comments

Yes, the title is just a scootch self-aggrandising, but I’m rather chuffed with myself at the moment, so please forgive me.

It all started with my phone (a regular Samsung Galaxy S3) suddenly refusing to boot, stuck at the initial splash screen (“Samsung Galaxy SIII GT-I9300”). After turning it off and on again a few times (I know my basic problem-solving strategies) and clearing the cache, I decided to start looking deeper. In contrast to pretty much every other Android debugging experience ever, I almost immediately found a useful error message in the recovery system:

E:Failed to mount /efs (Invalid Argument)

“Excellent!”, thought I. “An error message. Google will tell me how to fix this!”

Nope. The combined wisdom of the Internet, distilled from a great many poorly-spelled forum posts, unhelpful blog posts, and thoroughly pointless articles, was simple: “You’re screwed. Send it back for service.”

I tried that. Suffice it to say that I will never, ever buy anything from Kogan ever again. I have learnt my lesson. Trying to deal with their support people was an exercise in frustration, and ultimately fruitless.

In the end, I decided I’d have some fun trying to fix it myself – after all, it’s a failure at the base Linux level. I know a thing or two about troubleshooting Linux, if I do say so myself. If I really couldn’t fix it, I’d just go buy a new phone.

It turned out be relatively simple. Here’s the condensed version of my notes, in case someone wants to follow in my footsteps. If you’d like expansion, feel free to e-mail me. Note that these instructions are specifically for my Galaxy S3 (GT-I9300), but should work with some degree of adaptation on pretty much any Android phone, as far as I can determine, within the limits of the phone’s willingness to flash a custom recovery.

  1. Using heimdall, flash the TeamWin recovery onto your phone (drop into “download mode” first – hold VolDown+Home+Power):

    heimdall flash --recovery twrp.img
  2. Boot into recovery (VolUp+Home+Power), select “Advanced -> Terminal”, and take an image of the EFS partition onto the external SD card you should have already in the phone:

    dd if=/dev/block/mmcblk0p3 of=/external_sd/efs.img
  3. Shutdown the phone, mount the SD card on your computer, then turn your EFS partition image into a loopback device and fsck it:

    sudo losetup -f .../efs.img
    sudo fsck -f /dev/loop0

    With a bit of luck, the partition won’t be a complete write-off and you’ll be able to salvage the contents of the files, if not the exact filesystem structure.

    Incidentally, if the filesystem was completely stuffed, you could get someone else’s EFS partition and change the IMEI and MAC addresses and you’d probably be golden, but that would quite possibly be illegal or something, so don’t do that.

  4. Now comes the fun part – putting the filesystem back together. After fscking, mount the image somewhere on your computer:

    mount /dev/loop0 /mnt

    In my case, I had about a dozen files living in lost+found, and I figured that wasn’t a positive outcome. I did try, just in case, writing the fsck’d filesystem image back to the phone, in the hope that it just needed to mount the filesystem to boot, but no dice.

    Instead, I had to find out where these lost soul^Wfiles were supposed to live. Luckily, a colleague of mine also has an S3 (the ever-so-slightly-different GT-I9300T), and he was kind enough to let me take a copy of his EFS partition, and use that as a file location template. Using a combination of file sizes, permissions/ownerships, and inode numbers (I knew the -i option to ls would come in handy someday!), I was able to put all the lost files back where they should be.

  5. Unmount all those EFS filesystems, losetup -d /dev/loop0, and put the fixed up EFS partition image back onto your SD card for the return trip to the phone.

  6. Now, with a filesystem image that looks reasonable, it’s time to write it back onto the phone and see what happens. Copy it onto the SD card, boot up into recovery again, get a shell, and a bit more dd:

    dd if=/external_sd/efs.img of=/dev/block/mmcblk0p3
  7. With a bit of luck, your phone may just boot back up now. In my case, I’d done so many other things to my phone trying to get it back up and running (including flashing custom ROMs and what have you) that I needed to flash Cyanogen, boot it, and wait at the boot screen for about 15 minutes (I shit you not, 15 minutes of “Gah is my phone going to work?!?”) before it came up and lo! I had a working phone again. And about 27 SMSes. Sigh, back to work…

So, yeah, neener-neener to the collected wisdom of the ‘tubes. I fixed my EFS partition, and in the great, grand scheme of things, it wasn’t even all that difficult. For any phone which (a) allows you to flash a custom recovery and (b) you can find another of the same model to play with, EFS corruption doesn’t necessarily mean a fight with tech support.

Incidentally, if you happen to have an S3 exhibiting this problem, but you’re not comfortable fiddling with it, I’m happy to put your EFS back together again if you pay shipping both ways. It’s about a 5 minute job now I know how to do it. E-mail me.

Truly, nothing is safe

Posted: Thu, 19 December 2013 | permalink | 1 Comment

Quoted from a recent Debian Security Advisory:

Genkin, Shamir and Tromer discovered that RSA key material could be extracted by using the sound generated by the computer during the decryption of some chosen ciphertexts.

Side channel attacks are the ones that terrify me the most. You can cryptanalyse the algorithm and audit the implementation as much as you like, and then still disclose key material because your computer makes noise.

So you think your test suite is comprehensive?

Posted: Sun, 15 December 2013 | permalink | No comments

Compare and contrast your practices with those of the SQLite development team, who go so far as to run every test with versions of malloc(2) and I/O syscalls which fail, as well as special VFS layers which reorder and drop writes.

I think this sentence sums it all up:

By comparison, the project has 1084 times as much test code and test scripts – 91452.5 KSLOC.

One thousand times as much test code as production code. As Q3A says, “Impressive”.

The easy bit of software development

Posted: Thu, 5 December 2013 | permalink | No comments

I’m sure this isn’t an original thought of mine, but it just popped into my head and I think it’s something of a “fundamental truth” that all software developers need to keep in mind:

Writing software is easy. The hard part is writing software that works.

All too often, we get so caught up in the rush of building something that we forget that it has to work – and, all too often, we fail in some fundamental fashion, whether it’s “doesn’t satisfy the user’s needs” or “you just broke my $FEATURE!” (which is the context I was thinking of).

The Shoe is on the Other Foot

Posted: Tue, 26 November 2013 | permalink | No comments

I suppose nobody at Microsoft remembers what it was like when everyone was taking potshots at them, so they’ve decided it’s fair game to take a cheap shot at the new Evil Empire. That being said, I wouldn’t say no to one of those “Keep Calm” coffee mugs…

Timezones are not optional information

Posted: Mon, 11 November 2013 | permalink | 9 Comments

In high school, I had a science teacher who would mark your answers wrong if you forgot to include units. “What does that mean?”, he would write, “17.2 elephants?” The point he was trying to get across was that a bare number, without the relevant units, wasn’t precise enough to be useful. Also, carrying the units with you helped to cross-check your work – if you got a numeric answer, but the units were garbage (seconds per kilogram, for instance, when you were trying to find an acceleration), then you could be pretty sure you’d made a mistake somewhere.

Fast forward to today, and I’m currently working with a database containing a pile of timestamps. Without timezones. Where the timestamps being inserted are in local time. Thankfully, so far, this particular system has always been run in one timezone, so I’ve only got one timezone to deal with, but the potential ramifications of systems with different timezones inserting data into this database terrify me.

The naive answer is to just store everything in UTC and be done with it. I’m not particularly averse to that solution, as long as it’s very clear to everyone what’s going on. The correct answer, though, I think, is to always keep timezone information with your timestamps – otherwise, you’ll never know whether it’s 0830 in elephants…

How to deal with the "package { gcc: }" problem in Puppet

Posted: Wed, 30 October 2013 | permalink | 4 Comments

In a comment to yesterday’s post on why your Puppet module sucks, Warren asks what can be done about the problem of multiple modules needing to include the same package (gcc being the most common example I’m yet to come across).

As I stated previously, there is no sane, standardised way for multiple independent modules to cooperate to ensure that certain packages are installed for their cooperative use. If it wasn’t already obvious, I consider this to be a significant failing in Puppet, and one which essentially renders any attempt at public, wide-scale reuse of Puppet modules impossible. Software packages are such a fundamental part of system management today that it is rare for a module not to want to interact with them in some way.

Without strong leadership from Puppet Labs, or someone core to the project who’s willing to be very noisy and obnoxious on the subject, the issue is fundamentally unsolveable, because it requires either deep (and non-backwards-compatible) changes to how Puppet works at a core level, or it needs cooperation from everyone who writes Puppet modules for public consumption (to use a single, coordinated approach to common packages).

The approaches that I’ve seen people advocate using, or that I’ve considered myself, roughly fall into the following camps.

Put your package into a class in your module

Absofrickinglutely useless, because the elementary package resources you end up creating will still conflict with other modules’ package resources. Anyone who suggests this can be permanently ignored, as they don’t understand the problem being solved.

Use a “globally common” module of classes representing common packages

This could work, if you could get everyone who writes modules to use it (see “someone willing to be very noisy and obnoxious”). Practically speaking, anyone who suggests this doesn’t understand human nature. I don’t see this actually happening any time soon.

Use the defined() function everywhere

By wrapping all your package resources in if !defined(Package["foo"]) blocks, you can at least stop your module from causing explosions. What it doesn’t do is make sure that the various definitions of the package resource are congruent (imagine if one was package { "gcc": ensure => absent }…). In order to safely avoid explosion, everyone would have to follow this approach, which (in the worst case) reduces the problem to the “globally common” module of classes.

However, it’s the least-worst approach I can practically consider. At least your module won’t be the cause of explosions, and realistically no module’s going to ask for gcc to be removed (or, if they are, you’ll quickly find them and kill them).

I’ve seen calls from Puppet core dev team members to deprecate and remove the defined() function entirely. Given the complete lack of alternatives, this is a disturbing illustration of just how out of touch they are from the unfortunate realities of practical Puppetry.

Use virtual resources

This is yet another variant of the “common class” technique that would require everyone, everywhere, to dance to the same tune. It has all the same problems, and hence gets a big thumbs-down from me.

Use singleton_packages

This module is a valiant attempt to work around the problem, but practically speaking is no better than using defined() – because again, if a module that isn’t using singleton_packages specifies package { "gcc": ensure => absent }, you’re going to end up with assplosion. The mandatory dependency on hiera doesn’t win me over, either (I’m a hiera-skeptic, for reasons which I might go into some other time).

Use only modules from a single source

This is the solution that I use myself, if that’s any recommendation…

As a result of pretty much every publically-available Puppet module sucking for one reason or another, I do not, currently, have any externally-written Puppet modules in the trees that I use. Every module (158, at current count) has been written by someone in the sysadmin team at $DAYJOB, to our common standards. This means that I can coordinate use of package resources, using our module of common classes where appropriate, or refactor modules to separate concerns appropriately.

If you’re wondering why we don’t have all of these 158 modules available publically, well, we have a few of them up, but yes, the vast majority of them aren’t publically available. Some of them suck mightily, while many others are just too intertwined with other modules to be able to use on their own, and we don’t want to release crap code – there’s far too much of that out there already.

Why Your Puppet Module Sucks

Posted: Tue, 29 October 2013 | permalink | 6 Comments

I use Puppet a lot at work, and I use a lot of modules written by other people, as well as writing quite a number of my own. Here’s a brief list of reasons why I might say that your module sucks.

1. You use global variables

This would have to be both the most common idiom that just makes my teeth grind. Defined types exist for a bloody reason. Global variables make it incredibly difficult to reason about what is going to happen when I use a particular global variable (see also “You don’t write documentation”), and I get no feedback if I typo a global variable name. Add in a healthy heaping of “lack of namespacing”, and you’ve basically guaranteed that your module will be loudly cursed.

2. You use parameterised classes

I’ve never managed to work out why parameterised classes even exist. They’ve got all the problems of regular classes, as well as all the problems of types. There was a fantastic opportunity with parameterised classes to fix some of the really annoying things about regular resources, such as doing conflict checking on parameters and only barfing if there was a conflict… but no, if you define the same class twice, even with identical parameters, Puppet will smack you on the hand and take away your biscuit. FFFFFFUUUUUUUUUUUU-

3. You fail at using fail()

Modules are supposed to be reusable. That’s their whole reason for existence. One of the benefits of Puppet is that you can provide an OS-independent interface for configuring something. However, unless you’re some absolute God of cross-platform compatibility, you will only be writing your module to support those OSes or distros that you, personally, care about.

That’s cool – if you don’t know anything about OS X, you probably shouldn’t be guessing at how to support something on that platform anyway. However, when you do have some sort of platform-specific code, for the love of pete, have a default or else clause that calls fail() with a useful and descriptive error message, to indicate clearly that you haven’t included support for whatever environment the module’s being used in. Failing to do this can cause some really spectacular explosions, because the rest of your module assumes that certain things have been done in the platform-specific code, and when it hasn’t… hoo boy.

4. You don’t write documentation

Yes, it isn’t easy to write good documentation. I know that. But without documentation, your module is going to be practically unuseable. So if you don’t have docs, your module is basically useless. Well done, you.

5. You have undeclared dependencies

This also includes declaring your dependencies in a non-machine-parseable manner; I’m not going to grovel through your README for the list of other modules I might need; I have machines to do that kind of thing for me.

If I try to use your module, and it craps out on trying to reference some sort of type or class that doesn’t appear at all related to your module, I will call into question your ancestry, and die a little more inside.

6. You use common packages without even trying to avoid the pitfalls

OK, to be fair this is, ultimately, Puppet’s grand fuckup, not yours, but you at least need to pretend to care…

There is no sane, standardised way in Puppet for multiple modules to install the same package. Let’s say that a module I write needs to have a compiler. So I package { "gcc": }. Then someone else’s module also wants a compiler, so it also package { "gcc": }. FWAKOOM! says Puppet. “What the fuck?” says the poor sysadmin, who just wanted both a virtualenv and an RVM environment on the one machine.

Basically, using packages is going to make your module suck. Does that mean that this makes wide distribution of a large repository of modules written by different people nigh-on impossible? Yes. Fantastic.

RACK_ENV: It's not *for* you

Posted: Sun, 13 October 2013 | permalink | 1 Comment

(With apologies to Penny Arcade)

I’m probably the last person on earth to realise this, but the RACK_ENV environment variable (and the -E option to rackup) isn’t intended for consumption by anything other than Rack itself. If you want to indicate what sort of environment your application should run in, you’re going to have to do it via some other means.

Why is this? Because the interpretation that Rack applies to the value of RACK_ENV that you set makes no sense whatsoever outside of Rack. Valid options are “development”, “deployment”, and “none”. If you follow the usual Rails convention of naming your environments “development”, “test”, and “production” (and maybe “staging” if you’re feeling adventurous), then in any environment other than “development”, you’re not going to be telling Rack anything it understands.

As I said, I may be the last person on earth to have worked this out, but I doubt it. There are plenty of bug reports and “WTF?” blog posts and Stack Overflow questions that appear to stem from people misunderstanding the purpose of RACK_ENV. Sadly, the Rack documentation is very quiet on the whole topic, and the only place that mentions how the environment is interpreted is in the comments for – and that doesn’t tie that environment to the -E option or RACK_ENV environment variable.

At any rate, the take away is simple: unless you want Rack to pre-configure a bundle of middleware for you, RACK_ENV or rackup -E is not the configuration variable you’re looking for. Use something else to tell your app how it’s supposed to work.