How I Tripped Over the Debian Weak Keys Vulnerability

Posted: Tue, 9 April 2024 | permalink | 4 Comments

Those of you who haven’t been in IT for far, far too long might not know that next month will be the 16th(!) anniversary of the disclosure of what was, at the time, a fairly earth-shattering revelation: that for about 18 months, the Debian OpenSSL package was generating entirely predictable private keys.

The recent xz-stential threat (thanks to @nixCraft for making me aware of that one), has got me thinking about my own serendipitous interaction with a major vulnerability. Given that the statute of limitations has (probably) run out, I thought I’d share it as a tale of how “huh, that’s weird” can be a powerful threat-hunting tool – but only if you’ve got the time to keep pulling at the thread.

Prelude to an Adventure

Our story begins back in March 2008. I was working at Engine Yard (EY), a now largely-forgotten Rails-focused hosting company, which pioneered several advances in Rails application deployment. Probably EY’s greatest claim to lasting fame is that they helped launch a little code hosting platform you might have heard of, by providing them free infrastructure when they were little more than a glimmer in the Internet’s eye.

I am, of course, talking about everyone’s favourite Microsoft product: GitHub.

Since GitHub was in the right place, at the right time, with a compelling product offering, they quickly started to gain traction, and grow their userbase. With growth comes challenges, amongst them the one we’re focusing on today: SSH login times. Then, as now, GitHub provided SSH access to the git repos they hosted, by SSHing to git@github.com with publickey authentication. They were using the standard way that everyone manages SSH keys: the ~/.ssh/authorized_keys file, and that became a problem as the number of keys started to grow.

The way that SSH uses this file is that, when a user connects and asks for publickey authentication, SSH opens the ~/.ssh/authorized_keys file and scans all of the keys listed in it, looking for a key which matches the key that the user presented. This linear search is normally not a huge problem, because nobody in their right mind puts more than a few keys in their ~/.ssh/authorized_keys, right?

2008-era GitHub giving monkey puppet side-eye to the idea that nobody stores many keys in an authorized_keys file

Of course, as a popular, rapidly-growing service, GitHub was gaining users at a fair clip, to the point that the one big file that stored all the SSH keys was starting to visibly impact SSH login times. This problem was also not going to get any better by itself. Something Had To Be Done.

EY management was keen on making sure GitHub ran well, and so despite it not really being a hosting problem, they were willing to help fix this problem. For some reason, the late, great, Ezra Zygmuntowitz pointed GitHub in my direction, and let me take the time to really get into the problem with the GitHub team. After examining a variety of different possible solutions, we came to the conclusion that the least-worst option was to patch OpenSSH to lookup keys in a MySQL database, indexed on the key fingerprint.

We didn’t take this decision on a whim – it wasn’t a case of “yeah, sure, let’s just hack around with OpenSSH, what could possibly go wrong?”. We knew it was potentially catastrophic if things went sideways, so you can imagine how much worse the other options available were. Ensuring that this wouldn’t compromise security was a lot of the effort that went into the change. In the end, though, we rolled it out in early April, and lo! SSH logins were fast, and we were pretty sure we wouldn’t have to worry about this problem for a long time to come.

Normally, you’d think “patching OpenSSH to make mass SSH logins super fast” would be a good story on its own. But no, this is just the opening scene.

Chekov’s Gun Makes its Appearance

Fast forward a little under a month, to the first few days of May 2008. I get a message from one of the GitHub team, saying that somehow users were able to access other users’ repos over SSH. Naturally, as we’d recently rolled out the OpenSSH patch, which touched this very thing, the code I’d written was suspect number one, so I was called in to help.

The lineup scene from the movie The Usual Suspects — They're called The Usual Suspects for a reason, but sometimes, it really is Keyser Söze

Eventually, after more than a little debugging, we discovered that, somehow, there were two users with keys that had the same key fingerprint. This absolutely shouldn’t happen – it’s a bit like winning the lottery twice in a row¹ – unless the users had somehow shared their keys with each other, of course. Still, it was worth investigating, just in case it was a web application bug, so the GitHub team reached out to the users impacted, to try and figure out what was going on.

The users professed no knowledge of each other, neither admitted to publicising their key, and couldn’t offer any explanation as to how the other person could possibly have gotten their key.

Then things went from “weird” to “what the…?”. Because another pair of users showed up, sharing a key fingerprint – but it was a different shared key fingerprint. The odds now have gone from “winning the lottery multiple times in a row” to as close to “this literally cannot happen” as makes no difference.

Milhouse from The Simpsons says that We're Through The Looking Glass Here, People

Once we were really, really confident that the OpenSSH patch wasn’t the cause of the problem, my involvement in the problem basically ended. I wasn’t a GitHub employee, and EY had plenty of other customers who needed my help, so I wasn’t able to stay deeply involved in the on-going investigation of The Mystery of the Duplicate Keys.

However, the GitHub team did keep talking to the users involved, and managed to determine the only apparent common factor was that all the users claimed to be using Debian or Ubuntu systems, which was where their SSH keys would have been generated.

That was as far as the investigation had really gotten, when along came May 13, 2008.

Chekov’s Gun Goes Off

With the publication of DSA-1571-1, everything suddenly became clear. Through a well-meaning but ultimately disasterous cleanup of OpenSSL’s randomness generation code, the Debian maintainer had inadvertently reduced the number of possible keys that could be generated by a given user from “bazillions” to a little over 32,000. With so many people signing up to GitHub – some of them no doubt following best practice and freshly generating a separate key – it’s unsurprising that some collisions occurred.

You can imagine the sense of “oooooooh, so that’s what’s going on!” that rippled out once the issue was understood. I was mostly glad that we had conclusive evidence that my OpenSSH patch wasn’t at fault, little knowing how much more contact I was to have with Debian weak keys in the future, running a huge store of known-compromised keys and using them to find misbehaving Certificate Authorities, amongst other things.

Lessons Learned

While I’ve not found a description of exactly when and how Luciano Bello discovered the vulnerability that became CVE-2008-0166, I presume he first came across it some time before it was disclosed – likely before GitHub tripped over it. The stable Debian release that included the vulnerable code had been released a year earlier, so there was plenty of time for Luciano to have discovered key collisions and go “hmm, I wonder what’s going on here?”, then keep digging until the solution presented itself.

The thought “hmm, that’s odd”, followed by intense investigation, leading to the discovery of a major flaw is also what ultimately brought down the recent XZ backdoor. The critical part of that sequence is the ability to do that intense investigation, though.

When I reflect on my brush with the Debian weak keys vulnerability, what sticks out to me is the fact that I didn’t do the deep investigation. I wonder if Luciano hadn’t found it, how long it might have been before it was found. The GitHub team would have continued investigating, presumably, and perhaps they (or I) would have eventually dug deep enough to find it. But we were all super busy – myself, working support tickets at EY, and GitHub feverishly building features and fighting the fires in their rapidly-growing service.

As it was, Luciano was able to take the time to dig in and find out what was happening, but just like the XZ backdoor, I feel like we, as an industry, got a bit lucky that someone with the skills, time, and energy was on hand at the right time to make a huge difference.

It’s a luxury to be able to take the time to really dig into a problem, and it’s a luxury that most of us rarely have. Perhaps an understated takeaway is that somehow we all need to wrestle back some time to follow our hunches and really dig into the things that make us go “hmm…”.

Support My Hunches

If you’d like to help me be able to do intense investigations of mysterious software phenomena, you can shout me a refreshing beverage on ko-fi.

the odds are actually probably more like winning the lottery about twenty times in a row. The numbers involved are staggeringly huge, so it’s easiest to just approximate it as “really, really unlikely”. ↩

4 Comments

From: Sniffnoy
2024-04-09 18:16

Hey, the anchor link in this post has a problem – for some reason it’s sending me to “https://www.hezmatt.org/~mpalmer/blog/#fn:1” instead of just linking to the anchor on whatever page I’m on.

From: Matt Palmer
2024-04-09 18:59

Hi Sniffnoy, thanks for letting me know. Looks like I messed up the tag at some point, and that was breaking relative links like the footnote link you noted. Hopefully it’s looking better now.

From: Vitaly Potyarkin
2024-04-10 20:05

Thanks for the writeup!

It has scared me into setting up a regular search for duplicates among ssh-keygen outputs: https://github.com/sio/weak-ssh-keygen (compute resources courtesy of GitHub)

From: david
2024-04-12 19:18

It’s interresting. That’s exactly how I came to issue my very first CVE on spring libarry for a DOS. A small connection leak in one of our software that we just couldn’t understand. Not having time to investigate, we restarted it on a cron basis. One year later, when it became clear restart were more and more frequent, we invested time on analysis. A 2 days deep analysis of logs showed where leak came from and how to reproduce. It quickly became clear that anyone could crash DOS our service in a few seconds without requiring authentication or a distributed attack with a few lines of javascript.

Comments on this post are closed.

Brane Dump