Your Release Process Sucks

Posted: Sat, 23 November 2024 | permalink | No comments

For the past decade-plus, every piece of software I write has had one of two release processes.

Software that gets deployed directly onto servers (websites, mostly, but also the infrastructure that runs Pwnedkeys, for example) is deployed with nothing more than git push prod main. I’ll talk more about that some other day.

Today is about the release process for everything else I maintain – Rust / Ruby libraries, standalone programs, and so forth. To release those, I use the following, extremely intricate process:

Create an annotated git tag, where the name of the tag is the software version I’m releasing, and the annotation is the release notes for that version.
Run git release in the repository.
There is no step 3.

Yes, it absolutely is that simple. And if your release process is any more complicated than that, then you are suffering unnecessarily.

But don’t worry. I’m from the Internet, and I’m here to help.

The annotated tag is one git’s best-kept secrets. They’ve been available in git for practically forever (I’ve been using them since at least 2014, which is “practically forever” in software development), yet almost everyone I mention them to has never heard of them.

A “tag”, in git parlance, is a repository-unique named label that points to a single commit (as identified by the commit’s SHA1 hash). Annotating a tag is simply associating a block of free-form text with that tag.

Creating an annotated tag is simple-sauce: git tag -a tagname will open up an editor window where you can enter your annotation, and git tag -a -m "some annotation" tagname will create the tag with the annotation “some annotation”. Retrieving the annotation for a tag is straightforward, too: git show tagname will display the annotation along with all the other tag-related information.

Now that we know all about annotated tags, let’s talk about how to use them to make software releases freaking awesome.

Step 1: Create the Annotated Git Tag

As I just mentioned, creating an annotated git tag is pretty simple: just add a -a (or --annotate, if you enjoy typing) to your git tag command, and WHAM! annotation achieved.

Releases, though, typically have unique and ever-increasing version numbers, which we want to encode in the tag name. Rather than having to look at the existing tags and figure out the next version number ourselves, we can have software do the hard work for us.

Enter: git-version-bump. This straightforward program takes one mandatory argument: major, minor, or patch, and bumps the corresponding version number component in line with Semantic Versioning principles. If you pass it -n, it opens an editor for you to enter the release notes, and when you save out, the tag is automagically created with the appropriate name.

Because the program is called git-version-bump, you can call it as a git command: git version-bump. Also, because version-bump is long and unwieldy, I have it aliased to vb, with the following entry in my ~/.gitconfig:

[alias]
    vb = version-bump -n

Of course, you don’t have to use git-version-bump if you don’t want to (although why wouldn’t you?). The important thing is that the only step you take to go from “here is our current codebase in main” to “everything as of this commit is version X.Y.Z of this software”, is the creation of an annotated tag that records the version number being released, and the metadata that goes along with that release.

Step 2: Run `git release`

As I said earlier, I’ve been using this release process for over a decade now. So long, in fact, that when I started, GitHub Actions didn’t exist, and so a lot of the things you’d delegate to a CI runner these days had to be done locally, or in a more ad-hoc manner on a server somewhere.

This is why step 2 in the release process is “run git release”. It’s because historically, you can’t do everything in a CI run. Nowadays, most of my repositories have this in the .git/config:

[alias]
    release = push --tags

Older repositories which, for one reason or another, haven’t been updated to the new hawtness, have various other aliases defined, which run more specialised scripts (usually just rake release, for Ruby libraries), but they’re slowly dying out.

The reason why I still have this alias, though, is that it standardises the release process. Whether it’s a Ruby gem, a Rust crate, a bunch of protobuf definitions, or whatever else, I run the same command to trigger a release going out. It means I don’t have to think about how I do it for this project, because every project does it exactly the same way.

The Wiring Behind the Button

It wasn’t the button that was the problem. It was the miles of wiring, the hundreds of miles of cables, the circuits, the relays, the machinery. The engine was a massive, sprawling, complex, mind-bending nightmare of levers and dials and buttons and switches. You couldn’t just slap a button on the wall and expect it to work. But there should be a button. A big, fat button that you could press and everything would be fine again. Just press it, and everything would be back to normal.

Red Dwarf: Better Than Life

Once you’ve accepted that your release process should be as simple as creating an annotated tag and running one command, you do need to consider what happens afterwards. These days, with the near-universal availability of CI runners that can do anything you need in an isolated, reproducible environment, the work required to go from “annotated tag” to “release artifacts” can be scripted up and left to do its thing.

What that looks like, of course, will probably vary greatly depending on what you’re releasing. I can’t really give universally-applicable guidance, since I don’t know your situation. All I can do is provide some of my open source work as inspirational examples.

For starters, let’s look at a simple Rust crate I’ve written, called strong-box. It’s a straightforward crate, that provides ergonomic and secure cryptographic functionality inspired by the likes of NaCl. As it’s just a crate, its release script is very straightforward. Most of the complexity is working around Cargo’s inelegant mandate that crate version numbers are specified in a TOML file. Apart from that, it’s just a matter of building and uploading the crate. Easy!

Slightly more complicated is action-validator. This is a Rust CLI tool which validates GitHub Actions and Workflows (how very meta) against a published JSON schema, to make sure you haven’t got any syntax or structural errors. As not everyone has a Rust toolchain on their local box, the release process helpfully build binaries for several common OSes and CPU architectures that people can download if they choose. The release process in this case is somewhat larger, but not particularly complicated. Almost half of it is actually scaffolding to build an experimental WASM/NPM build of the code, because someone seemed rather keen on that.

Moving away from Rust, and stepping up the meta another notch, we can take a look at the release process for git-version-bump itself, my Ruby library and associated CLI tool which started me down the “Just Tag It Already” rabbit hole many years ago. In this case, since gemspecs are very amenable to programmatic definition, the release process is practically trivial. Remove the boilerplate and workarounds for GitHub Actions bugs, and you’re left with about three lines of actual commands.

These approaches can certainly scale to larger, more complicated processes. I’ve recently implemented annotated-tag-based releases in a proprietary software product, that produces Debian/Ubuntu, RedHat, and Windows packages, as well as Docker images, and it takes all of the information it needs from the annotated tag. I’m confident that this approach will successfully serve them as they expand out to build AMIs, GCP machine images, and whatever else they need in their release processes in the future.

Objection, Your Honour!

I can hear the howl of the “but, actuallys” coming over the horizon even as I type. People have a lot of Big Feelings about why this release process won’t work for them. Rather than overload this article with them, I’ve created a companion article that enumerates the objections I’ve come across, and answers them. I’m also available for consulting if you’d like a personalised, professional opinion on your specific circumstances.

DVD Bonus Feature: Pre-releases

Unless you’re addicted to surprises, it’s good to get early feedback about new features and bugfixes before they make it into an official, general-purpose release. For this, you can’t go past the pre-release.

The major blocker to widespread use of pre-releases is that cutting a release is usually a pain in the behind. If you’ve got to edit changelogs, and modify version numbers in a dozen places, then you’re entirely justified in thinking that cutting a pre-release for a customer to test that bugfix that only occurs in their environment is too much of a hassle.

The thing is, once you’ve got releases building from annotated tags, making pre-releases on every push to main becomes practically trivial. This is mostly due to another fantastic and underused Git command: git describe.

How git describe works is, basically, that it finds the most recent commit that has an associated annotated tag, and then generates a string that contains that tag’s name, plus the number of commits between that tag and the current commit, with the current commit’s hash included, as a bonus. That is, imagine that three commits ago, you created an annotated release tag named v4.2.0. If you run git describe now, it will print out v4.2.0-3-g04f5a6f (assuming that the current commit’s SHA starts with 04f5a6f).

You might be starting to see where this is going. With a bit of light massaging (essentially, removing the leading v and replacing the -s with .s), that string can be converted into a version number which, in most sane environments, is considered “newer” than the official 4.2.0 release, but will be superceded by the next actual release (say, 4.2.1 or 4.3.0). If you’re already injecting version numbers into the release build process, injecting a slightly different version number is no work at all.

Then, you can easily build release artifacts for every commit to main, and make them available somewhere they won’t get in the way of the “official” releases. For example, in the proprietary product I mentioned previously, this involves uploading the Debian packages to a separate component (prerelease instead of main), so that users that want to opt-in to the prerelease channel simply modify their sources.list to change main to prerelease. Management have been extremely pleased with the easy availability of pre-release packages; they’ve been gleefully installing them willy-nilly for testing purposes since I rolled them out.

In fact, even while I’ve been writing this article, I was asked to add some debug logging to help track down a particularly pernicious bug. I added the few lines of code, committed, pushed, and went back to writing. A few minutes later (next week’s job is to cut that in-process time by at least half), the person who asked for the extra logging ran apt update; apt upgrade, which installed the newly-built package, and was able to progress in their debugging adventure.

Continuous Delivery: It’s Not Just For Hipsters.

“+1, Informative”

Hopefully, this has spurred you to commit your immortal soul to the Church of the Annotated Tag. You may tithe by buying me a refreshing beverage. Alternately, if you’re really keen to adopt more streamlined release management processes, I’m available for consulting engagements.

Invalid Excuses for Why Your Release Process Sucks

Posted: Sat, 23 November 2024 | permalink | No comments

In my companion article, I made the bold claim that your release process should consist of no more than two steps:

Create an annotated Git tag;
Run a single command to trigger the release pipeline.

As I have been on the Internet for more than five minutes, I’m aware that a great many people will have a great many objections to this simple and straightforward idea. In the interests of saving them a lot of wear and tear on their keyboards, I present this list of common reasons why these objections are invalid.

If you have an objection I don’t cover here, the comment box is down the bottom of the article. If you think you’ve got a real stumper, I’m available for consulting engagements, and if you turn out to have a release process which cannot feasibly be reduced to the above two steps for legitimate technical reasons, I’ll waive my fees.

“But I automatically generate my release notes from commit messages!”

This one is really easy to solve: have the release note generation tool feed directly into the annotation. Boom! Headshot.

“But all these files need to be edited to make a release!”

No, they absolutely don’t. But I can see why you might think you do, given how inflexible some packaging environments can seem, and since “that’s how we’ve always done it”.

Language Packages

Most languages require you to encode the version of the library or binary in a file that you want to revision control. This is teh suck, but I’m yet to encounter a situation that can’t be worked around some way or another.

In Ruby, for instance, gemspec files are actually executable Ruby code, so I call code (that’s part of git-version-bump, as an aside) to calculate the version number from the git tags. The Rust build tool, Cargo, uses a TOML file, which isn’t as easy, but a small amount of release automation is used to take care of that.

Distribution Packages

If you’re building Linux distribution packages, you can easily apply similar automation faffery. For example, Debian packages take their metadata from the debian/changelog file in the build directory. Don’t keep that file in revision control, though: build it at release time. Everything you need to construct a Debian (or RPM) changelog is in the tag – version numbers, dates, times, authors, release notes. Use it for much good.

The Dreaded Changelog

Finally, there’s the CHANGELOG file. If it’s maintained during the development process, it typically has an archive of all the release notes, under version numbers, with an “Unreleased” heading at the top. It’s one more place to remember to have to edit when making that “preparing release X.Y.Z” commit, and it is a gift to the Demon of Spurious Merge Conflicts if you follow the policy of “every commit must add a changelog entry”.

My solution: just burn it to the ground. Add a line to the top with a link to wherever the contents of annotated tags get published (such as GitHub Releases, if that’s your bag) and never open it ever again.

“But I need to know other things about my release, too!”

For some reason, you might think you need some other metadata about your releases. You’re probably wrong – it’s amazing how much information you can obtain or derive from the humble tag – so think creatively about your situation before you start making unnecessary complexity for yourself.

But, on the off chance you’re in a situation that legitimately needs some extra release-related information, here’s the secret: structured annotation. The annotation on a tag can be literally any sequence of octets you like. How that data is interpreted is up to you.

So, require that annotations on release tags use some sort of structured data format (say YAML or TOML – or even XML if you hate your release manager), and mandate that it contain whatever information you need. You can make sure that the annotation has a valid structure and contains all the information you need with an update hook, which can reject the tag push if it doesn’t meet the requirements, and you’re sorted.

“But I have multiple packages in my repo, with different release cadences and versions!”

This one is common enough that I just refer to it as “the monorepo drama”. Personally, I’m not a huge fan of monorepos, but you do you, boo. Annotated tags can still handle it just fine.

The trick is to include the package name being released in the tag name. So rather than a release tag being named vX.Y.Z, you use foo/vX.Y.Z, bar/vX.Y.Z, and baz/vX.Y.Z. The release automation for each package just triggers on tags that match the pattern for that particular package, and limits itself to those tags when figuring out what the version number is.

“But we don’t semver our releases!”

Oh, that’s easy. The tag pattern that marks a release doesn’t have to be vX.Y.Z. It can be anything you want.

Relatedly, there is a (rare, but existent) need for packages that don’t really have a conception of “releases” in the traditional sense. The example I’ve hit most often is automatically generated “bindings” packages, such as protobuf definitions. The source of truth for these is a bunch of .proto files, but to be useful, they need to be packaged into code for the various language(s) you’re using. But those packages need versions, and while someone could manually make releases, the best option is to build new per-language packages automatically every time any of those definitions change.

The versions of those packages, then, can be datestamps (I like something like YYYY.MM.DD.N, where N starts at 0 each day and increments if there are multiple releases in a single day).

This process allows all the code that needs the definitions to declare the minimum version of the definitions that it relies on, and everything is kept in sync and tracked almost like magic.

Th-th-th-th-that’s all, folks!

I hope you’ve enjoyed this bit of mild debunking. Show your gratitude by buying me a refreshing beverage, or purchase my professional expertise and I’ll answer all of your questions and write all your CI jobs.

Health Industry Company Sues to Prevent Certificate Revocation

Posted: Wed, 31 July 2024 | permalink | 2 Comments

It’s not often that a company is willing to make a sworn statement to a court about how its IT practices are incompatible with the needs of the Internet, but when they do… it’s popcorn time.

The Combatants

In the red corner, weighing in at… nah, I’m not going to do that schtick.

The plaintiff in the case is Alegeus Technologies, LLC, a Delaware Corporation that, according to their filings, “is a leading provider of a business-tobusiness, white-label funding and payment platform for healthcare carriers and third-party administrators to administer consumer-directed employee benefit programs”. Not being subject to the US’ bonkers health care system, I have only a passing familiarity with the sorts of things they do, but presumably it involves moving a lot of money around, which is sometimes important.

The defendant is DigiCert, a CA which, based on analysis I’ve done previously, is the second-largest issuer of WebPKI certificates by volume.

The History

According to a recently opened Mozilla CA bug, DigiCert found an issue in their “domain control validation” workflow, that meant it may have been possible for a miscreant to have certificates issued to them that they weren’t legitimately entitled to. Given that validating domain names is basically the “YOU HAD ONE JOB!” of a CA, this is a big deal.

The CA/Browser Forum Baseline Requirements (BRs) (which all CAs are required to adhere to, by virtue of their being included in various browser and OS trust stores), say that revocation is required within 24 hours when “[t]he CA obtains evidence that the validation of domain authorization or control for any Fully‐Qualified Domain Name or IP address in the Certificate should not be relied upon” (section 4.9.1.1, point 5).

DigiCert appears to have at least tried to do the right thing, by opening the above Mozilla bug giving some details of the problem, and notifying their customers that their certificates were going to be revoked. One may quibble about how fast they’re doing it, but they’re giving it a decent shot, at least.

A complicating factor in all this is that, only a touch over a month ago, Google Chrome announced the removal of another CA, Entrust, from its own trust store program, citing “a pattern of compliance failures, unmet improvement commitments, and the absence of tangible, measurable progress in response to publicly disclosed incident reports”. Many of these compliance failures were failures to revoke certificates in a timely manner. One imagines that DigiCert would not like to gain a reputation for tardy revocation, particularly at the moment.

The Legal Action

Now we come to Alegeus Technologies. They’ve opened a civil case whose first action is to request the issuance of a Temporary Restraining Order (TRO) that prevents DigiCert from revoking certificates issued to Alegeus (which the court has issued). This is a big deal, because TROs are legal instruments that, if not obeyed, constitute contempt of court (or something similar) – and courts do not like people who disregard their instructions. That means that, in the short term, those certificates aren’t getting revoked, despite the requirement imposed by root stores on DigiCert that the certificates must be revoked. DigiCert is in a real “rock / hard place” situation here: revoke and get punished by the courts, or don’t revoke and potentially (though almost certainly not, in the circumstances) face removal from trust stores (which would kill, or at least massively hurt, their business).

The reasons that Alegeus gives for requesting the restraining order is that “[t]o Reissue and Reinstall the Security Certificates, Alegeus must work with and coordinate with its Clients, who are required to take steps to rectify the certificates. Alegeus has hundreds of such Clients. Alegeus is generally required by contract to give its clients much longer than 24 hours’ notice before executing such a change regarding certification.”

In the filing, Alegeus does acknowledge that “DigiCert is a voluntary member of the Certification Authority Browser Forum (CABF), which has bylaws stating that certificates with an issue in their domain validation must be revoked within 24 hours.” This is a misstatement of the facts, though. It is the BRs, not the CABF bylaws, that require revocation, and the BRs apply to all CAs that wish to be included in browser and OS trust stores, not just those that are members of the CABF. In any event, given that Alegeus was aware that DigiCert is required to revoke certificates within 24 hours, one wonders why Alegeus went ahead and signed agreements with their customers that required a lengthy notice period before changing certificates.

What complicates the situation is that there is apparently a Master Services Agreement (MSA) that states that it “constitutes the entire agreement between the parties” – and that MSA doesn’t mention certificate revocation anywhere relevant. That means that it’s not quite so cut-and-dried that DigiCert does, in fact, have the right to revoke those certificates. I’d expect a lot of “update to your Master Services Agreement” emails to be going out from DigiCert (and other CAs) in the near future to clarify this point.

Not being a lawyer, I can’t imagine which way this case might go, but there’s one thing we can be sure of: some lawyers are going to able to afford that trip to a tropical paradise this year.

The Security Issues

The requirement for revocation within 24 hours is an important security control in the WebPKI ecosystem. If a certificate is misissued to a malicious party, or is otherwise compromised, it needs to be marked as untrustworthy as soon as possible. While revocation is far from perfect, it is the best tool we have.

In this court filing, Alegeus has claimed that they are unable to switch certificates with less than 24 hours notice (due to “contractual SLAs”). This is a pretty big problem, because there are lots of reasons why a certificate might need to be switched out Very Quickly. As a practical example, someone with access to the private key for your SSL certificate might decide to use it in a blog post. Letting that sort of problem linger for an extended period of time might end up being a Pretty Big Problem of its own. An organisation that cannot respond within hours to a compromised certificate is playing chicken with their security.

The Takeaways

Contractual obligations that require you to notify anyone else of a certificate (or private key) changing are bonkers, and completely antithetical to the needs of the WebPKI. If you have to have them, you’re going to want to start transitioning to a private PKI, wherein you can do whatever you darn well please with revocation (or not). As these sorts of problems keep happening, trust stores (and hence CAs) are going to crack down on this sort of thing, so you may as well move sooner rather than later.

If you are an organisation that uses WebPKI certificates, you’ve got to be able to deal with any kind of certificate revocation event within hours, not days. This basically boils down to automated issuance and lifecycle management, because having someone manually request and install certificates is terrible on many levels. There isn’t currently a completed standard for notifying subscribers if their certificates need premature renewal (say, due to needing to be revoked), but the ACME Renewal Information Extension is currently being developed to fill that need. Ask your CA if they’re tracking this standards development, and when they intend to have the extension available for use. (Pro-tip: if they say “we’ll start doing development when the RFC is published”, run for the hills; that’s not how responsible organisations work on the Internet).

The Givings

If you’ve found this helpful, consider shouting me a refreshing beverage. Reading through legal filings is thirsty work!

Checking for Compromised Private Keys has Never Been Easier

Posted: Fri, 28 June 2024 | permalink | No comments

As regular readers would know, since I never stop banging on about it, I run Pwnedkeys, a service which finds and collates private keys which have been disclosed or are otherwise compromised. Until now, the only way to check if a key is compromised has been to use the Pwnedkeys API, which is not necessarily trivial for everyone.

Starting today, that’s changing.

The next phase of Pwnedkeys is to start offering more user-friendly tools for checking whether keys being used are compromised. These will typically be web-based or command-line tools intended to answer the question “is the key in this (certificate, CSR, authorized_keys file, TLS connection, email, etc) known to Pwnedkeys to have been compromised?”.

Opening the Toolbox

Available right now are the first web-based key checking tools in this arsenal. These tools allow you to:

Check the key in a PEM-format X509 data structure (such as a CSR or certificate);
Check the keys in an authorized_keys file you upload; and
Check the SSH keys used by a user at any one of a number of widely-used code-hosting sites.

Further planned tools include “live” checking of the certificates presented in TLS connections (for HTTPS, etc), SSH host keys, command-line utilities for checking local authorized_keys files, and many other goodies.

If You Are Intrigued By My Ideas…

… and wish to subscribe to my newsletter, now you can!

I’m not going to be blogging every little update to Pwnedkeys, because that would probably get a bit tedious for readers who aren’t as intrigued by compromised keys as I am. Instead, I’ll be posting every little update in the Pwnedkeys newsletter. So, if you want to keep up-to-date with the latest and greatest news and information, subscribe to the newsletter.

Supporting Pwnedkeys

All this work I’m doing on my own time, and I’m paying for the infrastructure from my own pocket. If you’ve got a few dollars to spare, I’d really appreciate it if you bought me a refreshing beverage. It helps keep the lights on here at Pwnedkeys Global HQ.

Information Security: "We Can Do It, We Just Choose Not To"

Posted: Fri, 14 June 2024 | permalink | 2 Comments

Whenever a large corporation disgorges the personal information of millions of people onto the Internet, there is a standard playbook that is followed.

“Security is our top priority”.

“Passwords were hashed”.

“No credit card numbers were disclosed”.

record scratch

Let’s talk about that last one a bit.

A Case Study

This post could have been written any time in the past… well, decade or so, really. But the trigger for my sitting down and writing this post is the recent breach of wallet-finding and criminal-harassment-enablement platform Tile. As reported by Engadget, a statement attributed to Life360 CEO Chris Hulls says

The potentially impacted data consists of information such as names, addresses, email addresses, phone numbers, and Tile device identification numbers.

But don’t worry though; even though your home address is now public information

It does not include more sensitive information, such as credit card numbers

Aaaaaand here is where I get salty.

Why Credit Card Numbers Don’t Matter

Describing credit card numbers as “more sensitive information” is somewhere between disingenuous and a flat-out lie. It was probably included in the statement because it’s part of the standard playbook. Why is it part of the playbook, though?

Not being a disaster comms specialist, I can’t say for sure, but my hunch is that the post-breach playbook includes this line because (a) credit cards are less commonly breached these days (more on that later), and (b) it’s a way to insinuate that “all your financial data is safe, no need to worry” without having to say that (because that statement would absolutely be a lie).

The thing that not nearly enough people realise about credit card numbers is:

The credit card holder is not usually liable for most fraud done via credit card numbers; and
In terms of actual, long-term damage to individuals, credit card fraud barely rates a mention. Identity fraud, Business Email Compromise, extortion, and all manner of other unpleasantness is far more damaging to individuals.

Why Credit Card Numbers Do Matter

Losing credit card numbers in a data breach is a huge deal – but not for the users of the breached platform. Instead, it’s a problem for the company that got breached.

See, going back some years now, there was a wave of huge credit card data breaches. If you’ve been around a while, names like Target and Heartland will bring back some memories.

Because these breaches cost issuing banks and card brands a lot of money, the Payment Card Industry Security Standards Council (PCI-SSC) and the rest of the ecosystem went full goblin mode. Now, if you lose credit card numbers in bulk, it will cost you big. Massive fines for breaches (typically levied by the card brands via the acquiring bank), increased transaction fees, and even the Credit Card Death Penalty (being banned from charging credit cards), are all very big sticks.

Now Comes the Finding Out

In news that should not be surprising, when there are actual consequences for failing to do something, companies take the problem seriously. Which is why “no credit card numbers were disclosed” is such an interesting statement.

Consider why no credit card numbers were disclosed. It’s not that credit card numbers aren’t valuable to criminals – because they are. Instead, it’s because the company took steps to properly secure the credit card data.

Next, you’ll start to consider why, if the credit card numbers were secured, why wasn’t the personal information that did get disclosed similarly secured? Information that is far more damaging to the individuals to whom that information relates than credit card numbers.

The only logical answer is that it wasn’t deemed financially beneficial to the company to secure that data. The consequences of disclosure for that information isn’t felt by the company which was breached. Instead, it’s felt by the individuals who have to spend weeks of their life cleaning up from identity fraud committed against them. It’s felt by the victim of intimate partner violence whose new address is found in a data dump, letting their ex find them again.

Until there are real, actual consequences for the companies which hemorrhage our personal data (preferably ones that have “percentage of global revenue” at the end), data breaches will continue to happen. Not because they’re inevitable – because as credit card numbers show, data can be secured – but because there’s no incentive for companies to prevent our personal data from being handed over to whoever comes along.

Support my Salt

My salty takes are powered by refreshing beverages. If you’d like to see more of the same, buy me one.

GitHub's Missing Tab

Posted: Thu, 30 May 2024 | permalink | 7 Comments

Visit any GitHub project page, and the first thing you see is something that looks like this:

screenshot of the GitHub repository page, showing the Code, Issues, and Pull Requests tabs

“Code”, that’s fairly innocuous, and it’s what we came here for. The “Issues” and “Pull Requests” tabs, with their count of open issues, might give us some sense of “how active” the project is, or perhaps “how maintained”. Useful information for the casual visitor, undoubtedly.

However, there’s another user community that visits this page on the regular, and these same tabs mean something very different to them.

I’m talking about the maintainers (or, more commonly, maintainer, singular). When they see those tabs, all they see is work. The “Code” tab is irrelevant to them – they already have the code, and know it possibly better than they know their significant other(s) (if any). “Issues” and “Pull Requests” are just things that have to be done.

I know for myself, at least, that it is demoralising to look at a repository page and see nothing but work. I’d be surprised if it didn’t contribute in some small way to maintainers just noping the fudge out.

A Modest Proposal

So, here’s my thought. What if instead of the repo tabs looking like the above, they instead looked like this:

modified screenshot of the GitHub repository page, showing a new Kudos tab, with a smiley face icon, between the Code and Issues tabs

My conception of this is that it would, essentially, be a kind of “yearbook”, that people who used and liked the software could scribble their thoughts on. With some fairly straightforward affordances elsewhere to encourage its use, it could be a powerful way to show maintainers that they are, in fact, valued and appreciated.

There are a number of software packages I’ve used recently, that I’d really like to say a general “thanks, this is awesome!” to. However, I’m not about to make the Issues tab look even scarier by creating an “issue” to say thanks, and digging up an email address is often surprisingly difficult, and wouldn’t be a public show of my gratitude, which I believe is a valuable part of the interaction.

You Can’t Pay Your Rent With Kudos

Absolutely you cannot. A means of expressing appreciation in no way replaces the pressing need to figure out a way to allow open source developers to pay their rent. Conversely, however, the need to pay open source developers doesn’t remove the need to also show those people that their work is appreciated and valued by many people around the world.

Anyway, who knows a senior exec at GitHub? I’ve got an idea I’d like to run past them…

"Is This Project Still Maintained?"

Posted: Tue, 14 May 2024 | permalink | 5 Comments

If you wander around a lot of open source repositories on the likes of GitHub, you’ll invariably stumble over repos that have an issue (or more than one!) with a title like the above. Sometimes sitting open and unloved, often with a comment or two from the maintainer and a bunch of “I’ll help out!” followups that never seemed to pan out. Very rarely, you’ll find one that has been closed, with a happy ending.

These issues always fascinate me, because they say a lot about what it means to “maintain” an open source project, the nature of succession (particularly in a post-Jia Tan world), and the expectations of users and the impedence mismatch between maintainers, contributors, and users. I’ve also recently been thinking about pre-empting this sort of issue, and opening my own issue that answers the question before it’s even asked.

Why These Issues Are Created

As both a producer and consumer of open source software, I completely understand the reasons someone might want to know whether a project is abandoned. It’s comforting to be able to believe that there’s someone “on the other end of the line”, and that if you have a problem, you can ask for help with a non-zero chance of someone answering you. There’s also a better chance that, if the maintainer is still interested in the software, that compatibility issues and at least show-stopper bugs might get fixed for you.

But often there’s more at play. There is a delusion that “maintained” open source software comes with entitlements – an expectation that your questions, bug reports, and feature requests will be attended to in some fashion.

This comes about, I think, in part because there are a lot of open source projects that are energetically supported, where generous volunteers do answer questions, fix reported bugs, and implement things that they don’t personally need, but which random Internet strangers ask for. If you’ve had that kind of user experience, it’s not surprising that you might start to expect it from all open source projects.

Of course, these wonders of cooperative collaboration are the exception, rather than the rule. In many (most?) cases, there is little practical difference between most projects that are “maintained” and those that are formally declared “unmaintained”. The contributors (or, most often, contributor – singular) are unlikely to have the time or inclination to respond to your questions in a timely and effective manner. If you find a problem with the software, you’re going to be paddling your own canoe, even if the maintainer swears that they’re still “maintaining” it.

A Thought Appears

With this in mind, I’ve been considering how to get ahead of the problem and answer the question for the software projects I’ve put out in the world. Nothing I’ve built has anything like what you’d call a “community”; most have never seen an external PR, or even an issue. The last commit date on them might be years ago.

By most measures, almost all of my repos look “unmaintained”. Yet, they don’t feel unmaintained to me. I’m still using the code, sometimes as often as every day, and if something broke for me, I’d fix it. Anyone who needs the functionality I’ve developed can use the code, and be pretty confident that it’ll do what it says in the README.

I’m considering creating an issue in all my repos, titled “Is This Project Still Maintained?”, pinning it to the issues list, and pasting in something I’m starting to think of as “The Open Source Maintainer’s Manifesto”.

It goes something like this:

Is This Project Still Maintained?

Yes. Maybe. Actually, perhaps no. Well, really, it depends on what you mean by “maintained”.

I wrote the software in this repo for my own benefit – to solve the problems I had, when I had them. While I could have kept the software to myself, I instead released it publicly, under the terms of an open licence, with the hope that it might be useful to others, but with no guarantees of any kind. Thanks to the generosity of others, it costs me literally nothing for you to use, modify, and redistribute this project, so have at it!

OK, Whatever. What About Maintenance?

In one sense, this software is “maintained”, and always will be. I fix the bugs that annoy me, I upgrade dependencies when not doing so causes me problems, and I add features that I need. To the degree that any on-going development is happening, it’s because I want that development to happen.

However, if “maintained” to you means responses to questions, bug fixes, upgrades, or new features, you may be somewhat disappointed. That’s not “maintenance”, that’s “support”, and if you expect support, you’ll probably want to have a “support contract”, where we come to an agreement where you pay me money, and I help you with the things you need help with.

That Doesn’t Sound Fair!

If it makes you feel better, there are several things you are entitled to:

The ability to use, study, modify, and redistribute the contents of this repository, under the terms stated in the applicable licence(s).

That any interactions you may have with myself, other contributors, and anyone else in this project’s spaces will be in line with the published Code of Conduct, and any transgressions of the Code of Conduct will be dealt with appropriately.

… actually, that’s it.

Things that you are not entitled to include an answer to your question, a fix for your bug, an implementation of your feature request, or a merge (or even review) of your pull request. Sometimes I may respond, either immediately or at some time long afterwards. You may luck out, and I’ll think “hmm, yeah, that’s an interesting thing” and I’ll work on it, but if I do that in any particular instance, it does not create an entitlement that I will continue to do so, or that I will ever do so again in the future.

But… I’ve Found a Huge and Terrible Bug!

You have my full and complete sympathy. It’s reasonable to assume that I haven’t come across the same bug, or at least that it doesn’t bother me, otherwise I’d have fixed it for myself.

Feel free to report it, if only to warn other people that there is a huge bug they might need to avoid (possibly by not using the software at all). Well-written bug reports are great contributions, and I appreciate the effort you’ve put in, but the work that you’ve done on your bug report still doesn’t create any entitlement on me to fix it.

If you really want that bug fixed, the source is available, and the licence gives you the right to modify it as you see fit. I encourage you to dig in and fix the bug. If you don’t have the necessary skills to do so yourself, you can get someone else to fix it – everyone has the same entitlements to use, study, modify, and redistribute as you do.

You may also decide to pay me for a support contract, and get the bug fixed that way. That gets the bug fixed for everyone, and gives you the bonus warm fuzzies of contributing to the digital commons, which is always nice.

But… My PR is a Gift!

If you take the time and effort to make a PR, you’re doing good work and I commend you for it. However, that doesn’t mean I’ll necessarily merge it into this repository, or even work with you to get it into a state suitable for merging.

A PR is what is often called a “gift of work”. I’ll have to make sure that, at the very least, it doesn’t make anything actively worse. That includes introducing bugs, or causing maintenance headaches in the future (which includes my getting irrationally angry at indenting, because I’m like that). Properly reviewing a PR takes me at least as much time as it would take me to write it from scratch, in almost all cases.

So, if your PR languishes, it might not be that it’s bad, or that the project is (dum dum dummmm!) “unmaintained”, but just that I don’t accept this particular gift of work at this particular time.

Don’t forget that the terms of licence include permission to redistribute modified versions of the code I’ve released. If you think your PR is all that and a bag of potato chips, fork away! I won’t be offended if you decide to release a permanent fork of this software, as long as you comply with the terms of the licence(s) involved.

(Note that I do not undertake support contracts solely to review and merge PRs; that reeks a little too much of “pay to play” for my liking)

Gee, You Sound Like an Asshole

I prefer to think of myself as “forthright” and “plain-speaking”, but that brings to mind that third thing you’re entitled to: your opinion.

I’ve written this out because I feel like clarifying the reality we’re living in, in the hope that it prevents misunderstandings. If what I’ve written makes you not want to use the software I’ve written, that’s fine – you’ve probably avoided future disappointment.

Opinions Sought

What do you think? Too harsh? Too wishy-washy? Comment away!

The Mediocre Programmer's Guide to Rust

Posted: Wed, 1 May 2024 | permalink | 2 Comments

Me: “Hi everyone, my name’s Matt, and I’m a mediocre programmer.”

Everyone: “Hi, Matt.”

Facilitator: “Are you an alcoholic, Matt?”

Me: “No, not since I stopped reading Twitter.”

Facilitator: “Then I think you’re in the wrong room.”

Yep, that’s my little secret – I’m a mediocre programmer. The definition of the word “hacker” I most closely align with is “someone who makes furniture with an axe”. I write simple, straightforward code because trying to understand complexity makes my head hurt.

Which is why I’ve always avoided the more “academic” languages, like OCaml, Haskell, Clojure, and so on. I know they’re good languages – people far smarter than me are building amazing things with them – but the time I hear the word “endofunctor”, I’ve lost all focus (and most of my will to live). My preferred languages are the ones that come with less intellectual overhead, like C, PHP, Python, and Ruby.

So it’s interesting that I’ve embraced Rust with significant vigour. It’s by far the most “complicated” language that I feel at least vaguely comfortable with using “in anger”. Part of that is that I’ve managed to assemble a set of principles that allow me to almost completely avoid arguing with Rust’s dreaded borrow checker, lifetimes, and all the rest of the dark, scary corners of the language. It’s also, I think, that Rust helps me to write better software, and I can feel it helping me (almost) all of the time.

In the spirit of helping my fellow mediocre programmers to embrace Rust, I present the principles I’ve assembled so far.

Neither a Borrower Nor a Lender Be

If you know anything about Rust, you probably know about the dreaded “borrow checker”. It’s the thing that makes sure you don’t have two pieces of code trying to modify the same data at the same time, or using a value when it’s no longer valid.

While Rust’s borrowing semantics allow excellent performance without compromising safety, for us mediocre programmers it gets very complicated, very quickly. So, the moment the compiler wants to start talking about “explicit lifetimes”, I shut it up by just using “owned” values instead.

It’s not that I never borrow anything; I have some situations that I know are “borrow-safe” for the mediocre programmer (I’ll cover those later). But any time I’m not sure how things will pan out, I’ll go straight for an owned value.

For example, if I need to store some text in a struct or enum, it’s going straight into a String. I’m not going to start thinking about lifetimes and &'a str; I’ll leave that for smarter people. Similarly, if I need a list of things, it’s a Vec<T> every time – no &'b [T] in my structs, thank you very much.

Attack of the Clones

Following on from the above, I’ve come to not be afraid of .clone(). I scatter them around my code like seeds in a field. Life’s too short to spend time trying to figure out who’s borrowing what from whom, if I can just give everyone their own thing.

There are warnings in the Rust book (and everywhere else) about how a clone can be “expensive”. While it’s true that, yes, making clones of data structures consumes CPU cycles and memory, it very rarely matters. CPU cycles are (usually) plentiful and RAM (usually) relatively cheap. Mediocre programmer mental effort is expensive, and not to be spent on premature optimisation. Also, if you’re coming from most any other modern language, Rust is already giving you so much more performance that you’re probably ending up ahead of the game, even if you .clone() everything in sight.

If, by some miracle, something I write gets so popular that the “expense” of all those spurious clones becomes a problem, it might make sense to pay someone much smarter than I to figure out how to make the program a zero-copy masterpiece of efficient code. Until then… clone early and clone often, I say!

Derive Macros are Powerful Magicks

If you start .clone()ing everywhere, pretty quickly you’ll be hit with this error:


error[E0599]: no method named `clone` found for struct `Foo` in the current scope

This is because not everything can be cloned, and so if you want your thing to be cloned, you need to implement the method yourself. Well… sort of.

One of the things that I find absolutely outstanding about Rust is the “derive macro”. These allow you to put a little marker on a struct or enum, and the compiler will write a bunch of code for you! Clone is one of the available so-called “derivable traits”, so you add #[derive(Clone)] to your structs, and poof! you can .clone() to your heart’s content.

But there are other things that are commonly useful, and so I’ve got a set of traits that basically all of my data structures derive:


#[derive(Clone, Debug, Default)]
struct Foo {
    // ...
}

Every time I write a struct or enum definition, that line #[derive(Clone, Debug, Default)] goes at the top.

The Debug trait allows you to print a “debug” representation of the data structure, either with the dbg!() macro, or via the {:?} format in the format!() macro (and anywhere else that takes a format string). Being able to say “what exactly is that?” comes in handy so often, not having a Debug implementation is like programming with one arm tied behind your Aeron.

Meanwhile, the Default trait lets you create an “empty” instance of your data structure, with all of the fields set to their own default values. This only works if all the fields themselves implement Default, but a lot of standard types do, so it’s rare that you’ll define a structure that can’t have an auto-derived Default. Enums are easily handled too, you just mark one variant as the default:


#[derive(Clone, Debug, Default)]
enum Bar {
    Something(String),
    SomethingElse(i32),
    #[default]   // <== mischief managed
    Nothing,
}

Borrowing is OK, Sometimes

While I previously said that I like and usually use owned values, there are a few situations where I know I can borrow without angering the borrow checker gods, and so I’m comfortable doing it.

The first is when I need to pass a value into a function that only needs to take a little look at the value to decide what to do. For example, if I want to know whether any values in a Vec<u32> are even, I could pass in a Vec, like this:


fn main() {
    let numbers = vec![0u32, 1, 2, 3, 4, 5];

    if has_evens(numbers) {
        println!("EVENS!");
    }
}

fn has_evens(numbers: Vec<u32>) -> bool {
    numbers.iter().any(|n| n % 2 == 0)
}

Howver, this gets ugly if I’m going to use numbers later, like this:


fn main() {
    let numbers = vec![0u32, 1, 2, 3, 4, 5];

    if has_evens(numbers) {
        println!("EVENS!");
    }

    // Compiler complains about "value borrowed here after move"
    println!("Sum: {}", numbers.iter().sum::<u32>());
}

fn has_evens(numbers: Vec<u32>) -> bool {
    numbers.iter().any(|n| n % 2 == 0)
}

Helpfully, the compiler will suggest I use my old standby, .clone(), to fix this problem. But I know that the borrow checker won’t have a problem with lending that Vec<u32> into has_evens() as a borrowed slice, &[u32], like this:


fn main() {
    let numbers = vec![0u32, 1, 2, 3, 4, 5];

    if has_evens(&numbers) {
        println!("EVENS!");
    }
}

fn has_evens(numbers: &[u32]) -> bool {
    numbers.iter().any(|n| n % 2 == 0)
}

The general rule I’ve got is that if I can take advantage of lifetime elision (a fancy term meaning “the compiler can figure it out”), I’m probably OK. In less fancy terms, as long as the compiler doesn’t tell me to put 'a anywhere, I’m in the green. On the other hand, the moment the compiler starts using the words “explicit lifetime”, I nope the heck out of there and start cloning everything in sight.

Another example of using lifetime elision is when I’m returning the value of a field from a struct or enum. In that case, I can usually get away with returning a borrowed value, knowing that the caller will probably just be taking a peek at that value, and throwing it away before the struct itself goes out of scope. For example:


struct Foo {
    id: u32,
    desc: String,
}

impl Foo {
    fn description(&self) -> &str {
        &self.desc
    }
}

Returning a reference from a function is practically always a mortal sin for mediocre programmers, but returning one from a struct method is often OK. In the rare case that the caller does want the reference I return to live for longer, they can always turn it into an owned value themselves, by calling .to_owned().

Avoid the String Tangle

Rust has a couple of different types for representing strings – String and &str being the ones you see most often. There are good reasons for this, however it complicates method signatures when you just want to take some sort of “bunch of text”, and don’t care so much about the messy details.

For example, let’s say we have a function that wants to see if the length of the string is even. Using the logic that since we’re just taking a peek at the value passed in, our function might take a string reference, &str, like this:


fn is_even_length(s: &str) -> bool {
    s.len() % 2 == 0
}

That seems to work fine, until someone wants to check a formatted string:


fn main() {
    // The compiler complains about "expected `&str`, found `String`"
    if is_even_length(format!("my string is {}", std::env::args().next().unwrap())) {
        println!("Even length string");
    }
}

Since format! returns an owned string, String, rather than a string reference, &str, we’ve got a problem. Of course, it’s straightforward to turn the String from format!() into a &str (just prefix it with an &). But as mediocre programmers, we can’t be expected to remember which sort of string all our functions take and add & wherever it’s needed, and having to fix everything when the compiler complains is tedious.

The converse can also happen: a method that wants an owned String, and we’ve got a &str (say, because we’re passing in a string literal, like "Hello, world!"). In this case, we need to use one of the plethora of available “turn this into a String” mechanisms (.to_string(), .to_owned(), String::from(), and probably a few others I’ve forgotten), on the value before we pass it in, which gets ugly real fast.

For these reasons, I never take a String or an &str as an argument. Instead, I use the Power of Traits to let callers pass in anything that is, or can be turned into, a string. Let us have some examples.

First off, if I would normally use &str as the type, I instead use impl AsRef<str>:


fn is_even_length(s: impl AsRef<str>) -> bool {
    s.as_ref().len() % 2 == 0
}

Note that I had to throw in an extra as_ref() call in there, but now I can call this with either a String or a &str and get an answer.

Now, if I want to be given a String (presumably because I plan on taking ownership of the value, say because I’m creating a new instance of a struct with it), I use impl Into<String> as my type:


struct Foo {
    id: u32,
    desc: String,
}

impl Foo {
    fn new(id: u32, desc: impl Into<String>) -> Self {
        Self { id, desc: desc.into() }
    }
}

We have to call .into() on our desc argument, which makes the struct building a bit uglier, but I’d argue that’s a small price to pay for being able to call both Foo::new(1, "this is a thing") and Foo::new(2, format!("This is a thing named {name}")) without caring what sort of string is involved.

Always Have an `Error` Enum

Rust’s error handing mechanism (Results… everywhere), along with the quality-of-life sugar surrounding it (like the short-circuit operator, ?), is a delightfully ergonomic approach to error handling. To make life easy for mediocre programmers, I recommend starting every project with an Error enum, that derives thiserror::Error, and using that in every method and function that returns a Result.

How you structure your Error type from there is less cut-and-dried, but typically I’ll create a separate enum variant for each type of error I want to have a different description. With thiserror, it’s easy to then attach those descriptions:


#[derive(Clone, Debug, thiserror::Error)]
enum Error {
    #[error("{0} caught fire")]
    Combustion(String),
    #[error("{0} exploded")]
    Explosion(String),
}

I also implement functions to create each error variant, because that allows me to do the Into<String> trick, and can sometimes come in handy when creating errors from other places with .map_err() (more on that later). For example, the impl for the above Error would probably be:


impl Error {
    fn combustion(desc: impl Into<String>) -> Self {
        Self::Combustion(desc.into())
    }

    fn explosion(desc: impl Into<String>) -> Self {
        Self::Explosion(desc.into())
    }
}

It’s a tedious bit of boilerplate, and you can use the thiserror-ext crate’s thiserror_ext::Construct derive macro to do the hard work for you, if you like. It, too, knows all about the Into<String> trick.

Banish `map_err` (well, mostly)

The newer mediocre programmer, who is just dipping their toe in the water of Rust, might write file handling code that looks like this:


fn read_u32_from_file(name: impl AsRef<str>) -> Result<u32, Error> {
    let mut f = File::open(name.as_ref())
        .map_err(|e| Error::FileOpenError(name.as_ref().to_string(), e))?;

    let mut buf = vec![0u8; 30];
    f.read(&mut buf)
        .map_err(|e| Error::ReadError(e))?;

    String::from_utf8(buf)
        .map_err(|e| Error::EncodingError(e))?
        .parse::<u32>()
        .map_err(|e| Error::ParseError(e))
}

This works great (or it probably does, I haven’t actually tested it), but there are a lot of .map_err() calls in there. They take up over half the function, in fact. With the power of the From trait and the magic of the ? operator, we can make this a lot tidier.

First off, assume we’ve written boilerplate error creation functions (or used thiserror_ext::Construct to do it for us)). That allows us to simplify the file handling portion of the function a bit:


fn read_u32_from_file(name: impl AsRef<str>) -> Result<u32, Error> {
    let mut f = File::open(name.as_ref())
        // We've dropped the `.to_string()` out of here...
        .map_err(|e| Error::file_open_error(name.as_ref(), e))?;

    let mut buf = vec![0u8; 30];
    f.read(&mut buf)
        // ... and the explicit parameter passing out of here
        .map_err(Error::read_error)?;

    // ...

If that latter .map_err() call looks weird, without the |e| and such, it’s passing a function-as-closure, which just saves on a few characters typing. Just because we’re mediocre, doesn’t mean we’re not also lazy.

Next, if we implement the From trait for the other two errors, we can make the string-handling lines significantly cleaner. First, the trait impl:


impl From<std::string::FromUtf8Error> for Error {
    fn from(e: std::string::FromUtf8Error) -> Self {
        Self::EncodingError(e)
    }
}

impl From<std::num::ParseIntError> for Error {
    fn from(e: std::num::ParseIntError) -> Self {
        Self::ParseError(e)
    }
}

(Again, this is boilerplate that can be autogenerated, this time by adding a #[from] tag to the variants you want a From impl on, and thiserror will take care of it for you)

In any event, no matter how you get the From impls, once you have them, the string-handling code becomes practically error-handling-free:


    Ok(
        String::from_utf8(buf)?
            .parse::<u32>()?
    )

The ? operator will automatically convert the error from the types returned from each method into the return error type, using From. The only tiny downside to this is that the ? at the end strips the Result, and so we’ve got to wrap the returned value in Ok() to turn it back into a Result for returning. But I think that’s a small price to pay for the removal of those .map_err() calls.

In many cases, my coding process involves just putting a ? after every call that returns a Result, and adding a new Error variant whenever the compiler complains about not being able to convert some new error type. It’s practically zero effort – outstanding outcome for the mediocre programmer.

Just Because You’re Mediocre, Doesn’t Mean You Can’t Get Better

To finish off, I’d like to point out that mediocrity doesn’t imply shoddy work, nor does it mean that you shouldn’t keep learning and improving your craft. One book that I’ve recently found extremely helpful is Effective Rust, by David Drysdale. The author has very kindly put it up to read online, but buying a (paper or ebook) copy would no doubt be appreciated.

The thing about this book, for me, is that it is very readable, even by us mediocre programmers. The sections are written in a way that really “clicked” with me. Some aspects of Rust that I’d had trouble understanding for a long time – such as lifetimes and the borrow checker, and particularly lifetime elision – actually made sense after I’d read the appropriate sections.

Finally, a Quick Beg

I’m currently subsisting on the kindness of strangers, so if you found something useful (or entertaining) in this post, why not buy me a refreshing beverage? It helps to know that people like what I’m doing, and helps keep me from having to sell my soul to a private equity firm.

How I Tripped Over the Debian Weak Keys Vulnerability

Posted: Tue, 9 April 2024 | permalink | 4 Comments

Those of you who haven’t been in IT for far, far too long might not know that next month will be the 16th(!) anniversary of the disclosure of what was, at the time, a fairly earth-shattering revelation: that for about 18 months, the Debian OpenSSL package was generating entirely predictable private keys.

The recent xz-stential threat (thanks to @nixCraft for making me aware of that one), has got me thinking about my own serendipitous interaction with a major vulnerability. Given that the statute of limitations has (probably) run out, I thought I’d share it as a tale of how “huh, that’s weird” can be a powerful threat-hunting tool – but only if you’ve got the time to keep pulling at the thread.

Prelude to an Adventure

Our story begins back in March 2008. I was working at Engine Yard (EY), a now largely-forgotten Rails-focused hosting company, which pioneered several advances in Rails application deployment. Probably EY’s greatest claim to lasting fame is that they helped launch a little code hosting platform you might have heard of, by providing them free infrastructure when they were little more than a glimmer in the Internet’s eye.

I am, of course, talking about everyone’s favourite Microsoft product: GitHub.

Since GitHub was in the right place, at the right time, with a compelling product offering, they quickly started to gain traction, and grow their userbase. With growth comes challenges, amongst them the one we’re focusing on today: SSH login times. Then, as now, GitHub provided SSH access to the git repos they hosted, by SSHing to git@github.com with publickey authentication. They were using the standard way that everyone manages SSH keys: the ~/.ssh/authorized_keys file, and that became a problem as the number of keys started to grow.

The way that SSH uses this file is that, when a user connects and asks for publickey authentication, SSH opens the ~/.ssh/authorized_keys file and scans all of the keys listed in it, looking for a key which matches the key that the user presented. This linear search is normally not a huge problem, because nobody in their right mind puts more than a few keys in their ~/.ssh/authorized_keys, right?

2008-era GitHub giving monkey puppet side-eye to the idea that nobody stores many keys in an authorized_keys file

Of course, as a popular, rapidly-growing service, GitHub was gaining users at a fair clip, to the point that the one big file that stored all the SSH keys was starting to visibly impact SSH login times. This problem was also not going to get any better by itself. Something Had To Be Done.

EY management was keen on making sure GitHub ran well, and so despite it not really being a hosting problem, they were willing to help fix this problem. For some reason, the late, great, Ezra Zygmuntowitz pointed GitHub in my direction, and let me take the time to really get into the problem with the GitHub team. After examining a variety of different possible solutions, we came to the conclusion that the least-worst option was to patch OpenSSH to lookup keys in a MySQL database, indexed on the key fingerprint.

We didn’t take this decision on a whim – it wasn’t a case of “yeah, sure, let’s just hack around with OpenSSH, what could possibly go wrong?”. We knew it was potentially catastrophic if things went sideways, so you can imagine how much worse the other options available were. Ensuring that this wouldn’t compromise security was a lot of the effort that went into the change. In the end, though, we rolled it out in early April, and lo! SSH logins were fast, and we were pretty sure we wouldn’t have to worry about this problem for a long time to come.

Normally, you’d think “patching OpenSSH to make mass SSH logins super fast” would be a good story on its own. But no, this is just the opening scene.

Chekov’s Gun Makes its Appearance

Fast forward a little under a month, to the first few days of May 2008. I get a message from one of the GitHub team, saying that somehow users were able to access other users’ repos over SSH. Naturally, as we’d recently rolled out the OpenSSH patch, which touched this very thing, the code I’d written was suspect number one, so I was called in to help.

The lineup scene from the movie The Usual Suspects — They're called The Usual Suspects for a reason, but sometimes, it really is Keyser Söze

Eventually, after more than a little debugging, we discovered that, somehow, there were two users with keys that had the same key fingerprint. This absolutely shouldn’t happen – it’s a bit like winning the lottery twice in a row¹ – unless the users had somehow shared their keys with each other, of course. Still, it was worth investigating, just in case it was a web application bug, so the GitHub team reached out to the users impacted, to try and figure out what was going on.

The users professed no knowledge of each other, neither admitted to publicising their key, and couldn’t offer any explanation as to how the other person could possibly have gotten their key.

Then things went from “weird” to “what the…?”. Because another pair of users showed up, sharing a key fingerprint – but it was a different shared key fingerprint. The odds now have gone from “winning the lottery multiple times in a row” to as close to “this literally cannot happen” as makes no difference.

Milhouse from The Simpsons says that We're Through The Looking Glass Here, People

Once we were really, really confident that the OpenSSH patch wasn’t the cause of the problem, my involvement in the problem basically ended. I wasn’t a GitHub employee, and EY had plenty of other customers who needed my help, so I wasn’t able to stay deeply involved in the on-going investigation of The Mystery of the Duplicate Keys.

However, the GitHub team did keep talking to the users involved, and managed to determine the only apparent common factor was that all the users claimed to be using Debian or Ubuntu systems, which was where their SSH keys would have been generated.

That was as far as the investigation had really gotten, when along came May 13, 2008.

Chekov’s Gun Goes Off

With the publication of DSA-1571-1, everything suddenly became clear. Through a well-meaning but ultimately disasterous cleanup of OpenSSL’s randomness generation code, the Debian maintainer had inadvertently reduced the number of possible keys that could be generated by a given user from “bazillions” to a little over 32,000. With so many people signing up to GitHub – some of them no doubt following best practice and freshly generating a separate key – it’s unsurprising that some collisions occurred.

You can imagine the sense of “oooooooh, so that’s what’s going on!” that rippled out once the issue was understood. I was mostly glad that we had conclusive evidence that my OpenSSH patch wasn’t at fault, little knowing how much more contact I was to have with Debian weak keys in the future, running a huge store of known-compromised keys and using them to find misbehaving Certificate Authorities, amongst other things.

Lessons Learned

While I’ve not found a description of exactly when and how Luciano Bello discovered the vulnerability that became CVE-2008-0166, I presume he first came across it some time before it was disclosed – likely before GitHub tripped over it. The stable Debian release that included the vulnerable code had been released a year earlier, so there was plenty of time for Luciano to have discovered key collisions and go “hmm, I wonder what’s going on here?”, then keep digging until the solution presented itself.

The thought “hmm, that’s odd”, followed by intense investigation, leading to the discovery of a major flaw is also what ultimately brought down the recent XZ backdoor. The critical part of that sequence is the ability to do that intense investigation, though.

When I reflect on my brush with the Debian weak keys vulnerability, what sticks out to me is the fact that I didn’t do the deep investigation. I wonder if Luciano hadn’t found it, how long it might have been before it was found. The GitHub team would have continued investigating, presumably, and perhaps they (or I) would have eventually dug deep enough to find it. But we were all super busy – myself, working support tickets at EY, and GitHub feverishly building features and fighting the fires in their rapidly-growing service.

As it was, Luciano was able to take the time to dig in and find out what was happening, but just like the XZ backdoor, I feel like we, as an industry, got a bit lucky that someone with the skills, time, and energy was on hand at the right time to make a huge difference.

It’s a luxury to be able to take the time to really dig into a problem, and it’s a luxury that most of us rarely have. Perhaps an understated takeaway is that somehow we all need to wrestle back some time to follow our hunches and really dig into the things that make us go “hmm…”.

Support My Hunches

If you’d like to help me be able to do intense investigations of mysterious software phenomena, you can shout me a refreshing beverage on ko-fi.

the odds are actually probably more like winning the lottery about twenty times in a row. The numbers involved are staggeringly huge, so it’s easiest to just approximate it as “really, really unlikely”. ↩

Not all TLDs are Created Equal

Posted: Tue, 13 February 2024 | permalink | No comments

In light of the recent cancellation of the queer.af domain registration by the Taliban, the fragile and difficult nature of country-code top-level domains (ccTLDs) has once again been comprehensively demonstrated. Since many people may not be aware of the risks, I thought I’d give a solid explainer of the whole situation, and explain why you should, in general, not have anything to do with domains which are registered under ccTLDs.

Top-level What-Now?

A top-level domain (TLD) is the last part of a domain name (the collection of words, separated by periods, after the https:// in your web browser’s location bar). It’s the “com” in example.com, or the “af” in queer.af.

There are two kinds of TLDs: country-code TLDs (ccTLDs) and generic TLDs (gTLDs). Despite all being TLDs, they’re very different beasts under the hood.

What’s the Difference?

Generic TLDs are what most organisations and individuals register their domains under: old-school technobabble like “com”, “net”, or “org”, historical oddities like “gov”, and the new-fangled world of words like “tech”, “social”, and “bank”. These gTLDs are all regulated under a set of rules created and administered by ICANN (the “Internet Corporation for Assigned Names and Numbers”), which try to ensure that things aren’t a complete wild-west, limiting things like price hikes (well, sometimes, anyway), and providing means for disputes over names¹.

Country-code TLDs, in contrast, are all two letters long², and are given out to countries to do with as they please. While ICANN kinda-sorta has something to do with ccTLDs (in the sense that it makes them exist on the Internet), it has no authority to control how a ccTLD is managed. If a country decides to raise prices by 100x, or cancel all registrations that were made on the 12th of the month, there’s nothing anyone can do about it.

If that sounds bad, that’s because it is. Also, it’s not a theoretical problem – the Taliban deciding to asssert its bigotry over the little corner of the Internet namespace it has taken control of is far from the first time that ccTLDs have caused grief.

Shifting Sands

The queer.af cancellation is interesting because, at the time the domain was reportedly registered, 2018, Afghanistan had what one might describe as, at least, a different political climate. Since then, of course, things have changed, and the new bosses have decided to get a bit more active.

Those running queer.af seem to have seen the writing on the wall, and were planning on moving to another, less fraught, domain, but hadn’t completed that move when the Taliban came knocking.

The Curious Case of Brexit

When the United Kingdom decided to leave the European Union, it fell foul of the EU’s rules for the registration of domains under the “eu” ccTLD³. To register (and maintain) a domain name ending in .eu, you have to be a resident of the EU. When the UK ceased to be part of the EU, residents of the UK were no longer EU residents.

Cue much unhappiness, wailing, and gnashing of teeth when this was pointed out to Britons. Some decided to give up their domains, and move to other parts of the Internet, while others managed to hold onto them by various legal sleight-of-hand (like having an EU company maintain the registration on their behalf).

In any event, all very unpleasant for everyone involved.

Geopolitics… on the Internet?!?

After Russia invaded Ukraine in February 2022, the Ukranian Vice Prime Minister asked ICANN to suspend ccTLDs associated with Russia. While ICANN said that it wasn’t going to do that, because it wouldn’t do anything useful, some domain registrars (the companies you pay to register domain names) ceased to deal in Russian ccTLDs, and some websites restricted links to domains with Russian ccTLDs.

Whether or not you agree with the sort of activism implied by these actions, the fact remains that even the actions of a government that aren’t directly related to the Internet can have grave consequences for your domain name if it’s registered under a ccTLD. I don’t think any gTLD operator will be invading a neighbouring country any time soon.

Money, Money, Money, Must Be Funny

When you register a domain name, you pay a registration fee to a registrar, who does administrative gubbins and causes you to be able to control the domain name in the DNS. However, you don’t “own” that domain name⁴ – you’re only renting it. When the registration period comes to an end, you have to renew the domain name, or you’ll cease to be able to control it.

Given that a domain name is typically your “brand” or “identity” online, the chances are you’d prefer to keep it over time, because moving to a new domain name is a massive pain, having to tell all your customers or users that now you’re somewhere else, plus having to accept the risk of someone registering the domain name you used to have and capturing your traffic… it’s all a gigantic hassle.

For gTLDs, ICANN has various rules around price increases and bait-and-switch pricing that tries to keep a lid on the worst excesses of registries. While there are any number of reasonable criticisms of the rules, and the Internet community has to stay on their toes to keep ICANN from totally succumbing to regulatory capture, at least in the gTLD space there’s some degree of control over price gouging.

On the other hand, ccTLDs have no effective controls over their pricing. For example, in 2008 the Seychelles increased the price of .sc domain names from US$25 to US$75. No reason, no warning, just “pay up”.

Who Is Even Getting That Money?

A closely related concern about ccTLDs is that some of the “cool” ones are assigned to countries that are… not great.

The poster child for this is almost certainly Libya, which has the ccTLD “ly”. While Libya was being run by a terrorist-supporting extremist, companies thought it was a great idea to have domain names that ended in .ly. These domain registrations weren’t (and aren’t) cheap, and it’s hard to imagine that at least some of that money wasn’t going to benefit the Gaddafi regime.

Similarly, the British Indian Ocean Territory, which has the “io” ccTLD, was created in a colonialist piece of chicanery that expelled thousands of native Chagossians from Diego Garcia. Money from the registration of .io domains doesn’t go to the (former) residents of the Chagos islands, instead it gets paid to the UK government.

Again, I’m not trying to suggest that all gTLD operators are wonderful people, but it’s not particularly likely that the direct beneficiaries of the operation of a gTLD stole an island chain and evicted the residents.

Are ccTLDs Ever Useful?

The answer to that question is an unqualified “maybe”. I certainly don’t think it’s a good idea to register a domain under a ccTLD for “vanity” purposes: because it makes a word, is the same as a file extension you like, or because it looks cool.

Those ccTLDs that clearly represent and are associated with a particular country are more likely to be OK, because there is less impetus for the registry to try a naked cash grab. Unfortunately, ccTLD registries have a disconcerting habit of changing their minds on whether they serve their geographic locality, such as when auDA decided to declare an open season in the .au namespace some years ago. Essentially, while a ccTLD may have geographic connotations now, there’s not a lot of guarantee that they won’t fall victim to scope creep in the future.

Finally, it might be somewhat safer to register under a ccTLD if you live in the location involved. At least then you might have a better idea of whether your domain is likely to get pulled out from underneath you. Unfortunately, as the .eu example shows, living somewhere today is no guarantee you’ll still be living there tomorrow, even if you don’t move house.

In short, I’d suggest sticking to gTLDs. They’re at least lower risk than ccTLDs.

“+1, Helpful”

If you’ve found this post informative, why not buy me a refreshing beverage? My typing fingers (both of them) thank you in advance for your generosity.

Footnotes

don’t make the mistake of thinking that I approve of ICANN or how it operates; it’s an omnishambles of poor governance and incomprehensible decision-making. ↩
corresponding roughly, though not precisely (because everything has to be complicated, because humans are complicated), to the entries in the ISO standard for “Codes for the representation of names of countries and their subdivisions”, ISO 3166. ↩
yes, the EU is not a country; it’s part of the “roughly, though not precisely” caveat mentioned previously. ↩
despite what domain registrars try very hard to imply, without falling foul of deceptive advertising regulations. ↩

Brane Dump

Sidebar: “annotated what-now?!?”