How I Tripped Over the Debian Weak Keys Vulnerability

Posted: Tue, 9 April 2024 | permalink | 2 Comments

Those of you who haven’t been in IT for far, far too long might not know that next month will be the 16th(!) anniversary of the disclosure of what was, at the time, a fairly earth-shattering revelation: that for about 18 months, the Debian OpenSSL package was generating entirely predictable private keys.

The recent xz-stential threat (thanks to @nixCraft for making me aware of that one), has got me thinking about my own serendipitous interaction with a major vulnerability. Given that the statute of limitations has (probably) run out, I thought I’d share it as a tale of how “huh, that’s weird” can be a powerful threat-hunting tool – but only if you’ve got the time to keep pulling at the thread.

Prelude to an Adventure

Our story begins back in March 2008. I was working at Engine Yard (EY), a now largely-forgotten Rails-focused hosting company, which pioneered several advances in Rails application deployment. Probably EY’s greatest claim to lasting fame is that they helped launch a little code hosting platform you might have heard of, by providing them free infrastructure when they were little more than a glimmer in the Internet’s eye.

I am, of course, talking about everyone’s favourite Microsoft product: GitHub.

Since GitHub was in the right place, at the right time, with a compelling product offering, they quickly started to gain traction, and grow their userbase. With growth comes challenges, amongst them the one we’re focusing on today: SSH login times. Then, as now, GitHub provided SSH access to the git repos they hosted, by SSHing to git@github.com with publickey authentication. They were using the standard way that everyone manages SSH keys: the ~/.ssh/authorized_keys file, and that became a problem as the number of keys started to grow.

The way that SSH uses this file is that, when a user connects and asks for publickey authentication, SSH opens the ~/.ssh/authorized_keys file and scans all of the keys listed in it, looking for a key which matches the key that the user presented. This linear search is normally not a huge problem, because nobody in their right mind puts more than a few keys in their ~/.ssh/authorized_keys, right?

2008-era GitHub giving monkey puppet side-eye to the idea that nobody stores many keys in an authorized_keys file

Of course, as a popular, rapidly-growing service, GitHub was gaining users at a fair clip, to the point that the one big file that stored all the SSH keys was starting to visibly impact SSH login times. This problem was also not going to get any better by itself. Something Had To Be Done.

EY management was keen on making sure GitHub ran well, and so despite it not really being a hosting problem, they were willing to help fix this problem. For some reason, the late, great, Ezra Zygmuntowitz pointed GitHub in my direction, and let me take the time to really get into the problem with the GitHub team. After examining a variety of different possible solutions, we came to the conclusion that the least-worst option was to patch OpenSSH to lookup keys in a MySQL database, indexed on the key fingerprint.

We didn’t take this decision on a whim – it wasn’t a case of “yeah, sure, let’s just hack around with OpenSSH, what could possibly go wrong?”. We knew it was potentially catastrophic if things went sideways, so you can imagine how much worse the other options available were. Ensuring that this wouldn’t compromise security was a lot of the effort that went into the change. In the end, though, we rolled it out in early April, and lo! SSH logins were fast, and we were pretty sure we wouldn’t have to worry about this problem for a long time to come.

Normally, you’d think “patching OpenSSH to make mass SSH logins super fast” would be a good story on its own. But no, this is just the opening scene.

Chekov’s Gun Makes its Appearance

Fast forward a little under a month, to the first few days of May 2008. I get a message from one of the GitHub team, saying that somehow users were able to access other users’ repos over SSH. Naturally, as we’d recently rolled out the OpenSSH patch, which touched this very thing, the code I’d written was suspect number one, so I was called in to help.

The lineup scene from the movie The Usual Suspects
They're called The Usual Suspects for a reason, but sometimes, it really is Keyser Söze

Eventually, after more than a little debugging, we discovered that, somehow, there were two users with keys that had the same key fingerprint. This absolutely shouldn’t happen – it’s a bit like winning the lottery twice in a row1 – unless the users had somehow shared their keys with each other, of course. Still, it was worth investigating, just in case it was a web application bug, so the GitHub team reached out to the users impacted, to try and figure out what was going on.

The users professed no knowledge of each other, neither admitted to publicising their key, and couldn’t offer any explanation as to how the other person could possibly have gotten their key.

Then things went from “weird” to “what the…?”. Because another pair of users showed up, sharing a key fingerprint – but it was a different shared key fingerprint. The odds now have gone from “winning the lottery multiple times in a row” to as close to “this literally cannot happen” as makes no difference.

Milhouse from The Simpsons says that We're Through The Looking Glass Here, People

Once we were really, really confident that the OpenSSH patch wasn’t the cause of the problem, my involvement in the problem basically ended. I wasn’t a GitHub employee, and EY had plenty of other customers who needed my help, so I wasn’t able to stay deeply involved in the on-going investigation of The Mystery of the Duplicate Keys.

However, the GitHub team did keep talking to the users involved, and managed to determine the only apparent common factor was that all the users claimed to be using Debian or Ubuntu systems, which was where their SSH keys would have been generated.

That was as far as the investigation had really gotten, when along came May 13, 2008.

Chekov’s Gun Goes Off

With the publication of DSA-1571-1, everything suddenly became clear. Through a well-meaning but ultimately disasterous cleanup of OpenSSL’s randomness generation code, the Debian maintainer had inadvertently reduced the number of possible keys that could be generated by a given user from “bazillions” to a little over 32,000. With so many people signing up to GitHub – some of them no doubt following best practice and freshly generating a separate key – it’s unsurprising that some collisions occurred.

You can imagine the sense of “oooooooh, so that’s what’s going on!” that rippled out once the issue was understood. I was mostly glad that we had conclusive evidence that my OpenSSH patch wasn’t at fault, little knowing how much more contact I was to have with Debian weak keys in the future, running a huge store of known-compromised keys and using them to find misbehaving Certificate Authorities, amongst other things.

Lessons Learned

While I’ve not found a description of exactly when and how Luciano Bello discovered the vulnerability that became CVE-2008-0166, I presume he first came across it some time before it was disclosed – likely before GitHub tripped over it. The stable Debian release that included the vulnerable code had been released a year earlier, so there was plenty of time for Luciano to have discovered key collisions and go “hmm, I wonder what’s going on here?”, then keep digging until the solution presented itself.

The thought “hmm, that’s odd”, followed by intense investigation, leading to the discovery of a major flaw is also what ultimately brought down the recent XZ backdoor. The critical part of that sequence is the ability to do that intense investigation, though.

When I reflect on my brush with the Debian weak keys vulnerability, what sticks out to me is the fact that I didn’t do the deep investigation. I wonder if Luciano hadn’t found it, how long it might have been before it was found. The GitHub team would have continued investigating, presumably, and perhaps they (or I) would have eventually dug deep enough to find it. But we were all super busy – myself, working support tickets at EY, and GitHub feverishly building features and fighting the fires in their rapidly-growing service.

As it was, Luciano was able to take the time to dig in and find out what was happening, but just like the XZ backdoor, I feel like we, as an industry, got a bit lucky that someone with the skills, time, and energy was on hand at the right time to make a huge difference.

It’s a luxury to be able to take the time to really dig into a problem, and it’s a luxury that most of us rarely have. Perhaps an understated takeaway is that somehow we all need to wrestle back some time to follow our hunches and really dig into the things that make us go “hmm…”.

Support My Hunches

If you’d like to help me be able to do intense investigations of mysterious software phenomena, you can shout me a refreshing beverage on ko-fi.


  1. the odds are actually probably more like winning the lottery about twenty times in a row. The numbers involved are staggeringly huge, so it’s easiest to just approximate it as “really, really unlikely”. 


Not all TLDs are Created Equal

Posted: Tue, 13 February 2024 | permalink | No comments

In light of the recent cancellation of the queer.af domain registration by the Taliban, the fragile and difficult nature of country-code top-level domains (ccTLDs) has once again been comprehensively demonstrated. Since many people may not be aware of the risks, I thought I’d give a solid explainer of the whole situation, and explain why you should, in general, not have anything to do with domains which are registered under ccTLDs.

Top-level What-Now?

A top-level domain (TLD) is the last part of a domain name (the collection of words, separated by periods, after the https:// in your web browser’s location bar). It’s the “com” in example.com, or the “af” in queer.af.

There are two kinds of TLDs: country-code TLDs (ccTLDs) and generic TLDs (gTLDs). Despite all being TLDs, they’re very different beasts under the hood.

What’s the Difference?

Generic TLDs are what most organisations and individuals register their domains under: old-school technobabble like “com”, “net”, or “org”, historical oddities like “gov”, and the new-fangled world of words like “tech”, “social”, and “bank”. These gTLDs are all regulated under a set of rules created and administered by ICANN (the “Internet Corporation for Assigned Names and Numbers”), which try to ensure that things aren’t a complete wild-west, limiting things like price hikes (well, sometimes, anyway), and providing means for disputes over names1.

Country-code TLDs, in contrast, are all two letters long2, and are given out to countries to do with as they please. While ICANN kinda-sorta has something to do with ccTLDs (in the sense that it makes them exist on the Internet), it has no authority to control how a ccTLD is managed. If a country decides to raise prices by 100x, or cancel all registrations that were made on the 12th of the month, there’s nothing anyone can do about it.

If that sounds bad, that’s because it is. Also, it’s not a theoretical problem – the Taliban deciding to asssert its bigotry over the little corner of the Internet namespace it has taken control of is far from the first time that ccTLDs have caused grief.

Shifting Sands

The queer.af cancellation is interesting because, at the time the domain was reportedly registered, 2018, Afghanistan had what one might describe as, at least, a different political climate. Since then, of course, things have changed, and the new bosses have decided to get a bit more active.

Those running queer.af seem to have seen the writing on the wall, and were planning on moving to another, less fraught, domain, but hadn’t completed that move when the Taliban came knocking.

The Curious Case of Brexit

When the United Kingdom decided to leave the European Union, it fell foul of the EU’s rules for the registration of domains under the “eu” ccTLD3. To register (and maintain) a domain name ending in .eu, you have to be a resident of the EU. When the UK ceased to be part of the EU, residents of the UK were no longer EU residents.

Cue much unhappiness, wailing, and gnashing of teeth when this was pointed out to Britons. Some decided to give up their domains, and move to other parts of the Internet, while others managed to hold onto them by various legal sleight-of-hand (like having an EU company maintain the registration on their behalf).

In any event, all very unpleasant for everyone involved.

Geopolitics… on the Internet?!?

After Russia invaded Ukraine in February 2022, the Ukranian Vice Prime Minister asked ICANN to suspend ccTLDs associated with Russia. While ICANN said that it wasn’t going to do that, because it wouldn’t do anything useful, some domain registrars (the companies you pay to register domain names) ceased to deal in Russian ccTLDs, and some websites restricted links to domains with Russian ccTLDs.

Whether or not you agree with the sort of activism implied by these actions, the fact remains that even the actions of a government that aren’t directly related to the Internet can have grave consequences for your domain name if it’s registered under a ccTLD. I don’t think any gTLD operator will be invading a neighbouring country any time soon.

Money, Money, Money, Must Be Funny

When you register a domain name, you pay a registration fee to a registrar, who does administrative gubbins and causes you to be able to control the domain name in the DNS. However, you don’t “own” that domain name4 – you’re only renting it. When the registration period comes to an end, you have to renew the domain name, or you’ll cease to be able to control it.

Given that a domain name is typically your “brand” or “identity” online, the chances are you’d prefer to keep it over time, because moving to a new domain name is a massive pain, having to tell all your customers or users that now you’re somewhere else, plus having to accept the risk of someone registering the domain name you used to have and capturing your traffic… it’s all a gigantic hassle.

For gTLDs, ICANN has various rules around price increases and bait-and-switch pricing that tries to keep a lid on the worst excesses of registries. While there are any number of reasonable criticisms of the rules, and the Internet community has to stay on their toes to keep ICANN from totally succumbing to regulatory capture, at least in the gTLD space there’s some degree of control over price gouging.

On the other hand, ccTLDs have no effective controls over their pricing. For example, in 2008 the Seychelles increased the price of .sc domain names from US$25 to US$75. No reason, no warning, just “pay up”.

Who Is Even Getting That Money?

A closely related concern about ccTLDs is that some of the “cool” ones are assigned to countries that are… not great.

The poster child for this is almost certainly Libya, which has the ccTLD “ly”. While Libya was being run by a terrorist-supporting extremist, companies thought it was a great idea to have domain names that ended in .ly. These domain registrations weren’t (and aren’t) cheap, and it’s hard to imagine that at least some of that money wasn’t going to benefit the Gaddafi regime.

Similarly, the British Indian Ocean Territory, which has the “io” ccTLD, was created in a colonialist piece of chicanery that expelled thousands of native Chagossians from Diego Garcia. Money from the registration of .io domains doesn’t go to the (former) residents of the Chagos islands, instead it gets paid to the UK government.

Again, I’m not trying to suggest that all gTLD operators are wonderful people, but it’s not particularly likely that the direct beneficiaries of the operation of a gTLD stole an island chain and evicted the residents.

Are ccTLDs Ever Useful?

The answer to that question is an unqualified “maybe”. I certainly don’t think it’s a good idea to register a domain under a ccTLD for “vanity” purposes: because it makes a word, is the same as a file extension you like, or because it looks cool.

Those ccTLDs that clearly represent and are associated with a particular country are more likely to be OK, because there is less impetus for the registry to try a naked cash grab. Unfortunately, ccTLD registries have a disconcerting habit of changing their minds on whether they serve their geographic locality, such as when auDA decided to declare an open season in the .au namespace some years ago. Essentially, while a ccTLD may have geographic connotations now, there’s not a lot of guarantee that they won’t fall victim to scope creep in the future.

Finally, it might be somewhat safer to register under a ccTLD if you live in the location involved. At least then you might have a better idea of whether your domain is likely to get pulled out from underneath you. Unfortunately, as the .eu example shows, living somewhere today is no guarantee you’ll still be living there tomorrow, even if you don’t move house.

In short, I’d suggest sticking to gTLDs. They’re at least lower risk than ccTLDs.

“+1, Helpful”

If you’ve found this post informative, why not buy me a refreshing beverage? My typing fingers (both of them) thank you in advance for your generosity.


Footnotes

  1. don’t make the mistake of thinking that I approve of ICANN or how it operates; it’s an omnishambles of poor governance and incomprehensible decision-making. 

  2. corresponding roughly, though not precisely (because everything has to be complicated, because humans are complicated), to the entries in the ISO standard for “Codes for the representation of names of countries and their subdivisions”, ISO 3166. 

  3. yes, the EU is not a country; it’s part of the “roughly, though not precisely” caveat mentioned previously. 

  4. despite what domain registrars try very hard to imply, without falling foul of deceptive advertising regulations. 


Why Certificate Lifecycle Automation Matters

Posted: Tue, 30 January 2024 | permalink | 2 Comments

If you’ve perused the ActivityPub feed of certificates whose keys are known to be compromised, and clicked on the “Show More” button to see the name of the certificate issuer, you may have noticed that some issuers seem to come up again and again. This might make sense – after all, if a CA is issuing a large volume of certificates, they’ll be seen more often in a list of compromised certificates. In an attempt to see if there is anything that we can learn from this data, though, I did a bit of digging, and came up with some illuminating results.

The Procedure

I started off by finding all the unexpired certificates logged in Certificate Transparency (CT) logs that have a key that is in the pwnedkeys database as having been publicly disclosed. From this list of certificates, I removed duplicates by matching up issuer/serial number tuples, and then reduced the set by counting the number of unique certificates by their issuer.

This gave me a list of the issuers of these certificates, which looks a bit like this:

/C=BE/O=GlobalSign nv-sa/CN=AlphaSSL CA - SHA256 - G4
/C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Domain Validation Secure Server CA
/C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Organization Validation Secure Server CA
/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2
/C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./OU=http://certs.starfieldtech.com/repository//CN=Starfield Secure Certificate Authority - G2
/C=AT/O=ZeroSSL/CN=ZeroSSL RSA Domain Secure Site CA
/C=BE/O=GlobalSign nv-sa/CN=GlobalSign GCC R3 DV TLS CA 2020

Rather than try to work with raw issuers (because, as Andrew Ayer says, The SSL Certificate Issuer Field is a Lie), I mapped these issuers to the organisations that manage them, and summed the counts for those grouped issuers together.

The Data

Lieutenant Commander Data from Star Trek: The Next Generation
Insert obligatory "not THAT data" comment here

The end result of this work is the following table, sorted by the count of certificates which have been compromised by exposing their private key:

IssuerCompromised Count
Sectigo170
ISRG (Let's Encrypt)161
GoDaddy141
DigiCert81
GlobalSign46
Entrust3
SSL.com1

If you’re familiar with the CA ecosystem, you’ll probably recognise that the organisations with large numbers of compromised certificates are also those who issue a lot of certificates. So far, nothing particularly surprising, then.

Let’s look more closely at the relationships, though, to see if we can get more useful insights.

Volume Control

Using the issuance volume report from crt.sh, we can compare issuance volumes to compromise counts, to come up with a “compromise rate”. I’m using the “Unexpired Precertificates” colume from the issuance volume report, as I feel that’s the number that best matches the certificate population I’m examining to find compromised certificates. To maintain parity with the previous table, this one is still sorted by the count of certificates that have been compromised.

IssuerIssuance VolumeCompromised CountCompromise Rate
Sectigo88,323,0681701 in 519,547
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480
GoDaddy56,121,4291411 in 398,024
DigiCert144,713,475811 in 1,786,586
GlobalSign1,438,485461 in 31,271
Entrust23,16631 in 7,722
SSL.com171,81611 in 171,816

If we now sort this table by compromise rate, we can see which organisations have the most (and least) leakiness going on from their customers:

IssuerIssuance VolumeCompromised CountCompromise Rate
Entrust23,16631 in 7,722
GlobalSign1,438,485461 in 31,271
SSL.com171,81611 in 171,816
GoDaddy56,121,4291411 in 398,024
Sectigo88,323,0681701 in 519,547
DigiCert144,713,475811 in 1,786,586
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480

By grouping by order-of-magnitude in the compromise rate, we can identify three “bands”:

Now we have some useful insights we can think about.

Why Is It So?

Professor Julius Sumner Miller
If you don't know who Professor Julius Sumner Miller is, I highly recommend finding out

All of the organisations on the list, with the exception of Let’s Encrypt, are what one might term “traditional” CAs. To a first approximation, it’s reasonable to assume that the vast majority of the customers of these traditional CAs probably manage their certificates the same way they have for the past two decades or more. That is, they generate a key and CSR, upload the CSR to the CA to get a certificate, then copy the cert and key… somewhere. Since humans are handling the keys, there’s a higher risk of the humans using either risky practices, or making a mistake, and exposing the private key to the world.

Let’s Encrypt, on the other hand, issues all of its certificates using the ACME (Automatic Certificate Management Environment) protocol, and all of the Let’s Encrypt documentation encourages the use of software tools to generate keys, issue certificates, and install them for use. Given that Let’s Encrypt has 161 compromised certificates currently in the wild, it’s clear that the automation in use is far from perfect, but the significantly lower compromise rate suggests to me that lifecycle automation at least reduces the rate of key compromise, even though it doesn’t eliminate it completely.

It is true that all of the organisations in this analysis also provide ACME issuance workflows, should customers desire it. However, the “traditional CA” companies have been around a lot longer than ACME has, and so they probably acquired many of their customers before ACME existed.

Given that it’s incredibly hard to get humans to change the way they do things, once they have a way that “works”, it seems reasonable to assume that most of the certificates issued by these CAs are handled in the same human-centric, error-prone manner they always have been.

If organisations would like to refute this assumption, though, by sharing their data on ACME vs legacy issuance rates, I’m sure we’d all be extremely interested.

Explaining the Outlier

The difference in presumed issuance practices would seem to explain the significant difference in compromise rates between Let’s Encrypt and the other organisations, if it weren’t for one outlier. This is a largely “traditional” CA, with the manual-handling issues that implies, but with a compromise rate close to that of Let’s Encrypt.

We are, of course, talking about DigiCert.

The thing about DigiCert, that doesn’t show up in the raw numbers from crt.sh, is that DigiCert manages the issuance of certificates for several of the biggest “hosted TLS” providers, such as CloudFlare and AWS. When these services obtain a certificate from DigiCert on their customer’s behalf, the private key is kept locked away, and no human can (we hope) get access to the private key. This is supported by the fact that no certificates identifiably issued to either CloudFlare or AWS appear in the set of certificates with compromised keys.

When we ask for “all certificates issued by DigiCert”, we get both the certificates issued to these big providers, which are very good at keeping their keys under control, as well as the certificates issued to everyone else, whose key handling practices may not be quite so stringent.

It’s possible, though not trivial, to account for certificates issued to these “hosted TLS” providers, because the certificates they use are issued from intermediates “branded” to those companies. With the crt.sh psql interface we can run this query to get the total number of unexpired precertificates issued to these managed services:

SELECT SUM(sub.NUM_ISSUED[2] - sub.NUM_EXPIRED[2])
  FROM (
    SELECT ca.name, max(coalesce(coalesce(nullif(trim(cc.SUBORDINATE_CA_OWNER), ''), nullif(trim(cc.CA_OWNER), '')), cc.INCLUDED_CERTIFICATE_OWNER)) as OWNER,
           ca.NUM_ISSUED, ca.NUM_EXPIRED
      FROM ccadb_certificate cc, ca_certificate cac, ca
     WHERE cc.CERTIFICATE_ID = cac.CERTIFICATE_ID
       AND cac.CA_ID = ca.ID
  GROUP BY ca.ID
  ) sub
 WHERE sub.name ILIKE '%Amazon%' OR sub.name ILIKE '%CloudFlare%' AND sub.owner = 'DigiCert';

The number I get from running that query is 104,316,112, which should be subtracted from DigiCert’s total issuance figures to get a more accurate view of what DigiCert’s “regular” customers do with their private keys. When I do this, the compromise rates table, sorted by the compromise rate, looks like this:

IssuerIssuance VolumeCompromised CountCompromise Rate
Entrust23,16631 in 7,722
GlobalSign1,438,485461 in 31,271
SSL.com171,81611 in 171,816
GoDaddy56,121,4291411 in 398,024
"Regular" DigiCert40,397,363811 in 498,732
Sectigo88,323,0681701 in 519,547
All DigiCert144,713,475811 in 1,786,586
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480

In short, it appears that DigiCert’s regular customers are just as likely as GoDaddy or Sectigo customers to expose their private keys.

What Does It All Mean?

The takeaway from all this is fairly straightforward, and not overly surprising, I believe.

The less humans have to do with certificate issuance, the less likely they are to compromise that certificate by exposing the private key.

While it may not be surprising, it is nice to have some empirical evidence to back up the common wisdom.

Fully-managed TLS providers, such as CloudFlare, AWS Certificate Manager, and whatever Azure’s thing is called, is the platonic ideal of this principle: never give humans any opportunity to expose a private key. I’m not saying you should use one of these providers, but the security approach they have adopted appears to be the optimal one, and should be emulated universally.

The ACME protocol is the next best, in that there are a variety of standardised tools widely available that allow humans to take themselves out of the loop, but it’s still possible for humans to handle (and mistakenly expose) key material if they try hard enough.

Legacy issuance methods, which either cannot be automated, or require custom, per-provider automation to be developed, appear to be at least four times less helpful to the goal of avoiding compromise of the private key associated with a certificate.

Humans Are, Of Course, The Problem

Bender, the robot from Futurama, asking if we'd like to kill all humans
No thanks, Bender, I'm busy tonight

This observation – that if you don’t let humans near keys, they don’t get leaked – is further supported by considering the biggest issuers by volume who have not issued any certificates whose keys have been compromised: Google Trust Services (fourth largest issuer overall, with 57,084,529 unexpired precertificates), and Microsoft Corporation (sixth largest issuer overall, with 22,852,468 unexpired precertificates). It appears that somewhere between “most” and “basically all” of the certificates these organisations issue are to customers of their public clouds, and my understanding is that the keys for these certificates are managed in same manner as CloudFlare and AWS – the keys are locked away where humans can’t get to them.

It should, of course, go without saying that if a human can never have access to a private key, it makes it rather difficult for a human to expose it.

More broadly, if you are building something that handles sensitive or secret data, the more you can do to keep humans out of the loop, the better everything will be.

Your Support is Appreciated

If you’d like to see more analysis of how key compromise happens, and the lessons we can learn from examining billions of certificates, please show your support by buying me a refreshing beverage. Trawling CT logs is thirsty work.

Appendix: Methodology Limitations

In the interests of clarity, I feel it’s important to describe ways in which my research might be flawed. Here are the things I know of that may have impacted the accuracy, that I couldn’t feasibly account for.


Pwned Certificates on the Fediverse

Posted: Tue, 16 January 2024 | permalink | No comments

As well as the collection and distribution of compromised keys, the pwnedkeys project also matches those pwned keys against issued SSL certificates. I’m excited to announce that, as of the beginning of 2024, all matched certificates are now being published on the Fediverse, thanks to the botsin.space Mastodon server.

Want to know which sites are susceptible to interception and interference, in (near-)real time? Do you have a burning desire to know who is issuing certificates to people that post their private keys in public? Now you can.

How It Works

The process for publishing pwned certs is, roughly, as follows:

  1. All the certificates in Certificate Transparency (CT) logs are hoovered up (using my scrape-ct-log tool, the fastest log scraper in the west!), and the fingerprint of the public key of each certificate is stored in an LMDB datafile.

  2. As new private keys are identified as having been compromised, the fingerprint of that key is checked against all the LMDB files, which map key fingerprints to certificates (actually to CT log entry IDs, from which the certificates themselves are retrieved).

  3. If one or more matches are found, then the certificates using the compromised key are forwarded to the “tooter”, which publishes them for the world to marvel at.

This makes it sound all very straightforward, and it is… in theory. The trick comes in optimising the pipeline so that the five million or so new certificates every day can get indexed on the one slightly middle-aged server I’ve got, without getting backlogged.

Why Don’t You Just Have the Certificates Revoked?

Funny story about that…

I used to notify CAs of certificates they’d issued using compromised keys, which had the effect of requiring them to revoke the associated certificates. However, several CAs disliked having to revoke all those certificates, because it cost them staff time (and hence money) to do so. They went so far as to change their procedures from the standard way of accepting problem reports (emailing a generic attestation of compromise), and instead required CA-specific hoop-jumping to notify them of compromised keys.

Since the effectiveness of revocation in the WebPKI is, shall we say, “homeopathic” at best, I decided I couldn’t be bothered to play whack-a-mole with CAs that just wanted to be difficult, and I stopped sending compromised key notifications to CAs. Instead, now I’m publishing the details of compromised certificates to everyone, so that users can protect themselves directly should they choose to.

Further Work

The astute amongst you may have noticed, in the above “How It Works” description, a bit of a gap in my scanning coverage. CAs can (and do!) issue certificates for keys that are already compromised, including “weak” keys that have been known about for a decade or more (1, 2, 3). However, as currently implemented, the pwnedkeys certificate checker does not automatically find such certificates.

My plan is to augment the CT scraping / cert processing pipeline to check all incoming certificates against the existing (2M+) set of pwned keys. Though, with over five million new certificates to check every day, it’s not necessarily as simple as “just hit the pwnedkeys API for every new cert”. The poor old API server might not like that very much.

Support My Work

If you’d like to see this extra matching happen a bit quicker, I’ve setup a ko-fi supporters page, where you can support my work on pwnedkeys and the other open source software and projects I work on by buying me a refreshing beverage. I would be very appreciative, and your support lets me know I should do more interesting things with the giant database of compromised keys I’ve accumulated.


PostgreSQL Encryption: The Available Options

Posted: Tue, 7 November 2023 | permalink | 5 Comments

On an episode of Postgres FM, the hosts had a (very brief) discussion of data encryption in PostgreSQL. While Postgres FM is a podcast well worth a subscribe, the hosts aren’t data security experts, and so as someone who builds a queryable database encryption system, I found the coverage to be somewhat… lacking. I figured I’d provide a more complete survey of the available options for PostgreSQL-related data encryption.

The Status Quo

By default, when you install PostgreSQL, there is no data encryption at all. That means that anyone who gets access to any part of the system can read all the data they have access to.

This is, of course, not peculiar to PostgreSQL: basically everything works much the same way.

What’s stopping an attacker from nicking off with all your data is the fact that they can’t access the database at all. The things that are acting as protection are “perimeter” defences, like putting the physical equipment running the server in a secure datacenter, firewalls to prevent internet randos connecting to the database, and strong passwords.

This is referred to as “tortoise” security – it’s tough on the outside, but soft on the inside. Once that outer shell is cracked, the delicious, delicious data is ripe for the picking, and there’s absolutely nothing to stop a miscreant from going to town and making off with everything.

It’s a good idea to plan your defenses on the assumption you’re going to get breached sooner or later. Having good defence-in-depth includes denying the attacker to your data even if they compromise the database. This is where encryption comes in.

Storage-Layer Defences: Disk / Volume Encryption

To protect against the compromise of the storage that your database uses (physical disks, EBS volumes, and the like), it’s common to employ encryption-at-rest, such as full-disk encryption, or volume encryption. These mechanisms protect against “offline” attacks, but provide no protection while the system is actually running. And therein lies the rub: your database is always running, so encryption at rest typically doesn’t provide much value.

If you’re running physical systems, disk encryption is essential, but more to prevent accidental data loss, due to things like failing to wipe drives before disposing of them, rather than physical theft. In systems where volume encryption is only a tickbox away, it’s also worth enabling, if only to prevent inane questions from your security auditors. Relying solely on storage-layer defences, though, is very unlikely to provide any appreciable value in preventing data loss.

Database-Layer Defences: Transparent Database Encryption

If you’ve used proprietary database systems in high-security environments, you might have come across Transparent Database Encryption (TDE). There are also a couple of proprietary extensions for PostgreSQL that provide this functionality.

TDE is essentially encryption-at-rest implemented inside the database server. As such, it has much the same drawbacks as disk encryption: few real-world attacks are thwarted by it. There is a very small amount of additional protection, in that “physical” level backups (as produced by pg_basebackup) are protected, but the vast majority of attacks aren’t stopped by TDE. Any attacker who can access the database while it’s running can just ask for an SQL-level dump of the stored data, and they’ll get the unencrypted data quick as you like.

Application-Layer Defences: Field Encryption

If you want to take the database out of the threat landscape, you really need to encrypt sensitive data before it even gets near the database. This is the realm of field encryption, more commonly known as application-level encryption.

This technique involves encrypting each field of data before it is sent to be stored in the database, and then decrypting it again after it’s retrieved from the database. Anyone who gets the data from the database directly, whether via a backup or a direct connection, is out of luck: they can’t decrypt the data, and therefore it’s worthless.

There are, of course, some limitations of this technique.

For starters, every ORM and data mapper out there has rolled their own encryption format, meaning that there’s basically zero interoperability. This isn’t a problem if you build everything that accesses the database using a single framework, but if you ever feel the need to migrate, or use the database from multiple codebases, you’re likely in for a rough time.

The other big problem of traditional application-level encryption is that, when the database can’t understand what data its storing, it can’t run queries against that data. So if you want to encrypt, say, your users’ dates of birth, but you also need to be able to query on that field, you need to choose between one or the other: you can’t have both at the same time.

You may think to yourself, “but this isn’t any good, an attacker that breaks into my application can still steal all my data!”. That is true, but security is never binary. The name of the game is reducing the attack surface, making it harder for an attacker to succeed. If you leave all the data unencrypted in the database, an attacker can steal all your data by breaking into the database or by breaking into the application. Encrypting the data reduces the attacker’s options, and allows you to focus your resources on hardening the application against attack, safe in the knowledge that an attacker who gets into the database directly isn’t going to get anything valuable.

Sidenote: The Curious Case of pg_crypto

PostgreSQL ships a “contrib” module called pg_crypto, which provides encryption and decryption functions. This sounds ideal to use for encrypting data within our applications, as it’s available no matter what we’re using to write our application. It avoids the problem of framework-specific cryptography, because you call the same PostgreSQL functions no matter what language you’re using, which produces the same output.

However, I don’t recommend ever using pg_crypto’s data encryption functions, and I doubt you will find many other cryptographic engineers who will, either.

First up, and most horrifyingly, it requires you to pass the long-term keys to the database server. If there’s an attacker actively in the database server, they can capture the keys as they come in, which means all the data encrypted using that key is exposed. Sending the keys can also result in the keys ending up in query logs, both on the client and server, which is obviously a terrible result.

Less scary, but still very concerning, is that pg_crypto’s available cryptography is, to put it mildly, antiquated. We have a lot of newer, safer, and faster techniques for data encryption, that aren’t available in pg_crypto. This means that if you do use it, you’re leaving a lot on the table, and need to have skilled cryptographic engineers on hand to avoid the potential pitfalls.

In short: friends don’t let friends use pg_crypto.

The Future: Enquo

All this brings us to the project I run: Enquo. It takes application-layer encryption to a new level, by providing a language- and framework-agnostic cryptosystem that also enables encrypted data to be efficiently queried by the database.

So, you can encrypt your users’ dates of birth, in such a way that anyone with the appropriate keys can query the database to return, say, all users over the age of 18, but an attacker just sees unintelligible gibberish. This should greatly increase the amount of data that can be encrypted, and as the Enquo project expands its available data types and supported languages, the coverage of encrypted data will grow and grow. My eventual goal is to encrypt all data, all the time.

If this appeals to you, visit enquo.org to use or contribute to the open source project, or EnquoDB.com for commercial support and hosted database options.


Private Key Redaction: Redux

Posted: Mon, 12 June 2023 | permalink | No comments

[Note: the original version of this post named the author of the referenced blog post, and the tone of my writing could be construed to be mocking or otherwise belittling them. While that was not my intention, I recognise that was a possible interpretation, and I have revised this post to remove identifying information and try to neutralise the tone. On the other hand, I have kept the identifying details of the domain involved, as there are entirely legitimate security concerns that result from the issues discussed in this post.]

I have spoken before about why it is tricky to redact private keys. Although that post demonstrated a real-world, presumably-used-in-the-wild private key, I’ve been made aware of commentary along the lines of this representative sample:

I find it hard to believe that anyone would take their actual production key and redact it for documentation. Does the author have evidence of this in practice, or did they see example keys and assume they were redacted production keys?

Well, buckle up, because today’s post is another real-world case study, with rather higher stakes than the previous example.

When Helping Hurts

Today’s case study begins with someone who attempted to do a very good thing: they wrote a blog post about using HashiCorp Vault to store certificates and their private keys. In his post, they included some “test” data, a certificate and a private key, which they redacted.

Unfortunately, they did not redact these very well. Each base64 “blob” has had one line replaced with all xs. Based on the steps I explained previously, it is relatively straightforward to retrieve the entire, intact private key.

From Bad to OMFG

Now, if this post author had, say, generated a fresh private key (after all, there’s no shortage of possible keys), that would not be worthy of a blog post. As you may surmise, that is not what happened.

After reconstructing the insufficiently-redacted private key, you end up with a key that has a SHA256 fingerprint (in hex) of:

72bef096997ec59a671d540d75bd1926363b2097eb9fe10220b2654b1f665b54

Searching for certificates which use that key fingerprint, we find one result: a certificate for hiltonhotels.jp (and a bunch of other, related, domains, as subjectAltNames). As of the time of writing, that certificate is not marked as revoked, and appears to be the same certificate that is currently presented to visitors of that site.

This is, shall we say, not great.

Anyone in possession of this private key – which, I should emphasise, has presumably been public information since the post’s publication date of February 2023 – has the ability to completely transparently impersonate the sites listed in that certificate. That would provide an attacker with the ability to capture any data a user entered, such as personal information, passwords, or payment details, and also modify what the user’s browser received, including injecting malware or other unpleasantness.

In short, no good deed goes unpunished, and this attempt to educate the world at large about the benefits of secure key storage has instead published private key material. Remember, kids: friends don’t let friends post redacted private keys to the Internet.


dev-dependencies and Rust's unused_crate_dependencies lint

Posted: Sun, 30 April 2023 | permalink | No comments

I’m in the process of getting super-strict about the code quality of cretrit, the comparison-revealing encryption library that underlies the queryable encryption of the Enquo project. While I’m going to write a whole big thing about Rust linting in the future, I bumped across a rather gnarly problem that I thought was worth sharing separately. The problem, in short, is that the unused_crate_dependencies lint interacts badly with crates that are only needed for benchmarking, such as (in my case) criterion.

Rust has a whole bucketload of “lints” that can help your codebase adhere to certain standards, by warning (or exploding) if the problem is detected. The unused_crate_dependencies lint, as the name suggests, gets snippy when there’s a crate listed in your Cargo.toml that doesn’t appear to be used anywhere.

All well and good so far. However, while Rust has the ability to specify “crates needed for running the testsuite” (the [dev-dependencies] section of Cargo.toml) separately from “crates needed for actually using this thing” ([dependencies]), it doesn’t have a way to specify “crates needed for running the benchmarks”. That is a problem when you’re using something like criterion for benchmarking, because it doesn’t get refered to at all in your project code – it only gets used in the benchmarks.

When building your codebase for running the test suite, the compiler sees that you’ve specified criterion as one of your “testsuite dependencies”, but it never gets used in your testsuite. This causes the unused_crate_dependencies lint to lose its tiny mind, and make your build look ugly (or fail).

Thankfully, the solution is very simple, once you know the trick (this could be the unofficial theme song of the entire Rust ecosystem). You need to refer to the criterion crate somewhere in the code that gets built during the testsuite. The lint tells you most of what you need to do (like most things Rust, it tries hard to be helpful), but since it’s a development dependency, you need a little extra secret sauce.

All you need to do is add these two lines to the bottom of your src/lib.rs (or src/main.rs for a binary crate):

#[cfg(test)]
use criterion as _;

For the less Rust-literate, this means “when the build-time configuration flag test is set, import the criterion crate, but don’t, like, allow it to actually be referred to”. This is enough to make the lint believe that the dependency is being used, and making it only happen when the test build-time config flag is set avoids the ugliness of it trying to refer to the crate during regular builds (which would fail, because criterion is only a dev-dependency).

Simple? Yes. Did it take me a lot of skull-sweat to figure out? You betcha. That’s why I’m writing it down – even if everyone else already knows this, at least future Matt will find this post next time he trips over this problem.


Rutie and Magnus, Two Good Ways to Build Ruby Extensions in Rust

Posted: Tue, 18 April 2023 | permalink | No comments

I wrote the Ruby bindings for the Enquo Project, my attempt to bring queryable encryption to all databases, using the Rutie library. Recently, I’ve rewritten the bindings to use Magnus instead, and I thought I’d put down my thoughts about the whole situation.

The Story So Far

The Enquo Project core cryptography is all written in Rust, as seems to be the vogue these days. Rust is fast, safe, and easily interoperable with most of the rest of the modern software development ecosystem, making it a good choice as a language to implement the cryptographic primitives that Enquo needs, like Order-Revealing Encryption.

Of course, since not everyone writes their applications in Rust, we need to provide the functionality of the Enquo client in the languages that people do use, such as Ruby, Python, and so on. Since re-writing all that cryptographic code in a myriad of languages would be tedious and error-prone, we instead provide bindings to the “core” Rust code. These are just thin shims of code that translate the data types and function calls between Rust and the target language.

Shim in a Can
Wrong sort of shim, but canned language bindings would be handy

As I’m most familiar with Ruby and its development ecosystem (particularly Ruby on Rails), it was natural that I’d make Ruby bindings for Enquo as my first target. Rummaging around, it seemed that Rutie was a good library to use, so off I went.

What are Rutie and Magnus, Anyway?

Both libraries share the same goal: provide the ability to write some Rust code, run that through a compiler, and produce something that can be loaded by the Ruby interpreter and used just like any other Ruby class. They’re both fairly “high level” interfaces, trying to abstract away much of the gory details, and do a lot of the common “heavy lifting” that can make writing bindings fiddly and annoying. Things like mapping data types (like strings and integers) between Rust data types and the closest equivalents in Ruby.

This mapping never goes perfectly smoothly. For example, Ruby integers don’t have a fixed range of values they can represent – you can store a huge number like 2256 more-or-less as easily as you can the number 12. But Rust, being a lower-level language, only has a set of integer types that have fixed boundaries, like the u32 type, which can only store integers between zero and about four billion (232 - 1, to be precise).

There’s also lots of little things that need to be just right, also, like translating the different memory management approaches of the languages, and dealing with a myriad of fiddly little issues like passing arguments and return values in and out of method calls, helpers for defining classes and methods (and pointing to the correct Rust functions), and so on.

A mass of tangled pipes and valves
This is what I imagine it looks like inside these libraries
(Hervé Cozanet / Wikimedia Commons, CC-BY-SA)

All in all, these libraries are fairly significant pieces of work, and I’m mighty glad that someone else has taken on the job of building (and maintaining!) them.

So Why the Change?

Good question.

It’s important to say at the outset that there’s nothing particularly wrong with Rutie. I found using Rutie to be very straightforward, and the Ruby bindings came together very quickly and easily. If someone chose to use Rutie for their project, I’m sure they’d have a good experience.

What made me take the time to rewrite using Magnus was a set of a few tiny things, which together gave me enough of a shove to do the work.

Firstly, I’d had a hiccup with Rutie’s support of newer versions of Ruby, particularly 3.2 (PR). Also, I’d hit a couple of segfault issues, which were ultimately caused by Ruby garbage-collecting data out from underneath me. These were ultimately my fault, of course, but Rutie wasn’t helping me out in avoiding the problems in the first place.

Finally, while Rutie helped translate data types, there was still a bit of boilerplate and ugliness that needed to be included. This wasn’t a showstopper, but I’m appreciating the extra smoothness that Magnus provides here.

As an example, here’s what’s required in Rutie to get “native” Rust data types from Ruby method parameters (and the self reference to the current object):

fn enquo_field_decrypt_text(ciphertext_obj: RString, context_obj: RString) -> RString {
    let ciphertext = ciphertext_obj.to_str_unchecked();
    let context = context_obj.to_vec_u8_unchecked();

    let field = rbself.get_data(&*FIELD_WRAPPER);
    // etc etc etc

The equivalent in Magnus is just the function signature:

fn decrypt_text(&self, ciphertext: String, context: String) -> Result<String, magnus::Error> {

You can also see there that Magnus signals an exception via the Result return value, while Rutie’s approach to raising an exception involves poking the Ruby VM directly, which always struck me as a bit ugly.

There are several other minor things in Magnus (like its cleaner approach to wrapping structs so they can be stored in Ruby objects) that I’m appreciating, too. Never discount the power of ergonomics for making a happy developer.

The End Result

I spent a bit over half of last weekend doing the rewrite – maybe ten hours of so. Since Magnus did more type checking and data validation, and its approach to error handling was smoother, I took the opportunity to rewrite a bunch of Ruby “wrapper” code I’d written (which just existed to check things like ranges of values and string encodings) into Rust, as well.

To make sure that the conversion was accurate, I added a heap more unit tests to the bindings. I also took the opportunity to restructure the codebase to split the code for the different Ruby classes into separate files, which I hadn’t done initially as the code had originally accreted, rather than being purposefully written.

All up, though, my rewrite ended up removing over 60 lines (excluding the extra specs I added):

$ git diff --stat -- lib ext/enquo/src
 ruby/ext/enquo/src/field.rs       | 342 ++++++++++++++++++++++++++++++++++++++
 ruby/ext/enquo/src/lib.rs         | 338 ++++---------------------------------
 ruby/ext/enquo/src/root.rs        |  39 +++++
 ruby/ext/enquo/src/root_key.rs    |  67 ++++++++
 ruby/lib/enquo.rb                 |   6 +-
 ruby/lib/enquo/field.rb           | 173 -------------------
 ruby/lib/enquo/root.rb            |  28 ----
 ruby/lib/enquo/root_key.rb        |   1 -
 ruby/lib/enquo/root_key/static.rb |  27 ---
 9 files changed, 479 insertions(+), 542 deletions(-)

Considering that I was translating from a “higher level” language into a “lower level” one, the removal of so much code is quite remarkable. Magnus was able to automagically replace rather a lot of raise ArgumentError if something.isnt_right code in those .rb files.

So, in conclusion, if you, too, are building Ruby extensions in Rust, while Rutie is a solid choice (and you probably should stick with it if you’re already using it), I highly recommend giving Magnus a look for your next extension.


Database Encryption: If It's So Good, Why Isn't Everyone Doing It?

Posted: Fri, 7 April 2023 | permalink | No comments

a wordcloud of organisations who have been reported to have had data breaches in 2022
Just some of the organisations that leaked data in 2022

It seems like just about every day there’s another report of another company getting “hacked” and having its sensitive data (or, worse, the sensitive data of its customers) stolen. Sometimes, people’s most intimate information gets dumped for the world to see. Other times it’s “just” used for identity theft, extortion, and other crimes. In the least worst case, the attacker gets cold feet, but people suffer stress and inconvenience from having to replace identity documents.

A great way to protect information from being leaked is to encrypt it. We encrypt data while it’s being sent over the Internet (with TLS), and we encrypt it when it’s “at rest” (with disk or volume encryption). Yet, everyone’s data seems to still get stolen on a regular basis. Why?

Because the data is kept online in an unencrypted form, sitting in the database while its being used. This means that attackers can just connect to the database, or trick the application into dumping the database, and all the data is just lying there, waiting to be misused.

It’s Not the Devs’ Fault, Though

You may be thinking that leaving an entire database full of sensitive data unencrypted seems like a terrible idea. And you’re right: it is a terrible idea. But it’s seemingly unavoidable.

The problem is that in order to do what a database does best (query, sort, and aggregate data), it needs to be able to know what the data is. When you encrypt data, however, all the database sees is a locked box.

a locked box
Not very useful for a database

The database can’t tell what’s in the locked box – whether it’s a number equal to 42, or a date that’s less than 2023-01-01, or a string that contains the substring “foo”. Every value is just an opaque blob of “stuff”, and the database is rendered completely useless.

Since modern applications usually rely pretty heavily on their database, it’s essentially impossible to build an application if you’ve turned your database into a glorified flat-file by encrypting everything in it. Thus, it’s hardly surprising that developers have to leave the data laying around unencrypted, for anyone to come along and take.

Introducing Enquo

I said before that having data unencrypted in a database is seemingly unavoidable. That’s because there are some innovative cryptographic techniques that can make it possible to query encrypted data.

Andy Dwyer being amazed
Indeed

The purpose of the Enquo project is to provide a common set of cryptographic primitives that implement ENcrypted QUery Operations (ie “Enquo”), and integrate those operations into databases, ORMs, and anywhere else that could benefit. The end goal is to provide the ability to encrypt all the data stored in any database server, while still allowing the data to be queried and aggregated.

So far, the project consists of these components:

Support for other languages and ORMs is designed to be as straightforward as possible, and integration with other databases is mostly dependent on their own extensibility.

The project’s core tenets emphasise both uncompromising security, and a friendly developer experience.

Naturally, all Enquo code is open source, released under the MIT licence.

Would You Like To Know More?

Desire to know more intensifies
Everyone who uses a database...

If all this sounds relevant to your interests:

  1. If you use Ruby on Rails and PostgreSQL, you’re halfway home already. Follow the ActiveEnquo getting started tutorial and see how much of your data Enquo can already protect. When you find data you want to encrypt but can’t, tell me about it.

    • If you use Ruby and PostgreSQL with another ORM, such as Sequel, writing a plugin to support Enquo shouldn’t be too difficult. The ActiveEnquo code should give you a good start. If you get stuck, get in touch.
  2. If you use PostgreSQL with another programming language, tell me what language you use and we’ll work together to get bindings for that library created.

  3. If you use another database server, support is coming for your database of choice eventually, but at present there’s no timeline on support. On the off chance that you happen to be a hard-core database hacking expert, and would like to work on getting Enquo support in your preferred database server, I’d love to talk to you.


Discovering AWS IAM accounts

Posted: Thu, 7 October 2021 | permalink | No comments

Let’s say you’re someone who happens to discover an AWS account number, and would like to take a stab at guessing what IAM users might be valid in that account. Tricky problem, right? Not with this One Weird Trick!

In your own AWS account, create a KMS key and try to reference an ARN representing an IAM user in the other account as the principal. If the policy is accepted by PutKeyPolicy, then that IAM account exists, and if the error says “Policy contains a statement with one or more invalid principals” then the user doesn’t exist.

As an example, say you want to guess at IAM users in AWS account 111111111111. Then make sure this statement is in your key policy:

{
  "Sid": "Test existence of user",
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam::111111111111:user/bob"
  },
  "Action": "kms:DescribeKey",
  "Resource": "*"
}

If that policy is accepted, then the account has an IAM user named bob. Otherwise, the user doesn’t exist. Scripting this is left as an exercise for the reader.

Sadly, wildcards aren’t accepted in the username portion of the ARN, otherwise you could do some funky searching with ...:user/a*, ...:user/b*, etc. You can’t have everything; where would you put it all?

I did mention this to AWS as an account enumeration risk. They’re of the opinion that it’s a good thing you can know what users exist in random other AWS accounts. I guess that means this is a technique you can put in your toolbox safe in the knowledge it’ll work forever.

Given this is intended behaviour, I assume you don’t need to use a key policy for this, but that’s where I stumbled over it. Also, you can probably use it to enumerate roles and anything else that can be a principal, but since I don’t see as much use for that, I didn’t bother exploring it.

There you are, then. If you ever need to guess at IAM users in another AWS account, now you can!