On an episode of Postgres FM, the hosts had a (very brief) discussion of data encryption in PostgreSQL. While Postgres FM is a podcast well worth a subscribe, the hosts aren’t data security experts, and so as someone who builds a queryable database encryption system, I found the coverage to be somewhat… lacking. I figured I’d provide a more complete survey of the available options for PostgreSQL-related data encryption.
The Status Quo
By default, when you install PostgreSQL, there is no data encryption at all. That means that anyone who gets access to any part of the system can read all the data they have access to.
This is, of course, not peculiar to PostgreSQL: basically everything works much the same way.
What’s stopping an attacker from nicking off with all your data is the fact that they can’t access the database at all. The things that are acting as protection are “perimeter” defences, like putting the physical equipment running the server in a secure datacenter, firewalls to prevent internet randos connecting to the database, and strong passwords.
This is referred to as “tortoise” security – it’s tough on the outside, but soft on the inside. Once that outer shell is cracked, the delicious, delicious data is ripe for the picking, and there’s absolutely nothing to stop a miscreant from going to town and making off with everything.
It’s a good idea to plan your defenses on the assumption you’re going to get breached sooner or later. Having good defence-in-depth includes denying the attacker to your data even if they compromise the database. This is where encryption comes in.
Storage-Layer Defences: Disk / Volume Encryption
To protect against the compromise of the storage that your database uses (physical disks, EBS volumes, and the like), it’s common to employ encryption-at-rest, such as full-disk encryption, or volume encryption. These mechanisms protect against “offline” attacks, but provide no protection while the system is actually running. And therein lies the rub: your database is always running, so encryption at rest typically doesn’t provide much value.
If you’re running physical systems, disk encryption is essential, but more to prevent accidental data loss, due to things like failing to wipe drives before disposing of them, rather than physical theft. In systems where volume encryption is only a tickbox away, it’s also worth enabling, if only to prevent inane questions from your security auditors. Relying solely on storage-layer defences, though, is very unlikely to provide any appreciable value in preventing data loss.
Database-Layer Defences: Transparent Database Encryption
If you’ve used proprietary database systems in high-security environments, you might have come across Transparent Database Encryption (TDE). There are also a couple of proprietary extensions for PostgreSQL that provide this functionality.
TDE is essentially encryption-at-rest implemented inside the database server.
As such, it has much the same drawbacks as disk encryption: few real-world attacks are thwarted by it.
There is a very small amount of additional protection, in that “physical” level backups (as produced by
pg_basebackup) are protected, but the vast majority of attacks aren’t stopped by TDE.
Any attacker who can access the database while it’s running can just ask for an SQL-level dump of the stored data, and they’ll get the unencrypted data quick as you like.
Application-Layer Defences: Field Encryption
If you want to take the database out of the threat landscape, you really need to encrypt sensitive data before it even gets near the database. This is the realm of field encryption, more commonly known as application-level encryption.
This technique involves encrypting each field of data before it is sent to be stored in the database, and then decrypting it again after it’s retrieved from the database. Anyone who gets the data from the database directly, whether via a backup or a direct connection, is out of luck: they can’t decrypt the data, and therefore it’s worthless.
There are, of course, some limitations of this technique.
For starters, every ORM and data mapper out there has rolled their own encryption format, meaning that there’s basically zero interoperability. This isn’t a problem if you build everything that accesses the database using a single framework, but if you ever feel the need to migrate, or use the database from multiple codebases, you’re likely in for a rough time.
The other big problem of traditional application-level encryption is that, when the database can’t understand what data its storing, it can’t run queries against that data. So if you want to encrypt, say, your users’ dates of birth, but you also need to be able to query on that field, you need to choose between one or the other: you can’t have both at the same time.
You may think to yourself, “but this isn’t any good, an attacker that breaks into my application can still steal all my data!”. That is true, but security is never binary. The name of the game is reducing the attack surface, making it harder for an attacker to succeed. If you leave all the data unencrypted in the database, an attacker can steal all your data by breaking into the database or by breaking into the application. Encrypting the data reduces the attacker’s options, and allows you to focus your resources on hardening the application against attack, safe in the knowledge that an attacker who gets into the database directly isn’t going to get anything valuable.
Sidenote: The Curious Case of
PostgreSQL ships a “contrib” module called
pg_crypto, which provides encryption and decryption functions.
This sounds ideal to use for encrypting data within our applications, as it’s available no matter what we’re using to write our application.
It avoids the problem of framework-specific cryptography, because you call the same PostgreSQL functions no matter what language you’re using, which produces the same output.
However, I don’t recommend ever using
pg_crypto’s data encryption functions, and I doubt you will find many other cryptographic engineers who will, either.
First up, and most horrifyingly, it requires you to pass the long-term keys to the database server. If there’s an attacker actively in the database server, they can capture the keys as they come in, which means all the data encrypted using that key is exposed. Sending the keys can also result in the keys ending up in query logs, both on the client and server, which is obviously a terrible result.
Less scary, but still very concerning, is that
pg_crypto’s available cryptography is, to put it mildly, antiquated.
We have a lot of newer, safer, and faster techniques for data encryption, that aren’t available in
This means that if you do use it, you’re leaving a lot on the table, and need to have skilled cryptographic engineers on hand to avoid the potential pitfalls.
In short: friends don’t let friends use
The Future: Enquo
All this brings us to the project I run: Enquo. It takes application-layer encryption to a new level, by providing a language- and framework-agnostic cryptosystem that also enables encrypted data to be efficiently queried by the database.
So, you can encrypt your users’ dates of birth, in such a way that anyone with the appropriate keys can query the database to return, say, all users over the age of 18, but an attacker just sees unintelligible gibberish. This should greatly increase the amount of data that can be encrypted, and as the Enquo project expands its available data types and supported languages, the coverage of encrypted data will grow and grow. My eventual goal is to encrypt all data, all the time.