PostgreSQL Encryption: The Available Options

Posted: Tue, 7 November 2023 | permalink | 5 Comments

On an episode of Postgres FM, the hosts had a (very brief) discussion of data encryption in PostgreSQL. While Postgres FM is a podcast well worth a subscribe, the hosts aren’t data security experts, and so as someone who builds a queryable database encryption system, I found the coverage to be somewhat… lacking. I figured I’d provide a more complete survey of the available options for PostgreSQL-related data encryption.

The Status Quo

By default, when you install PostgreSQL, there is no data encryption at all. That means that anyone who gets access to any part of the system can read all the data they have access to.

This is, of course, not peculiar to PostgreSQL: basically everything works much the same way.

What’s stopping an attacker from nicking off with all your data is the fact that they can’t access the database at all. The things that are acting as protection are “perimeter” defences, like putting the physical equipment running the server in a secure datacenter, firewalls to prevent internet randos connecting to the database, and strong passwords.

This is referred to as “tortoise” security – it’s tough on the outside, but soft on the inside. Once that outer shell is cracked, the delicious, delicious data is ripe for the picking, and there’s absolutely nothing to stop a miscreant from going to town and making off with everything.

It’s a good idea to plan your defenses on the assumption you’re going to get breached sooner or later. Having good defence-in-depth includes denying the attacker to your data even if they compromise the database. This is where encryption comes in.

Storage-Layer Defences: Disk / Volume Encryption

To protect against the compromise of the storage that your database uses (physical disks, EBS volumes, and the like), it’s common to employ encryption-at-rest, such as full-disk encryption, or volume encryption. These mechanisms protect against “offline” attacks, but provide no protection while the system is actually running. And therein lies the rub: your database is always running, so encryption at rest typically doesn’t provide much value.

If you’re running physical systems, disk encryption is essential, but more to prevent accidental data loss, due to things like failing to wipe drives before disposing of them, rather than physical theft. In systems where volume encryption is only a tickbox away, it’s also worth enabling, if only to prevent inane questions from your security auditors. Relying solely on storage-layer defences, though, is very unlikely to provide any appreciable value in preventing data loss.

Database-Layer Defences: Transparent Database Encryption

If you’ve used proprietary database systems in high-security environments, you might have come across Transparent Database Encryption (TDE). There are also a couple of proprietary extensions for PostgreSQL that provide this functionality.

TDE is essentially encryption-at-rest implemented inside the database server. As such, it has much the same drawbacks as disk encryption: few real-world attacks are thwarted by it. There is a very small amount of additional protection, in that “physical” level backups (as produced by pg_basebackup) are protected, but the vast majority of attacks aren’t stopped by TDE. Any attacker who can access the database while it’s running can just ask for an SQL-level dump of the stored data, and they’ll get the unencrypted data quick as you like.

Application-Layer Defences: Field Encryption

If you want to take the database out of the threat landscape, you really need to encrypt sensitive data before it even gets near the database. This is the realm of field encryption, more commonly known as application-level encryption.

This technique involves encrypting each field of data before it is sent to be stored in the database, and then decrypting it again after it’s retrieved from the database. Anyone who gets the data from the database directly, whether via a backup or a direct connection, is out of luck: they can’t decrypt the data, and therefore it’s worthless.

There are, of course, some limitations of this technique.

For starters, every ORM and data mapper out there has rolled their own encryption format, meaning that there’s basically zero interoperability. This isn’t a problem if you build everything that accesses the database using a single framework, but if you ever feel the need to migrate, or use the database from multiple codebases, you’re likely in for a rough time.

The other big problem of traditional application-level encryption is that, when the database can’t understand what data its storing, it can’t run queries against that data. So if you want to encrypt, say, your users’ dates of birth, but you also need to be able to query on that field, you need to choose between one or the other: you can’t have both at the same time.

You may think to yourself, “but this isn’t any good, an attacker that breaks into my application can still steal all my data!”. That is true, but security is never binary. The name of the game is reducing the attack surface, making it harder for an attacker to succeed. If you leave all the data unencrypted in the database, an attacker can steal all your data by breaking into the database or by breaking into the application. Encrypting the data reduces the attacker’s options, and allows you to focus your resources on hardening the application against attack, safe in the knowledge that an attacker who gets into the database directly isn’t going to get anything valuable.

Sidenote: The Curious Case of pg_crypto

PostgreSQL ships a “contrib” module called pg_crypto, which provides encryption and decryption functions. This sounds ideal to use for encrypting data within our applications, as it’s available no matter what we’re using to write our application. It avoids the problem of framework-specific cryptography, because you call the same PostgreSQL functions no matter what language you’re using, which produces the same output.

However, I don’t recommend ever using pg_crypto’s data encryption functions, and I doubt you will find many other cryptographic engineers who will, either.

First up, and most horrifyingly, it requires you to pass the long-term keys to the database server. If there’s an attacker actively in the database server, they can capture the keys as they come in, which means all the data encrypted using that key is exposed. Sending the keys can also result in the keys ending up in query logs, both on the client and server, which is obviously a terrible result.

Less scary, but still very concerning, is that pg_crypto’s available cryptography is, to put it mildly, antiquated. We have a lot of newer, safer, and faster techniques for data encryption, that aren’t available in pg_crypto. This means that if you do use it, you’re leaving a lot on the table, and need to have skilled cryptographic engineers on hand to avoid the potential pitfalls.

In short: friends don’t let friends use pg_crypto.

The Future: Enquo

All this brings us to the project I run: Enquo. It takes application-layer encryption to a new level, by providing a language- and framework-agnostic cryptosystem that also enables encrypted data to be efficiently queried by the database.

So, you can encrypt your users’ dates of birth, in such a way that anyone with the appropriate keys can query the database to return, say, all users over the age of 18, but an attacker just sees unintelligible gibberish. This should greatly increase the amount of data that can be encrypted, and as the Enquo project expands its available data types and supported languages, the coverage of encrypted data will grow and grow. My eventual goal is to encrypt all data, all the time.

If this appeals to you, visit enquo.org to use or contribute to the open source project, or EnquoDB.com for commercial support and hosted database options.


5 Comments

From: Andy
2023-11-07 13:01

Thanks for the overview. I implemented something similar to pg_crypto a while ago, a set of encryption/decryption procedures in pg_python, that talked to encryption service on another machine (holding the keys in RAM only). So it was field encryption, but on database level. Leaves still some loop holes, but the PCI auditors liked the approach ;-)

From: Marco
2023-11-07 17:02

Enquo it’s like what MongoDB already provides to encrypt/decrypt data.

https://www.mongodb.com/docs/manual/core/queryable-encryption/fundamentals/encrypt-and-query/

From: Matt Palmer
2023-11-07 22:40

Yes, MongoDB does have some support for queryable encryption, but Enquo intends to support all databases, and is also completely open source.

From: Sergio Garcia
2023-11-08 20:33

Thanks for writing the project and the idea, I have concerns regarding safety of data especially in times of GDPR (and also our LGPD in Brazil).

It also concerns me (especially when filing security questionnaire) is to check if the solution implemented is actually safe or just to fill a check-mark, and with data encryption this bother me every time.

Using it for text means giving up all text operations in the database (starts with, ends with, reverse, full text search, unicode handling, etc) and end up with just ordering and equality.

Using ORE is a interesting approach for numeric data, but even for it I don’t see to much values considering the overhead necessary for storing the extra data for it to work.

In your i64 docs you mention 640 bytes for storing a single value, for comparison, libx509pq (the library behind crt.sh) can store a entire x509 certificate (public keys, certificate chains, etc) in the same storage size used by two ORE i64, without considering compression.

A dump of the database is a high security concern, but the same way the connection string can be leaked in the application scope the keys for encryption also can be.

From: Matt Palmer
2023-11-08 22:22

Hi Sergio, thanks for sharing your thoughts.

On the text search front, I will hopefully be able to make an announcement about that in the near future. Watch this space!

Yes, the storage overhead is an annoyance, but disk space is way cheaper than a data breach. If it turns out that storing data securely is not economically feasible, perhaps that’s a sign that the data shouldn’t be stored in the first place.

Keys don’t have to be stored locally; the plan (as soon as I get free time, or someone sponsors the work) is to provide pluggable key providers for secure key management services (AWS KMS, etc) where the data to be encrypted/decrypted is sent to the key service, so there’s no chance to leak the keys. Credentials to the key service could be leaked, but rotate the creds (and one should be using short-term creds anyway) and the attacker is locked out again. It’s also much easier to do (eg) anomaly detection on key accesses to discern indications of compromise.

Post a comment

All comments are held for moderation; markdown formatting accepted.

This is a honeypot form. Do not use this form unless you want to get your IP address blacklisted. Use the second form below for comments.
Name: (required)
E-mail: (required, not published)
Website: (optional)
Name: (required)
E-mail: (required, not published)
Website: (optional)