2020-02-29

MongoDB Agrees: Field Level Encryption is Important

Keep gut healthy. Trust gut.

I often second-guess myself. The past couple years, I've been trying to follow my gut more often. When my gut is healthy, I find myself often confirming my initial assumptions.

Keep gut healthy. Trust gut.

Field Level Encryption (FLE?) for JSON serialization is one of those instances.

MongoDB Announces Field Level Encryption Feature

MongoDB added support for client-side field level encryption in their version 4.2 release, announced way back in June. Just today there was a post on LinkedIn from MongoDB's account linking to a new FAQ and webinar on the subject, which is how I realized that they, too, agree it can be a useful complement to data-at-rest encryption of particularly sensitive data, particularly personally-identifiable information of users:

Our implementation of FLE...

Great! So I should have trusted my gut on the FLE abbreviation.

...is totally separated from the database, making it transparent to the server, and instead handled exclusively within the MongoDB drivers on the client (hence the name Client-Side Field Level Encryption). All encrypted fields on the server - stored in-memory, in system logs, at-rest, and in backups - are rendered as ciphertext, making them unreadable to any party who does not have client access along with the keys necessary to decrypt the data.

This is a different and more comprehensive approach than the column encryption used in many relational databases. As most handle encryption server-side, data is still accessible to administrators who have access to the database instance itself, even if they have no client access privileges.

Exactly what I figured would be useful for JsonCryption. Good start!

Let's see if they validate anything else...

More Indirect Validations of JsonCryption from MongoDB

Where is FLE most useful for you?

Regulatory Compliance

FLE makes it easier to comply with “right to be forgotten” conditions in new privacy regulations such as the GDPR and the CCPA - simply destroy the customer key and the associated personal data is rendered useless.

Another key motivation for JsonCryption was to comply with GDPR and the CCPA. I like the angle for complying with the "right to be forgotten" mentioned here, though. I hadn't thought of that. To make this work, I'll have to tweak the creation of IDataProtector instances to allow better configuration, so that consumers of JsonCryption will have the ability to create a unique IDataProtector instance per user, if they wish.

What else we got?

Key Management Systems

With the addition of FLE, encryption keys are protected in an isolated, customer-managed KMS account. Atlas SREs and Product Engineers have no mechanism to access FLE KMS keys, rendering data unreadable to any MongoDB personnel managing the underlying database or infrastructure for you.

More confirmation of design decisions for JsonCryption! The primary reason I ended up going with the Microsoft.AspNetCore.DataProtection package for the actual encryption layer was to gain industry-standard KMS (Key Management System) functionality. This is essential for any serious consumers of JsonCryption.

Other questions I've been thinking about but haven't been able to dive in quite yet...

Performance

What is the performance impact of FLE?

JsonCryption is designed to be used as a plugin for JSON serialization. So speed matters. But encryption can be slow. That's why from the beginning, I've been planning a future post (or more) discussing benchmarking results (which I have yet to do).

Additionally, performance gains could be found by adding support for any other JSON serialization package than just Newtonsoft.JSON, which is notoriously slow. To that end, I'm currently in the middle of working on a pretty sweet implementation of a version to work with the blazing fast Utf8Json, described on its GitHub as being:

Definitely Fastest and Zero Allocation JSON Serializer for C#(NET, .NET Core, Unity, Xamarin).

They seem to back it up with solid benchmark results.

So far, this has been a lot fun as I've had the opportunity to explore new techniques, particularly in writing Expressions. To wire JsonCryption into Utf8Json, I need to do a significant amount of runtime generic method resolution and PropertyInfo getting and setting. I would typically do all of this with Reflection. Reflection is slow. It would completely defeat the purpose of using a very fast serializer (Utf8Json) but then chopping off its legs in the encryption layer by relying so heavily on Reflection.

So instead of using Reflection constantly, I'm using a bit of Reflection to build Expression trees, which I then compile at runtime into generic methods, getters, and setters, which are finally cached per type being serialized. It's not a new technique by any means - Jon Skeet blogged about a flavor (he would say "flavour") of it all the way back in 2008 - but it's new to me.

Anyway, I should have more on that soon.

Back to MongoDB...

FLE and Regular At-Rest Encryption are Complementary

What is the relationship between Client-Side FLE and regular at-rest encryption?

They are independent, but complementary to one another and address different threats. You can encrypt data at-rest as usual. Client-side FLE provides additional securing of data from bad actors on the server side or from unintentional data leaks.

This was a key motivation for JsonCryption from the beginning, as well. You might be able to satisfy the encryption requirements of GDPR with a basic encryption-at-rest policy, but then all a hacker has to do is get past your one layer of encryption and they have access to all of your data. On the contrary, with field-level encryption, even if they manage to hack your system and extract all of your data, they still have to hack multiple fields, which could theoretically each be protected by its own unique key.

Querying Encrypted Fields Poses Challenges

What sort of queries are supported against fields encrypted with Client-Side FLE?

You can perform equality queries on encrypted fields when using deterministic encryption. ...

JsonCryption was primarily designed with Marten in mind. With that, I knew that some sacrifices may need to be made when it comes to querying encrypted values. As of now, I haven't tested or played around with any scenarios involving querying encrypted fields. For my primary project that's using JsonCryption and Marten, my repositories aren't mature enough to know whether or not I'll need such capabilities. I've been lightly mulling it over in my mind, but for now I'm waiting until a concrete need arises before doing anything about it. In the meantime, if anybody is interested in exploring such things in JsonCryption, have at it, and remember that we take Pull Requests.

JsonCryption Wants to Support Multiple KMS's in the Future

Which key management solutions are compatible with Client-Side FLE?

...

We have designed client-side FLE to be agnostic to specific key management solutions, and plan on adding direct native support for Azure and GCP in the future.

As I mentioned earlier, this was a key motivation behind using the Microsoft.AspNetCore.DataProtection package under the covers to handle the encryption and key management duties. It could be even more flexible, of course. While Microsoft's package offers impressive functionality and an inviting API for defining key management policies, other libraries exist that perform similar and different functions. Adding a configuration API and an Adapter Layer in JsonCryption to support additional Key Management Systems could be a good future extension point.

Where can it be Used?

Where can I use Client-Side FLE?

For them, anywhere MongoDB is used (obviously). This sounds like a fantastic feature. If I was using MongoDB for my other project, I would abandon JsonCryption and use MongoDB's solution. I would also feel really stupid for spending time working on throwaway code.

However, I'm using PostgreSQL because I like Marten for what I'm doing, so I still need another solution. JsonCryption meets this need, and it's technically database-agnostic, as long as the JSON serializer for your project is configurable/customizable.

Off to a Good Start

I'm pretty excited reading this update from MongoDB (can you tell?). Partly because it's clear that FLE is an emerging thing, and partly because many of my early assumptions and motivations at the start of JsonCryption were validated by one of the most important global companies in the JSON serialization and storage space.

There's still a lot of work that needs done to make JsonCryption into what it could be, but I see the potential it has and get pretty excited. If anybody wants to help along the way, please reach out. JsonCryption deserves better developers than myself working on this.