Faster Reflection in .NET for JsonCryption.Utf8Json

  1. I needed to use Reflection to add support for Utf8Json to JsonCryption
  2. I wanted to support Utf8Json because it’s good and fast…
  3. … but, reflection in .NET is sloooowwww…

Thankfully, through C#’s Expression class, we can cache getters, setters, and methods that we discover via System.Reflection initially, so that we can use them in the future without going through System.Reflection each time thereafter.

I’m late to this game, as Jon Skeet first wrote about the technique back in 2008. And I believe others had written about it before him.

Adding support for Utf8Json

From a high-level view, I needed to provide an alternative implementation of Utf8Json.IJsonFormatterResolver, as well as implementations of Utf8Json.IJsonFormatter<T> in order to offer a similar usage API of JsonCryption:

using Utf8Json;

class Foo
{
    [Encrypt]
    public string LaunchCode { get; }
    ...
}

// setup
IJsonFormatterResolver encryptedResolver = new EncryptedResolver(…);

// serialize/deserialize
var myFoo = new Foo { LaunchCode = "password1" };
string json = JsonSerializer.Serialize(myFoo, encryptedResolver);
Foo deserialized = JsonSerializer.Deserialize<Foo>(json, encryptedResolver);

The implementation of IJsonFormatterResolver is trivial, just getting from a cache or creating an instance of IJsonFormatter<T> for each type T. The fun starts with the implementation of IJsonFormatter<T>.

First, an overview

Stepping back for a moment… I don’t want to write a JSON serializer. Whenever possible, JsonCryption should leverage the serialization logic of the given serializer, and only encrypt/decrypt at the correct point in the serialization chain. Something like this:

Without Encryption
  1. .NET Object (POCO)
  2. (serialize)
  3. JSON
  4. (deserialize)
  5. POCO
With Encryption
  1. POCO
  2. (serialize)
  3. JSON
  4. (encrypt)
  5. Encrypted JSON
  6. (decrypt)
  7. JSON
  8. (deserialize)
  9. POCO

Except, this isn’t exactly accurate since JsonCryption is doing Field Level Encryption (FLE). So as written, the encryption path shown above would produce a single blob of cipher text for the Encrypted JSON. We instead want a nice JSON document with only the encrypted Fields represented in cipher text:

{
  id: 123,
  launchCode: <cipher text here...>
}

So really, the process is something more like this:

  1. POCO
  2. (serialize)
  3. (resolve fields)
  4. (serialize/encrypt fields)
  5. JSON …
(serialize/encrypt fields) for a single field
  1. field
  2. (write JSON property name)
  3. (serialize data)
  4. JSON chunk
  5. (encrypt serialized data)
  6. cipher text
  7. (write cipher text as JSON value)

Like this, I (mostly) don’t have to worry about serializing/encrypting primitive, non-primitive, or user-defined objects. For example, if I have something like this…

class Foo
{
    [Encrypt]
    public Bar MyBar { get; }
}

class Bar
{
    public int Countdown { get; }
    public string Message { get; }
}

… then I will first get something like this during the serialization/encryption of MyBar

{ Countown: 99, Message: "Bottles of beer on the wall" }

Which itself is just a string, and therefore straightforward to encrypt, so that the final serialized form of Foo would be something like:

{
  MyBar: <cipher text here...>
}

Finally, since I only want to encrypt properties/fields on custom C# objects that are decorated with EncryptAttribute, I can safely cache an instance of IJsonFormatter<T> for each type that I serialize via JsonSerializer.Serialize(…). This is good news, and now we can begin the fun stuff…

EncryptedFormatter<T> : IJsonFormatter<T>

As mentioned earlier, for each type T, EncryptedFormatter<T> needs to get all properties and fields that should be serialized, serialize each one, encrypt those that should be encrypted, and write everything to the resulting JSON representation of T.

Getting the properties and fields

Getting a list of properties and fields to be serialized is easy with reflection. I can cache the list of resulting MemberInfo‘s to use each time. So far not bad.

Serialize each MemberInfo, encrypting when necessary

When serializing each one, however, some things I need to do include:

  • Get the value from the MemberInfo
  • Determine if it needs to be encrypted
  • Serialize (and possibly encrypt) the value

Get the value from the MemberInfo

With reflection, this is easy, but slow:

object value = fieldInfo.GetValue(instance);

We could be calling this getter many times in client code, so this should be optimized more for speed. Using .NET’s Expression library to build delegates at run-time has a much larger scope than this post, so I’m only going to show end results and maybe discuss a couple points of interest. For now, this was my resulting code to build a compiled delegate at run-time of the getter for a given MemberInfo (PropertyInfo or FieldInfo), so that I could cache it for reuse:

Func<object, object> BuildGetter(MemberInfo memberInfo, Type parentType)
{
    var parameter = Expression.Parameter(ObjectType, "obj");
    var typedParameter = Expression.Convert(parameter, parentType);
    var body = Expression.MakeMemberAccess(typedParameter, memberInfo);
    var objectifiedBody = Expression.Convert(body, ObjectType);
    var lambda = Expression.Lambda<Func<object, object>>(objectifiedBody, parameter);
    return lambda.Compile();
}

This gives me a delegate to use for this particular MemberInfo instance to get its value, bypassing the need to use reflection’s much slower GetValue(object instance) method:

// using reflection
object value = memberInfo.GetValue(instance);

// using the cached delegate
object value = cachedGetter(instance);

As others on the interwebs have mentioned when using this technique, it’s initially slow since we have to compile code at run-time. But after that, it’s essentially as fast as a direct access of the property or field.

Determine if it needs to be encrypted

This is trivial. Just check if it’s decorated by EncryptAttribute and cache that Boolean.

Serialize (and possibly encrypt) the value

Initially, I thought I could get away with just using Utf8Json’s dynamic support when serializing to avoid having to explicitly call the typed JsonSerializer.Serialize<T>(…) method for each MemberInfo. I got it to work for primitives, but not for more complex types.

Hence, I would need to once again use reflection to get the typed Serialize<T> method to use for each MemberInfo at run-time. Since reflection is slow, I also needed to cache this as a compiled delegate:

// signature: JsonSerializer.Serializer<T>(ref JsonWriter writer, T value, IJsonFormatterResolver resolver)

internal delegate void FallbackSerializer(
    ref JsonWriter writer,
    object value,
    IJsonFormatterResolver fallbackResolver);

FallbackSerializer BuildFallbackSerializer(Type type)
{
    var method = typeof(JsonSerializer)
        .GetMethods()
        .Where(m => m.Name == "Serialize")
        .Select(m => (MethodInfo: m, Params: m.GetParameters(), Args: m.GetGenericArguments()))
        .Where(x => x.Params.Length == 3)
        .Where(x => x.Params[0].ParameterType == typeof(JsonWriter).MakeByRefType())
        .Where(x => x.Params[1].ParameterType == x.Args[0])
        .Where(x => x.Params[2].ParameterType == typeof(IJsonFormatterResolver))
        .Single().MethodInfo;

    var generic = method.MakeGenericMethod(type);

    var writerExpr = Expression.Parameter(typeof(JsonWriter).MakeByRefType(), "writer");
    var valueExpr = Expression.Parameter(ObjectType, "obj");
    var resolverExpr = Expression.Parameter(typeof(IJsonFormatterResolver), "resolver");

    var typedValueExpr = Expression.Convert(valueExpr, type);
    var body = Expression.Call(generic, writerExpr, typedValueExpr, resolverExpr);
    var lambda = Expression.Lambda<FallbackSerializer>(body, writerExpr, valueExpr, resolverExpr);
    return lambda.Compile();
}

For this, I needed to use a custom delegate due to the JsonWriter being passed in by reference, which isn’t allowed with the built-in Func<>. Beyond that, everything else should more or less flow from what we did before with the MemberInfo getter.

Ultimately, this allowed me to do something like:

static void WriteDataMember(
    ref JsonWriter writer,
    T value,
    ExtendedMemberInfo memberInfo,
    IJsonFormatterResolver formatterResolver,
    IJsonFormatterResolver fallbackResolver,
    IDataProtector dataProtector)
{
    writer.WritePropertyName(memberInfo.Name);
    object memberValue = memberInfo.Getter(value);
    var valueToSerialize = memberInfo.ShouldEncrypt
        ? BuildEncryptedValue(memberValue, memberInfo, fallbackResolver, dataProtector)
        : BuildNormalValue(memberValue, memberInfo, memberInfo.HasNestedEncryptedMembers, formatterResolver);
    JsonSerializer.Serialize(ref writer, valueToSerialize, fallbackResolver);
}

static string BuildEncryptedValue(
    dynamic memberValue,
    ExtendedMemberInfo memberInfo,
    IJsonFormatterResolver fallbackResolver,
    IDataProtector dataProtector)
{
    var localWriter = new JsonWriter();
    memberInfo.FallbackSerializer(ref localWriter, memberValue, fallbackResolver);
    return dataProtector.Protect(localWriter.ToString());
}

static object BuildNormalValue(
    dynamic memberValue,
    ExtendedMemberInfo memberInfo,
    bool hasNestedEncryptedMembers,
    IJsonFormatterResolver formatterResolver)
{
    if (!hasNestedEncryptedMembers)
        return memberValue;

    var localWriter = new JsonWriter();
    memberInfo.FallbackSerializer(ref localWriter, memberValue, formatterResolver);
    return localWriter.ToString();
}

There are a couple things going on here…

First, I needed to use the localWriter when leaning on Utf8Json to serialize at the intermediate stage, because otherwise it would restart its internal JsonWriter when calling the JsonSerializer.Serialize(instance, fallbackResolver) overload. Things were very weird before I realized what was happening with this.

Second, you’ll see that I needed to do one additional special stage for properties that aren’t marked to be encrypted themselves. This is to take into account nested classes/structs whose children may themselves have encrypted members:

class FooParent
{
    public FooChild Child { get; }
}

class FooChild
{
    [Encrypt]
    public string LaunchCode { get; }
}

Because of the possibility of nesting, when building the cached EncryptedFormatter<T>, I also needed to traverse every nested property and field of T to determine if any were decorated by EncryptAttribute. If a nested member needs encrypted, then I need to encrypt T itself using the EncryptedResolver, eventually returning a JSON string. Otherwise, I could do the entire thing normally with the default Utf8Json resolver configured by the client, therefore only needing to return the original object directly.

Conclusion: All theory without benchmarking

Is this actually faster than using regular reflection? Did I make the code needlessly complicated?

Theoretically, it should be significantly faster, but until I actually benchmark it, I won’t know for sure.

I’ve been talking about benchmarking JsonCryption for a while now, so it will likely be the next thing I do on this project. Unfortunately, I have other projects going on that are more important, so I’m not sure when I’ll be able to get to it. I’m also not thrilled about slightly rewriting JsonCryption.Utf8Json to use reflection just so that I can benchmark it.

Encryption itself is slow. I expect the encryption part alone to be a very significant piece of the total time spent serializing a given object. But again, I won’t know until I look into it.

Finally, working on this port of JsonCryption taught me some new techniques that I would like to see incorporated into the version for Newtonsoft.Json. I’m guessing/hoping I might find some low hanging fruit to optimize that one a bit more.

MongoDB Agrees: Field Level Encryption is Important

I often second-guess myself. The past couple years, I’ve been trying to follow my gut more often. When my gut is healthy, I find myself often confirming my initial assumptions.

Keep gut healthy. Trust gut.

Field Level Encryption (FLE?) for JSON serialization is one of those instances.

MongoDB Announces Field Level Encryption Feature

MongoDB added support for client-side field level encryption in their version 4.2 release, announced way back in June. Just today there was a post on LinkedIn from MongoDB’s account linking to a new FAQ and webinar on the subject, which is how I realized that they, too, agree it can be a useful complement to data-at-rest encryption of particularly sensitive data, particularly personally-identifiable information of users:

Our implementation of FLE…

Great! So I should have trusted my gut on the FLE abbreviation.

…is totally separated from the database, making it transparent to the server, and instead handled exclusively within the MongoDB drivers on the client (hence the name Client-Side Field Level Encryption). All encrypted fields on the server – stored in-memory, in system logs, at-rest, and in backups – are rendered as ciphertext, making them unreadable to any party who does not have client access along with the keys necessary to decrypt the data.

This is a different and more comprehensive approach than the column encryption used in many relational databases. As most handle encryption server-side, data is still accessible to administrators who have access to the database instance itself, even if they have no client access privileges.

Exactly what I figured would be useful for JsonCryption. Good start!

Let’s see if they validate anything else…

More Indirect Validations of JsonCryption from MongoDB

Regulatory Compliance

Where is FLE most useful for you?

Regulatory Compliance

FLE makes it easier to comply with “right to be forgotten” conditions in new privacy regulations such as the GDPR and the CCPA – simply destroy the customer key and the associated personal data is rendered useless.

Another key motivation for JsonCryption was to comply with GDPR and the CCPA. I like the angle for complying with the “right to be forgotten” mentioned here, though. I hadn’t thought of that. To make this work, I’ll have to tweak the creation of IDataProtector instances to allow better configuration, so that consumers of JsonCryption will have the ability to create a unique IDataProtector instance per user, if they wish.

What else we got?

Key Management Systems

With the addition of FLE, encryption keys are protected in an isolated, customer-managed KMS account. Atlas SREs and Product Engineers have no mechanism to access FLE KMS keys, rendering data unreadable to any MongoDB personnel managing the underlying database or infrastructure for you.

More confirmation of design decisions for JsonCryption! The primary reason I ended up going with the Microsoft.AspNetCore.DataProtection package for the actual encryption layer was to gain industry-standard KMS (Key Management System) functionality. This is essential for any serious consumers of JsonCryption.

Other questions I’ve been thinking about but haven’t been able to dive in quite yet…

Performance

What is the performance impact of FLE?

JsonCryption is designed to be used as a plugin for JSON serialization. So speed matters. But encryption can be slow. That’s why from the beginning, I’ve been planning a future post (or more) discussing benchmarking results (which I have yet to do).

Additionally, performance gains could be found by adding support for any other JSON serialization package than just Newtonsoft.JSON, which is notoriously slow. To that end, I’m currently in the middle of working on a pretty sweet implementation of a version to work with the blazing fast Utf8Json, described on its GitHub as being:

Definitely Fastest and Zero Allocation JSON Serializer for C#(NET, .NET Core, Unity, Xamarin).

They seem to back it up with solid benchmark results.

image
https://github.com/neuecc/Utf8Json/

So far, this has been a lot fun as I’ve had the opportunity to explore new techniques, particularly in writing Expressions. To wire JsonCryption into Utf8Json, I need to do a significant amount of runtime generic method resolution and PropertyInfo getting and setting. I would typically do all of this with Reflection. Reflection is slow. It would completely defeat the purpose of using a very fast serializer (Utf8Json) but then chopping off its legs in the encryption layer by relying so heavily on Reflection.

So instead of using Reflection constantly, I’m using a bit of Reflection to build Expression trees, which I then compile at runtime into generic methods, getters, and setters, which are finally cached per type being serialized. It’s not a new technique by any means – Jon Skeet blogged about a flavor (he would say “flavour”) of it all the way back in 2008 – but it’s new to me.

Anyway, I should have more on that soon.

Back to MongoDB…

FLE and Regular At-Rest Encryption are Complementary

What is the relationship between Client-Side FLE and regular at-rest encryption?

They are independent, but complementary to one another and address different threats. You can encrypt data at-rest as usual. Client-side FLE provides additional securing of data from bad actors on the server side or from unintentional data leaks.

This was a key motivation for JsonCryption from the beginning, as well. You might be able to satisfy the encryption requirements of GDPR with a basic encryption-at-rest policy, but then all a hacker has to do is get past your one layer of encryption and they have access to all of your data. On the contrary, with field-level encryption, even if they manage to hack your system and extract all of your data, they still have to hack multiple fields, which could theoretically each be protected by its own unique key.

Querying Encrypted Fields Poses Challenges

What sort of queries are supported against fields encrypted with Client-Side FLE?

You can perform equality queries on encrypted fields when using deterministic encryption. …

JsonCryption was primarily designed with Marten in mind. With that, I knew that some sacrifices may need to be made when it comes to querying encrypted values. As of now, I haven’t tested or played around with any scenarios involving querying encrypted fields. For my primary project that’s using JsonCryption and Marten, my repositories aren’t mature enough to know whether or not I’ll need such capabilities. I’ve been lightly mulling it over in my mind, but for now I’m waiting until a concrete need arises before doing anything about it. In the meantime, if anybody is interested in exploring such things in JsonCryption, have at it, and remember that we take Pull Requests.

JsonCryption Wants to Support Multiple KMS’s in the Future

Which key management solutions are compatible with Client-Side FLE?

We have designed client-side FLE to be agnostic to specific key management solutions, and plan on adding direct native support for Azure and GCP in the future.

As I mentioned earlier, this was a key motivation behind using the Microsoft.AspNetCore.DataProtection package under the covers to handle the encryption and key management duties. It could be even more flexible, of course. While Microsoft’s package offers impressive functionality and an inviting API for defining key management policies, other libraries exist that perform similar and different functions. Adding a configuration API and an Adapter Layer in JsonCryption to support additional Key Management Systems could be a good future extension point.

Where can it be Used?

Where can I use Client-Side FLE?

For them, anywhere MongoDB is used (obviously). This sounds like a fantastic feature. If I was using MongoDB for my other project, I would abandon JsonCryption and use MongoDB’s solution. I would also feel really stupid for spending time working on throwaway code.

However, I’m using PostgreSQL because I like Marten for what I’m doing, so I still need another solution. JsonCryption meets this need, and it’s technically database-agnostic, as long as the JSON serializer for your project is configurable/customizable.

Off to a Good Start

I’m pretty excited reading this update from MongoDB (can you tell?). Partly because it’s clear that FLE is an emerging thing, and partly because many of my early assumptions and motivations at the start of JsonCryption were validated by one of the most important global companies in the JSON serialization and storage space.

There’s still a lot of work that needs done to make JsonCryption into what it could be, but I see the potential it has and get pretty excited. If anybody wants to help along the way, please reach out. JsonCryption deserves better developers than myself working on this.