Faster Reflection in .NET for JsonCryption.Utf8Json

  1. I needed to use Reflection to add support for Utf8Json to JsonCryption
  2. I wanted to support Utf8Json because it’s good and fast…
  3. … but, reflection in .NET is sloooowwww…

Thankfully, through C#’s Expression class, we can cache getters, setters, and methods that we discover via System.Reflection initially, so that we can use them in the future without going through System.Reflection each time thereafter.

I’m late to this game, as Jon Skeet first wrote about the technique back in 2008. And I believe others had written about it before him.

Adding support for Utf8Json

From a high-level view, I needed to provide an alternative implementation of Utf8Json.IJsonFormatterResolver, as well as implementations of Utf8Json.IJsonFormatter<T> in order to offer a similar usage API of JsonCryption:

using Utf8Json;

class Foo
{
    [Encrypt]
    public string LaunchCode { get; }
    ...
}

// setup
IJsonFormatterResolver encryptedResolver = new EncryptedResolver(…);

// serialize/deserialize
var myFoo = new Foo { LaunchCode = "password1" };
string json = JsonSerializer.Serialize(myFoo, encryptedResolver);
Foo deserialized = JsonSerializer.Deserialize<Foo>(json, encryptedResolver);

The implementation of IJsonFormatterResolver is trivial, just getting from a cache or creating an instance of IJsonFormatter<T> for each type T. The fun starts with the implementation of IJsonFormatter<T>.

First, an overview

Stepping back for a moment… I don’t want to write a JSON serializer. Whenever possible, JsonCryption should leverage the serialization logic of the given serializer, and only encrypt/decrypt at the correct point in the serialization chain. Something like this:

Without Encryption
  1. .NET Object (POCO)
  2. (serialize)
  3. JSON
  4. (deserialize)
  5. POCO
With Encryption
  1. POCO
  2. (serialize)
  3. JSON
  4. (encrypt)
  5. Encrypted JSON
  6. (decrypt)
  7. JSON
  8. (deserialize)
  9. POCO

Except, this isn’t exactly accurate since JsonCryption is doing Field Level Encryption (FLE). So as written, the encryption path shown above would produce a single blob of cipher text for the Encrypted JSON. We instead want a nice JSON document with only the encrypted Fields represented in cipher text:

{
  id: 123,
  launchCode: <cipher text here...>
}

So really, the process is something more like this:

  1. POCO
  2. (serialize)
  3. (resolve fields)
  4. (serialize/encrypt fields)
  5. JSON …
(serialize/encrypt fields) for a single field
  1. field
  2. (write JSON property name)
  3. (serialize data)
  4. JSON chunk
  5. (encrypt serialized data)
  6. cipher text
  7. (write cipher text as JSON value)

Like this, I (mostly) don’t have to worry about serializing/encrypting primitive, non-primitive, or user-defined objects. For example, if I have something like this…

class Foo
{
    [Encrypt]
    public Bar MyBar { get; }
}

class Bar
{
    public int Countdown { get; }
    public string Message { get; }
}

… then I will first get something like this during the serialization/encryption of MyBar

{ Countown: 99, Message: "Bottles of beer on the wall" }

Which itself is just a string, and therefore straightforward to encrypt, so that the final serialized form of Foo would be something like:

{
  MyBar: <cipher text here...>
}

Finally, since I only want to encrypt properties/fields on custom C# objects that are decorated with EncryptAttribute, I can safely cache an instance of IJsonFormatter<T> for each type that I serialize via JsonSerializer.Serialize(…). This is good news, and now we can begin the fun stuff…

EncryptedFormatter<T> : IJsonFormatter<T>

As mentioned earlier, for each type T, EncryptedFormatter<T> needs to get all properties and fields that should be serialized, serialize each one, encrypt those that should be encrypted, and write everything to the resulting JSON representation of T.

Getting the properties and fields

Getting a list of properties and fields to be serialized is easy with reflection. I can cache the list of resulting MemberInfo‘s to use each time. So far not bad.

Serialize each MemberInfo, encrypting when necessary

When serializing each one, however, some things I need to do include:

  • Get the value from the MemberInfo
  • Determine if it needs to be encrypted
  • Serialize (and possibly encrypt) the value

Get the value from the MemberInfo

With reflection, this is easy, but slow:

object value = fieldInfo.GetValue(instance);

We could be calling this getter many times in client code, so this should be optimized more for speed. Using .NET’s Expression library to build delegates at run-time has a much larger scope than this post, so I’m only going to show end results and maybe discuss a couple points of interest. For now, this was my resulting code to build a compiled delegate at run-time of the getter for a given MemberInfo (PropertyInfo or FieldInfo), so that I could cache it for reuse:

Func<object, object> BuildGetter(MemberInfo memberInfo, Type parentType)
{
    var parameter = Expression.Parameter(ObjectType, "obj");
    var typedParameter = Expression.Convert(parameter, parentType);
    var body = Expression.MakeMemberAccess(typedParameter, memberInfo);
    var objectifiedBody = Expression.Convert(body, ObjectType);
    var lambda = Expression.Lambda<Func<object, object>>(objectifiedBody, parameter);
    return lambda.Compile();
}

This gives me a delegate to use for this particular MemberInfo instance to get its value, bypassing the need to use reflection’s much slower GetValue(object instance) method:

// using reflection
object value = memberInfo.GetValue(instance);

// using the cached delegate
object value = cachedGetter(instance);

As others on the interwebs have mentioned when using this technique, it’s initially slow since we have to compile code at run-time. But after that, it’s essentially as fast as a direct access of the property or field.

Determine if it needs to be encrypted

This is trivial. Just check if it’s decorated by EncryptAttribute and cache that Boolean.

Serialize (and possibly encrypt) the value

Initially, I thought I could get away with just using Utf8Json’s dynamic support when serializing to avoid having to explicitly call the typed JsonSerializer.Serialize<T>(…) method for each MemberInfo. I got it to work for primitives, but not for more complex types.

Hence, I would need to once again use reflection to get the typed Serialize<T> method to use for each MemberInfo at run-time. Since reflection is slow, I also needed to cache this as a compiled delegate:

// signature: JsonSerializer.Serializer<T>(ref JsonWriter writer, T value, IJsonFormatterResolver resolver)

internal delegate void FallbackSerializer(
    ref JsonWriter writer,
    object value,
    IJsonFormatterResolver fallbackResolver);

FallbackSerializer BuildFallbackSerializer(Type type)
{
    var method = typeof(JsonSerializer)
        .GetMethods()
        .Where(m => m.Name == "Serialize")
        .Select(m => (MethodInfo: m, Params: m.GetParameters(), Args: m.GetGenericArguments()))
        .Where(x => x.Params.Length == 3)
        .Where(x => x.Params[0].ParameterType == typeof(JsonWriter).MakeByRefType())
        .Where(x => x.Params[1].ParameterType == x.Args[0])
        .Where(x => x.Params[2].ParameterType == typeof(IJsonFormatterResolver))
        .Single().MethodInfo;

    var generic = method.MakeGenericMethod(type);

    var writerExpr = Expression.Parameter(typeof(JsonWriter).MakeByRefType(), "writer");
    var valueExpr = Expression.Parameter(ObjectType, "obj");
    var resolverExpr = Expression.Parameter(typeof(IJsonFormatterResolver), "resolver");

    var typedValueExpr = Expression.Convert(valueExpr, type);
    var body = Expression.Call(generic, writerExpr, typedValueExpr, resolverExpr);
    var lambda = Expression.Lambda<FallbackSerializer>(body, writerExpr, valueExpr, resolverExpr);
    return lambda.Compile();
}

For this, I needed to use a custom delegate due to the JsonWriter being passed in by reference, which isn’t allowed with the built-in Func<>. Beyond that, everything else should more or less flow from what we did before with the MemberInfo getter.

Ultimately, this allowed me to do something like:

static void WriteDataMember(
    ref JsonWriter writer,
    T value,
    ExtendedMemberInfo memberInfo,
    IJsonFormatterResolver formatterResolver,
    IJsonFormatterResolver fallbackResolver,
    IDataProtector dataProtector)
{
    writer.WritePropertyName(memberInfo.Name);
    object memberValue = memberInfo.Getter(value);
    var valueToSerialize = memberInfo.ShouldEncrypt
        ? BuildEncryptedValue(memberValue, memberInfo, fallbackResolver, dataProtector)
        : BuildNormalValue(memberValue, memberInfo, memberInfo.HasNestedEncryptedMembers, formatterResolver);
    JsonSerializer.Serialize(ref writer, valueToSerialize, fallbackResolver);
}

static string BuildEncryptedValue(
    dynamic memberValue,
    ExtendedMemberInfo memberInfo,
    IJsonFormatterResolver fallbackResolver,
    IDataProtector dataProtector)
{
    var localWriter = new JsonWriter();
    memberInfo.FallbackSerializer(ref localWriter, memberValue, fallbackResolver);
    return dataProtector.Protect(localWriter.ToString());
}

static object BuildNormalValue(
    dynamic memberValue,
    ExtendedMemberInfo memberInfo,
    bool hasNestedEncryptedMembers,
    IJsonFormatterResolver formatterResolver)
{
    if (!hasNestedEncryptedMembers)
        return memberValue;

    var localWriter = new JsonWriter();
    memberInfo.FallbackSerializer(ref localWriter, memberValue, formatterResolver);
    return localWriter.ToString();
}

There are a couple things going on here…

First, I needed to use the localWriter when leaning on Utf8Json to serialize at the intermediate stage, because otherwise it would restart its internal JsonWriter when calling the JsonSerializer.Serialize(instance, fallbackResolver) overload. Things were very weird before I realized what was happening with this.

Second, you’ll see that I needed to do one additional special stage for properties that aren’t marked to be encrypted themselves. This is to take into account nested classes/structs whose children may themselves have encrypted members:

class FooParent
{
    public FooChild Child { get; }
}

class FooChild
{
    [Encrypt]
    public string LaunchCode { get; }
}

Because of the possibility of nesting, when building the cached EncryptedFormatter<T>, I also needed to traverse every nested property and field of T to determine if any were decorated by EncryptAttribute. If a nested member needs encrypted, then I need to encrypt T itself using the EncryptedResolver, eventually returning a JSON string. Otherwise, I could do the entire thing normally with the default Utf8Json resolver configured by the client, therefore only needing to return the original object directly.

Conclusion: All theory without benchmarking

Is this actually faster than using regular reflection? Did I make the code needlessly complicated?

Theoretically, it should be significantly faster, but until I actually benchmark it, I won’t know for sure.

I’ve been talking about benchmarking JsonCryption for a while now, so it will likely be the next thing I do on this project. Unfortunately, I have other projects going on that are more important, so I’m not sure when I’ll be able to get to it. I’m also not thrilled about slightly rewriting JsonCryption.Utf8Json to use reflection just so that I can benchmark it.

Encryption itself is slow. I expect the encryption part alone to be a very significant piece of the total time spent serializing a given object. But again, I won’t know until I look into it.

Finally, working on this port of JsonCryption taught me some new techniques that I would like to see incorporated into the version for Newtonsoft.Json. I’m guessing/hoping I might find some low hanging fruit to optimize that one a bit more.

Introducing JsonCryption!

I couldn’t find a useful .NET library for easy and robust JSON property-level encryption/decryption, so I made one.

The GitHub page covers more details, but this is the gist:

Installation:

Install-Package JsonCryption.Newtonsoft
// There's also a version for System.Text.Json, but the implementation
// for Newtonsoft.Json is better, owing to the greater feature surface
// and customizability of the latter at this time.

Configuration:

// pseudo code (assuming using Newtonsoft.Json for serialization)
container.Register<JsonSerializer>(() => new JsonSerializer()
{
    ContractResolver = new JsonCryptionContractResolver(container.Resolve<IDataProtectionProvider>())
});

Usage:

var myFoo = new Foo("some important value", "something very public");
class Foo
{
    [Encrypt]
    public string EncryptedString { get; }
  
    public string UnencryptedString { get; }

    public Foo(string encryptedString, string unencryptedString)
    {
        ...
    }
}
var serializer = // resolve JsonSerializer
using var textWriter = ...
serializer.Serialize(textWriter, myFoo);
// pseudo output: '{ "encryptedString": "akjdfkldjagkldhtlfkjk...", "UnencryptedString": "something very public" }'

Why I need JsonCryption

My main project (not fully operational) is a .NET Core app that handles contact information for users. Being on the OCD spectrum, I wanted this data to have stronger protection than just disk-level and/or database-level encryption.

Property/field-level encryption – in addition to disk-level and database-level encryption – sounded pretty nice. But I needed to be able to easily control which fields/properties were encrypted from each object.

This project is also using Marten, which uses PostgreSQL as a document DB. Marten stores documents (C# objects, essentially) in tables with explicit lookup columns, and one column for the JSON blob. From what I could tell, the best hook offered by Marten’s API to encrypt/decrypt documents automatically is at the point of serialization/deserialization by providing an alternative ISerializer. If I encrypted the entire blob, I wouldn’t be able to query anything very well. So I needed a way to leave certain columns unencrypted when serializing – the ones that would serve as lookups in queries.

Discovery path

First Stop: Newtonsoft.Json.Encryption

This library provided a lot of inspiration. It intends to be very easy to use by requiring a single EncryptAttribute to decorate what is to be encrypted, and it plugs into Newtonsoft.Json via the ContractResolver approach (similar to JsonCryption above).

However, I felt that it had a few fatal flaws that would make using it a more difficult than initially meets the eye.

That it doesn’t store the Init Vector with the generated ciphertext was a non-starter for me. This requires consumers of the library to figure out how and where to store it themselves. I’m not a cryptographic expert (use JsonCryption at your own risk!), but it seems pretty standard practice to include the IV with the ciphertext to enable later decryption with just the symmetric key. In any case, this would be a bigger issue after later discoveries.

Overriding JsonConverter

Next, I came across this blog post by Thomas Freudenberg that used a slightly different approach. Rather than provide a custom ContractResolver, he decorated each property needing encryption with a custom JsonConverter. His approach also offered a normal way to handle the Init Vectors.

public class Settings {
    [JsonConverter(typeof(EncryptingJsonConverter), "#my*S3cr3t")]
    public string Password { get; set; }
}

This was interesting, but would be annoying to have to type all of that for each property needing encryption. Also, I would obviously need a way to inject the secret into the converter, rather than hard-code it here.

Nevertheless, it gave me an idea for an approach to use with .NET Core’s new System.Text.Json library…

Initial Attempt for System.Text.Json

Microsoft recently released System.Text.Json with .NET Core 3.0 as an open-source alternative to the also-open-source Newtonsoft.Json, which had been the default JSON serialization library for .NET Core up to now. Wanting to be cutting edge, and not knowing much about this new library, I started writing my solution around this.

The library has decent documentation, is open-source (as already mentioned), and enables powerful serialization customization via an unsealed public JsonConverterAttribute. By overriding this with my own implementation, I could essentially implement Freudenberg’s approach with much less code:

public sealed class EncryptAttribute : JsonConverterAttribute
{
    public EncryptAttribute() : base(typeof(EncryptedJsonConverterFactory))
    {
    }
}

Then I just needed to write a custom EncryptedJsonConverterFactory to provide the correct converter given the datatype being serialized.

But this approach also carried critical issues…

  • Overriding the JsonConverterAttribute ultimately required using a Singleton pattern rather than clean Dependency Injection
  • System.Text.Json currently offers no ability to serialize non-public properties, nor fields of any visibility. For most DDD scenarios, this was also a non-starter.

Newtonsoft.Json

Newtonsoft.Json offers support for serializing private to public fields and properties. It’s a well-known mature library with a highly extensible API. It’s JsonConverterAttribute is currently sealed, so we can’t override that… but there are better options for configuring it, anyway, in order to take advantage of Dependency Injection and other better patterns than I was forced to use with System.Text.Json.

The good news is that the exercise of implementing a solution for System.Text.Json forced me to develop some core logic for converting different datatypes to and from byte arrays, which would come in handy for encrypting a wide variety of datatypes. Another issue with the other libraries and approaches I mentioned earlier is that they only handled a tiny number of potential datatypes. I wanted a set-and-forget solution that would work widely, so being able to convert all built-in types and any nested combination thereof was essential.

Adding support for Cryptography best practices

I began with a custom implementation and abstraction of the core Encrypter that I was using throughout the library. It was basic and structured largely using inspiration from the two approaches discussed earlier.

It worked.

But then I attended a great session at CodeMash 2020 called Practical Cryptography for Developers. Without getting into the weeds of cryptography, I was exposed for the first time to the concept of key/algorithm rotation and management and cryptographic best practices.

Writing these features into my library would take me far outside its immediate domain, and far outside my expertise. Surely, I thought, there must be some libraries that handle this already…

Switching to Microsoft.AspNetCore.DataProtection underneath

… yes, there is. Obviously.

The open-source package Microsoft.AspNetCore.DataProtection was designed to provide

a simple, easy to use cryptographic API a developer can use to protect data, including key management and rotation

https://docs.microsoft.com/en-us/aspnet/core/security/data-protection/introduction?view=aspnetcore-3.1

It’s highly configurable, easy to bootstrap, built to promote testability, and built for .NET Core. It handles key management and algorithm management, written by dedicated experts in the field.

So I used that instead of my own Encrypter.

Closing

In the end, I kept both the System.Text.Json implementation (JsonCryption.System.Text.Json), and the Newtonsoft.Json implementation (JsonCryption.Newtonsoft).

JsonCryption.Newtonsoft is better for the moment, allowing encryption/serialization of private to public fields and properties, shallow or nested, of (theoretically) any data type that is also serializable by Newtonsoft.Json.

Check it out. Try it out.

And tell me what you think needs changed to make it better.