2020-03-13

Faster Reflection in .NET for JsonCryption.Utf8Json

I needed to use Reflection to add support for Utf8Json to JsonCryption. I wanted to support Utf8Json because it's good and fast... but, reflection in .NET is sloooowwww…

Thankfully, through C#'s Expression class, we can cache getters, setters, and methods that we discover via System.Reflection initially, so that we can use them in the future without going through System.Reflection each time thereafter.

I'm late to this game, as Jon Skeet first wrote about the technique back in 2008. And I believe others had written about it before him.

Adding support for Utf8Json

From a high-level view, I needed to provide an alternative implementation of Utf8Json.IJsonFormatterResolver, as well as implementations of Utf8Json.IJsonFormatter<T> in order to offer a similar usage API of JsonCryption:

using Utf8Json;

class Foo
{
    [Encrypt]
    public string LaunchCode { get; }
    ...
}

// setup
IJsonFormatterResolver encryptedResolver = new EncryptedResolver(…);

// serialize/deserialize
var myFoo = new Foo { LaunchCode = "password1" };
string json = JsonSerializer.Serialize(myFoo, encryptedResolver);
Foo deserialized = JsonSerializer.Deserialize<Foo>(json, encryptedResolver);

The implementation of IJsonFormatterResolver is trivial, just getting from a cache or creating an instance of IJsonFormatter<T> for each type T. The fun starts with the implementation of IJsonFormatter<T>.

First, an overview

Stepping back for a moment... I don't want to write a JSON serializer. Whenever possible, JsonCryption should leverage the serialization logic of the given serializer, and only encrypt/decrypt at the correct point in the serialization chain. Something like this:

Without Encryption

  • .NET Object (POCO)
  • (serialize)
  • JSON
  • (deserialize)
  • POCO

With Encryption

  • POCO
  • (serialize)
  • JSON
  • (encrypt)
  • Encrypted JSON
  • (decrypt)
  • JSON
  • (deserialize)
  • POCO

Except, this isn't exactly accurate since JsonCryption is doing Field Level Encryption (FLE). So as written, the encryption path shown above would produce a single blob of cipher text for the Encrypted JSON. We instead want a nice JSON document with only the encrypted Fields represented in cipher text:

{
  id: 123,
  launchCode: <cipher text here...>
}

So really, the process is something more like this:

  • POCO
  • (serialize)
  • (resolve fields)
  • (serialize/encrypt fields) -> S
  • JSON ...

(serialize/encrypt fields) for a single field

  • S. field
  • S. (write JSON property name)
  • S. (serialize data)
  • S. JSON chunk
  • S. (encrypt serialized data)
  • S. cipher text
  • S. (write cipher text as JSON value)

Like this, I (mostly) don't have to worry about serializing/encrypting primitive, non-primitive, or user-defined objects. For example, if I have something like this...

class Foo
{
    [Encrypt]
    public Bar MyBar { get; }
}

class Bar
{
    public int Countdown { get; }
    public string Message { get; }
}

… then I will first get something like this during the serialization/encryption of MyBar...

{ Countown: 99, Message: "Bottles of beer on the wall" }

Which itself is just a string, and therefore straightforward to encrypt, so that the final serialized form of Foo would be something like:

{
  MyBar: <cipher text here...>
}

Finally, since I only want to encrypt properties/fields on custom C# objects that are decorated with EncryptAttribute, I can safely cache an instance of IJsonFormatter<T> for each type that I serialize via JsonSerializer.Serialize(…). This is good news, and now we can begin the fun stuff...

EncryptedFormatter<T> : IJsonFormatter<T>

As mentioned earlier, for each type T, EncryptedFormatter<T> needs to get all properties and fields that should be serialized, serialize each one, encrypt those that should be encrypted, and write everything to the resulting JSON representation of T.

Getting the properties and fields

Getting a list of properties and fields to be serialized is easy with reflection. I can cache the list of resulting MemberInfo's to use each time. So far not bad.

Serialize each MemberInfo, encrypting when necessary

When serializing each one, however, some things I need to do include:

  • Get the value from the MemberInfo
  • Determine if it needs to be encrypted
  • Serialize (and possibly encrypt) the value

Get the value from the MemberInfo

With reflection, this is easy, but slow:

object value = fieldInfo.GetValue(instance);

We could be calling this getter many times in client code, so this should be optimized more for speed. Using .NET's Expression library to build delegates at run-time has a much larger scope than this post, so I'm only going to show end results and maybe discuss a couple points of interest. For now, this was my resulting code to build a compiled delegate at run-time of the getter for a given MemberInfo (PropertyInfo or FieldInfo), so that I could cache it for reuse:

Func<object, object> BuildGetter(MemberInfo memberInfo, Type parentType)
{
    var parameter = Expression.Parameter(ObjectType, "obj");
    var typedParameter = Expression.Convert(parameter, parentType);
    var body = Expression.MakeMemberAccess(typedParameter, memberInfo);
    var objectifiedBody = Expression.Convert(body, ObjectType);
    var lambda = Expression.Lambda<Func<object, object>>(objectifiedBody, parameter);
    return lambda.Compile();
}

This gives me a delegate to use for this particular MemberInfo instance to get its value, bypassing the need to use reflection's much slower GetValue(object instance) method:

// using reflection
object value = memberInfo.GetValue(instance);

// using the cached delegate
object value = cachedGetter(instance);

As others on the interwebs have mentioned when using this technique, it's initially slow since we have to compile code at run-time. But after that, it's essentially as fast as a direct access of the property or field.

Determine if it needs to be encrypted

This is trivial. Just check if it's decorated by EncryptAttribute and cache that Boolean.

Serialize (and possibly encrypt) the value

Initially, I thought I could get away with just using Utf8Json's dynamic support when serializing to avoid having to explicitly call the typed JsonSerializer.Serialize<T>(…) method for each MemberInfo. I got it to work for primitives, but not for more complex types.

Hence, I would need to once again use reflection to get the typed Serialize<T> method to use for each MemberInfo at run-time. Since reflection is slow, I also needed to cache this as a compiled delegate:

// signature: JsonSerializer.Serializer<T>(ref JsonWriter writer, T value, IJsonFormatterResolver resolver)

internal delegate void FallbackSerializer(
    ref JsonWriter writer,
    object value,
    IJsonFormatterResolver fallbackResolver);

FallbackSerializer BuildFallbackSerializer(Type type)
{
    var method = typeof(JsonSerializer)
        .GetMethods()
        .Where(m => m.Name == "Serialize")
        .Select(m => (MethodInfo: m, Params: m.GetParameters(), Args: m.GetGenericArguments()))
        .Where(x => x.Params.Length == 3)
        .Where(x => x.Params[0].ParameterType == typeof(JsonWriter).MakeByRefType())
        .Where(x => x.Params[1].ParameterType == x.Args[0])
        .Where(x => x.Params[2].ParameterType == typeof(IJsonFormatterResolver))
        .Single().MethodInfo;

    var generic = method.MakeGenericMethod(type);

    var writerExpr = Expression.Parameter(typeof(JsonWriter).MakeByRefType(), "writer");
    var valueExpr = Expression.Parameter(ObjectType, "obj");
    var resolverExpr = Expression.Parameter(typeof(IJsonFormatterResolver), "resolver");

    var typedValueExpr = Expression.Convert(valueExpr, type);
    var body = Expression.Call(generic, writerExpr, typedValueExpr, resolverExpr);
    var lambda = Expression.Lambda<FallbackSerializer>(body, writerExpr, valueExpr, resolverExpr);
    return lambda.Compile();
}

For this, I needed to use a custom delegate due to the JsonWriter being passed in by reference, which isn't allowed with the built-in Func<>. Beyond that, everything else should more or less flow from what we did before with the MemberInfo getter.

Ultimately, this allowed me to do something like:

static void WriteDataMember(
    ref JsonWriter writer,
    T value,
    ExtendedMemberInfo memberInfo,
    IJsonFormatterResolver formatterResolver,
    IJsonFormatterResolver fallbackResolver,
    IDataProtector dataProtector)
{
    writer.WritePropertyName(memberInfo.Name);
    object memberValue = memberInfo.Getter(value);
    var valueToSerialize = memberInfo.ShouldEncrypt
        ? BuildEncryptedValue(memberValue, memberInfo, fallbackResolver, dataProtector)
        : BuildNormalValue(memberValue, memberInfo, memberInfo.HasNestedEncryptedMembers, formatterResolver);
    JsonSerializer.Serialize(ref writer, valueToSerialize, fallbackResolver);
}

static string BuildEncryptedValue(
    dynamic memberValue,
    ExtendedMemberInfo memberInfo,
    IJsonFormatterResolver fallbackResolver,
    IDataProtector dataProtector)
{
    var localWriter = new JsonWriter();
    memberInfo.FallbackSerializer(ref localWriter, memberValue, fallbackResolver);
    return dataProtector.Protect(localWriter.ToString());
}

static object BuildNormalValue(
    dynamic memberValue,
    ExtendedMemberInfo memberInfo,
    bool hasNestedEncryptedMembers,
    IJsonFormatterResolver formatterResolver)
{
    if (!hasNestedEncryptedMembers)
        return memberValue;

    var localWriter = new JsonWriter();
    memberInfo.FallbackSerializer(ref localWriter, memberValue, formatterResolver);
    return localWriter.ToString();
}

There are a couple things going on here...

First, I needed to use the localWriter when leaning on Utf8Json to serialize at the intermediate stage, because otherwise it would restart its internal JsonWriter when calling the JsonSerializer.Serialize(instance, fallbackResolver) overload. Things were very weird before I realized what was happening with this.

Second, you'll see that I needed to do one additional special stage for properties that aren't marked to be encrypted themselves. This is to take into account nested classes/structs whose children may themselves have encrypted members:

class FooParent
{
    public FooChild Child { get; }
}

class FooChild
{
    [Encrypt]
    public string LaunchCode { get; }
}

Because of the possibility of nesting, when building the cached EncryptedFormatter<T>, I also needed to traverse every nested property and field of T to determine if any were decorated by EncryptAttribute. If a nested member needs encrypted, then I need to encrypt T itself using the EncryptedResolver, eventually returning a JSON string. Otherwise, I could do the entire thing normally with the default Utf8Json resolver configured by the client, therefore only needing to return the original object directly.

Conclusion: All theory without benchmarking

Is this actually faster than using regular reflection? Did I make the code needlessly complicated?

Theoretically, it should be significantly faster, but until I actually benchmark it, I won't know for sure.

I've been talking about benchmarking JsonCryption for a while now, so it will likely be the next thing I do on this project. Unfortunately, I have other projects going on that are more important, so I'm not sure when I'll be able to get to it. I'm also not thrilled about slightly rewriting JsonCryption.Utf8Json to use reflection just so that I can benchmark it.

Encryption itself is slow. I expect the encryption part alone to be a very significant piece of the total time spent serializing a given object. But again, I won't know until I look into it.

Finally, working on this port of JsonCryption taught me some new techniques that I would like to see incorporated into the version for Newtonsoft.Json. I'm guessing/hoping I might find some low hanging fruit to optimize that one a bit more.