Self-taught .NET Developer: How I Learned

Early stage

Books

Pro C# 8

I actually read the version for C# 5, but this is the latest one I found on Amazon. It’s more than 1,500 pages long. My pace was around 30 pages per evening after work, which worked out to about a chapter per night.

If I started to get sleepy, I let myself sleep. When I’m learning new things, I often get sleepy. My theory is that my brain is working extra hard and needs to rest to crystalize the new knowledge.

I knew that I would neither understand nor remember everything I read. This was simply to provide a broad overview… at least after the initial chapters on the core of the language, which I did need to learn. In the future, I expected that I would either:

  • Remember that such-and-such thing existed, and look for more details to actually use the feature, or
  • Be able to learn the forgotten feature more easily the second time around after priming my mind initially

I think both cases happened in actuality.

Effective C#

I learned a lot about how to write clean C#. Each “Item” is a couple pages long and covers an important lesson/guideline/rule. The chapters are the following:

  1. C# Language Idioms
  2. .NET Resource Management
  3. Expressing Designs in C#
  4. Working with the Framework
  5. Dynamic Programming in C#
  6. Miscellaneous

Some of the Items include:

  • Use Properties Instead of Accessible Data Members
  • Prefer Query Syntax to Loops
  • Distinguish Between Value Types and Reference Types
  • Limit Visibility of Your Types
  • Avoid ICloneable
  • and more…

I feel like it definitely provided some solid foundations at the beginning of my C# learning to keep me from developing bad habits early on.

Videos

While reading those two books, I tried to line up videos as much as possible, as well.

One of my early sources for video content was Bob Tabor’s LearnVisualStudio.net (now just https://bobtabor.com/). The progression in his videos largely lined up with the progression in Pro C# 8, which helped to reinforce the learning from two angles. I also liked that his courses included simple exercises at the end of each, allowing me to punch out some basic code in Visual Studio to start getting familiar with that.

After finishing most of his content and spending a couple aimless weeks wandering around YouTube, I finally discovered the fantastically curated C# learning path at Pluralsight.

The quality of the content is excellent. The user experience is fantastic. The guided path is intelligent and covers many important topics very well. I was hooked and began spending around 1-2 hours per evening going through course after course. They track your progress with very simple metrics, which almost felt like a video game to me, which kept me even more motivated to keep going.

I’ve since gotten rid of my TV. If I want something to watch while eating dinner or something, I turn on a Pluralsight video and improve myself a tiny bit more. Since finishing the C# path, I’ve watched a number of other courses in various paths and as individual units (64 courses completed so far). I’m 100% convinced this has helped to dramatically ramp up my learning curve.

I was very nervous before starting officially as a professional developer that all of my theoretical knowledge wouldn’t transfer to on-the-job skills, but it turned out that it actually set me up on a solid foundation to succeed, and I had a very successful first year as a developer.

Anyway, in order, here’s what I watched from Pluralsight the first 12 months (I think the C# path has changed since then):

Beyond the Elementary Basics

Books

I don’t think the list below is in any particular order. I just know that I read these – some partly, others fully – throughout my first year on the job.

Dependency Injection in .NET

During my first year, I led the development of my team’s first Web API. It was a relatively new push throughout the organization, so there wasn’t a ton of examples that I could leverage from other teams.

After learning about unit testing, and realizing that I had no idea how to expect my Web API code to actually behave, I wanted to unit-test everything very thoroughly. Motivated by this, I quickly realized how using dependency injection to manage dependencies throughout the application would be essential for promoting testability.

This book looked like the standard-bearer on Amazon. It turned out to be exactly what I needed. Going through it, combined with some pinpointed videos online, I was able to use good DI practices to build the API, helping me to reach my primary goal of straightforward testability.

Of course, testability isn’t the only reason to use DI, and the book does a good job of covering everything else. I won’t repeat it here.

The Art of Unit Testing: with examples in C#

I had come across this book on Amazon about a month before finally purchasing it. Later, I stumbled upon an hour-long video on YouTube of the author, Roy Osherov, giving a talk at a conference on unit testing. The talk was fantastic, so I had to buy the book.

As an intro to unit testing, with a good mix of philosophy and practicality, this book was an enjoyable read. It helped me to shift my mindset away from normal best practices in software development, to the slightly different best practices for writing good unit tests. Namely, repetition can be more acceptable for unit tests if it helps to make the test understandable.

It also discusses how doing TDD can improve code quality by enforcing good patterns, gives an overview of various testing and mocking frameworks, and discusses different testing paradigms like TDD, ATDD, and others.

I learned a lot, and should probably revisit it from time to time. It’s a good intro and a solid resource.

Design Patterns

At the very beginning of my journey to self-taught professional programmer, I interviewed a senior PM and dev at my then-employer. They were able to give me a fantastic breakdown of the industry, providing a bunch of jargon and things to look into and keep in mind. One was TDD and the idea of a failing test. The other was the importance of design patterns.

This is the bible of design patterns, written by the Gang of Four. Unfortunately, I didn’t get as far as I would have liked. Reading UML still doesn’t come easy to me, and it definitely didn’t at that time, either. I think I need a little more interactivity, which is why the 15+ hour course on Pluralsight covering various design patterns was much more easy for me to digest.

Nevertheless, it got me off to a good start. Thanks to those two individuals, and thanks to some early education on the subject, I think I was able to develop some good habits when evaluating a problem. I think in patterns in real life, so thinking in design patterns for software problems just feels right to me, as well.

This is definitely a book that I need to take a second look at.

Patterns of Enterprise Application Architecture

During this first year, I discovered and fell in love with Martin Fowler’s work. I like the way he writes, and I like the way he thinks.

This is one of his classic books, and it has helped me to think through and analyze different architectures at a couple different employers so far. It’s best when you can speak with another developer who has spent time in Fowler’s world, because then you have a rich vocabulary that you can use to discuss otherwise complex topics. I think this was one of the primary aims of the book (and any design pattern endeavor).

Refactoring: Improving the Design of Existing Code

Another Fowler book, goes into detail introducing various patterns for making systematic refactorings, as well as the reasons why you should want to refactor.

I’m amazed at how the authors could judiciously categorize so many different types of refactoring actions. On top of categorization, they go another step and explain why you’d want to do one action, or why you wouldn’t want to, and how it interacts with other refactoring patterns.

The intro chapters are especially insightful, because he speaks more freely about the way he thinks when looking at legacy code and/or refactoring any code, as well as how he likes to make small changes to code to continually make it more legible. All good ideas to keep in mind, and pretty cool to see a master at his craft transform a piece of mediocre code into something clean and buttoned up.

Pro ASP.NET Web API Security

This book was rated highly on Amazon, and I made it through about 5 chapters. The content looks impressive when scanning the table of contents and the chapters themselves, but there was just something about the way the author writes that never seemed to flow to me. I found myself constantly re-reading different sentences or passages multiple times to understand what he was trying to say.

The content is probably good, but the English is awkward. Maybe I’ll give it another try eventually, but probably not.

Videos

Oddly, I spent a good amount of time watching videos on Angular, but no books. Also during this second year, I was working at a new company. During the interviews with them, they asked more standard computer science questions on things like algorithm efficiency (Big O Notation) and implementations of hash tables (dictionaries).

Despite successfully passing the interviews and getting hired, I felt that I was missing some core pieces of knowledge, so that was also reflected in the videos watched during this time.

Finally, near the end of my initial employer, I spent some time setting up the new API and Angular projects I’d created on our new CI/CD platform of TeamCity and Octopus. For this, I ended up using a decent amount of Powershell, so I thought I might need to get some more familiarity with it.

Here’s the list:

Discovering Domain Driven Design

My second coding employer had developed an exceptionally rich and supple domain model. It was mostly a joy to work with (would be more so if you could compare against the alternative universe of what would have been without the rich model), but I never knew that the principles behind it were a “thing”. I just thought that everybody that worked there was insanely smart, which they were.

At my next employer, we didn’t have the luxury of a rich and supple (love that word) domain model… but there were aspirations to get there. On my team particularly, we scoped out a chunk of code that we believed could serve as a good starting point for Strangling the Monolith, allowing us to use good Domain Driven Design principles and practices to build something useful and easy to maintain.

To ramp up my knowledge for the task, I read a lot of books and watched the majority of the Pluralsight DDD path.

Unfortunately, that project has thus far been shelved at work. It’s a big strategic mistake, in my opinion, but what do I know… I’m just a lowly developer with no significant business education at all…………

Books

Domain-Driven Design Distilled

Fantastic intro to the topic, written by the same author of Implementing Domain-Driven Design.

I joked with colleagues at work when they complained it was 70+ pages that it’s a very quick read because there are a lot of pictures… which is true. It’s an easy read, lots of diagrams, and touches on the topics enough for non-developers to understand pretty much everything they need to know about DDD. For developers, it provides us a soft intro so that learning the meat comes more easily in the later stages.

Domain-Driven Design

This is the reference for the subject. It’s a bit more philosophically written than its more practical Implementing Domain-Driven Design, but I preferred this one over the other two in this list. Maybe it helped that I read it last and so already had a decent grasp of the concepts, making it easier to go through this and appreciate the higher-level philosophy and principles.

It has changed the way I look at software for the better.

To practice, I’ve been working on a private project in my spare time at home, trying to stay as true to the principles of DDD as possible. So far, I love it. As complexity increases, so far my system is still very easy to understand and modify where needed.

He was a big proponent of the word “supple” to describe an ideal Domain Model, and I’m enjoying making refactorings along the way in my own project to continually progress toward more “suppleness”.

Implementing Domain-Driven Design

I read this one after Domain-Driven Design Distilled, but before Domain Driven Design. It was pretty good, but I didn’t use a number of his specific implementation patterns in my hobby project. Maybe that will come back to bite me, we’ll see. I’m also lucky in that I can refer back to my previous employer’s fantastic Domain Model and the patterns that they used for alternatives to those found in these books.

Others have told me they really preferred this book over the other main one. The style is a bit different, so that might be a thing. Overall, though, it felt like the overall quality was comparable between the two. If you can only read one, flip a coin and you’ll probably be fine… but why not read both?

Videos

I didn’t watch the videos in the advanced section, yet. I’m mostly looking forward to a more in-depth look at Event Sourcing, but that one hasn’t been posted (since I’ve last checked).

I also skipped a course by Esposito or something. I’m sure it was decent, but I couldn’t stand his voice. Nothing personal.

Connecting the Dots

The next step in my development is being able to tie the pieces all together into a cohesive software system – deployed and usable. Shipping software involves more than just writing beautiful code and committing it to the cloud somewhere. Databases need provisioned. CI/CD pipelines need setup. Authentication needs configured. Code needs deployed. Etc.

The only way to do that is to work on real projects from start to finish. Work provides some of these opportunities, but it’s difficult to cover all the bases in our silo’d teams and on complex projects, most of which are likely legacy and already have a number of infrastructural pieces long established.

Also, I really don’t have any direct mobile UI experience, so some of the projects in my queue are chosen specifically to get that covered.

Finally, another goal of this stage is to use implicit peer pressure to force me to lift my coding standards. With open-source projects, I’m displaying my code for all to see (and criticize). With deployable apps and open-source projects, I’m displaying my finished products for all to see (and criticize). With blogging, I’m displaying my thoughts and dreams for all to see (and criticize).

The criticism will hurt. So the threat of criticism provides additional motivation to go the extra mile, to stretch myself. And of course, the actual criticism will (hopefully) teach me new ways to do/see/think about things.

It’s all a giant intense learning experience!

Projects

Super Secret App

This is my main baby, but I think it might actually have commercial value when it’s finished… so I can’t talk about it too much.

But it has a ton of cool technical stuff in it for me to learn, including:

  • Domain Driven Design
  • Message-based asynchronous architecture
  • Event sourcing
  • Cloud hosting
  • Mobile development
  • End-to-end web development
  • Public API built for integrations with third-parties
  • Modern Authentication
  • Individual user accounts
  • Field level encryption
  • and more…

JsonCryption

This small project spun off from the one above. I needed to modify my JSON serializer to use with Marten so that I could easily encrypt different fields of my C# objects with a straightforward API. MongoDB has a similar feature, but it requires using MongoDB. I’m using Marten specifically because I like how they handle Event Sourcing… but Marten requires using Postgres.

I’ve already learned a ton from this short project, including:

  • Optimizing .NET reflection with Expression trees
  • Using GitHub Actions for CI/CD
  • Publishing to NuGet in a CI/CD pipeline
  • OWASP best practices for encryption
  • Key Management Systems
  • and more…

Todo Bubbles

Every developer needs to do their own todo list at some point, right? I have an idea for a slight variation on the usual todo list, making use of what I think is a key psychological trait in order to fill a specific niche.

More details will come probably later this summer. I want to finish the first project before starting this one. I’m not sure if anybody else will find it useful, but I know I could use the specific feature set that this guy will provide.

And all the while, I expect to learn:

  • More mobile development
  • Mobile app deployment
  • Probably some fun little UI stuff

This Blog

I like to relate things to seemingly unrelated things. It helps me to understand both at a deeper level, I find. I’m not sure if anybody else will agree, or if they’ll agree with the weird relationships I find with programming and business (religion, war, seduction…).

But I like writing about it, if nothing more than to store my thoughts. Also, as many people often say, forcing ourselves to write down our thoughts and ideas helps us to curate and refine them.

Writing is a skill by itself. It’s useful in life to be a good writer. It’s useful as a programmer to be a good writer. We all have complicated things to communicate with people we care about, about things we care about. I’m not a great writer, but I think I can improve, and I think it will improve other areas in my life – including the code I write.

Wartime Metaphors for Software Development

Caesar built fortifications when his armies were camping. His enemies didn’t always place the same emphasis on building defenses. Caesar defeated his enemies.

Strong fortifications helped him to secure his position. Maintaining a secure position enabled him to focus on the most important part of the battle… establishing secure and efficient supply lines. Hungry armies don’t fight well.

Good generals retreat when they’re in disadvantageous positions. From a secure base, they could organize measured attacks. Without a secure base, they were left exposed to annihilation.

Good generals retreat during battle itself. Undisciplined armies full of adrenaline easily charge, charge, and keep charging. A good charge becomes increasingly dangerous as it progresses. If the faction charging separates too far from the security of the main army, they can be easily surrounded and wiped out by the enemy. Faking retreat to entice undisciplined charges was a staple tactic of Genghis Khan.

How does this apply to business?

  • Build competitive moats to protect your market share
  • Protect your supply lines
  • Retreat?

Retreat.

Businesses can grow too quickly. As things go well, leadership may try to expand into new areas before properly securing their base. Expanding too rapidly leads to chaos. Business lines risk being overexposed to competition.

It’s not just the business lines that are at risk, but also the good employees. The first soldiers to flee are the cavalry… noblemen and elites of society. The agile ones, with options.

How does this apply to software?

Build fortifications. Automated tests, obviously. Be more diligent than your competitors, and you’ll live to code another day.

Your supply lines are your lifeblood. Don’t skimp on CI/CD. If your world-changing features can’t eat (get released), they’ll either fight poorly, or not at all.

Strong fortifications are essential to healthy supply lines.

Don’t build too fast. When you get away from your fortifications, you’ll get wiped out when things take a bad turn, rather than being able to retreat to relative safety temporarily to regroup.

Don’t build new features until existing rudimentarily fortified. Go back and refortify older and more central features. Your castle keep is your last defense. It should be rock solid. Too many features without adequate fortifications leaves your software exposed to competitors. The general can enact strategy with organized units. He cannot direct a mob, and a disorganized mob is an easy target.

When you’re in a disadvantageous position, retreat (refactor!). Keep refactoring until you’re in the advantageous position again. Fortify in stages… build a wooden wall, fortify that with rock, fortify that with towers. The longer you take to retreat, the more costly it becomes. You may have to leave behind siege equipment, cannons, or soldiers if you must eventually make a hasty retreat.

Don’t keep marching toward the narrow pass with just-so-happen-to-be-perfect-ledges-for-archers on both sides simply because you want to “keep delivering value to customers”. If your code is dead, it can’t deliver value.

Faster Reflection in .NET for JsonCryption.Utf8Json

  1. I needed to use Reflection to add support for Utf8Json to JsonCryption
  2. I wanted to support Utf8Json because it’s good and fast…
  3. … but, reflection in .NET is sloooowwww…

Thankfully, through C#’s Expression class, we can cache getters, setters, and methods that we discover via System.Reflection initially, so that we can use them in the future without going through System.Reflection each time thereafter.

I’m late to this game, as Jon Skeet first wrote about the technique back in 2008. And I believe others had written about it before him.

Adding support for Utf8Json

From a high-level view, I needed to provide an alternative implementation of Utf8Json.IJsonFormatterResolver, as well as implementations of Utf8Json.IJsonFormatter<T> in order to offer a similar usage API of JsonCryption:

using Utf8Json;

class Foo
{
    [Encrypt]
    public string LaunchCode { get; }
    ...
}

// setup
IJsonFormatterResolver encryptedResolver = new EncryptedResolver(…);

// serialize/deserialize
var myFoo = new Foo { LaunchCode = "password1" };
string json = JsonSerializer.Serialize(myFoo, encryptedResolver);
Foo deserialized = JsonSerializer.Deserialize<Foo>(json, encryptedResolver);

The implementation of IJsonFormatterResolver is trivial, just getting from a cache or creating an instance of IJsonFormatter<T> for each type T. The fun starts with the implementation of IJsonFormatter<T>.

First, an overview

Stepping back for a moment… I don’t want to write a JSON serializer. Whenever possible, JsonCryption should leverage the serialization logic of the given serializer, and only encrypt/decrypt at the correct point in the serialization chain. Something like this:

Without Encryption
  1. .NET Object (POCO)
  2. (serialize)
  3. JSON
  4. (deserialize)
  5. POCO
With Encryption
  1. POCO
  2. (serialize)
  3. JSON
  4. (encrypt)
  5. Encrypted JSON
  6. (decrypt)
  7. JSON
  8. (deserialize)
  9. POCO

Except, this isn’t exactly accurate since JsonCryption is doing Field Level Encryption (FLE). So as written, the encryption path shown above would produce a single blob of cipher text for the Encrypted JSON. We instead want a nice JSON document with only the encrypted Fields represented in cipher text:

{
  id: 123,
  launchCode: <cipher text here...>
}

So really, the process is something more like this:

  1. POCO
  2. (serialize)
  3. (resolve fields)
  4. (serialize/encrypt fields)
  5. JSON …
(serialize/encrypt fields) for a single field
  1. field
  2. (write JSON property name)
  3. (serialize data)
  4. JSON chunk
  5. (encrypt serialized data)
  6. cipher text
  7. (write cipher text as JSON value)

Like this, I (mostly) don’t have to worry about serializing/encrypting primitive, non-primitive, or user-defined objects. For example, if I have something like this…

class Foo
{
    [Encrypt]
    public Bar MyBar { get; }
}

class Bar
{
    public int Countdown { get; }
    public string Message { get; }
}

… then I will first get something like this during the serialization/encryption of MyBar

{ Countown: 99, Message: "Bottles of beer on the wall" }

Which itself is just a string, and therefore straightforward to encrypt, so that the final serialized form of Foo would be something like:

{
  MyBar: <cipher text here...>
}

Finally, since I only want to encrypt properties/fields on custom C# objects that are decorated with EncryptAttribute, I can safely cache an instance of IJsonFormatter<T> for each type that I serialize via JsonSerializer.Serialize(…). This is good news, and now we can begin the fun stuff…

EncryptedFormatter<T> : IJsonFormatter<T>

As mentioned earlier, for each type T, EncryptedFormatter<T> needs to get all properties and fields that should be serialized, serialize each one, encrypt those that should be encrypted, and write everything to the resulting JSON representation of T.

Getting the properties and fields

Getting a list of properties and fields to be serialized is easy with reflection. I can cache the list of resulting MemberInfo‘s to use each time. So far not bad.

Serialize each MemberInfo, encrypting when necessary

When serializing each one, however, some things I need to do include:

  • Get the value from the MemberInfo
  • Determine if it needs to be encrypted
  • Serialize (and possibly encrypt) the value

Get the value from the MemberInfo

With reflection, this is easy, but slow:

object value = fieldInfo.GetValue(instance);

We could be calling this getter many times in client code, so this should be optimized more for speed. Using .NET’s Expression library to build delegates at run-time has a much larger scope than this post, so I’m only going to show end results and maybe discuss a couple points of interest. For now, this was my resulting code to build a compiled delegate at run-time of the getter for a given MemberInfo (PropertyInfo or FieldInfo), so that I could cache it for reuse:

Func<object, object> BuildGetter(MemberInfo memberInfo, Type parentType)
{
    var parameter = Expression.Parameter(ObjectType, "obj");
    var typedParameter = Expression.Convert(parameter, parentType);
    var body = Expression.MakeMemberAccess(typedParameter, memberInfo);
    var objectifiedBody = Expression.Convert(body, ObjectType);
    var lambda = Expression.Lambda<Func<object, object>>(objectifiedBody, parameter);
    return lambda.Compile();
}

This gives me a delegate to use for this particular MemberInfo instance to get its value, bypassing the need to use reflection’s much slower GetValue(object instance) method:

// using reflection
object value = memberInfo.GetValue(instance);

// using the cached delegate
object value = cachedGetter(instance);

As others on the interwebs have mentioned when using this technique, it’s initially slow since we have to compile code at run-time. But after that, it’s essentially as fast as a direct access of the property or field.

Determine if it needs to be encrypted

This is trivial. Just check if it’s decorated by EncryptAttribute and cache that Boolean.

Serialize (and possibly encrypt) the value

Initially, I thought I could get away with just using Utf8Json’s dynamic support when serializing to avoid having to explicitly call the typed JsonSerializer.Serialize<T>(…) method for each MemberInfo. I got it to work for primitives, but not for more complex types.

Hence, I would need to once again use reflection to get the typed Serialize<T> method to use for each MemberInfo at run-time. Since reflection is slow, I also needed to cache this as a compiled delegate:

// signature: JsonSerializer.Serializer<T>(ref JsonWriter writer, T value, IJsonFormatterResolver resolver)

internal delegate void FallbackSerializer(
    ref JsonWriter writer,
    object value,
    IJsonFormatterResolver fallbackResolver);

FallbackSerializer BuildFallbackSerializer(Type type)
{
    var method = typeof(JsonSerializer)
        .GetMethods()
        .Where(m => m.Name == "Serialize")
        .Select(m => (MethodInfo: m, Params: m.GetParameters(), Args: m.GetGenericArguments()))
        .Where(x => x.Params.Length == 3)
        .Where(x => x.Params[0].ParameterType == typeof(JsonWriter).MakeByRefType())
        .Where(x => x.Params[1].ParameterType == x.Args[0])
        .Where(x => x.Params[2].ParameterType == typeof(IJsonFormatterResolver))
        .Single().MethodInfo;

    var generic = method.MakeGenericMethod(type);

    var writerExpr = Expression.Parameter(typeof(JsonWriter).MakeByRefType(), "writer");
    var valueExpr = Expression.Parameter(ObjectType, "obj");
    var resolverExpr = Expression.Parameter(typeof(IJsonFormatterResolver), "resolver");

    var typedValueExpr = Expression.Convert(valueExpr, type);
    var body = Expression.Call(generic, writerExpr, typedValueExpr, resolverExpr);
    var lambda = Expression.Lambda<FallbackSerializer>(body, writerExpr, valueExpr, resolverExpr);
    return lambda.Compile();
}

For this, I needed to use a custom delegate due to the JsonWriter being passed in by reference, which isn’t allowed with the built-in Func<>. Beyond that, everything else should more or less flow from what we did before with the MemberInfo getter.

Ultimately, this allowed me to do something like:

static void WriteDataMember(
    ref JsonWriter writer,
    T value,
    ExtendedMemberInfo memberInfo,
    IJsonFormatterResolver formatterResolver,
    IJsonFormatterResolver fallbackResolver,
    IDataProtector dataProtector)
{
    writer.WritePropertyName(memberInfo.Name);
    object memberValue = memberInfo.Getter(value);
    var valueToSerialize = memberInfo.ShouldEncrypt
        ? BuildEncryptedValue(memberValue, memberInfo, fallbackResolver, dataProtector)
        : BuildNormalValue(memberValue, memberInfo, memberInfo.HasNestedEncryptedMembers, formatterResolver);
    JsonSerializer.Serialize(ref writer, valueToSerialize, fallbackResolver);
}

static string BuildEncryptedValue(
    dynamic memberValue,
    ExtendedMemberInfo memberInfo,
    IJsonFormatterResolver fallbackResolver,
    IDataProtector dataProtector)
{
    var localWriter = new JsonWriter();
    memberInfo.FallbackSerializer(ref localWriter, memberValue, fallbackResolver);
    return dataProtector.Protect(localWriter.ToString());
}

static object BuildNormalValue(
    dynamic memberValue,
    ExtendedMemberInfo memberInfo,
    bool hasNestedEncryptedMembers,
    IJsonFormatterResolver formatterResolver)
{
    if (!hasNestedEncryptedMembers)
        return memberValue;

    var localWriter = new JsonWriter();
    memberInfo.FallbackSerializer(ref localWriter, memberValue, formatterResolver);
    return localWriter.ToString();
}

There are a couple things going on here…

First, I needed to use the localWriter when leaning on Utf8Json to serialize at the intermediate stage, because otherwise it would restart its internal JsonWriter when calling the JsonSerializer.Serialize(instance, fallbackResolver) overload. Things were very weird before I realized what was happening with this.

Second, you’ll see that I needed to do one additional special stage for properties that aren’t marked to be encrypted themselves. This is to take into account nested classes/structs whose children may themselves have encrypted members:

class FooParent
{
    public FooChild Child { get; }
}

class FooChild
{
    [Encrypt]
    public string LaunchCode { get; }
}

Because of the possibility of nesting, when building the cached EncryptedFormatter<T>, I also needed to traverse every nested property and field of T to determine if any were decorated by EncryptAttribute. If a nested member needs encrypted, then I need to encrypt T itself using the EncryptedResolver, eventually returning a JSON string. Otherwise, I could do the entire thing normally with the default Utf8Json resolver configured by the client, therefore only needing to return the original object directly.

Conclusion: All theory without benchmarking

Is this actually faster than using regular reflection? Did I make the code needlessly complicated?

Theoretically, it should be significantly faster, but until I actually benchmark it, I won’t know for sure.

I’ve been talking about benchmarking JsonCryption for a while now, so it will likely be the next thing I do on this project. Unfortunately, I have other projects going on that are more important, so I’m not sure when I’ll be able to get to it. I’m also not thrilled about slightly rewriting JsonCryption.Utf8Json to use reflection just so that I can benchmark it.

Encryption itself is slow. I expect the encryption part alone to be a very significant piece of the total time spent serializing a given object. But again, I won’t know until I look into it.

Finally, working on this port of JsonCryption taught me some new techniques that I would like to see incorporated into the version for Newtonsoft.Json. I’m guessing/hoping I might find some low hanging fruit to optimize that one a bit more.

MongoDB Agrees: Field Level Encryption is Important

I often second-guess myself. The past couple years, I’ve been trying to follow my gut more often. When my gut is healthy, I find myself often confirming my initial assumptions.

Keep gut healthy. Trust gut.

Field Level Encryption (FLE?) for JSON serialization is one of those instances.

MongoDB Announces Field Level Encryption Feature

MongoDB added support for client-side field level encryption in their version 4.2 release, announced way back in June. Just today there was a post on LinkedIn from MongoDB’s account linking to a new FAQ and webinar on the subject, which is how I realized that they, too, agree it can be a useful complement to data-at-rest encryption of particularly sensitive data, particularly personally-identifiable information of users:

Our implementation of FLE…

Great! So I should have trusted my gut on the FLE abbreviation.

…is totally separated from the database, making it transparent to the server, and instead handled exclusively within the MongoDB drivers on the client (hence the name Client-Side Field Level Encryption). All encrypted fields on the server – stored in-memory, in system logs, at-rest, and in backups – are rendered as ciphertext, making them unreadable to any party who does not have client access along with the keys necessary to decrypt the data.

This is a different and more comprehensive approach than the column encryption used in many relational databases. As most handle encryption server-side, data is still accessible to administrators who have access to the database instance itself, even if they have no client access privileges.

Exactly what I figured would be useful for JsonCryption. Good start!

Let’s see if they validate anything else…

More Indirect Validations of JsonCryption from MongoDB

Regulatory Compliance

Where is FLE most useful for you?

Regulatory Compliance

FLE makes it easier to comply with “right to be forgotten” conditions in new privacy regulations such as the GDPR and the CCPA – simply destroy the customer key and the associated personal data is rendered useless.

Another key motivation for JsonCryption was to comply with GDPR and the CCPA. I like the angle for complying with the “right to be forgotten” mentioned here, though. I hadn’t thought of that. To make this work, I’ll have to tweak the creation of IDataProtector instances to allow better configuration, so that consumers of JsonCryption will have the ability to create a unique IDataProtector instance per user, if they wish.

What else we got?

Key Management Systems

With the addition of FLE, encryption keys are protected in an isolated, customer-managed KMS account. Atlas SREs and Product Engineers have no mechanism to access FLE KMS keys, rendering data unreadable to any MongoDB personnel managing the underlying database or infrastructure for you.

More confirmation of design decisions for JsonCryption! The primary reason I ended up going with the Microsoft.AspNetCore.DataProtection package for the actual encryption layer was to gain industry-standard KMS (Key Management System) functionality. This is essential for any serious consumers of JsonCryption.

Other questions I’ve been thinking about but haven’t been able to dive in quite yet…

Performance

What is the performance impact of FLE?

JsonCryption is designed to be used as a plugin for JSON serialization. So speed matters. But encryption can be slow. That’s why from the beginning, I’ve been planning a future post (or more) discussing benchmarking results (which I have yet to do).

Additionally, performance gains could be found by adding support for any other JSON serialization package than just Newtonsoft.JSON, which is notoriously slow. To that end, I’m currently in the middle of working on a pretty sweet implementation of a version to work with the blazing fast Utf8Json, described on its GitHub as being:

Definitely Fastest and Zero Allocation JSON Serializer for C#(NET, .NET Core, Unity, Xamarin).

They seem to back it up with solid benchmark results.

image
https://github.com/neuecc/Utf8Json/

So far, this has been a lot fun as I’ve had the opportunity to explore new techniques, particularly in writing Expressions. To wire JsonCryption into Utf8Json, I need to do a significant amount of runtime generic method resolution and PropertyInfo getting and setting. I would typically do all of this with Reflection. Reflection is slow. It would completely defeat the purpose of using a very fast serializer (Utf8Json) but then chopping off its legs in the encryption layer by relying so heavily on Reflection.

So instead of using Reflection constantly, I’m using a bit of Reflection to build Expression trees, which I then compile at runtime into generic methods, getters, and setters, which are finally cached per type being serialized. It’s not a new technique by any means – Jon Skeet blogged about a flavor (he would say “flavour”) of it all the way back in 2008 – but it’s new to me.

Anyway, I should have more on that soon.

Back to MongoDB…

FLE and Regular At-Rest Encryption are Complementary

What is the relationship between Client-Side FLE and regular at-rest encryption?

They are independent, but complementary to one another and address different threats. You can encrypt data at-rest as usual. Client-side FLE provides additional securing of data from bad actors on the server side or from unintentional data leaks.

This was a key motivation for JsonCryption from the beginning, as well. You might be able to satisfy the encryption requirements of GDPR with a basic encryption-at-rest policy, but then all a hacker has to do is get past your one layer of encryption and they have access to all of your data. On the contrary, with field-level encryption, even if they manage to hack your system and extract all of your data, they still have to hack multiple fields, which could theoretically each be protected by its own unique key.

Querying Encrypted Fields Poses Challenges

What sort of queries are supported against fields encrypted with Client-Side FLE?

You can perform equality queries on encrypted fields when using deterministic encryption. …

JsonCryption was primarily designed with Marten in mind. With that, I knew that some sacrifices may need to be made when it comes to querying encrypted values. As of now, I haven’t tested or played around with any scenarios involving querying encrypted fields. For my primary project that’s using JsonCryption and Marten, my repositories aren’t mature enough to know whether or not I’ll need such capabilities. I’ve been lightly mulling it over in my mind, but for now I’m waiting until a concrete need arises before doing anything about it. In the meantime, if anybody is interested in exploring such things in JsonCryption, have at it, and remember that we take Pull Requests.

JsonCryption Wants to Support Multiple KMS’s in the Future

Which key management solutions are compatible with Client-Side FLE?

We have designed client-side FLE to be agnostic to specific key management solutions, and plan on adding direct native support for Azure and GCP in the future.

As I mentioned earlier, this was a key motivation behind using the Microsoft.AspNetCore.DataProtection package under the covers to handle the encryption and key management duties. It could be even more flexible, of course. While Microsoft’s package offers impressive functionality and an inviting API for defining key management policies, other libraries exist that perform similar and different functions. Adding a configuration API and an Adapter Layer in JsonCryption to support additional Key Management Systems could be a good future extension point.

Where can it be Used?

Where can I use Client-Side FLE?

For them, anywhere MongoDB is used (obviously). This sounds like a fantastic feature. If I was using MongoDB for my other project, I would abandon JsonCryption and use MongoDB’s solution. I would also feel really stupid for spending time working on throwaway code.

However, I’m using PostgreSQL because I like Marten for what I’m doing, so I still need another solution. JsonCryption meets this need, and it’s technically database-agnostic, as long as the JSON serializer for your project is configurable/customizable.

Off to a Good Start

I’m pretty excited reading this update from MongoDB (can you tell?). Partly because it’s clear that FLE is an emerging thing, and partly because many of my early assumptions and motivations at the start of JsonCryption were validated by one of the most important global companies in the JSON serialization and storage space.

There’s still a lot of work that needs done to make JsonCryption into what it could be, but I see the potential it has and get pretty excited. If anybody wants to help along the way, please reach out. JsonCryption deserves better developers than myself working on this.

Tech Debt Leads to Software Death

When we don’t write the code that we know we should write, we incur tech debt.

When we write code that we know we shouldn’t, we incur tech debt.

When we write code in a way that we know could be better, we incur tech debt.

How it starts…

We make these decisions because they’re quick fixes to our problems right now. Instant gratification satiates our immediate concerns at the expense of our future selves (and others).

It’s not like I’ll let this get out of control.

When we incur tech debt, we’re figuratively running up the balance on our tech credit card. At first, we get a quick win without any obvious drawbacks. “I’ll pay it off right away!” we promise ourselves.

But before we refactor, we come up against another super urgent problem. “It’s also small… let’s just do this one more time,” we convince ourselves.

“It’s not like I’ll let this get out of control.”

We know how this story turns out…

“But it isn’t our fault! We’re only developers, after all. With constant pressure to deliver working software, we just have to take shortcuts. Besides, employee turnover is so fast in this industry that I’ll be long gone before I need to worry about the effects, anyway.”

Now what?

Like financial debt, tech debt comes on slowly and imperceptibly. As it grows, it gradually reduces our ability to do things that we need and want to do, because ultimately we become slaves to the interest payments that our debt master demands. Soon, we can’t create new features, or any new features we manage to squeak out are riddled with bugs. The longer the debt accumulates, the harder it is to bite the bullet and start cleaning things up again. We can’t innovate or invest for the future, because we’re swamped just trying to keep up with the maintenance payments.

While in maintenance mode, we’re stagnating at best. More likely, our software health continues to degrade.

Eventually, our competitors are beating us so badly that they put us out of our misery business.

We’ve all incurred tech debt

I’ve heard it said that the ancient Israelites used the same word for “sin” and “debt”. It may be true, but I couldn’t verify it after exhaustively searching Google for 3 minutes.

But debt and sin have been closely linked in Christian theology going back to its origins.

The wages of sin is death.

Romans 6:23

And our tech debt doesn’t just impact ourselves. It affects everybody in the company. Really, if you think about it, it impacts everybody in the world – now and going forward – since what we produce directly contributes to the joy, efficiency, health, and more of the aggregate human population.

Is there a verse related to this, too?

Wherefore as by one man sin entered into this world, and by sin death; and so death passed upon all men, in whom all have sinned.

Romans 5:12

Just replace “sin” with “tech debt”… I didn’t because I didn’t want to write blasphemy.

Confession and Rectification

Today is Ash Wednesday, a day that marks the annual period of Lent, which Catholics devote to confession of sins, sacrifice, and penance, leading up to the eventual rebirth and fresh start celebrated at Easter. It’s a valuable 40 days in which we can reevaluate and make ourselves aware of our mistakes in the past year so that we can aim to do better going forward. We are to remind ourselves of things we did that we know we shouldn’t have done, and of things that we did not do that we know we should have. Without this rhythmic focus each year, it becomes more likely that our guilt and bad habits continue to take control of us, like growing financial debt causes us to worry and takes over our economic lives, and like growing technical debt gives us guilt and worry and takes over our software.

…our tech debt doesn’t just impact ourselves. It affects everybody in the company. Really, if you think about it, it impacts everybody in the world…

Many of us work under different Agile setups, with similar rituals to analyze and continually improve the Agile process itself. But how many of us also have periods of time explicitly set aside for nothing other than reviewing old code? Code that we know is there, gnawing at our developer conscience. Code that we hope our peers don’t discover, because we know they’d judge us… and rightfully so!!

Maybe it’s better if we’re given the opportunity to confess our technical sins in relative privacy, perhaps to a more talented and experienced developer… somebody who we look up to as a sort of mentor……. a coding father figure, if you will.

“I made took a shortcut here, and I’m sorry about it. What can I do to make it better and get a fresh start?”

“Write two integration tests and nine unit tests. It will be okay. Go and code better.”

At what sort of frequency should this happen? No clue… I’m still just a baby figuring things out. I just know that it should happen. It’s important enough to make it an explicitly set-aside ritual in our sprints… as critical as daily stand-ups and sprint retrospectives.

Religious Automated Tests

Automated Tests : Software :: Religion : Culture

Writing good software is difficult.

Writing good software becomes more difficult as the complexity of the software increases. Partly, this is because with greater complexity, it becomes easier to break features that had been working well before.

Automated tests mitigate this by providing a protection against unforeseen regressions in our software.

The greater the complexity, the greater the benefit that tests provide. When the logic of the system is greater than our minds can hold in working memory, we can’t know if our attempts at improving the system will break something else. The thing that we break could even be a core part of the system, potentially leading to catastrophic failures.

It isn’t just increasing system complexity that makes it more difficult for us to introduce improved features without breaking things. If the code was written by somebody else, it’s more likely that we don’t fully understand what they did, both in the production code and in the tests.

“Why does it do this? Surely they meant it to do that, instead…”

So we make a change, and it breaks a test.

“Hmm… actually, the test looks wrong, too!”

So we “fix” the test we broke…

After a couple more iterations and releases to production, we start getting notifications that the software is behaving very strangely. We investigate, and it turns out it’s due (obviously) to our not understanding the original author’s design for the code and the test.

If we could ask them when we have questions, it helps, but doesn’t eliminate the problem. They themselves have to remember why they made certain decisions, perhaps years ago.

If we can’t ask them, we’re really left in the dark. By default, unless we have a VERY strong conviction for why the code and test need to be changed, it’s generally best to trust the tests.

But, at the end of the day, the tests and code are merely translations of the product owner’s ideal vision for the software. Developers can’t read the minds of product owners, so we sometimes translate incorrectly. In these cases, we can change the tests… very carefully.

If we inherit a highly complex codebase with a great set of tests, we can delete the tests and the code will still run just fine. After deleting the tests, we can even introduce some new (non-tested) features with little trouble for a time.

“See?! We don’t need tests! They were just holding us back, making us slow and requiring extra work for no reason at all. We did tests in the past, but that’s outdated.

“Besides, maybe perhaps you could argue that we needed tests back then in the early days, but we’re much better programmers now, so we really don’t need them anymore.”

After a couple months, making more and more significant changes to the existing codebase – which had been thoroughly tested before we liberated our code from the arbitrary shackles of the tests – eventually causes bigger and weirder bugs to crop up.

Soon, we inadvertently introduce a bug that threatens the very core of our system, threatening to crash the whole thing. Panicked, we begin pointing fingers and blaming each other, our managers, our users, and especially those idiot developers who used to work here but are long gone. Besides, it’s the core feature that they developed that’s failing now, so it must be their fault.

A couple people suggest reinstating the tests. They actually saved them locally when we decided to delete them from our code, having suspected they were actually not just important, but essential. They thought we were making a fatal mistake removing them. Some voiced their opinions, but in the end they went along with our plan in order to keep their jobs.

“But we can’t just add the old tests back!” retorts one lead developer. “Our code has changed so much, they wouldn’t even apply to our modern codebase.”

It’s true. Many of the old tests cover functionality that has since been removed, or significantly modified. None of the latest features were ever tested. Some of them work more or less as intended, but many behave much differently than initially hoped. Their gaps were then patched with more new untested code, which caused other strange behaviors to bubble up in different places.

The complexity was increasing exponentially, and nobody could make a confident change anymore, worried they’d pull the final Jenga block and bring the whole thing down for good.

The junior developers were especially paralyzed. Unsure of where to even start, they lost their confidence and entrepreneurial spirit, looking only to their seniors to solve the crisis for them.

All the while, our tiny competitors began creeping up on our market shares. We paid them little mind for the longest time, but now they were becoming a real threat. They never removed their tests. After we removed ours, we could pivot much faster than they could initially, we being unburdened by the need to write new tests and code that passes. We mocked them for being old-fashioned and backward.

But now they were encroaching on our markets. Our children wanted their devices. They had momentum. We had fear and inaction.

Adding tests back to our codebase will be painful. Many features will have to change while we refactor to get the old tests to pass. We’ll have to devote considerable time writing new tests for the new features that we’re able to keep. Many features will have to be removed altogether… perhaps just for a time… perhaps forever. We’ll continue losing ground to our competition, forced to spend time getting our testing suite up and running again, unable to devote resources to new features.

Maybe we can rebound and retain our spot as the market leader, or maybe we’re too late and our competitors replace us. Nonetheless, it’s becoming clear that no matter what… the market leader will have robust tests.

Introducing JsonCryption!

I couldn’t find a useful .NET library for easy and robust JSON property-level encryption/decryption, so I made one.

The GitHub page covers more details, but this is the gist:

Installation:

Install-Package JsonCryption.Newtonsoft
// There's also a version for System.Text.Json, but the implementation
// for Newtonsoft.Json is better, owing to the greater feature surface
// and customizability of the latter at this time.

Configuration:

// pseudo code (assuming using Newtonsoft.Json for serialization)
container.Register<JsonSerializer>(() => new JsonSerializer()
{
    ContractResolver = new JsonCryptionContractResolver(container.Resolve<IDataProtectionProvider>())
});

Usage:

var myFoo = new Foo("some important value", "something very public");
class Foo
{
    [Encrypt]
    public string EncryptedString { get; }
  
    public string UnencryptedString { get; }

    public Foo(string encryptedString, string unencryptedString)
    {
        ...
    }
}
var serializer = // resolve JsonSerializer
using var textWriter = ...
serializer.Serialize(textWriter, myFoo);
// pseudo output: '{ "encryptedString": "akjdfkldjagkldhtlfkjk...", "UnencryptedString": "something very public" }'

Why I need JsonCryption

My main project (not fully operational) is a .NET Core app that handles contact information for users. Being on the OCD spectrum, I wanted this data to have stronger protection than just disk-level and/or database-level encryption.

Property/field-level encryption – in addition to disk-level and database-level encryption – sounded pretty nice. But I needed to be able to easily control which fields/properties were encrypted from each object.

This project is also using Marten, which uses PostgreSQL as a document DB. Marten stores documents (C# objects, essentially) in tables with explicit lookup columns, and one column for the JSON blob. From what I could tell, the best hook offered by Marten’s API to encrypt/decrypt documents automatically is at the point of serialization/deserialization by providing an alternative ISerializer. If I encrypted the entire blob, I wouldn’t be able to query anything very well. So I needed a way to leave certain columns unencrypted when serializing – the ones that would serve as lookups in queries.

Discovery path

First Stop: Newtonsoft.Json.Encryption

This library provided a lot of inspiration. It intends to be very easy to use by requiring a single EncryptAttribute to decorate what is to be encrypted, and it plugs into Newtonsoft.Json via the ContractResolver approach (similar to JsonCryption above).

However, I felt that it had a few fatal flaws that would make using it a more difficult than initially meets the eye.

That it doesn’t store the Init Vector with the generated ciphertext was a non-starter for me. This requires consumers of the library to figure out how and where to store it themselves. I’m not a cryptographic expert (use JsonCryption at your own risk!), but it seems pretty standard practice to include the IV with the ciphertext to enable later decryption with just the symmetric key. In any case, this would be a bigger issue after later discoveries.

Overriding JsonConverter

Next, I came across this blog post by Thomas Freudenberg that used a slightly different approach. Rather than provide a custom ContractResolver, he decorated each property needing encryption with a custom JsonConverter. His approach also offered a normal way to handle the Init Vectors.

public class Settings {
    [JsonConverter(typeof(EncryptingJsonConverter), "#my*S3cr3t")]
    public string Password { get; set; }
}

This was interesting, but would be annoying to have to type all of that for each property needing encryption. Also, I would obviously need a way to inject the secret into the converter, rather than hard-code it here.

Nevertheless, it gave me an idea for an approach to use with .NET Core’s new System.Text.Json library…

Initial Attempt for System.Text.Json

Microsoft recently released System.Text.Json with .NET Core 3.0 as an open-source alternative to the also-open-source Newtonsoft.Json, which had been the default JSON serialization library for .NET Core up to now. Wanting to be cutting edge, and not knowing much about this new library, I started writing my solution around this.

The library has decent documentation, is open-source (as already mentioned), and enables powerful serialization customization via an unsealed public JsonConverterAttribute. By overriding this with my own implementation, I could essentially implement Freudenberg’s approach with much less code:

public sealed class EncryptAttribute : JsonConverterAttribute
{
    public EncryptAttribute() : base(typeof(EncryptedJsonConverterFactory))
    {
    }
}

Then I just needed to write a custom EncryptedJsonConverterFactory to provide the correct converter given the datatype being serialized.

But this approach also carried critical issues…

  • Overriding the JsonConverterAttribute ultimately required using a Singleton pattern rather than clean Dependency Injection
  • System.Text.Json currently offers no ability to serialize non-public properties, nor fields of any visibility. For most DDD scenarios, this was also a non-starter.

Newtonsoft.Json

Newtonsoft.Json offers support for serializing private to public fields and properties. It’s a well-known mature library with a highly extensible API. It’s JsonConverterAttribute is currently sealed, so we can’t override that… but there are better options for configuring it, anyway, in order to take advantage of Dependency Injection and other better patterns than I was forced to use with System.Text.Json.

The good news is that the exercise of implementing a solution for System.Text.Json forced me to develop some core logic for converting different datatypes to and from byte arrays, which would come in handy for encrypting a wide variety of datatypes. Another issue with the other libraries and approaches I mentioned earlier is that they only handled a tiny number of potential datatypes. I wanted a set-and-forget solution that would work widely, so being able to convert all built-in types and any nested combination thereof was essential.

Adding support for Cryptography best practices

I began with a custom implementation and abstraction of the core Encrypter that I was using throughout the library. It was basic and structured largely using inspiration from the two approaches discussed earlier.

It worked.

But then I attended a great session at CodeMash 2020 called Practical Cryptography for Developers. Without getting into the weeds of cryptography, I was exposed for the first time to the concept of key/algorithm rotation and management and cryptographic best practices.

Writing these features into my library would take me far outside its immediate domain, and far outside my expertise. Surely, I thought, there must be some libraries that handle this already…

Switching to Microsoft.AspNetCore.DataProtection underneath

… yes, there is. Obviously.

The open-source package Microsoft.AspNetCore.DataProtection was designed to provide

a simple, easy to use cryptographic API a developer can use to protect data, including key management and rotation

https://docs.microsoft.com/en-us/aspnet/core/security/data-protection/introduction?view=aspnetcore-3.1

It’s highly configurable, easy to bootstrap, built to promote testability, and built for .NET Core. It handles key management and algorithm management, written by dedicated experts in the field.

So I used that instead of my own Encrypter.

Closing

In the end, I kept both the System.Text.Json implementation (JsonCryption.System.Text.Json), and the Newtonsoft.Json implementation (JsonCryption.Newtonsoft).

JsonCryption.Newtonsoft is better for the moment, allowing encryption/serialization of private to public fields and properties, shallow or nested, of (theoretically) any data type that is also serializable by Newtonsoft.Json.

Check it out. Try it out.

And tell me what you think needs changed to make it better.