Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove fields from JSON dynamically using Json.Net

Tags:

json

c#

json.net

I have some JSON input, the shape of which I cannot predict, and I have to make some transformations (to call it something) so that some fields are not logged. For instance, if I have this JSON:

{
    "id": 5,
    "name": "Peter",
    "password": "some pwd"
}

then after the transformation it should look like this:

{
    "id": 5,
    "name": "Peter"
}  

The above sample is trivial, but the actual case is not so happy/easy. I will have some regular expressions and if any field(s) on the input JSON matches that, then it shouldn't be on the result. I will have to go recursively in case I have some nested objects. I've been seeing some stuff on LINQ to JSON but I have found nothing satisfying my needs.

Is there a way of doing this?

Note: This is part of a logging library. I can use the JSON string if necessary or easier. The thing is that at some point in my logging pipeline I get the object (or string as required) and then I need to strip the sensitive data from it, such as passwords, but also any other client-specified data.

like image 700
Luiso Avatar asked Oct 18 '16 19:10

Luiso


2 Answers

You can parse your JSON into a JToken, then use a recursive helper method to match property names to your regexes. Wherever there's a match, you can remove the property from its parent object. After all sensitive info has been removed, just use JToken.ToString() to get the redacted JSON.

Here is what the helper method might look like:

public static string RemoveSensitiveProperties(string json, IEnumerable<Regex> regexes)
{
    JToken token = JToken.Parse(json);
    RemoveSensitiveProperties(token, regexes);
    return token.ToString();
}

public static void RemoveSensitiveProperties(JToken token, IEnumerable<Regex> regexes)
{
    if (token.Type == JTokenType.Object)
    {
        foreach (JProperty prop in token.Children<JProperty>().ToList())
        {
            bool removed = false;
            foreach (Regex regex in regexes)
            {
                if (regex.IsMatch(prop.Name))
                {
                    prop.Remove();
                    removed = true;
                    break;
                }
            }
            if (!removed)
            {
                RemoveSensitiveProperties(prop.Value, regexes);
            }
        }
    }
    else if (token.Type == JTokenType.Array)
    {
        foreach (JToken child in token.Children())
        {
            RemoveSensitiveProperties(child, regexes);
        }
    }
}

And here is a short demo of its use:

public static void Test()
{
    string json = @"
    {
      ""users"": [
        {
          ""id"": 5,
          ""name"": ""Peter Gibbons"",
          ""company"": ""Initech"",
          ""login"": ""pgibbons"",
          ""password"": ""Sup3rS3cr3tP@ssw0rd!"",
          ""financialDetails"": {
            ""creditCards"": [
              {
                ""vendor"": ""Viza"",
                ""cardNumber"": ""1000200030004000"",
                ""expDate"": ""2017-10-18"",
                ""securityCode"": 123,
                ""lastUse"": ""2016-10-15""
              },
              {
                ""vendor"": ""MasterCharge"",
                ""cardNumber"": ""1001200230034004"",
                ""expDate"": ""2018-05-21"",
                ""securityCode"": 789,
                ""lastUse"": ""2016-10-02""
              }
            ],
            ""bankAccounts"": [
              {
                ""accountType"": ""checking"",
                ""accountNumber"": ""12345678901"",
                ""financialInsitution"": ""1st Bank of USA"",
                ""routingNumber"": ""012345670""
              }
            ]
          },
          ""securityAnswers"":
          [
              ""Constantinople"",
              ""Goldfinkle"",
              ""Poppykosh"",
          ],
          ""interests"": ""Computer security, numbers and passwords""
        }
      ]
    }";

    Regex[] regexes = new Regex[]
    {
        new Regex("^.*password.*$", RegexOptions.IgnoreCase),
        new Regex("^.*number$", RegexOptions.IgnoreCase),
        new Regex("^expDate$", RegexOptions.IgnoreCase),
        new Regex("^security.*$", RegexOptions.IgnoreCase),
    };

    string redactedJson = RemoveSensitiveProperties(json, regexes);
    Console.WriteLine(redactedJson);
}

Here is the resulting output:

{
  "users": [
    {
      "id": 5,
      "name": "Peter Gibbons",
      "company": "Initech",
      "login": "pgibbons",
      "financialDetails": {
        "creditCards": [
          {
            "vendor": "Viza",
            "lastUse": "2016-10-15"
          },
          {
            "vendor": "MasterCharge",
            "lastUse": "2016-10-02"
          }
        ],
        "bankAccounts": [
          {
            "accountType": "checking",
            "financialInsitution": "1st Bank of USA"
          }
        ]
      },
      "interests": "Computer security, numbers and passwords"
    }
  ]
}

Fiddle: https://dotnetfiddle.net/KcSuDt

like image 62
Brian Rogers Avatar answered Oct 27 '22 01:10

Brian Rogers


You can parse your JSON to a JContainer (which is either an object or array), then search the JSON hierarchy using DescendantsAndSelf() for properties with names that match some Regex, or string values that match a Regex, and remove those items with JToken.Remove().

For instance, given the following JSON:

{
  "Items": [
    {
      "id": 5,
      "name": "Peter",
      "password": "some pwd"
    },
    {
      "id": 5,
      "name": "Peter",
      "password": "some pwd"
    }
  ],
  "RootPasswrd2": "some pwd",
  "SecretData": "This data is secret",
  "StringArray": [
    "I am public",
    "This is also secret"
  ]
}

You can remove all properties whose name includes "pass.*w.*r.*d" as follows:

var root = (JContainer)JToken.Parse(jsonString);

var nameRegex = new Regex(".*pass.*w.*r.*d.*", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
var query = root.DescendantsAndSelf()
    .OfType<JProperty>()
    .Where(p => nameRegex.IsMatch(p.Name));
query.RemoveFromLowestPossibleParents();

Which results in:

{
  "Items": [
    {
      "id": 5,
      "name": "Peter"
    },
    {
      "id": 5,
      "name": "Peter"
    }
  ],
  "SecretData": "This data is secret",
  "StringArray": [
    "I am public",
    "This is also secret"
  ]
}

And you can remove all string values that include the substring secret by doing:

var valueRegex = new Regex(".*secret.*", RegexOptions.IgnoreCase);
var query2 = root.DescendantsAndSelf()
    .OfType<JValue>()
    .Where(v => v.Type == JTokenType.String && valueRegex.IsMatch((string)v));
query2.RemoveFromLowestPossibleParents();

var finalJsonString = root.ToString();

Which when applied after the first transform results in:

{
  "Items": [
    {
      "id": 5,
      "name": "Peter"
    },
    {
      "id": 5,
      "name": "Peter"
    }
  ],
  "StringArray": [
    "I am public"
  ]
}

For convenience, I am using the following extension methods:

public static partial class JsonExtensions
{
    public static TJToken RemoveFromLowestPossibleParent<TJToken>(this TJToken node) where TJToken : JToken
    {
        if (node == null)
            return null;
        JToken toRemove;
        var property = node.Parent as JProperty;
        if (property != null)
        {
            // Also detach the node from its immediate containing property -- Remove() does not do this even though it seems like it should
            toRemove = property;
            property.Value = null;
        }
        else
        {
            toRemove = node;
        }
        if (toRemove.Parent != null)
            toRemove.Remove();
        return node;
    }

    public static IEnumerable<TJToken> RemoveFromLowestPossibleParents<TJToken>(this IEnumerable<TJToken> nodes) where TJToken : JToken
    {
        var list = nodes.ToList();
        foreach (var node in list)
            node.RemoveFromLowestPossibleParent();
        return list;
    }
}

Demo fiddle here.

like image 32
dbc Avatar answered Oct 27 '22 01:10

dbc