Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to cache results of a json.net serialization in memory?

Project is MVC WebAPI based.

We are passing permission context of a client to our API servers as a serialized JSON object in the claims headers of the request. This is not a huge object: 6 properties and one collection of enum-based key-value pairs (up to 6 items here)

Vast majority of the requests to API occur every minute (some more frequently) from the same set of clients. Probably 700-900 clients (and growing), each one sending the same claims over and over, every minute.

For every request, various components of the code deserialize this object probably 5-6 times. This deserialization causes significant CPU drain on the servers.

What would be the best way to cache these deserializations in memory? Would a static Dictionary object with keys being serialized JSON strings, work well or would searching thru it be too slow, as these strings would be decently large in size?

EDIT: Every Action of every controller gets filtered thru this attribute to ensure that calls have proper permissions

    public class AccountResolveAttribute : ActionFilterAttribute
{
    public override void OnActionExecuting(HttpActionContext context)
    {
        var controller = (ControllerBase) context.ControllerContext.Controller;
        var identity = (ClaimsIdentity) controller.User.Identity;

        var users = identity.Claims
            .Where(c => c.Type == ClaimTypes.UserData.ToString())
            .Select(c => JsonConvert.DeserializeObject<UserInformation>(c.Value))
            .ToList();

        var accountId = controller.ReadAccountIdFromHeader();

        if (users.All(u => u.AccountId != accountId))
        {
            throw new ApplicationException(string.Format("You have no rights for viewing of information on an account Id={0}", accountId));
        }
    }
}

There are also calls in the base controller that interrogate the claims, but AccountResolve could probably cache the result of the first deserialization into the controller so that those calls do not try to deserialize again. However, the claims are the same over and over again, and I'm just trying to find a way to optimize to not deserialize again and again the same string. I've tried caching the serialization string as a key and result object into memory in a global static ConcurrentDictionary, but it doesn't appear to have helped

like image 306
Igorek Avatar asked May 03 '16 15:05

Igorek


People also ask

Can JSON be cached?

In order to add data in cache as JSON, the objects need to be created according to the JSON standards provided by NCache. JsonObject is added in the cache against a unique key. This key will be used to perform further operations on the cache.

What is JSON serialization?

Json namespace provides functionality for serializing to and deserializing from JavaScript Object Notation (JSON). Serialization is the process of converting the state of an object, that is, the values of its properties, into a form that can be stored or transmitted.

What is in memory cache in .NET core?

ASP.NET Core supports several different caches. The simplest cache is based on the IMemoryCache. IMemoryCache represents a cache stored in the memory of the web server. Apps running on a server farm (multiple servers) should ensure sessions are sticky when using the in-memory cache.

What kind of caching is available in C#?

In-process Cache, Persistant in-process Cache, and Distributed Cache. There are 3 types of caches: In-Memory Cache is used for when you want to implement cache in a single process.


2 Answers

There seems to be two aspects to this question:

  1. What the title is asking
  2. Something is eating up CPU cycles; the assumption is that it's due to the deserialization of UserInformation instances

For 1., seems like a ConcurrentDictionary would fit the bill assuming that there really are a reasonably finite number UserInformation possibilities (you mention this in the question); otherwise not only would you keep taking the serialization cost, you'd essentially have something that looked like a memory leak.

If you can safely make the assumption, here's an example:

public static class ClaimsIdentityExtensions
{
    private static readonly ConcurrentDictionary<string, UserInformation> CachedUserInformations = new ConcurrentDictionary<string, UserInformation>();
    public static IEnumerable<UserInformation> GetUserInformationClaims(this ClaimsIdentity identity)
    {
        return identity
            .Claims
            .Where(c => c.Type == ClaimTypes.UserData)
            .Select(c => CachedUserInformations.GetOrAdd(
                c.Value,
                JsonConvert.DeserializeObject<UserInformation>));
    }
}

You had mentioned that you tried to use ConcerrentDictionary, but it didn't help. I would be shocked if the performance of deserializing an object beat out a lookup in a ConcurrentDictionary (again, making the aforementioned assumption), even if the keys are "long" strings. Without examples of the UserInformation class, it's hard to know with 100% certainty from our end...however, here's an example that shows that given a UserInformation with an AccountId property, the ConcurrentDictionary approach beats out the brute-force-deserialization approach by an order of magnitude:

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Security.Claims;
using Newtonsoft.Json;

namespace ConsoleApplication2
{
    public class UserInformation
    {
        public int AccountId { get; set; }
    }

    public static class ClaimsIdentityExtensions
    {
        private static readonly ConcurrentDictionary<string, UserInformation> CachedUserInformations = new ConcurrentDictionary<string, UserInformation>();
        public static IEnumerable<UserInformation> GetUserInformationClaims(this ClaimsIdentity identity, bool withConcurrentDictionary)
        {
            if (withConcurrentDictionary)
            {
                return identity
                    .Claims
                    .Where(c => c.Type == ClaimTypes.UserData)
                    .Select(c => CachedUserInformations.GetOrAdd(
                        c.Value,
                        JsonConvert.DeserializeObject<UserInformation>));
            }

            return identity
                .Claims
                .Where(c => c.Type == ClaimTypes.UserData)
                .Select(c => JsonConvert.DeserializeObject<UserInformation>(c.Value));
        }
    }

    class Program
    {
        static void Main()
        {
            var identity = new ClaimsIdentity(new[]
            {
                new Claim(ClaimTypes.UserData, "{AccountId: 1}"),
                new Claim(ClaimTypes.UserData, "{AccountId: 2}"),
                new Claim(ClaimTypes.UserData, "{AccountId: 3}"),
                new Claim(ClaimTypes.UserData, "{AccountId: 4}"),
                new Claim(ClaimTypes.UserData, "{AccountId: 5}"),
            });

            const int iterations = 1000000;
            var stopwatch = Stopwatch.StartNew();
            for (var i = 0; i < iterations; ++i)
            {
                identity.GetUserInformationClaims(withConcurrentDictionary: true).ToList();
            }
            Console.WriteLine($"With ConcurrentDictionary: {stopwatch.Elapsed}");

            stopwatch = Stopwatch.StartNew();
            for (var i = 0; i < iterations; ++i)
            {
                identity.GetUserInformationClaims(withConcurrentDictionary: false).ToList();
            }
            Console.WriteLine($"Without ConcurrentDictionary: {stopwatch.Elapsed}");
        }
    }
}

Output:

With ConcurrentDictionary: 00:00:00.8731377
Without ConcurrentDictionary: 00:00:05.5883120

One quick way to know if the deserialization of the UserInformation instances is the cause of the suspiciously high CPU cycles, try commenting out and stubbing out any validation against UserInformation and see if the cycles are still high.

like image 64
Rafael Dowling Goodman Avatar answered Oct 16 '22 11:10

Rafael Dowling Goodman


Since every GET returns different results, you'll likely need to implement your own caching, which isn't terribly hard. You can use MemoryCache or HttpRuntime.Cache to store whatever data you want. There's a simple example at the bottom of the documentation.

One cache exists for each process, so if you have IIS configured for more than one worker process, each process will hold its own cache.

But this way, you can hold whatever data you want in cache. Then retrieve it and manipulate it however you need to before returning data to the client.

You just need to implement some kind of locking to make sure the same cached item is not written to by multiple threads at the same time. See here for some ideas about that.


Old answer:

If each user sees the same data, then you can use Strathweb.CacheOutput.WebApi2, which is available in NuGet. It might fit your needs.

It will cache results based on the URL sent. So if data is returned for /api/getmydata, the next call to /api/getmydata will get data from the cache. You set the cache expiration.

You decorate your actions with the CacheOutputAttribute:

[CacheOutput(ServerTimeSpan = 100)]
public List<string> GetMyData() {
    ...
}

But if one action can return different data depending on who the user is, then this won't work so easily.

like image 44
Gabriel Luci Avatar answered Oct 16 '22 11:10

Gabriel Luci