Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using the same model to deserialize in parallel (efficiently)

I'm trying to read multiple files that have been serialized with ProtoBuf.NET using .NET Tasks like this:

public static ResultsDump Amalgamate(RuntimeTypeModel model, IEnumerable<string> files)
{      
  var readDumpTasks = 
    files.Select(fn =>
      Task<ResultsDump>.Factory.StartNew(() => {
        try {
          using (var dumpFile = new FileStream(fn, FileMode.Open))
          {
            var miniDump = (ResultsDump)model.Deserialize(dumpFile, null, typeof(ResultsDump));
            if (miniDump == null) {
              throw new Exception(string.Format("Failed to deserialize dump file {0}", fn));
            }
            //readDumps.Add(miniDump);
            return miniDump;
          }
        }
        catch (Exception e) {
          throw new Exception(string.Format("cannot read dump file {0}: {1}", fn, e.Message), e);
        }
      })).ToArray();

  Task.WaitAll(readDumpTasks);

  var allDumps = readDumpTasks.Select(t => t.Result).ToList();

  // Goes on.. irrelevant to the question
}

For some reason, CPU usage doesn't really go above a single core. Is there something inherent lock in Protobuf.NET that doesn't like desrializing multiple file concurrently?

I've tried this with multiple RuntimeTypeModel instances as well as one, and it always seems to peak at a very "low" CPU usage level..

Am I even wrong to be blaming ProtoBuf.NET? Is this the .NET memory allocator / TPL?

like image 695
damageboy Avatar asked Apr 15 '26 13:04

damageboy


1 Answers

There is intentionally very limited locking in protobuf-net; it only really locks while checking the types (first run) to see what is needed. Once the model is understood, it is lock-free, and it is designed to be trivially parallel.

As noted (comments) it is extremely likely that IO is your bottleneck. Indeed, parallelising access to the same physical disk / spindle will usually greatly reduce throughput, as the buffer is more contended and it has to do more seeking rather than contiguous reading.

This should be easy to test / validate: for a test run, instead of reading from disk, load them all into memory first;

var ms = new MemoryStream(
    File.ReadAllBytes(path));

With all the files loaded, now do the same code but passing the MemoryStreams in as input. If it still doesn't scale, it might be a bug. I strogly suspect, however, that you will find it parallelises nicely at that point.


Here's a worked example, which for me saturates all my cores with concurrent deserialization:

using System.Collections.Generic;
using ProtoBuf;
using System;
using System.IO;
using System.Threading.Tasks;


internal class Program
{
    private static void Main()
    {
        var foo = new Foo { Bars = new List<Bar>() };
        var rand = new Random(1234);
        for (int i = 0; i < 1000; i++)
        {
            var bar = new Bar
            {
                A = rand.Next(),
                B = rand.Next(),
                C = rand.Next(),
                D = rand.Next(),
                E = rand.Next(),
                F = rand.Next(),
                G = rand.Next(),
                H = rand.Next()
            };
            foo.Bars.Add(bar);
        }
        var ms = new MemoryStream();
        Serializer.Serialize(ms, foo);
        var bytes = ms.ToArray();
        const int count = 100000;
        Parallel.For(0, count, x =>
        {
            Serializer.Deserialize<Foo>(new MemoryStream(bytes));
        });
    }
}
[ProtoContract]
internal class Foo
{
    [ProtoMember(1)]
    public List<Bar> Bars { get; set; }
}
[ProtoContract]
internal class Bar
{
    [ProtoMember(1)]
    public int A { get; set; }
    [ProtoMember(2)]
    public int B { get; set; }
    [ProtoMember(3)]
    public int C { get; set; }
    [ProtoMember(4)]
    public int D { get; set; }
    [ProtoMember(5)]
    public int E { get; set; }
    [ProtoMember(6)]
    public int F { get; set; }
    [ProtoMember(7)]
    public int G { get; set; }
    [ProtoMember(8)]
    public int H { get; set; }
}
like image 125
Marc Gravell Avatar answered Apr 18 '26 03:04

Marc Gravell



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!