Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel execution with StackExchange.Redis?

I have a 1M items store in List<Person> Which I'm serializing in order to insert to Redis. (2.8)

I divide work among 10 Tasks<> where each takes its own section ( List<> is thread safe for readonly ( It is safe to perform multiple read operations on a List)

Simplification :

example:

For ITEMS=100, THREADS=10 , each Task will capture its own PAGE and deal with the relevant range.

For exaple :

void Main()
{
    var ITEMS=100;
    var THREADS=10;
    var PAGE=4;

    List<int> lst = Enumerable.Range(0,ITEMS).ToList();

    for (int i=0;i< ITEMS/THREADS ;i++)
    {
      lst[PAGE*(ITEMS/THREADS)+i].Dump();
    }
}
  • PAGE=0 will deal with : 0,1,2,3,4,5,6,7,8,9
  • PAGE=4 will deal with : 40,41,42,43,44,45,46,47,48,49

All ok.

Now back to SE.redis.

I wanted to implement this pattern and so I did : (with ITEMS=1,000,000)

enter image description here

My testing :

(Here is dbsize checking each second) :

enter image description here

As you can see , 1M records were added via 10 threads.

Now , I don't know if it's fast but , when I change ITEMS from 1M to 10M -- things get really slow and I get exception :

The exception is on the for loop.

Unhandled Exception: System.AggregateException: One or more errors occurred. ---

System.TimeoutException: Timeout performing SET urn:user>288257, inst: 1, queu e: 11, qu=0, qs=11, qc=0, wr=0/0, in=0/0 at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message messa ge, ResultProcessor1 processor, ServerEndPoint server) in c:\TeamCity\buildAgen t\work\58bc9a6df18a3782\StackExchange.Redis\StackExchange\Redis\ConnectionMultip lexer.cs:line 1722 at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProces sor1 processor, ServerEndPoint server) in c:\TeamCity\buildAgent\work\58bc9a6df 18a3782\StackExchange.Redis\StackExchange\Redis\RedisBase.cs:line 79 ... ... Press any key to continue . . .

Question:

  • Is my way of dividing work is the RIGHT way (fastest)
  • How can I get things faster ( a sample code would be much appreciated)
  • How can I resolve this exception?

Related info :

<gcAllowVeryLargeObjects enabled="true" /> Is present in App.config ( otherwise i'm getting outOfmemoryException ) , also - build for x64bit, I have 16GB , , ssd drive , i7 cpu).

like image 497
Royi Namir Avatar asked May 12 '14 09:05

Royi Namir


1 Answers

Currently, your code is using the synchronous API (StringSet), and is being loaded by 10 threads concurrently. This will present no appreciable challenge to SE.Redis - it works just fine here. I suspect that it genuinely is a timeout where the server has taken longer than you would like to process some of the data, most likely also related to the server's allocator. One option, then, is to simply increase the timeout a bit. Not a lot... try 5 seconds instead of the default 1 second. Likely, most of the operations are working very fast anyway.

With regards to speeding it up: one option here is to not wait - i.e. keep pipelining data. If you are content not to check every single message for an error state, then one simple way to do this is to add , flags: CommandFlags.FireAndForget to the end of your StringSet call. In my local testing, this sped up the 1M example by 25% (and I suspect a lot of the rest of the time is actually spent in string serialization).

The biggest problem I had with the 10M example was simply the overhead of working with the 10M example - especially since this takes huge amounts of memory for both the redis-server and the application, which (to emulate your setup) are on the same machine. This creates competing memory pressure, with GC pauses etc in the managed code. But perhaps more importantly: it simply takes forever to start doing anything. Consequently, I refactored the code to use parallel yield return generators rather than a single list. For example:

    static IEnumerable<Person> InventPeople(int seed, int count)
    {
        for(int i = 0; i < count; i++)
        {
            int f = 1 + seed + i;
            var item = new Person
            {
                Id = f,
                Name = Path.GetRandomFileName().Replace(".", "").Substring(0, appRandom.Value.Next(3, 6)) + " " + Path.GetRandomFileName().Replace(".", "").Substring(0, new Random(Guid.NewGuid().GetHashCode()).Next(3, 6)),
                Age = f % 90,
                Friends = ParallelEnumerable.Range(0, 100).Select(n => appRandom.Value.Next(1, f)).ToArray()
            };
            yield return item;
        }
    }

    static IEnumerable<T> Batchify<T>(this IEnumerable<T> source, int count)
    {
        var list = new List<T>(count);
        foreach(var item in source)
        {
            list.Add(item);
            if(list.Count == count)
            {
                foreach (var x in list) yield return x;
                list.Clear();
            }
        }
        foreach (var item in list) yield return item;
    }

with:

foreach (var element in InventPeople(PER_THREAD * counter1, PER_THREAD).Batchify(1000))

Here, the purpose of Batchify is to ensure that we aren't helping the server too much by taking appreciable time between each operation - the data is invented in batches of 1000 and each batch is made available very quickly.

I was also concerned about JSON performance, so I switched to JIL:

    public static string ToJSON<T>(this T obj)
    {
        return Jil.JSON.Serialize<T>(obj);
    }

and then just for fun, I moved the JSON work into the batching (so that the actual processing loops :

 foreach (var element in InventPeople(PER_THREAD * counter1, PER_THREAD)
     .Select(x => new { x.Id, Json = x.ToJSON() }).Batchify(1000))

This got the times down a bit more, so I can load 10M in 3 minutes 57 seconds, a rate of 42,194 rops. Most of this time is actually local processing inside the application. If I change it so that each thread loads the same item ITEMS / THREADS times, then this changes to 1 minute 48 seconds - a rate of 92,592 rops.

I'm not sure if I've answered anything really, but the short version might be simply "try a longer timeout; consider using fire-and-forget).

like image 148
Marc Gravell Avatar answered Oct 29 '22 07:10

Marc Gravell