I am trying to insert a large(-ish) number of elements in the shortest time possible and I tried these two alternatives: 1) Pipelining: <pre class="prettyprint"><code>List<Task> addTasks = new List<Task>(); for (int i = 0; i < table.Rows.Count; i++) { DataRow row = table.Rows[i]; Task<bool> addAsync = redisDB.SetAddAsync(string.Format(keyFormat, row.Field<int>("Id")), row.Field<int>("Value")); addTasks.Add(addAsync); } Task[] tasks = addTasks.ToArray(); Task.WaitAll(tasks); </code></pre> 2) Batching: <pre class="prettyprint"><code>List<Task> addTasks = new List<Task>(); IBatch batch = redisDB.CreateBatch(); for (int i = 0; i < table.Rows.Count; i++) { DataRow row = table.Rows[i]; Task<bool> addAsync = batch.SetAddAsync(string.Format(keyFormat, row.Field<int>("Id")), row.Field<int>("Value")); addTasks.Add(addAsync); } batch.Execute(); Task[] tasks = addTasks.ToArray(); Task.WaitAll(tasks); </code></pre> I am not noticing any significant time difference (actually I expected the batch method to be faster): for approx 250K inserts I get approx 7 sec for pipelining vs approx 8 sec for batching. Reading from the documentation on pipelining, <blockquote> "Using pipelining allows us to get both requests onto the network immediately, eliminating most of the latency. Additionally, it also helps reduce packet fragmentation: 20 requests sent individually (waiting for each response) will require at least 20 packets, but 20 requests sent in a pipeline could fit into much fewer packets (perhaps even just one)." </blockquote> To me, this sounds a lot like the a batching behaviour. I wonder if behind the scenes there's any big difference between the two because at a simple check with <code>procmon</code> I see almost the same number of <code>TCP Send</code>s on both versions.

Behind the scenes, SE.Redis does quite a bit of work to try to avoid packet fragmentation, so it isn't surprising that it is quite similar in your case. The main difference between batching and flat pipelining are: <ul> <li>a batch will never be interleaved with competing operations on the same multiplexer (although it may be interleaved at the server; to avoid that you need to use a <code>multi</code>/<code>exec</code> transaction or a Lua script)</li> <li>a batch will be always avoid the chance of undersized packets, because it knows about all the data ahead of time</li> <li>but at the same time, the entire batch must be completed before anything can be sent, so this requires more in-memory buffering and may artificially introduce latency</li> </ul> In most cases, you will do better by avoiding batching, since SE.Redis achieves most of what it does automatically when simply adding work. As a final note; if you want to avoid local overhead, one final approach might be: <pre class="prettyprint"><code>redisDB.SetAdd(string.Format(keyFormat, row.Field<int>("Id")), row.Field<int>("Value"), flags: CommandFlags.FireAndForget); </code></pre> This sends everything down the wire, neither waiting for responses nor allocating incomplete <code>Task</code>s to represent future values. You might want to do something like a <code>Ping</code> at the end without fire-and-forget, to check the server is still talking to you. Note that using fire-and-forget does mean that you won't notice any server errors that get reported.

Pipelining vs Batching in Stackexchange.Redis

I am trying to insert a large(-ish) number of elements in the shortest time possible and I tried these two alternatives:

1) Pipelining:

List<Task> addTasks = new List<Task>(); for (int i = 0; i < table.Rows.Count; i++) {     DataRow row = table.Rows[i];     Task<bool> addAsync = redisDB.SetAddAsync(string.Format(keyFormat, row.Field<int>("Id")), row.Field<int>("Value"));     addTasks.Add(addAsync); } Task[] tasks = addTasks.ToArray(); Task.WaitAll(tasks);

2) Batching:

List<Task> addTasks = new List<Task>(); IBatch batch = redisDB.CreateBatch(); for (int i = 0; i < table.Rows.Count; i++) {     DataRow row = table.Rows[i];     Task<bool> addAsync = batch.SetAddAsync(string.Format(keyFormat, row.Field<int>("Id")), row.Field<int>("Value"));     addTasks.Add(addAsync); } batch.Execute(); Task[] tasks = addTasks.ToArray(); Task.WaitAll(tasks);

I am not noticing any significant time difference (actually I expected the batch method to be faster): for approx 250K inserts I get approx 7 sec for pipelining vs approx 8 sec for batching.

Reading from the documentation on pipelining,

"Using pipelining allows us to get both requests onto the network immediately, eliminating most of the latency. Additionally, it also helps reduce packet fragmentation: 20 requests sent individually (waiting for each response) will require at least 20 packets, but 20 requests sent in a pipeline could fit into much fewer packets (perhaps even just one)."

To me, this sounds a lot like the a batching behaviour. I wonder if behind the scenes there's any big difference between the two because at a simple check with procmon I see almost the same number of TCP Sends on both versions.

What is redis pipeline?

Redis pipelining is a technique for improving performance by issuing multiple commands at once without waiting for the response to each individual command. Pipelining is supported by most Redis clients. This document describes the problem that pipelining is designed to solve and how pipelining works in Redis.

Is StackExchange redis thread safe?

redis is fully thread safe; the expected usage is that a single multiplexer is reused between concurrent requests etc - very parallel. Two concurrent callers do not block each other: the two requests are pipelined and the results made available to each when the come back.

What is multiplexer in redis?

Multiplexing allows for a form of implicit pipelining. Pipelining, in the Redis sense, meaning sending commands to the server without regard for the response being received. If you've ever been through a drive-through window and rattled off your entire order into the speaker, this is like pipelining.

What is StackExchange redis?

StackExchange. Redis is a high performance general purpose redis client for . NET languages (C#, etc.). It is the logical successor to BookSleeve, and is the client developed-by (and used-by) Stack Exchange for busy sites like Stack Overflow.

Behind the scenes, SE.Redis does quite a bit of work to try to avoid packet fragmentation, so it isn't surprising that it is quite similar in your case. The main difference between batching and flat pipelining are:

a batch will never be interleaved with competing operations on the same multiplexer (although it may be interleaved at the server; to avoid that you need to use a multi/exec transaction or a Lua script)
a batch will be always avoid the chance of undersized packets, because it knows about all the data ahead of time
but at the same time, the entire batch must be completed before anything can be sent, so this requires more in-memory buffering and may artificially introduce latency

In most cases, you will do better by avoiding batching, since SE.Redis achieves most of what it does automatically when simply adding work.

As a final note; if you want to avoid local overhead, one final approach might be:

redisDB.SetAdd(string.Format(keyFormat, row.Field<int>("Id")),     row.Field<int>("Value"), flags: CommandFlags.FireAndForget);

This sends everything down the wire, neither waiting for responses nor allocating incomplete Tasks to represent future values. You might want to do something like a Ping at the end without fire-and-forget, to check the server is still talking to you. Note that using fire-and-forget does mean that you won't notice any server errors that get reported.

Pipelining vs Batching in Stackexchange.Redis

Tags:

CyberDude

People also ask

1 Answers

Marc Gravell

Recent Activity

Donate For Us

Pipelining vs Batching in Stackexchange.Redis

Tags:

CyberDude

People also ask

1 Answers

Marc Gravell

Related questions

Recent Activity

Donate For Us