I'm moving some part of my site from relational database to Redis and need to insert milions of keys in possibly short time.
In my case, data must be first fetched from MySQL, prepared by PHP and then added to corresponding sorted sets (time as a score + ID as a value). Currently I'm taking adventage of phpredis multi method with Redis::PIPELINE
parameter. Despite noticeable speed improvements it turned out to block reads and slow down loading times while doing import.
So here comes the question - is using pipeline in phpredis an equivalent to the mass insertion described in http://redis.io/topics/mass-insert?
Here's an example:
phpredis way:
<?php
// All necessary requires etc.
$client = Redis::getClient();
$client->multi(Redis::PIPELINE); // OR $client->pipeline();
$client->zAdd('key', 1, 2);
...
$client->zAdd('key', 1000, 2000);
$client->exec();
vs protocol from redis.io:
cat data.txt | redis-cli --pipe
I'm one of the contributors to phpredis, so I can answer your question. The short answer is that it is not the same but I'll provide a bit more detail.
What happens when you put phpredis into Redis::PIPELINE
mode is that instead of sending the command when it is called, it puts it into a list of "to be sent" commands. Then, once you call exec()
, one big command buffer is created with all of the commands and sent to Redis.
After the commands are all sent, phpredis reads each reply and packages the results as per each commands specification (e.g. HMGET
calls come back as associative arrays, etc).
The performance on pipelining in phpredis is actually quite good, and should suffice for almost every use case. That being said, you are still processing every command through PHP, which means you will pay the function call overhead by calling the phpredis extension itself for every command. In addition, phpredis will spend time processing and formatting each reply.
If your use case requires importing MASSIVE amounts of data into Redis, especially if you don't need to process each reply (but instead just want to know that all commands were processed), then the mass-import method is the way to go.
I've actually created a project to do this here: https://github.com/michael-grunder/redismi
The idea behind this extension is that you call it with your commands and then save the buffer to disk, which will be in the raw Redis protocol and compatible with cat buffer.txt | redis-cli --pipe
style insertion.
One thing to note is that at present you can't simply replace any given phpredis call with a call to the RedisMI object, as commands are processed as variable argument calls (like hiredis), which work for most, but not all phpredis commands.
Here is a simple example of how you might use it:
<?php
$obj_mi = new RedisMI();
// Some context we can pass around in RedisMI for whatever we want
$obj_context = new StdClass();
$obj_context->session_id = "some-session-id";
// Attach this context to the RedisMI object
$obj_mi->SetInfo($obj_context);
// Set a callback when a buffer is saved
$obj_mi->SaveCallback(
function($obj_mi, $str_filename, $i_cmd_count) {
// Output our context info we attached
$obj_context = $obj_mi->GetInfo();
echo "session id: " . $obj_context->session_id . "\n";
// Output the filename and how many commands were sent
echo "buffer file: " . $str_filename . "\n";
echo "commands : " . $i_cmd_count . "\n";
}
);
// A thousand SADD commands, adding three members each time
for($i=0;$i<1000;$i++) {
$obj_mi->sadd('some-set', "$i-one", "$i-two", "$i-three");
}
// A thousand ZADD commands
for($i=0;$i<1000;$i++) {
$obj_mi->zadd('some-zset', $i, "member-$i");
}
// Save the buffer
$obj_mi->SaveBuffer('test.buf');
?>
Then you can do something like this:
➜ tredismi php mi.php
session id: some-session-id
buffer file: test.buf
commands : 2000
➜ tredismi cat test.buf|redis-cli --pipe
All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 0, replies: 2000
Cheers!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With