I'm trying to load a large piece of data into Redis as fast as possible. My data looks like: <pre class="prettyprint"><code>771240491921 SOME;STRING;ABOUT;THIS;LENGTH 345928354912 SOME;STRING;ABOUT;THIS;LENGTH </code></pre> There is a ~12 digit number on the left and a variable length string on the right. The key is going to be the number on the left and the data is going to be the string on the right. In my Redis instance that I just installed out of the box and with an uncompressed plain text file with this data, I can get about a million records into it a minute. I need to do about 45 million, which would take about 45 minutes. 45 minutes is too long. Are there some standard performance tweaks that exist for me to do this type of optimization? Would I get better performance by sharding across separate instances?

I like what Salvadore proposed, but here you are one more very clear way - generate feed for cli, e.g. <pre class="prettyprint"><code>SET xxx yyy SET xxx yyy SET xxx yyy </code></pre> pipe it into cli on server close to you. Then do save, shutdown and move data file to the destination server.

Bulk ingest into Redis

Tags:

redis

I'm trying to load a large piece of data into Redis as fast as possible.

My data looks like:

771240491921 SOME;STRING;ABOUT;THIS;LENGTH
345928354912 SOME;STRING;ABOUT;THIS;LENGTH

There is a ~12 digit number on the left and a variable length string on the right. The key is going to be the number on the left and the data is going to be the string on the right.

In my Redis instance that I just installed out of the box and with an uncompressed plain text file with this data, I can get about a million records into it a minute. I need to do about 45 million, which would take about 45 minutes. 45 minutes is too long.

Are there some standard performance tweaks that exist for me to do this type of optimization? Would I get better performance by sharding across separate instances?

917

asked Sep 21 '11 18:09

Donald Miner

2 Answers

The fastest way to do this is the following: generate Redis protocol out of this data. The documentation to generate the Redis protocol is on the Redis.io site, it is a trivial protocol. Once you have that, just call it appendonly.log and start redis in append only mode.

You can even do a FLUSHALL command and finally push the data into your server with netcat, redirecting the output to /dev/null.

This will be super fast, there is no RTT to wait, it's just a bulk loading of data.

Less hackish way, just insert things 1000 per time using pipelining. It's almost as fast as generating the protocol, but much more clean :)

answered Sep 22 '22 09:09

antirez

I like what Salvadore proposed, but here you are one more very clear way - generate feed for cli, e.g.

SET xxx yyy
SET xxx yyy
SET xxx yyy

pipe it into cli on server close to you. Then do save, shutdown and move data file to the destination server.

answered Sep 22 '22 09:09

Nick

Related questions
                            
                                Could not find cache store adapter for redis_store
                            
                                Multiple Redis Instances
                            
                                How can i change the name of databases in redis?
                            
                                How to determine if something is a member of an ordered set?
                            
                                How often should I open/close my Booksleeve connection?
                            
                                Running multiple instance of Redis on Centos
                            
                                How to Install and configure Redis on ElasticBeanstalk
                            
                                Is there a command to update redis?
                            
                                Deleting multiple keys in redis-rb
                            
                                How can I test if my redis cache is working?
                            
                                How do I integrate chat with nodejs and xmpp into my existing web application?
                            
                                Should I share Redis connection between files/modules?
                            
                                Sidekiq - view completed jobs
                            
                                celery tasks with long eta (8+ hours) are executed multiple times in a row when eta is reached
                            
                                How to use Redis mass insertion?
                            
                                Can StackExchange.Redis be used to store POCO?
                            
                                How to set a string with TTL with StackExchange.Redis
                            
                                Querying with Redis?
                            
                                How to remove Redis on 'message' listeners
                            
                                Redis::CommandError: ERR invalid DB index when running rspec

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With