I'm trying to maximize inserts per second. I currently get around 20k inserts/sec. My performance is actually degrading the more threads and CPU I use (I have 16 cores available). 2 threads currently do more per sec than 16 threads on a 16 core dual processor machine. Any ideas on what the problem is? Is it because I'm using only one mongod? Is it indexing that could be slowing things down? Do I need to use sharding? I wonder if there's a way to shard, but also keep the database capped...
Constraints: must handle around 300k inserts/sec, must be self-limiting(capped), must be query-able relatively quickly
Problem Space: must handle call records for a major cellphone company (around 300k inserts/sec) and make those call records query-able for as long as possible (a week, for instance)
#!/usr/bin/perl use strict; use warnings; use threads; use threads::shared; use MongoDB; use Time::HiRes; my $conn = MongoDB::Connection->new; my $db = $conn->tutorial; my $users = $db->users; my $cmd = Tie::IxHash->new( "create" => "users", "capped" => "boolean::true", "max" => 10000000, ); $db->run_command($cmd); my $idx = Tie::IxHash->new( "background"=> "boolean::true", ); $users->ensure_index($idx); my $myhash = { "name" => "James", "age" => 31, # "likes" => [qw/Danielle biking food games/] }; my $j : shared = 0; my $numthread = 2; # how many threads to run my @array; for (1..100000) { push (@array, $myhash); $j++; } sub thInsert { #my @ids = $users->batch_insert(\@array); #$users->bulk_insert(\@array); $users->batch_insert(\@array); } my @threads; my $timestart = Time::HiRes::time(); push @threads, threads->new(\&thInsert) for 1..$numthread; $_->join foreach @threads; # wait for all threads to finish print (($j*$numthread) . "\n"); my $timeend = Time::HiRes::time(); print( (($j*$numthread)/($timeend - $timestart)) . "\n"); $users->drop(); $db->drop();
Performance Scale After evaluating multiple technology options, AHL used MongoDB to replace its relational and specialised 'tick' databases. MongoDB supports 250 million ticks per second, at 40x lower cost than the legacy technologies it replaced.
After running the test cases shown above I found out that for the first 10 million i.e. from 0 to 10 million inserts write per second with index was 0.4 times of without index. The more surprising was that for the next batch of 10 million records the write speed with index was reduced to 0.27 times without index.
Writes to MongoDB currently aquire a global write lock, although collection level locking is hopefully coming soon. By using more threads you're likely introducing more concurrency problems as the threads block eachother while they wait for the lock to be released.
Indexes will also slow you down, to get the best insert performance it's ideal to add them after you've loaded your data, however this isn't always possible, for example if you're using a unique index.
To really maximise write performance, your best bet is sharding. This'll give you a much better concurrency and higher disk I/O capacity as you distribute writes across several machines.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With