I am evaluating tokyo cabinet Table engine. The insert rate slows down considerable after hitting 1 million records. Batch size is 100,000 and is done within transaction. I tried setting the xmsiz but still no use. Has any one faced this problem with tokyo cabinet?
Details
Tokyo cabinet - 1.4.3
Perl bindings - 1.23
OS : Ubuntu 7.10 (VMWare Player on top of Windows XP)
I hit a brick wall around 1 million records per shard as well (sharding on the client side, nothing fancy). I tried various ttserver options and they seemed to make no difference, so I looked at the kernel side and found that
echo 80 > /proc/sys/vm/dirty_ratio
(previous value was 10) gave a big improvement - the following is the total size of the data (on 8 shards, each on its own node) printed every minute:
total: 14238792 records, 27.5881 GB size total: 14263546 records, 27.6415 GB size total: 14288997 records, 27.6824 GB size total: 14309739 records, 27.7144 GB size total: 14323563 records, 27.7438 GB size (here I changed the dirty_ratio setting for all shards) total: 14394007 records, 27.8996 GB size total: 14486489 records, 28.0758 GB size total: 14571409 records, 28.2898 GB size total: 14663636 records, 28.4929 GB size total: 14802109 records, 28.7366 GB size
So you can see that the improvement was in the order of 7-8 times. Database size was around 4.5GB per node at that point (including indexes) and the nodes have 8GB RAM (so dirty_ratio of 10 meant that the kernel tried to keep less than ca. 800MB dirty).
Next thing I'll try is ext2 (currently: ext3) and noatime and also keeping everything on a ramdisk (that would probably waste twice the amount of memory, but might be worth it).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With