We are using a 3 site, 3 nodes per site Cassandra 1.1.12 cluster that has 8GB ram allocated per node. We have been seeing long GC pauses on nodes periodically and that is messing with our applications realtime requirements. The systems we run on are 8 core systems with 24GB of ram.
We've been seeing 120+ second pauses where it does a stop the world GC.
We are running with these flags on JDK 1.7.0_04
-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms8G
-Xmx8G
-Xmn1600M
-XX:+HeapDumpOnOutOfMemoryError
-Xss180k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:PrintFLSStatistics=1
Here is the detailed GC log leading up to the long pause:
2014-02-23T11:50:19.231-0500: 2119627.980: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 145905916 Max Chunk Size: 3472057 Number of Blocks: 160577 Av. Block Size: 908 Tree Height: 146 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 2119627.981: [ParNew Desired survivor size 83886080 bytes, new threshold 1 (max 1) - age 1: 32269664 bytes, 32269664 total : 1356995K->44040K(1474560K), 4.5270760 secs] 5829345K->4546031K(8224768K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 144857324 Max Chunk Size: 2423465 Number of Blocks: 160577 Av. Block Size: 902 Tree Height: 146 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 4.5295190 secs] [Times: user=24.78 sys=0.07, real=4.52 secs] Heap after GC invocations=82068 (full 2561): par new generation total 1474560K, used 44040K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000) from space 163840K, 26% used [0x000000064ae00000, 0x000000064d902028, 0x0000000654e00000) to space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) concurrent mark-sweep generation total 6750208K, used 4501991K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) } Total time for which application threads were stopped: 4.5524690 seconds {Heap before GC invocations=82068 (full 2561): par new generation total 1474560K, used 1354760K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000) from space 163840K, 26% used [0x000000064ae00000, 0x000000064d902028, 0x0000000654e00000) to space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) concurrent mark-sweep generation total 6750208K, used 4501991K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) 2014-02-23T11:51:14.221-0500: 2119682.970: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 144857324 Max Chunk Size: 2423465 Number of Blocks: 160577 Av. Block Size: 902 Tree Height: 146 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 2119682.971: [ParNew Desired survivor size 83886080 bytes, new threshold 1 (max 1) - age 1: 41744088 bytes, 41744088 total : 1354760K->52443K(1474560K), 2.1589280 secs] 5856751K->4582809K(8224768K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 143937754 Max Chunk Size: 1505947 Number of Blocks: 160575 Av. Block Size: 896 Tree Height: 146 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 2.1613420 secs] [Times: user=12.82 sys=0.04, real=2.16 secs] Heap after GC invocations=82069 (full 2561): par new generation total 1474560K, used 52443K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000) from space 163840K, 32% used [0x0000000654e00000, 0x0000000658136e38, 0x000000065ee00000) to space 163840K, 0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000) concurrent mark-sweep generation total 6750208K, used 4530365K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) } Total time for which application threads were stopped: 2.1719930 seconds {Heap before GC invocations=82069 (full 2561): par new generation total 1474560K, used 1363163K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000) from space 163840K, 32% used [0x0000000654e00000, 0x0000000658136e38, 0x000000065ee00000) to space 163840K, 0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000) concurrent mark-sweep generation total 6750208K, used 4530365K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) 2014-02-23T11:52:33.089-0500: 2119761.837: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 143937754 Max Chunk Size: 1505947 Number of Blocks: 160575 Av. Block Size: 896 Tree Height: 146 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 2119761.839: [ParNew Desired survivor size 83886080 bytes, new threshold 1 (max 1) - age 1: 37906760 bytes, 37906760 total : 1363163K->48710K(1474560K), 3.5105890 secs] 5893529K->4611208K(8224768K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 142756036 Max Chunk Size: 326281 Number of Blocks: 160573 Av. Block Size: 889 Tree Height: 146 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 3.5130550 secs] [Times: user=18.81 sys=0.08, real=3.52 secs] Heap after GC invocations=82070 (full 2561): par new generation total 1474560K, used 48710K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000) from space 163840K, 29% used [0x000000064ae00000, 0x000000064dd91af8, 0x0000000654e00000) to space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) concurrent mark-sweep generation total 6750208K, used 4562497K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) } Total time for which application threads were stopped: 3.5236060 seconds {Heap before GC invocations=82070 (full 2561): par new generation total 1474560K, used 1359430K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000) from space 163840K, 29% used [0x000000064ae00000, 0x000000064dd91af8, 0x0000000654e00000) to space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) concurrent mark-sweep generation total 6750208K, used 4562497K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) 2014-02-23T11:55:59.448-0500: 2119968.196: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 142756036 Max Chunk Size: 326281 Number of Blocks: 160573 Av. Block Size: 889 Tree Height: 146 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 2119968.198: [ParNew (0: promotion failure size = 131074) (1: promotion failure size = 131074) (2: promotion failure size = 131074) (4: promotion failure size = 131074) (7: promotion failure size = 131074) (promotion failed) Desired survivor size 83886080 bytes, new threshold 1 (max 1) - age 1: 34318480 bytes, 34318480 total : 1359430K->1353373K(1474560K), 1.5971880 secs]2119969.795: [CMSCMS: Large block 0x000000073aa59270 : 4586726K->3600612K(6750208K), 149.1735470 secs] 5921928K->3600612K(8224768K), [CMS Perm : 23966K->23925K(40012K)]After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 403131826 Max Chunk Size: 403131826 Number of Blocks: 1 Av. Block Size: 403131826 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 150.7724630 secs] [Times: user=28.89 sys=11.57, real=150.75 secs] Heap after GC invocations=82071 (full 2562): par new generation total 1474560K, used 0K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000) from space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) to space 163840K, 0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000) concurrent mark-sweep generation total 6750208K, used 3600612K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23925K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) } Total time for which application threads were stopped: 150.7833360 seconds
I also have setup a nightly job that forces GC on the instance at 2am in hopes that it would alleviate these problems -- and it has done some good, but we're still having problems every few days with nodes.
-- Added detail After changing NewGen to 2GB (From 1600M), we saw a 15 second pause within first 12 hours of running.
... Class histograms dump:
15772.786: [Class Histogram (before full gc): num #instances #bytes class name ---------------------------------------------- 1: 9743656 526609104 [B 2: 9176097 440452656 java.nio.HeapByteBuffer 3: 8152787 326111480 java.math.BigInteger 4: 8126173 321393760 [I 5: 207997 307212904 [J 6: 8940730 214577520 java.lang.Long 7: 8121743 194921832 org.apache.cassandra.db.DecoratedKey 8: 8121399 129942384 org.apache.cassandra.dht.BigIntegerToken 9: 174374 78049856 [Ljava.lang.Object; 10: 914261 43884528 edu.stanford.ppl.concurrent.SnapTreeMap$Node 11: 1112269 35592608 java.util.concurrent.ConcurrentHashMap$HashEntry 12: 1101827 35258464 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 13: 306955 29467680 edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch 14: 1216257 29190168 org.apache.cassandra.cache.KeyCacheKey 15: 1111387 26673288 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue 16: 427695 20529360 edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder 17: 417278 16691120 org.apache.cassandra.db.ExpiringColumn 18: 306955 9822560 edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch 19: 401296 9631104 org.apache.cassandra.db.ISortedColumns$DeletionInfo 20: 305 8691760 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry; 21: 528460 8455360 java.util.concurrent.atomic.AtomicReference 22: 263965 8446880 edu.stanford.ppl.concurrent.SnapTreeMap 23: 342775 8226600 org.apache.cassandra.db.ColumnFamily 24: 263965 6335160 org.apache.cassandra.db.AtomicSortedColumns$Holder 25: 193133 6180256 org.apache.cassandra.db.DeletedColumn 26: 179519 5744608 java.util.ArrayList$Itr 27: 221515 5316360 java.util.concurrent.ConcurrentSkipListMap$Node 28: 35607 5025400 29: 35607 4852600 30: 61748 4445856 org.apache.cassandra.io.sstable.SSTableIdentityIterator 31: 264409 4230544 edu.stanford.ppl.concurrent.SnapTreeMap$COWMgr 32: 52605 4216504 [Ljava.util.HashMap$Entry; 33: 3389 3868680 34: 221414 3542624 org.apache.cassandra.db.AtomicSortedColumns 35: 107140 3428480 org.apache.cassandra.db.Column 36: 69734 3347232 java.util.TreeMap 37: 131777 3162648 java.util.ArrayList 38: 111752 2682048 java.util.concurrent.ConcurrentSkipListMap$Index 39: 51879 2490192 java.util.HashMap 40: 3389 2438184 41: 2978 2284352 42: 55660 1781120 org.apache.cassandra.io.util.DataOutputBuffer 43: 69664 1671936 org.apache.cassandra.db.TreeMapBackedSortedColumns 44: 51697 1654304 org.apache.cassandra.db.ArrayBackedSortedColumns 45: 61842 1485456 [Ljava.nio.ByteBuffer; 46: 61748 1481952 org.apache.cassandra.utils.BytesReadTracker 47: 3533 1462144 48: 55672 1336128 org.apache.cassandra.io.util.FastByteArrayOutputStream 49: 21743 1318848 [C 50: 33109 1268616 [Lorg.apache.cassandra.db.IColumn; 51: 28607 1144280 java.util.HashMap$KeyIterator 52: 17623 1127616 [Ljava.util.Hashtable$Entry; 53: 35003 1120096 java.util.Vector 54: 17335 1109440 com.sun.jmx.remote.util.OrderClassLoaders 55: 17601 844848 java.util.Hashtable 56: 18528 741120 org.apache.cassandra.io.sstable.IndexHelper$IndexInfo 57: 22634 724288 java.lang.String 58: 17514 700560 java.security.ProtectionDomain 59: 19829 634528 java.util.HashMap$Entry 60: 25327 607848 org.apache.cassandra.utils.IntervalTree.Interval 61: 17513 560416 java.security.CodeSource 62: 22026 528624 java.lang.Double 63: 21740 521760 java.util.concurrent.LinkedBlockingDeque$Node 64: 18782 486728 [[J 65: 18660 447840 org.apache.cassandra.utils.BloomFilter 66: 18660 447840 org.apache.cassandra.utils.obs.OpenBitSet 67: 18529 444696 org.apache.cassandra.db.compaction.PrecompactedRow 68: 3717 444688 java.lang.Class 69: 18528 444672 org.apache.cassandra.db.ColumnIndexer$RowHeader 70: 18528 444672 org.apache.cassandra.utils.BloomCalculations$BloomSpecification 71: 24572 393152 java.lang.Object 72: 5753 335568 [S 73: 7326 293040 java.util.TreeMap$Entry 74: 17817 285072 java.util.HashSet 75: 5583 282488 [[I 76: 17514 280224 java.security.ProtectionDomain$Key 77: 17514 280224 [Ljava.security.Principal; 78: 6612 264480 org.apache.cassandra.service.WriteResponseHandler 79: 7862 251584 org.apache.cassandra.db.RowMutation 80: 10083 241992 org.apache.cassandra.db.EchoedRow 81: 6110 195520 org.apache.cassandra.utils.ExpiringMap$CacheableObject 82: 311 179136 83: 2927 163912 org.apache.cassandra.thrift.TBinaryProtocol 84: 6563 157512 org.apache.cassandra.net.Message 85: 6289 150936 org.apache.cassandra.net.Header 86: 6110 146640 org.apache.cassandra.net.CallbackInfo 87: 5996 143904 java.util.concurrent.LinkedBlockingQueue$Node 88: 8004 128064 java.util.HashMap$EntrySet 89: 7885 126160 java.util.TreeMap$Values 90: 7790 124640 java.util.concurrent.atomic.AtomicInteger 91: 5132 123168 org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder 92: 6640 106240 org.apache.cassandra.utils.SimpleCondition 93: 6626 106016 java.util.HashMap$Values 94: 830 92960 java.net.SocksSocketImpl 95: 1076 86080 java.lang.reflect.Method 96: 4176 66816 java.lang.Integer 97: 2005 64160 java.lang.ThreadLocal$ThreadLocalMap$Entry 98: 1592 63680 java.lang.ref.SoftReference 99: 1586 63440 com.google.common.collect.SingletonImmutableMap 100: 1743 55776 java.util.TreeMap$ValueIterator 101: 470 48880 java.lang.Thread 102: 1469 47008 java.util.concurrent.locks.AbstractQueuedSynchronizer$Node 103: 1041 41640 java.lang.ref.Finalizer 104: 1002 40080 java.util.LinkedHashMap$Entry 105: 1603 38472 com.google.common.collect.SingletonImmutableSet 106: 1586 38064 com.google.common.collect.ImmutableEntry 107: 435 34800 [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry; 108: 970 31040 java.net.Inet4Address 109: 425 30600 java.lang.reflect.Constructor 110: 396 28512 java.lang.reflect.Field 111: 825 26400 java.net.Socket 112: 1050 25200 java.util.concurrent.atomic.AtomicLong 113: 687 21984 java.util.concurrent.SynchronousQueue$TransferStack$SNode 114: 387 21672 sun.security.provider.MD5 115: 451 21648 java.util.concurrent.ThreadPoolExecutor$Worker 116: 442 21216 java.net.SocketInputStream 117: 525 21000 org.apache.cassandra.thrift.TCustomSocket 118: 514 20560 org.apache.cassandra.db.CounterColumn 119: 639 20448 120: 353 19768 org.cliffc.high_scale_lib.ConcurrentAutoTable$CAT 121: 762 18288 java.io.FileDescriptor 122: 570 18240 org.apache.thrift.transport.TFramedTransport 123: 939 18200 [Ljava.lang.Class; 124: 546 17472 java.util.concurrent.locks.ReentrantLock$NonfairSync 125: 532 17464 [Ljava.lang.String; 126: 527 16864 java.util.Hashtable$Entry 127: 298 16688 org.apache.cassandra.service.ClientState$1 128: 298 16688 org.apache.cassandra.service.ClientState$2 129: 405 16200 org.apache.cassandra.thrift.Column 130: 323 15504 java.net.SocketOutputStream 131: 298 14304 org.apache.cassandra.service.ClientState 132: 585 14040 org.apache.thrift.transport.TMemoryInputTransport 133: 570 13680 org.apache.thrift.TByteArrayOutputStream 134: 284 13632 sun.nio.cs.UTF_8$Encoder 135: 405 12960 org.apache.cassandra.thrift.ColumnOrSuperColumn 136: 324 12960 java.io.BufferedInputStream 137: 530 12720 org.apache.cassandra.utils.EstimatedHistogram 138: 525 12600 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess 139: 388 12416 java.security.MessageDigest$Delegate 140: 129 12384 org.apache.cassandra.io.sstable.SSTableReader 141: 305 12200 java.util.concurrent.ConcurrentHashMap$Segment 142: 376 12032 org.apache.cassandra.db.columniterator.SSTableSliceIterator 143: 47 10904 [Z 144: 334 10688 org.apache.cassandra.net.OutboundTcpConnection$Entry 145: 330 10560 java.lang.ref.WeakReference 146: 435 10440 java.lang.ThreadLocal$ThreadLocalMap 147: 426 10224 java.util.BitSet 148: 314 10048 org.apache.cassandra.net.MessageDeliveryTask ... [removed anything taking less than 10K] Total 72302201 2937460496 , 3.1156640 secs] 15775.902: [CMSCMS: Large block 0x0000000714aace48 : 2751863K->2568755K(6340608K), 12.0184460 secs]15787.921: [Class Histogram (after full gc): num #instances #bytes class name ---------------------------------------------- 1: 8644126 451299632 [B 2: 8434549 404858352 java.nio.HeapByteBuffer 3: 7759859 310394360 java.math.BigInteger 4: 10686 293655512 [J 5: 7763265 248560200 [I 6: 8630461 207131064 java.lang.Long 7: 7759615 186230760 org.apache.cassandra.db.DecoratedKey 8: 7759483 124151728 org.apache.cassandra.dht.BigIntegerToken 9: 1825 64173960 [Ljava.lang.Object; 10: 1096613 35091616 java.util.concurrent.ConcurrentHashMap$HashEntry 11: 1092266 34952512 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 12: 1092266 26214384 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue 13: 1092266 26214384 org.apache.cassandra.cache.KeyCacheKey 14: 532463 25558224 edu.stanford.ppl.concurrent.SnapTreeMap$Node 15: 221414 21255744 edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch 16: 411588 16463520 org.apache.cassandra.db.ExpiringColumn 17: 341705 16401840 edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder 18: 305 8691760 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry; 19: 442919 7086704 java.util.concurrent.atomic.AtomicReference 20: 221414 7085248 edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch 21: 221414 7085248 edu.stanford.ppl.concurrent.SnapTreeMap 22: 221515 5316360 java.util.concurrent.ConcurrentSkipListMap$Node 23: 221418 5314032 org.apache.cassandra.db.ColumnFamily 24: 221418 5314032 org.apache.cassandra.db.ISortedColumns$DeletionInfo 25: 221414 5313936 org.apache.cassandra.db.AtomicSortedColumns$Holder 26: 35593 5023912 27: 35593 4850696 28: 3382 3862240 29: 116355 3723360 org.apache.cassandra.db.DeletedColumn 30: 221414 3542624 org.apache.cassandra.db.AtomicSortedColumns 31: 221414 3542624 edu.stanford.ppl.concurrent.SnapTreeMap$COWMgr 32: 111752 2682048 java.util.concurrent.ConcurrentSkipListMap$Index 33: 3382 2433816 34: 2971 2279488 35: 3533 1462144 36: 11281 891344 [C 37: 3710 443904 java.lang.Class 38: 12509 400288 java.lang.String 39: 23066 369056 java.lang.Object 40: 5742 334480 [S 41: 5417 275792 [[I 42: 8620 206880 java.lang.Double 43: 8596 206304 java.util.concurrent.LinkedBlockingDeque$Node 44: 311 179136 45: 1631 139152 [Ljava.util.HashMap$Entry; 46: 4006 128192 org.apache.cassandra.db.Column 47: 3630 116160 java.util.HashMap$Entry 48: 1066 85280 java.lang.reflect.Method 49: 1310 62880 java.util.HashMap 50: 509 57008 java.net.SocksSocketImpl 51: 1178 47120 java.lang.ref.SoftReference 52: 228 40752 [[J 53: 411 29592 java.lang.reflect.Constructor 54: 703 28120 java.lang.ref.Finalizer 55: 367 26424 java.lang.reflect.Field 56: 248 25792 java.lang.Thread 57: 804 25728 java.lang.ThreadLocal$ThreadLocalMap$Entry 58: 554 22160 java.util.LinkedHashMap$Entry 59: 431 20688 java.net.SocketInputStream 60: 514 20560 org.apache.cassandra.db.CounterColumn 61: 835 20040 java.util.concurrent.atomic.AtomicLong 62: 912 17768 [Ljava.lang.Class; 63: 546 17472 java.util.concurrent.locks.ReentrantLock$NonfairSync 64: 524 17272 [Ljava.lang.String; 65: 535 17120 java.net.Inet4Address 66: 1068 17088 java.lang.Integer 67: 213 17040 [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry; 68: 708 16992 java.util.ArrayList 69: 1043 16688 java.util.concurrent.atomic.AtomicInteger 70: 684 16416 java.io.FileDescriptor 71: 506 16192 java.util.Hashtable$Entry 72: 504 16128 java.net.Socket 73: 184 15296 [Ljava.util.Hashtable$Entry; 74: 264 14784 org.apache.cassandra.thrift.TBinaryProtocol 75: 305 12200 java.util.concurrent.ConcurrentHashMap$Segment 76: 506 12144 org.apache.cassandra.utils.EstimatedHistogram 77: 201 11256 org.cliffc.high_scale_lib.ConcurrentAutoTable$CAT 78: 117 11232 org.apache.cassandra.io.sstable.SSTableReader 79: 229 10992 java.util.concurrent.ThreadPoolExecutor$Worker 80: 47 10904 [Z 81: 320 10240 java.lang.ref.WeakReference 82: 319 10208 java.util.Vector 83: 181 10136 sun.security.provider.MD5 ... [removed anything taking up less than 10K] Total 65302653 2582483384 , 2.5709140 secs] 2951753K->2568755K(8183808K), [CMS Perm : 21044K->21000K(35192K)]After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 482780727 Max Chunk Size: 482780727 Number of Blocks: 1 Av. Block Size: 482780727 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 17.7063220 secs] [Times: user=16.57 sys=0.05, real=17.70 secs] Heap after GC invocations=1233 (full 21): par new generation total 1843200K, used 0K [0x00000005fae00000, 0x0000000677e00000, 0x0000000677e00000) eden space 1638400K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000065ee00000) from space 204800K, 0% used [0x000000065ee00000, 0x000000065ee00000, 0x000000066b600000) to space 204800K, 0% used [0x000000066b600000, 0x000000066b600000, 0x0000000677e00000) concurrent mark-sweep generation total 6340608K, used 2568755K [0x0000000677e00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 35192K, used 21000K [0x00000007fae00000, 0x00000007fd05e000, 0x0000000800000000) } Total time for which application threads were stopped: 17.7081220 seconds
First an extra tip, consider to also add this to your GC settings. It will dump a histogram on full GCs which you can use to figure out what is occupying the heap so badly:
JVM_OPTS="$JVM_OPTS -XX:+PrintSafepointStatistics" JVM_OPTS="$JVM_OPTS -XX:+PrintClassHistogramBeforeFullGC" JVM_OPTS="$JVM_OPTS -XX:+PrintClassHistogramAfterFullGC"
Also look at your system.log for messages from MemoryMeter.java. Those usually spit out during these situations and can be handy to get more clues.
From the messages in your GC Log, it seems you have promotions failures:
2119968.198: [ParNew (0: promotion failure size = 131074) (1: promotion failure size = 131074) (2: promotion failure size = 131074) (4: promotion failure size = 131074) (7: promotion failure size = 131074) (promotion failed) Desired survivor size 83886080 bytes, new threshold 1 (max 1) - age 1: 34318480 bytes, 34318480 total
This means that you have objects leaving inside your HEAP for too long and your heap gets fragmented causing it to not have enough room to promote objects from new generation to survivor or old generation. This is not normal for the strong hardware that you have, but it did happen to me even on a 16 core machine with 70Gb, so here are a few things to look at serially so you can identify the problem. If you had that histogram I told you about we could jump into conclusions faster but here is the list to try:
1.If you have enabled multi-threaded compactions, disable it;
2.How is your CPU doing? If high utilization, reduce concurrent compactors which defaults to 8 threads in your system to 4;
3.Cassandra 1.1 has a bug with Key Cache size calculation. It uses a hardcoded size of 48 bytes to calculate the heap space used for key cache and if your application produces keys that are longer than that, your keycache usage on the heap will be significantly higher than what you actually set it to in casssandra.yaml and it will cause a lot of heap pressure specially stuff that sits there for a long time causing promotion failures:
References: https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob;f=src/java/org/apache/cassandra/service/CacheService.java;hb=02672936#l102
https://issues.apache.org/jira/browse/CASSANDRA-4315
You can examine your key cache usage from nodetool info command or appropriate JMX MBean. If it is usually full, that could be a clue; unfortunately the fix is not available in 1.1. You should upgrade to 1.2 if you have this issue or to work around it without upgrade, lower the value of your key cache size in cassandra.yaml significantly;
4.Are you using row cache? Try turning that off.
5.Wide Row or what I usually call the row of death is read. If your application is producing wide rows (1 key with lots of columns), and is trying to read a large slice of that row (>100 columns) chances are depending on the size of data inside those columns, you will make your node come into halt like this with long GCs. Try consulting your developers and make sure you are reading small slices at a time from wide rows;
You can look at nodetool cfstats command to see what is the Max Row size of your Column Families. If you have a CF with max row size that is few hundred MBs or more, it could be a clue;
6.Is your application mixed read/write heavy? If you have a lot of writes it could be the memtable pressure. Try reducing the memtable size in cassandra.yaml.
7.For read/write heavy loads chances are the default tenuring threshold flag in GC setting is very low causing premature promotions. This could be very likely a symptom of your problem. Try increasing the tenuringthreshold to a large value like 32 to prevent premature promotions so that ParNew collects most garbage in young generation.
You can observer this tenuring problem and promotion failures by running jstat command and see how the survivor spaces in heap are utilized. I bet you they are not utilized much and objects go straight from eden to old generation. ;)
8.If all failed, chances are your heap is not big enough to handle your workload, so you have to adjust the size. This is less likely and you should consider it as your last resort and do it with caution.
8A: Increase your HEAP_NEWGEN. Try increasing it to 2GB out of your 8GB. Also increase the tenuring threshold like above;
8B: If the above improved the situation but did not eliminate the problem 100%, consider increasing your MAX_HEAP_SIZE 2GB at a time. Increase NEWGEN with it. Make sure it does not go beyond 1/4 of the MAX HEAP.
Continue up to 16GB but no more than that since you only have 24GB total memory.
About myself: I had all the above problems and I eliminated them one at a time. Currently I run my nodes with 24GB heap and 6GB new gen and tenuring threshold of 32. The maximum GC pauses I have is <250ms 1-2 times a day during hefty pick hours.
Hope this helps you.
Here are a few articles about GC: http://fedora.fiz-karlsruhe.de/docs/Wiki.jsp?page=Java%20Heap%20%26%20GC%20Tuning http://blog.ragozin.info/2011/10/java-cg-hotspots-cms-and-heap.html http://java.dzone.com/articles/how-tame-java-gc-pauses http://grokbase.com/t/cassandra/user/113qf50x4r/parnew-promotion-failed
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With