赞
踩
I have a 5 node cluster of Cassandra, with ~650 GB of data on each node involving a replication factor of 3. I have recently started seeing the following error in /var/log/cassandra/system.log.
INFO [ReadStage-5] 2017-10-17 17:06:07,887 NoSpamLogger.java:91 - Maximum memory usage reached (1.000GiB), cannot allocate chunk of 1.000MiB
I have attempted to increase the file_cache_size_in_mb, but sooner rather than later this same error catches up. I have tried to go as high as 2GB for this parameter, but to no avail.
When the error happens, the CPU utilisation soars and the read latencies are terribly erratic. I see this surge show up approximated every 1/2 hour. Note the timings in the list below.
INFO [ReadStage-5] 2017-10-17 17:06:07,887 NoSpamLogger.java:91 - Maximum memory usage reached (1.000GiB), cannot allocate chunk of 1.000MiB INFO [ReadStage-36] 2017-10-17 17:36:09,807 NoSpamLogger.java:91 - Maximum memory usage reached (1.000GiB), cannot allocate chunk of 1.000MiB INFO [ReadStage-15] 2017-10-17 18:05:56,003 NoSpamLogger.java:91 - Maximum memory usage reached (2.000GiB), cannot allocate chunk of 1.000MiB INFO [ReadStage-28] 2017-10-17 18:36:01,177 NoSpamLogger.java:91 - Maximum memory usage reached (2.000GiB), cannot allocate chunk of 1.000MiB
Two of the tables that I have are partitioned by hour, and the partitions are large. Ex. Here are their outputs from nodetool table stats
- Read Count: 4693453
- Read Latency: 0.36752741680805157 ms.
- Write Count: 561026
- Write Latency: 0.03742310516803143 ms.
- Pending Flushes: 0
- Table: raw_data
- SSTable count: 55
- Space used (live): 594395754275
- Space used (total): 594395754275
- Space used by snapshots (total): 0
- Off heap memory used (total): 360753372
- SSTable Compression Ratio: 0.20022598072758296
- Number of keys (estimate): 45163
- Memtable cell count: 90441
- Memtable data size: 685647925
- Memtable off heap memory used: 0
- Memtable switch count: 1
- Local read count: 0
- Local read latency: NaN ms
- Local write count: 126710
- Local write latency: 0.096 ms
- Pending flushes: 0
- Percent repaired: 52.99
- Bloom filter false positives: 167775
- Bloom filter false ratio: 0.16152
- Bloom filter space used: 264448
- Bloom filter off heap memory used: 264008
- Index summary off heap memory used: 31060
- Compression metadata off heap memory used: 360458304
- Compacted partition minimum bytes: 51
- **Compacted partition maximum bytes: 3449259151**
- Compacted partition mean bytes: 16642499
- Average live cells per slice (last five minutes): 1.0005435888450147
- Maximum live cells per slice (last five minutes): 42
- Average tombstones per slice (last five minutes): 1.0
- Maximum tombstones per slice (last five minutes): 1
- Dropped Mutations: 0
-
-
-
- Read Count: 4712814
- Read Latency: 0.3356051004771247 ms.
- Write Count: 643718
- Write Latency: 0.04168356951335834 ms.
- Pending Flushes: 0
- Table: customer_profile_history
- SSTable count: 20
- Space used (live): 9423364484
- Space used (total): 9423364484
- Space used by snapshots (total): 0
- Off heap memory used (total): 6560008
- SSTable Compression Ratio: 0.1744084338623116
- Number of keys (estimate): 69
- Memtable cell count: 35242
- Memtable data size: 789595302
- Memtable off heap memory used: 0
- Memtable switch count: 1
- Local read count: 2307
- Local read latency: NaN ms
- Local write count: 51772
- Local write latency: 0.076 ms
- Pending flushes: 0
- Percent repaired: 0.0
- Bloom filter false positives: 0
- Bloom filter false ratio: 0.00000
- Bloom filter space used: 384
- Bloom filter off heap memory used: 224
- Index summary off heap memory used: 400
- Compression metadata off heap memory used: 6559384
- Compacted partition minimum bytes: 20502
- **Compacted partition maximum bytes: 4139110981**
- Compacted partition mean bytes: 708736810
- Average live cells per slice (last five minutes): NaN
- Maximum live cells per slice (last five minutes): 0
- Average tombstones per slice (last five minutes): NaN
- Maximum tombstones per slice (last five minutes): 0
- Dropped Mutations: 0
Here goes:
- cdsdb/raw_data histograms
- Percentile SSTables Write Latency Read Latency Partition Size Cell Count
- (micros) (micros) (bytes)
- 50% 0.00 61.21 0.00 1955666 642
- 75% 1.00 73.46 0.00 17436917 4768
- 95% 3.00 105.78 0.00 107964792 24601
- 98% 8.00 219.34 0.00 186563160 42510
- 99% 12.00 315.85 0.00 268650950 61214
- Min 0.00 6.87 0.00 51 0
- Max 14.00 1358.10 0.00 3449259151 7007506
-
- cdsdb/customer_profile_history histograms
- Percentile SSTables Write Latency Read Latency Partition Size Cell Count
- (micros) (micros) (bytes)
- 50% 0.00 73.46 0.00 223875792 61214
- 75% 0.00 88.15 0.00 668489532 182785
- 95% 0.00 152.32 0.00 1996099046 654949
- 98% 0.00 785.94 0.00 3449259151 1358102
- 99% 0.00 943.13 0.00 3449259151 1358102
- Min 0.00 24.60 0.00 5723 4
- Max 0.00 5839.59 0.00 5960319812 1955666
Could you please suggest a way forward to mitigate this issue?
Based on the cfhistograms output posted, the partitions are enormous.
95% percentile of raw_data table has partition size of 107MB and max of 3.44GB. 95% percentile of customer_profile_history has partition size of 1.99GB and max of 5.96GB.
This clearly relates to the problem you notice every half-hour as these huge partitions are written to the sstable. The data-model has to change and based on the partition size above its better to have a partition interval as "minute" instead of "hour". So a 2GB partition would reduce to 33MB partition.
Recommended partition size is to keep it as close to 100MB maximum. Though theoretically we can store more than 100MB, the performance is going to suffer. Remember every read of that partition is over 100MB of data through the wire. In your case, its over 2GB and hence all the performance implications along with it
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。