当前位置:   article > 正文

NoSpamLogger.java Maximum memory usage reached Cassandra_nospamlogger.java:91 log maximum memory usage reac

nospamlogger.java:91 log maximum memory usage reached (134217728), cannot al

 

问题:

I have a 5 node cluster of Cassandra, with ~650 GB of data on each node involving a replication factor of 3. I have recently started seeing the following error in /var/log/cassandra/system.log.

INFO [ReadStage-5] 2017-10-17 17:06:07,887 NoSpamLogger.java:91 - Maximum memory usage reached (1.000GiB), cannot allocate chunk of 1.000MiB

I have attempted to increase the file_cache_size_in_mb, but sooner rather than later this same error catches up. I have tried to go as high as 2GB for this parameter, but to no avail.

When the error happens, the CPU utilisation soars and the read latencies are terribly erratic. I see this surge show up approximated every 1/2 hour. Note the timings in the list below.

INFO [ReadStage-5] 2017-10-17 17:06:07,887 NoSpamLogger.java:91 - Maximum memory usage reached (1.000GiB), cannot allocate chunk of 1.000MiB INFO [ReadStage-36] 2017-10-17 17:36:09,807 NoSpamLogger.java:91 - Maximum memory usage reached (1.000GiB), cannot allocate chunk of 1.000MiB INFO [ReadStage-15] 2017-10-17 18:05:56,003 NoSpamLogger.java:91 - Maximum memory usage reached (2.000GiB), cannot allocate chunk of 1.000MiB INFO [ReadStage-28] 2017-10-17 18:36:01,177 NoSpamLogger.java:91 - Maximum memory usage reached (2.000GiB), cannot allocate chunk of 1.000MiB

Two of the tables that I have are partitioned by hour, and the partitions are large. Ex. Here are their outputs from nodetool table stats

  1. Read Count: 4693453
  2. Read Latency: 0.36752741680805157 ms.
  3. Write Count: 561026
  4. Write Latency: 0.03742310516803143 ms.
  5. Pending Flushes: 0
  6. Table: raw_data
  7. SSTable count: 55
  8. Space used (live): 594395754275
  9. Space used (total): 594395754275
  10. Space used by snapshots (total): 0
  11. Off heap memory used (total): 360753372
  12. SSTable Compression Ratio: 0.20022598072758296
  13. Number of keys (estimate): 45163
  14. Memtable cell count: 90441
  15. Memtable data size: 685647925
  16. Memtable off heap memory used: 0
  17. Memtable switch count: 1
  18. Local read count: 0
  19. Local read latency: NaN ms
  20. Local write count: 126710
  21. Local write latency: 0.096 ms
  22. Pending flushes: 0
  23. Percent repaired: 52.99
  24. Bloom filter false positives: 167775
  25. Bloom filter false ratio: 0.16152
  26. Bloom filter space used: 264448
  27. Bloom filter off heap memory used: 264008
  28. Index summary off heap memory used: 31060
  29. Compression metadata off heap memory used: 360458304
  30. Compacted partition minimum bytes: 51
  31. **Compacted partition maximum bytes: 3449259151**
  32. Compacted partition mean bytes: 16642499
  33. Average live cells per slice (last five minutes): 1.0005435888450147
  34. Maximum live cells per slice (last five minutes): 42
  35. Average tombstones per slice (last five minutes): 1.0
  36. Maximum tombstones per slice (last five minutes): 1
  37. Dropped Mutations: 0
  38. Read Count: 4712814
  39. Read Latency: 0.3356051004771247 ms.
  40. Write Count: 643718
  41. Write Latency: 0.04168356951335834 ms.
  42. Pending Flushes: 0
  43. Table: customer_profile_history
  44. SSTable count: 20
  45. Space used (live): 9423364484
  46. Space used (total): 9423364484
  47. Space used by snapshots (total): 0
  48. Off heap memory used (total): 6560008
  49. SSTable Compression Ratio: 0.1744084338623116
  50. Number of keys (estimate): 69
  51. Memtable cell count: 35242
  52. Memtable data size: 789595302
  53. Memtable off heap memory used: 0
  54. Memtable switch count: 1
  55. Local read count: 2307
  56. Local read latency: NaN ms
  57. Local write count: 51772
  58. Local write latency: 0.076 ms
  59. Pending flushes: 0
  60. Percent repaired: 0.0
  61. Bloom filter false positives: 0
  62. Bloom filter false ratio: 0.00000
  63. Bloom filter space used: 384
  64. Bloom filter off heap memory used: 224
  65. Index summary off heap memory used: 400
  66. Compression metadata off heap memory used: 6559384
  67. Compacted partition minimum bytes: 20502
  68. **Compacted partition maximum bytes: 4139110981**
  69. Compacted partition mean bytes: 708736810
  70. Average live cells per slice (last five minutes): NaN
  71. Maximum live cells per slice (last five minutes): 0
  72. Average tombstones per slice (last five minutes): NaN
  73. Maximum tombstones per slice (last five minutes): 0
  74. Dropped Mutations: 0

Here goes:

  1. cdsdb/raw_data histograms
  2. Percentile SSTables Write Latency Read Latency Partition Size Cell Count
  3. (micros) (micros) (bytes)
  4. 50% 0.00 61.21 0.00 1955666 642
  5. 75% 1.00 73.46 0.00 17436917 4768
  6. 95% 3.00 105.78 0.00 107964792 24601
  7. 98% 8.00 219.34 0.00 186563160 42510
  8. 99% 12.00 315.85 0.00 268650950 61214
  9. Min 0.00 6.87 0.00 51 0
  10. Max 14.00 1358.10 0.00 3449259151 7007506
  11. cdsdb/customer_profile_history histograms
  12. Percentile SSTables Write Latency Read Latency Partition Size Cell Count
  13. (micros) (micros) (bytes)
  14. 50% 0.00 73.46 0.00 223875792 61214
  15. 75% 0.00 88.15 0.00 668489532 182785
  16. 95% 0.00 152.32 0.00 1996099046 654949
  17. 98% 0.00 785.94 0.00 3449259151 1358102
  18. 99% 0.00 943.13 0.00 3449259151 1358102
  19. Min 0.00 24.60 0.00 5723 4
  20. Max 0.00 5839.59 0.00 5960319812 1955666

Could you please suggest a way forward to mitigate this issue?

回答1:

Based on the cfhistograms output posted, the partitions are enormous.

95% percentile of raw_data table has partition size of 107MB and max of 3.44GB. 95% percentile of customer_profile_history has partition size of 1.99GB and max of 5.96GB.

This clearly relates to the problem you notice every half-hour as these huge partitions are written to the sstable. The data-model has to change and based on the partition size above its better to have a partition interval as "minute" instead of "hour". So a 2GB partition would reduce to 33MB partition.

Recommended partition size is to keep it as close to 100MB maximum. Though theoretically we can store more than 100MB, the performance is going to suffer. Remember every read of that partition is over 100MB of data through the wire. In your case, its over 2GB and hence all the performance implications along with it

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/不正经/article/detail/545362
推荐阅读
相关标签
  

闽ICP备14008679号