当前位置:   article > 正文

spark作业执行失败分析

failed the maximum allowable number of times

spark作业执行失败,重新执行的时候,查看sparkui,发现存在大量失败的task,执行结束后,通过yarn-ui看到报错日志如下:

  1. User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure:
  2. ShuffleMapStage 1 (javaRDD at SumDeliveryIndexFactory.java:628) has failed the maximum allowable
  3. number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException:
  4. Failed to connect to xxxx/10.136.22.22:34192

由报错可见,Failed to connect to xxxx/10.136.22.22:34192,连接10.136.22.22失败。进入10.136.22.22主机,查看nodemanager日志,于是看到了以下的错误信息:running beyond physical memory limits.Killing container。可见,由于使用的物理内存超出了container的内存大小,被强制kill了。

解决办法:spark-submit 添加参数,调大spark.yarn.executor.memoryOverhead=4G

错误日志:

  1. 2017-11-14 11:33:07,273 INFO
  2. org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory
  3. usage of ProcessTree 236569 for container-id container_e31_1510205192678_147416_02_000024:
  4. 40.7 GB of 40 GB physical memory used; 41.9 GB of 84 GB virtual memory used
  5. 2017-11-14 11:33:07,273 WARN
  6. org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process
  7. tree for container: container_e31_1510205192678_147416_02_000024 has processes older than 1
  8. iteration runningover the configured limit. Limit=42949672960, current usage = 43653300224
  9. 2017-11-14 11:33:07,274 WARN
  10. org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  11. Container [pid=236569,containerID=container_e31_1510205192678_147416_02_000024] is running beyond
  12. physical memory limits. Current usage: 40.7 GB of 40 GB physical memory used; 41.9 GB of 84 GB
  13. virtual memory used. Killing container.

参考博客:

  1. 1、Spark Executor在YARN上的内存分配
  2. http://blog.csdn.net/hammertank/article/details/48346285
  3. 2、yarn is running beyond physical memory limits 问题解决
  4. http://blog.csdn.net/oaimm/article/details/25298691
  5. 3、Yarn简单介绍及内存配置
  6. http://blog.chinaunix.net/uid-28311809-id-4383551.html

 

转载于:https://my.oschina.net/sniperLi/blog/1574280

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小蓝xlanll/article/detail/474609
推荐阅读
相关标签
  

闽ICP备14008679号