当前位置:   article > 正文

Apache celeborn 安装及使用教程

Apache celeborn 安装及使用教程

1.下载安装包

https://celeborn.apache.org/download/ 

测0.4.0时出现https://github.com/apache/incubator-celeborn/issues/835

2.解压

tar -xzvf apache-celeborn-0.3.2-incubating-bin.tgz

3.修改配置文件

  1. cp celeborn-env.sh.template  celeborn-env.sh
  2. cp log4j2.xml.template  log4j2.xml
  3. cp celeborn-defaults.conf.template  cp celeborn-defaults.conf

3.1修改celeborn-env.sh

  1. CELEBORN_MASTER_MEMORY=2g
  2. CELEBORN_WORKER_MEMORY=2g
  3. CELEBORN_WORKER_OFFHEAP_MEMORY=4g

3.2 修改celeborn-defaults.conf

  1. # used by client and worker to connect to master
  2. celeborn.master.endpoints 10.67.78.xx:9097
  3. # used by master to bootstrap
  4. celeborn.master.host 10.67.78.xx
  5. celeborn.master.port 9097
  6. celeborn.metrics.enabled true
  7. celeborn.worker.flusher.buffer.size 256k
  8. # If Celeborn workers have local disks and HDFS. Following configs should be added.
  9. # If Celeborn workers have local disks, use following config.
  10. # Disk type is HDD by defaut.
  11. #celeborn.worker.storage.dirs /mnt/disk1:disktype=SSD,/mnt/disk2:disktype=SSD
  12. # If Celeborn workers don't have local disks. You can use HDFS.
  13. # Do not set `celeborn.worker.storage.dirs` and use following configs.
  14. celeborn.storage.activeTypes HDFS
  15. celeborn.worker.sortPartition.threads 64
  16. celeborn.worker.commitFiles.timeout 240s
  17. celeborn.worker.commitFiles.threads 128
  18. celeborn.master.slot.assign.policy roundrobin
  19. celeborn.rpc.askTimeout 240s
  20. celeborn.worker.flusher.hdfs.buffer.size 4m
  21. celeborn.storage.hdfs.dir hdfs://10.67.78.xx:8020/celeborn
  22. celeborn.worker.replicate.fastFail.duration 240s
  23. # If your hosts have disk raid or use lvm, set celeborn.worker.monitor.disk.enabled to false
  24. celeborn.worker.monitor.disk.enabled false

4.复制到其他节点

  1. scp -r /root/apache-celeborn-0.3.2-incubating-bin 10.67.78.xx1:/root/
  2. scp -r /root/apache-celeborn-0.3.2-incubating-bin 10.67.78.xx2:/root/

因为在配置文件中已经配置了master 所以启动matster和worker即可。

5.启动master和worker

  1. cd $CELEBORN_HOME
  2. ./sbin/start-master.sh
  3. ./sbin/start-worker.sh celeborn://<Master IP>:<Master Port>

 之后在master的日志中看woker是否注册上

 

6.在 spark客户端使用

复制 $CELEBORN_HOME/spark/*.jar   到   $SPARK_HOME/jars/

修改spark-defaults.conf

  1. # Shuffle manager class name changed in 0.3.0:
  2. # before 0.3.0: org.apache.spark.shuffle.celeborn.RssShuffleManager
  3. # since 0.3.0: org.apache.spark.shuffle.celeborn.SparkShuffleManager
  4. spark.shuffle.manager org.apache.spark.shuffle.celeborn.SparkShuffleManager
  5. # must use kryo serializer because java serializer do not support relocation
  6. spark.serializer org.apache.spark.serializer.KryoSerializer
  7. # celeborn master
  8. spark.celeborn.master.endpoints clb-1:9097,clb-2:9097,clb-3:9097
  9. # This is not necessary if your Spark external shuffle service is Spark 3.1 or newer
  10. spark.shuffle.service.enabled false
  11. # options: hash, sort
  12. # Hash shuffle writer use (partition count) * (celeborn.push.buffer.max.size) * (spark.executor.cores) memory.
  13. # Sort shuffle writer uses less memory than hash shuffle writer, if your shuffle partition count is large, try to use sort hash writer.
  14. spark.celeborn.client.spark.shuffle.writer hash
  15. # We recommend setting spark.celeborn.client.push.replicate.enabled to true to enable server-side data replication
  16. # If you have only one worker, this setting must be false
  17. # If your Celeborn is using HDFS, it's recommended to set this setting to false
  18. spark.celeborn.client.push.replicate.enabled true
  19. # Support for Spark AQE only tested under Spark 3
  20. # we recommend setting localShuffleReader to false to get better performance of Celeborn
  21. spark.sql.adaptive.localShuffleReader.enabled false
  22. # If Celeborn is using HDFS
  23. spark.celeborn.storage.hdfs.dir hdfs://<namenode>/celeborn
  24. # we recommend enabling aqe support to gain better performance
  25. spark.sql.adaptive.enabled true
  26. spark.sql.adaptive.skewJoin.enabled true
  27. # Support Spark Dynamic Resource Allocation
  28. # Required Spark version >= 3.5.0 注意spark版本是否满足
  29. spark.shuffle.sort.io.plugin.class org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO
  30. # Required Spark version >= 3.4.0, highly recommended to disable 注意spark版本是否满足
  31. spark.dynamicAllocation.shuffleTracking.enabled false

7.启动spark-shell

  1. ./bin/spark-shell
  2. spark.sparkContext.parallelize(1 to 1000, 1000).flatMap(_ => (1 to 100).iterator.map(num => num)).repartition(10).count

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/AllinToyou/article/detail/143315
推荐阅读
相关标签
  

闽ICP备14008679号