当前位置:   article > 正文

CDH6.3.2之升级spark-3.3.1_cdh6 升级spark版本

cdh6 升级spark版本

一、背景

CDH中Spark默认版本2.4.0,我们对Hive升级到3.1.3版本,由于并未找到对应的 spark-hive 包,于是尝试使用Spark-3.3.1。

spark3.3.1 for CDH6.3.2 包下载链接

二、安装 Spark3-cdh

2.1 备份

cd /opt/cloudera/parcels/CDH/lib
cp -r spark/ spark240.hive211.bak
  • 1
  • 2

2.2 安装

解压

cd /opt/software/spark/
tar -zxvf spark-3.3.1-bin-3.0.0-cdh6.3.2.tgz
  • 1
  • 2

拷贝

cp -r spark-3.3.1-bin-3.0.0-cdh6.3.2 /opt/cloudera/parcels/CDH/lib/spark3
  • 1

2.3 配置文件

拷贝 hive-site.xml

cp /etc/hive/conf/hive-site.xml /opt/cloudera/parcels/CDH/lib/spark3/conf/
  • 1

拷贝spark配置文件

拷贝 spark-env.sh

cp /etc/spark/conf/spark-env.sh  /opt/cloudera/parcels/CDH/lib/spark3/conf/
  • 1

拷贝 classpath.txt

cp /etc/spark/conf/classpath.txt  /opt/cloudera/parcels/CDH/lib/spark3/conf/
  • 1

拷贝 spark-defaults.conf

cp /etc/spark/conf/spark-defaults.conf /opt/cloudera/parcels/CDH/lib/spark3/conf/
  • 1

拷贝 yarn-site.xml

cp -r /etc/spark/conf/yarn-conf/yarn-site.xml /opt/cloudera/parcels/CDH/lib/spark3/conf/
  • 1

2.4 修改配置

vim spark-env.sh

...
SELF="$(cd $(dirname $BASH_SOURCE) && pwd)"
if [ -z "$SPARK_CONF_DIR" ]; then
  export SPARK_CONF_DIR="$SELF"
fi
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

更改一下 SPARK_HOME

#export SPARK_HOME=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark
export SPARK_HOME=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark3
​
SPARK_PYTHON_PATH=""
if [ -n "$SPARK_PYTHON_PATH" ]; then
  export PYTHONPATH="$PYTHONPATH:$SPARK_PYTHON_PATH"
fi
...

vim spark-defaults.conf
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

修改spark.yarn.jars路径

注释lineage相关的(暂时)

兼容老的通讯协议

spark.authenticate=false
spark.driver.log.dfsDir=/user/spark/driverLogs
spark.driver.log.persistToDfs.enabled=true
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.eventLog.enabled=true
spark.io.encryption.enabled=false
spark.network.crypto.enabled=false
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.shuffle.service.port=7337
spark.ui.enabled=true
spark.ui.killEnabled=true


# 
spark.lineage.log.dir=/var/log/spark/lineage
# 
spark.lineage.enabled=true

spark.master=yarn
spark.submit.deployMode=client
spark.eventLog.dir=hdfs://master01:8020/user/spark/applicationHistory
spark.yarn.historyServer.address=http://master02:18088
spark.yarn.jars=local:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark3/jars/*
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
spark.yarn.historyServer.allowTracking=true
spark.yarn.appMasterEnv.MKL_NUM_THREADS=1
spark.executorEnv.MKL_NUM_THREADS=1
spark.yarn.appMasterEnv.OPENBLAS_NUM_THREADS=1
spark.executorEnv.OPENBLAS_NUM_THREADS=1


#spark.extraListeners=com.cloudera.spark.lineage.NavigatorAppListener
#spark.sql.queryExecutionListeners=com.cloudera.spark.lineage.NavigatorQueryListener
spark.shuffle.useOldFetchProtocol=true  #兼容老的通讯协议
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42

三、编辑一个spark-sql

vim /opt/cloudera/parcels/CDH/bin/spark-sql
  • 1
#!/bin/bash  
# Reference: http://stackoverflow.com/questions/59895/can-a-bash-script-tell-what-directory-its-stored-in  
export HADOOP_CONF_DIR=/etc/hadoop/conf
export YARN_CONF_DIR=/etc/hadoop/conf
SOURCE="${BASH_SOURCE[0]}"  
BIN_DIR="$( dirname "$SOURCE" )"  
while [ -h "$SOURCE" ]  
do  
 SOURCE="$(readlink "$SOURCE")"  
 [[ $SOURCE != /* ]] && SOURCE="$BIN_DIR/$SOURCE"  
 BIN_DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"  
done  
BIN_DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"  
LIB_DIR=$BIN_DIR/../lib  
export HADOOP_HOME=$LIB_DIR/hadoop  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

Autodetect JAVA_HOME if not defined

. $LIB_DIR/bigtop-utils/bigtop-detect-javahome  
​
exec $LIB_DIR/spark3/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver "$@"
  • 1
  • 2
  • 3

完成后使用 alternatives 进行环境变量管控

alternatives --install /usr/bin/spark-sql spark-sql /opt/cloudera/parcels/CDH/bin/spark-sql 1
alternatives --config spark-sql
  • 1
  • 2

如果有多个版本,切换为刚刚配置的

参考

spark3.3.1 for CDH6.3.2 打包

CDH6.3.2 升级 Spark3.3.0 版本

Spark错误之 Unknown message type: 10

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/知新_RL/article/detail/625450
推荐阅读
相关标签
  

闽ICP备14008679号