赞
踩
CDH中Spark默认版本2.4.0,我们对Hive升级到3.1.3版本,由于并未找到对应的 spark-hive 包,于是尝试使用Spark-3.3.1。
spark3.3.1 for CDH6.3.2 包下载链接
cd /opt/cloudera/parcels/CDH/lib
cp -r spark/ spark240.hive211.bak
cd /opt/software/spark/
tar -zxvf spark-3.3.1-bin-3.0.0-cdh6.3.2.tgz
cp -r spark-3.3.1-bin-3.0.0-cdh6.3.2 /opt/cloudera/parcels/CDH/lib/spark3
cp /etc/hive/conf/hive-site.xml /opt/cloudera/parcels/CDH/lib/spark3/conf/
cp /etc/spark/conf/spark-env.sh /opt/cloudera/parcels/CDH/lib/spark3/conf/
cp /etc/spark/conf/classpath.txt /opt/cloudera/parcels/CDH/lib/spark3/conf/
cp /etc/spark/conf/spark-defaults.conf /opt/cloudera/parcels/CDH/lib/spark3/conf/
cp -r /etc/spark/conf/yarn-conf/yarn-site.xml /opt/cloudera/parcels/CDH/lib/spark3/conf/
vim spark-env.sh
...
SELF="$(cd $(dirname $BASH_SOURCE) && pwd)"
if [ -z "$SPARK_CONF_DIR" ]; then
export SPARK_CONF_DIR="$SELF"
fi
#export SPARK_HOME=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark
export SPARK_HOME=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark3
SPARK_PYTHON_PATH=""
if [ -n "$SPARK_PYTHON_PATH" ]; then
export PYTHONPATH="$PYTHONPATH:$SPARK_PYTHON_PATH"
fi
...
vim spark-defaults.conf
修改spark.yarn.jars路径
注释lineage相关的(暂时)
兼容老的通讯协议
spark.authenticate=false spark.driver.log.dfsDir=/user/spark/driverLogs spark.driver.log.persistToDfs.enabled=true spark.dynamicAllocation.enabled=true spark.dynamicAllocation.executorIdleTimeout=60 spark.dynamicAllocation.minExecutors=0 spark.dynamicAllocation.schedulerBacklogTimeout=1 spark.eventLog.enabled=true spark.io.encryption.enabled=false spark.network.crypto.enabled=false spark.serializer=org.apache.spark.serializer.KryoSerializer spark.shuffle.service.enabled=true spark.shuffle.service.port=7337 spark.ui.enabled=true spark.ui.killEnabled=true # spark.lineage.log.dir=/var/log/spark/lineage # spark.lineage.enabled=true spark.master=yarn spark.submit.deployMode=client spark.eventLog.dir=hdfs://master01:8020/user/spark/applicationHistory spark.yarn.historyServer.address=http://master02:18088 spark.yarn.jars=local:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark3/jars/* spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native spark.yarn.config.gatewayPath=/opt/cloudera/parcels spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../.. spark.yarn.historyServer.allowTracking=true spark.yarn.appMasterEnv.MKL_NUM_THREADS=1 spark.executorEnv.MKL_NUM_THREADS=1 spark.yarn.appMasterEnv.OPENBLAS_NUM_THREADS=1 spark.executorEnv.OPENBLAS_NUM_THREADS=1 #spark.extraListeners=com.cloudera.spark.lineage.NavigatorAppListener #spark.sql.queryExecutionListeners=com.cloudera.spark.lineage.NavigatorQueryListener spark.shuffle.useOldFetchProtocol=true #兼容老的通讯协议
vim /opt/cloudera/parcels/CDH/bin/spark-sql
#!/bin/bash
# Reference: http://stackoverflow.com/questions/59895/can-a-bash-script-tell-what-directory-its-stored-in
export HADOOP_CONF_DIR=/etc/hadoop/conf
export YARN_CONF_DIR=/etc/hadoop/conf
SOURCE="${BASH_SOURCE[0]}"
BIN_DIR="$( dirname "$SOURCE" )"
while [ -h "$SOURCE" ]
do
SOURCE="$(readlink "$SOURCE")"
[[ $SOURCE != /* ]] && SOURCE="$BIN_DIR/$SOURCE"
BIN_DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
done
BIN_DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
LIB_DIR=$BIN_DIR/../lib
export HADOOP_HOME=$LIB_DIR/hadoop
. $LIB_DIR/bigtop-utils/bigtop-detect-javahome
exec $LIB_DIR/spark3/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver "$@"
…
完成后使用 alternatives 进行环境变量管控
alternatives --install /usr/bin/spark-sql spark-sql /opt/cloudera/parcels/CDH/bin/spark-sql 1
alternatives --config spark-sql
如果有多个版本,切换为刚刚配置的
参考
spark3.3.1 for CDH6.3.2 打包
CDH6.3.2 升级 Spark3.3.0 版本
Spark错误之 Unknown message type: 10
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。