赞
踩
参考文章
https://issues.apache.org/jira/browse/SPARK-35758https://jishuin.proginn.com/p/763bfbd67cf6
由于spark3不再直接支持hadoop2.6以下的低版本,而我们生产环境仍然使用的 CDH 5.16.2(hadoop-2.6.0-cdh5.16.2)的内核版本较低,需要自行编译spark3。
已经使用本文方法成功编译{saprk3.0.3,spark3.1.1,spark3.1.2,spark3.1.3,spark3.2.1},因决定使用次新版本作为生产环境的spark版本,故行文以spark3.1.3为例
提前准备好java,scala,maven环境
- java -version #1.8.0_311
- mvn -v #Apache Maven 3.6.3
- scala -version #2.12.10
增加一个环境变量(/etc/profile),让Maven在编译时可以使用更多的内存:
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
- #创建目录
- sudo mkdir /bi_bigdata/user_shell/spark
- #下载安装包到目录
- wget https://archive.apache.org/dist/spark/spark-3.1.3/spark-3.1.3.tgz -P /bi_bigdata/user_shell/spark
-
- #解压到指定文件夹
- tar -zxvf /bi_bigdata/user_shell/spark/spark-3.1.3.tgz -C /bi_bigdata/user_shell/spark/
- cd /bi_bigdata/user_shell/spark/spark-3.1.3
主要针对hadoop版本低于2.6.4 的修改,主要根据报错进行调整的
vim resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
- /*注释
- sparkConf.get(ROLLED_LOG_INCLUDE_PATTERN).foreach { includePattern =>
- try {
- val logAggregationContext = Records.newRecord(classOf[LogAggregationContext])
- logAggregationContext.setRolledLogsIncludePattern(includePattern)
- sparkConf.get(ROLLED_LOG_EXCLUDE_PATTERN).foreach { excludePattern =>
- logAggregationContext.setRolledLogsExcludePattern(excludePattern)
- }
- appContext.setLogAggregationContext(logAggregationContext)
- } catch {
- case NonFatal(e) =>
- logWarning(s"Ignoring ${ROLLED_LOG_INCLUDE_PATTERN.key} because the version of YARN " +
- "does not support it", e)
- }
- }
- appContext.setUnmanagedAM(isClientUnmanagedAMEnabled)
- sparkConf.get(APPLICATION_PRIORITY).foreach { appPriority =>
- appContext.setPriority(Priority.newInstance(appPriority))
- }
- appContext
- }
- */
-
- /*替换*/
- sparkConf.get(ROLLED_LOG_INCLUDE_PATTERN).foreach { includePattern =>
- try {
- val logAggregationContext = Records.newRecord(classOf[LogAggregationContext])
-
- // These two methods were added in Hadoop 2.6.4, so we still need to use reflection to
- // avoid compile error when building against Hadoop 2.6.0 ~ 2.6.3.
- val setRolledLogsIncludePatternMethod =
- logAggregationContext.getClass.getMethod("setRolledLogsIncludePattern", classOf[String])
- setRolledLogsIncludePatternMethod.invoke(logAggregationContext, includePattern)
-
- sparkConf.get(ROLLED_LOG_EXCLUDE_PATTERN).foreach { excludePattern =>
- val setRolledLogsExcludePatternMethod =
- logAggregationContext.getClass.getMethod("setRolledLogsExcludePattern", classOf[String])
- setRolledLogsExcludePatternMethod.invoke(logAggregationContext, excludePattern)
- }
-
- appContext.setLogAggregationContext(logAggregationContext)
- } catch {
- case NonFatal(e) =>
- logWarning(s"Ignoring ${ROLLED_LOG_INCLUDE_PATTERN.key} because the version of YARN " +
- "does not support it", e)
- }
-
- }
- appContext
- }
vim core/src/main/scala/org/apache/spark/util/Utils.scala
- //注释掉
- //import org.apache.hadoop.util.{RunJar, StringUtils}
- //替换为
- import org.apache.hadoop.util.{RunJar}
-
- def unpack(source: File, dest: File): Unit = {
- // StringUtils 在hadoop2.6.0中引用不到,所以取消此import,然后修改为相似的功能
- // val lowerSrc = StringUtils.toLowerCase(source.getName)
- if (source.getName == null) {
- throw new NullPointerException
- }
- val lowerSrc = source.getName.toLowerCase()
- if (lowerSrc.endsWith(".jar")) {
- RunJar.unJar(source, dest, RunJar.MATCH_ANY)
- } else if (lowerSrc.endsWith(".zip")) {
- FileUtil.unZip(source, dest)
- } else if (
- lowerSrc.endsWith(".tar.gz") || lowerSrc.endsWith(".tgz") || lowerSrc.endsWith(".tar")) {
- FileUtil.unTar(source, dest)
- } else {
- logWarning(s"Cannot unpack $source, just copying it to $dest.")
- copyRecursive(source, dest)
- }
- }
vim core/src/main/scala/org/apache/spark/ui/HttpSecurityFilter.scala
- private val parameterMap: Map[String, Array[String]] = {
- super.getParameterMap().asScala.map { case (name, values) =>
- //Unapplied methods are only converted to functions when a function type is expected.
- //You can make this conversion explicit by writing `stripXSS _` or `stripXSS(_)` instead of `stripXSS`.
- // stripXSS(name) -> values.map(stripXSS)
- stripXSS(name) -> values.map(stripXSS(_))
- }.toMap
- }
vim pom.xml
- <repository>
- <!--
- This is used as a fallback when the first try fails.
- -->
- <id>central</id>
- <name>Maven Repository</name>
- <url>https://mvnrepository.com/repos/central</url>
- <!--<url>https://repo.maven.apache.org/maven2</url>-->
- <releases>
- <enabled>true</enabled>
- </releases>
- <snapshots>
- <enabled>false</enabled>
- </snapshots>
- </repository>
- <!-- 添加CDH仓库-->
- <repository>
- <id>cloudera</id>
- <name>cloudera Repository</name>
- <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
- </repository>
- <!-- 添加CDH plugin仓库-->
- <pluginRepository>
- <id>cloudera</id>
- <name>Cloudera Repositories</name>
- <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
- </pluginRepository>
在spark的解压目录中进行编译可以分发的二进制压缩包
根据官方的提示,在编译hadoop2.x版本时,指定-Phadoop-2.7
./dev/make-distribution.sh --name 2.6.0-cdh5.16.2 --pip --tgz -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes -Phadoop-2.7 -Dhadoop.version=2.6.0-cdh5.16.2 -Dscala.version=2.12.10 -X
编译完成后,可以在当前目录看到对应可分发的tgz安装包,接下来就可以部署到生产环境了。
- #查看生成的tgz安装包
- ll -h |grep tgz |grep spark
- #spark-3.1.3-bin-2.6.0-cdh5.16.2.tgz
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。