赞
踩
本文主要关于Impala-cdh5-2.12.0_5.16.1 源码编译与安装~
公司需要将hdfs迁移到腾讯云的chdfs。chdfs实现了hdfs的协议,可以按照数据实际内存来付费,会节省不少钱。然而在测试过程中发现impala并不兼容chdfs会上报不支持ofs。需要改动内核源码~。根据腾讯云同事提供的部分源码改动方案改完后,需要自己打包编译。下面大概记录下编译遇到的一些坑
E0312 14:35:09.345242 358875 impala-server.cc:285] Currently configured default filesystem: CHDFSHadoopFileSystemAdapter. fs.defaultFS (ofs://f4mjxuquwlp-p2WP.chdfs.ap-shanghai.myqcloud.com/) is not supported
https://codeload.github.com/cloudera/Impala/zip/cdh5-2.12.0_5.16.1
yum -y install git ant libevent-devel automake libtool flex bison gcc-c++ openssl-devel make cmake
yum -y install doxygen.x86_64 glib-devel python-devel bzip2-devel svn libevent-devel krb5-workstation
yum -y install openldap-devel db4-devel python-setuptools python-pip cyrus-sasl* postgresql postgresql-server ant-nodeps lzo-devel lzop
1、设置环境变量 IMPALA_HOME
2、设置 /etc/default/bigtop-utils 中的JAVA_HOME
3、编译命令 ./buildall.sh -notests -so
1、下载python包过慢
Downloading Python dependencies
~/Impala-cdh5-2.12.0_5.16.1/infra/python/deps ~/wangkai/Impala-cdh5-2.12.0_5.16.1
Getting package info from https://pypi.python.org/simple/allpairs/
File with matching digest already exists, skipping AllPairs-2.0.1.tar.gz
Getting package info from https://pypi.python.org/simple/boto3/
File with matching digest already exists, skipping boto3-1.2.3.tar.gz
Getting package info from https://pypi.python.org/simple/simplejson/
File with matching digest already exists, skipping simplejson-3.3.0.tar.gz
Getting package info from https://pypi.python.org/simple/botocore/
File with matching digest already exists, skipping botocore-1.3.30.tar.gz
Getting package info from https://pypi.python.org/simple/python_dateutil/
File with matching digest already exists, skipping python-dateutil-2.5.2.tar.gz
Getting package info from https://pypi.python.org/simple/six/
日志可以看出会去下载python包,如果已经存在就跳过,由于服务器下载比较慢,有些包可以手动下载完丢进去。目录如下:
$IMPALA_HOME/infra/python/deps/
下载部分代码在 ./buildall.sh
bootstrap_dependencies() { # Populate necessary thirdparty components unless it's set to be skipped. if [[ "${SKIP_TOOLCHAIN_BOOTSTRAP}" = true ]]; then echo "SKIP_TOOLCHAIN_BOOTSTRAP is true, skipping download of Python dependencies." echo "SKIP_TOOLCHAIN_BOOTSTRAP is true, skipping toolchain bootstrap." else echo "Downloading Python dependencies" # Download all the Python dependencies we need before doing anything # of substance. Does not re-download anything that is already present. // 下载python包 if ! "$IMPALA_HOME/infra/python/deps/download_requirements"; then echo "Warning: Unable to download Python requirements." echo "Warning: bootstrap_virtualenv or other Python-based tooling may fail." else echo "Finished downloading Python dependencies" fi echo "Downloading and extracting toolchain dependencies." "$IMPALA_HOME/bin/bootstrap_toolchain.py" echo "Toolchain bootstrap complete." fi }
下载过慢的话可以手动下载丢入上面的包。还有后面编译的时候可以把上面下载的那段代码注释掉,跳过校验。校验也是比较慢的,~~~~~
2、下载c++的包过慢
Downloading and extracting toolchain dependencies.
impala编译过程在python包下载后回去下载c++的包。同样下载会很慢,可以自行下载后丢入。
目录为:$IMPALA_HOME//toolchain/
3、jar包下载失败
我在编译 fe这个项目的时候碰到了失败。主要原因是pom文件中的jar包下载不到,有些包已经不存在配置的仓库源里了。可能版本比较老了…
错误如下,有蛮多这种包找不到的,还有下载慢的。下载慢的自己下载放到本地仓库~~
[WARNING] The POM for net.sourceforge.czt.dev:cup-maven-plugin:jar:1.6-cdh is missing, no dependency information available
经过不断尝试发现需要修改 impala-parent 工程下的pom.xml。手动添加一些仓库地址
<?xml version="1.0" encoding="UTF-8"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.apache.impala</groupId> <artifactId>impala-parent</artifactId> <version>0.1-SNAPSHOT</version> <packaging>pom</packaging> <name>Apache Impala Parent POM</name> <properties> <surefire.reports.dir>${env.IMPALA_LOGS_DIR}/fe_tests</surefire.reports.dir> <jacoco.skip>true</jacoco.skip> <jacoco.data.file>${env.IMPALA_FE_TEST_COVERAGE_DIR}/jacoco.exec</jacoco.data.file> <jacoco.report.dir>${env.IMPALA_FE_TEST_COVERAGE_DIR}</jacoco.report.dir> <test.hive.testdata>${project.basedir}/../testdata/target/AllTypes.txt</test.hive.testdata> <backend.library.path>${env.IMPALA_HOME}/be/build/debug/service:${env.IMPALA_HOME}/be/build/release/service</backend.library.path> <beeswax_port>21000</beeswax_port> <impalad>localhost</impalad> <testExecutionMode>reduced</testExecutionMode> <hadoop.version>${env.IMPALA_HADOOP_VERSION}</hadoop.version> <hive.version>${env.IMPALA_HIVE_VERSION}</hive.version> <hive.major.version>${env.IMPALA_HIVE_MAJOR_VERSION}</hive.major.version> <sentry.version>${env.IMPALA_SENTRY_VERSION}</sentry.version> <hbase.version>${env.IMPALA_HBASE_VERSION}</hbase.version> <parquet.version>${env.IMPALA_PARQUET_VERSION}</parquet.version> <kite.version>${env.IMPALA_KITE_VERSION}</kite.version> <thrift.version>${env.IMPALA_THRIFT_JAVA_VERSION}</thrift.version> <impala.extdatasrc.api.version>1.0-SNAPSHOT</impala.extdatasrc.api.version> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <kudu.version>${env.KUDU_JAVA_VERSION}</kudu.version> <commons-io.version>2.6</commons-io.version> <slf4j.version>1.7.25</slf4j.version> <junit.version>4.12</junit.version> <!-- Beware compatibility requirements with Thrift and KMS; see IMPALA-4210. --> <httpcomponents.core.version>4.2.5</httpcomponents.core.version> <yarn-extras.version>${project.version}</yarn-extras.version> <eclipse.output.directory>eclipse-classes</eclipse.output.directory> <guava.version>11.0.2</guava.version> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <repositories> <repository> <id>apache.snapshots</id> <name>Apache Development Snapshot Repository</name> <url>https://repository.apache.org/content/repositories/snapshots/</url> <releases> <enabled>false</enabled> </releases> <snapshots> <enabled>true</enabled> </snapshots> </repository> <repository> <id>cdh.rcs.releases.repo</id> <url>https://repository.cloudera.com/content/groups/cdh-releases-rcs</url> <name>CDH Releases Repository</name> <snapshots> <enabled>true</enabled> </snapshots> </repository> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository> <repository> <id>alimavenspring</id> <url>https://maven.aliyun.com/repository/spring/</url> </repository> <repository> <id>tengxun</id> <url>https://search.maven.org/artifact/</url> </repository> <repository> <id>alimaven</id> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> </repository> <repository> <id>clouderanew</id> <url>https://repository.cloudera.com/artifactory/repo/</url> </repository> <repository> <id>cdh.releases.repo</id> <url>https://repository.cloudera.com/content/repositories/releases</url> <name>CDH Releases Repository</name> <snapshots> <enabled>false</enabled> </snapshots> </repository> <repository> <id>cdh.snapshots.repo</id> <url>https://repository.cloudera.com/content/repositories/snapshots</url> <name>CDH Snapshots Repository</name> <snapshots> <enabled>true</enabled> </snapshots> </repository> <repository> <id>cloudera.thirdparty.repo</id> <url>https://repository.cloudera.com/content/repositories/third-party</url> <name>Cloudera Third Party Repository</name> <snapshots> <enabled>false</enabled> </snapshots> </repository> <!-- This is needed for java-cup. TODO add the plugin to our maven repo --> <repository> <id>sonatype-nexus-snapshots</id> <name>Sonatype Nexus Snapshots</name> <url>https://oss.sonatype.org/content/repositories/snapshots</url> <releases> <enabled>false</enabled> </releases> <snapshots> <enabled>true</enabled> </snapshots> </repository> </repositories> <pluginRepositories> <pluginRepository> <id>clouderanew</id> <url>https://repository.cloudera.com/artifactory/repo/</url> </pluginRepository> <pluginRepository> <id>alimaven</id> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> </pluginRepository> <pluginRepository> <id>alimavenspring</id> <url>https://maven.aliyun.com/repository/spring/</url> </pluginRepository> <pluginRepository> <id>tengxun</id> <url>https://search.maven.org/artifact/</url> </pluginRepository> <pluginRepository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </pluginRepository> <pluginRepository> <id>cloudera.thirdparty.repo</id> <url>https://repository.cloudera.com/content/repositories/third-party</url> <name>Cloudera Third Party Repository</name> <snapshots> <enabled>false</enabled> </snapshots> </pluginRepository> <pluginRepository> <id>cloudera.snapshot.repo</id> <url>https://repository.cloudera.com/content/repositories/snapshots</url> <name>Cloudera Snapshot Repository</name> <snapshots> <enabled>true</enabled> </snapshots> </pluginRepository> <!-- This is needed for the cup maven plugin. TODO add the plugin to our maven repo --> <pluginRepository> <id>sonatype-nexus-snapshots</id> <name>Sonatype Nexus Snapshots</name> <url>https://oss.sonatype.org/content/repositories/snapshots</url> <releases> <enabled>false</enabled> </releases> <snapshots> <enabled>true</enabled> </snapshots> </pluginRepository> </pluginRepositories> </project>
END:经历九九八十一难 最终编译成功了~~~
一、修改三个配置文件
1、 bin/start-catalogd.sh 启动catalog的文件会去连接metastore
需要增加
source $IMPALA_HOME/bin/impala-config.sh
CATALOGD_ARGS=" -log_dir=/var/log/impala "
2、bin/start-statestored.sh 跟踪集群中的Impalad的健康状态及位置信息
需要增加
source $IMPALA_HOME/bin/impala-config.sh
STATESTORED_ARGS="-log_dir=/var/log/impala -state_store_port=24000"
3、bin/set-classpath.sh
4、bin/start-impalad.sh
二、启动
2972 2021-03-14 17:00:30 nohup ./start-statestored.sh &
2973 2021-03-14 17:00:34 nohup ./start-catalogd.sh &
2974 2021-03-14 17:00:38 nohup ./start-impalad.sh & 计算节点只要启动这个即可
三、连接测试
1、编译impala需要一定的耐心,很多包下载很慢很慢。。
2、对maven配置的不熟悉导致走了很多弯路~~~~~准备好好学习下maven配置管理
补充一下mvn下载日志可以去 $IMPALA_HOME/logs 下查看。可以看到具体的下载进度~
https://github.com/TencentEMapReduce/impala/commit/14cf694293a60174fd3c064f76ee7708d98fc2c7
https://blog.csdn.net/qqqq0199181/article/details/98515118
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。