软件环境:centos7 + hadoop2.7.6
1.hive 下载 apache-hive-2.1.0-bin.tar.gz
2.copy apache-hive-2.1.0-bin.tar.gz 到/user/local 目录并解压
tar -zxvf apache-hive-2.1.0-bin.tar.gz
3.安装msyql
授权root 可以远程登陆mysql
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '111111' WITH GRANT OPTION; Query OK, 0 rows affected, 1 warning (0.03 sec) mysql> FLUSH PRIVILEGES; Query OK, 0 rows affected (0.01 sec) mysql>
4.配置hive
1)hive-env.sh
[hadoop@ns1 conf]$ cd /usr/local/apache-hive-2.1.0-bin/conf/ [hadoop@ns1 conf]$ cp hive-env.sh.template hive-env.sh [hadoop@ns1 conf]$ vim hive-env.sh 根据自己的环境添加如下内容 JAVA_HOME=/usr/local/jdk HADOOP_HOME=/usr/local/hadoop HIVE_HOME=/usr/local/apache-hive-2.1.0-bin/ export HIVE_CONF_DIR=${HIVE_HOME}/conf export HIVE_AUX_JARS_PATH=/usr/local/spark-2.2.2-bin-hadoop2.7/jars/spark-hive_2.11-2.2.2.jar export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$HADOOP_HOME/lib:$HIVE_HOME/lib export HADOOP_OPTS="-Dorg.xerial.snappy.tempdir=/tmp -Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib $HADOOP_OPTS"
2)hive-site.xml
cp hive-default.xml.template hive-site.xml vim hive-site.xml 添加如下内容 <?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://ns1:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>111111</value> <description>password to use against metastore database</description> </property> <property> <name>datanucleus.autoCreateSchema</name> <value>true</value> </property> <property> <name>datanucleus.autoCreateTables</name> <value>true</value> </property> <property> <name>datanucleus.autoCreateColumns</name> <value>true</value> </property> <!-- 设置 hive仓库的HDFS上的位置 --> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> <description>location of default database for the warehouse</description> </property> <!--资源临时文件存放位置 --> <property> <name>hive.downloaded.resources.dir</name> <value>/home/hadoop/hive/tmp/resources</value> <description>Temporary local directory for added resources in the remote file system.</description> </property> <!-- Hive在0.9版本之前需要设置hive.exec.dynamic.partition为true, Hive在0.9版本之后默认为true --> <property> <name>hive.exec.dynamic.partition</name> <value>true</value> </property> <property> <name>hive.exec.dynamic.partition.mode</name> <value>nonstrict</value> </property> <!-- 修改日志位置 --> <property> <name>hive.exec.local.scratchdir</name> <value>/home/hadoop/hive/tmp/HiveJobsLog</value> <description>Local scratch space for Hive jobs</description> </property> <property> <name>hive.downloaded.resources.dir</name> <value>/home/hadoop/hive/tmp/ResourcesLog</value> <description>Temporary local directory for added resources in the remote file system.</description> </property> <property> <name>hive.querylog.location</name> <value>/home/hadoop/hive/tmp/HiveRunLog</value> <description>Location of Hive run time structured log file</description> </property> <property> <name>hive.server2.logging.operation.log.location</name> <value>/home/hadoop/hive/tmp/OpertitionLog</value> <description>Top level directory where operation tmp are stored if logging functionality is enabled </description> </property> <!-- 配置HWI接口 --> <property> <name>hive.hwi.war.file</name> <value>/usr/local/apache-hive-2.1.0-bin/lib/hive-hwi-2.1.0.jar</value> <description>This sets the path to the HWI war file, relative to ${HIVE_HOME}.</description> </property> <property> <name>hive.hwi.listen.host</name> <value>ns1</value> <description>This is the host address the Hive Web Interface will listen on</description> </property> <property> <name>hive.hwi.listen.port</name> <value>9999</value> <description>This is the port the Hive Web Interface will listen on</description> </property> <!-- Hiveserver2已经不再需要hive.metastore.local这个配置项了(hive.metastore.uris为空,则表示是metastore在本地,否则就是远程)远程的话直接配置hive.metastore.uris即可 --> <!-- property> <name>hive.metastore.uris</name> <value>thrift://m1:9083</value> <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description> </property --> <property> <name>hive.server2.thrift.bind.host</name> <value>ns1</value> </property> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property> <property> <name>hive.server2.thrift.http.port</name> <value>10001</value> </property> <property> <name>hive.server2.thrift.http.path</name> <value>cliservice</value> </property> <!-- HiveServer2的WEB UI --> <property> <name>hive.server2.webui.host</name> <value>ns1</value> </property> <property> <name>hive.server2.webui.port</name> <value>10002</value> </property> <property> <name>hive.scratch.dir.permission</name> <value>755</value> </property> <!-- 下面hive.aux.jars.path这个属性里面你这个jar包地址如果是本地的记住前面要加file://不然找不到, 而且会报org.apache.hadoop.hive.contrib.serde2.RegexSerDe错误 --> <!--property> <name>hive.aux.jars.path</name> <value>file:///home/centos/soft/spark/lib/spark-assembly-1.6.0-hadoop2.6.0.jar</value> </property--> <property> <name>hive.server2.enable.doAs</name> <value>false</value> </property> <!-- property> <name>hive.server2.authentication</name> <value>NOSASL</value> </property --> <property> <name>hive.auto.convert.join</name> <value>false</value> </property> <property> <name>spark.dynamicAllocation.enabled</name> <value>true</value> <description>动态分配资源</description> </property> <!-- 使用Hive on spark时,若不设置下列该配置会出现内存溢出异常 --> <property> <name>spark.driver.extraJavaOptions</name> <value>-XX:PermSize=128M -XX:MaxPermSize=512M</value> </property> </configuration>
3)hive-log4j2.properties
[hadoop@ns1 conf]$ cp hive-log4j2.properties.template hive-log4j2.properties [hadoop@ns1 conf]$ vim hive-log4j2.properties 修改如下内容 property.hive.log.dir = /home/hadoop/hive/tmp
4)copy jdbc 包
copy mysql-connector-java-5.1.46.jar 到/usr/local/apache-hive-2.1.0-bin/lib/
5)copy jline 扩展包
cp jline-2.12.jar /usr/local/hadoop/share/hadoop/yarn/lib/
6)copy jdk tools.jar 到hive lib
cp /usr/local/jdk/lib/tools.jar /usr/local/apache-hive-2.1.0-bin/lib/
7)hive 初始化
cd /usr/local/apache-hive-2.1.0-bin/bin ./schematool -dbType mysql -initSchema
8)启动hive metastore
./hive --service metastore
9)打开另一个终端进入hive
hive> create database hadoop; OK Time taken: 0.365 seconds hive> show databases; OK default hadoop Time taken: 0.027 seconds, Fetched: 2 row(s) hive> use hadoop; OK
5.hive 客户端安装
server与client区别:server安装了存储源数据的数据库,client则没有
将hive安装包复制到客户端机器
scp -r apache-hive-2.1.0-bin/ root@dn1:/usr/local
修改hive-site.xml,添加
<property> <name>hive.metastore.uris</name> <value>thrift://ns1:9083</value> <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
客户端即可使用