赞
踩
1、数据库与数据仓库
数据库:mysql、oracle、sqlserver、DB2、sqlite、MDB;
数据仓库:Hive,是MR的客户端,也就是说不必要每台机器都安装部署Hive。
2、Hive的特性
操作接口是采用SQL语法,HQL,避免了写MapReduce的繁琐过程。
3、Hive体系结构
(1)Client:终端命令行,其中,JDBC不常用,非常麻烦(相对于前者)
(2)metastore:原本的数据集和字段名称以及数据信息之间的双射关系,目前是存储在Mysql中
(3)Server-Hadoop:在操作Hive的同时,需要将Hadoop的HDFS开启,YARN开启,MAPRED配置好
1、解压Hive到安装目录
tar -zxf /opt/softwares/hive-0.13.1-cdh5.3.6.tar.gz -C ./
2、修改配置文件
(1)重命名配置文件
mv hive-env.sh.template hive-env.sh
mv hive-default.xml.template hive-site.xml
mv hive-log4j.properties.template hive-log4j.properties
(2)修改hive-env.sh
JAVA_HOME=/opt/modules/jdk1.8.0_121
HADOOP_HOME=/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/
export HIVE_CONF_DIR=/opt/modules/cdh/hive-0.13.1-cdh5.3.6/conf
(3)修改hive-site.xml
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop-senior01.itguigu.com:3306/metastore?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>password to use against metastore database</description> </property> <!-- 是否在当前客户端中显示查询出来的数据的字段名称 --> <property> <name>hive.cli.print.header</name> <value>true</value> <description>Whether to print the names of the columns in query output.</description> </property> <!-- 是否在当前客户端中显示当前所在数据库名称 --> <property> <name>hive.cli.print.current.db</name> <value>true</value> <description>Whether to include the current database in the Hive prompt.</description> </property> <!-- Hive的MapReduce任务 --> <property> <name>hive.fetch.task.conversion</name> <value>more</value> <description> Some select queries can be converted to single FETCH task minimizing latency. Currently the query should be single sourced not having any subquery and should not have any aggregations or distincts (which incurs RS), lateral views and joins. 1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only 2. more : SELECT, FILTER, LIMIT only (TABLESAMPLE, virtual columns) </description> </property>
(4)修改hive-log4j.properties
先在hive安装目下创建logs文件夹
mkdir logs
然后,修改hive-log4j.properties文件:
hive.log.dir=/opt/modules/cdh/hive-0.13.1-cdh5.3.6/logs
3、拷贝驱动
拷贝数据库驱动包到Hive根目录下的lib文件夹:
cp -a mysql-connector-java-5.1.27-bin.jar /opt/modules/cdh/hive-0.13.1-cdh5.3.6/lib/
4、安装和配置MySQL
默认在CentOS 7中操作,和6版本略有不同。
(1)安装MySQL(YUM安装)
su - root
yum -y install mysql mysql-server mysql-devel
wget http://dev.mysql.com/get/mysql-community-release-el7-5.noarch.rpm
rpm -ivh mysql-community-release-el7-5.noarch.rpm
yum -y install mysql-community-server
提示:如果使用离线绿色版本(免安装版本)需要手动初始化Mysql数据库。
(2)配置MySQL
(2-1)开启Mysql服务
systemctl start mysqld.service
(2-2)设置root用户密码
mysqladmin -uroot password '123456'
(2-3)进入MySQL
mysql -uroot -p123456
(2-4)为用户以及其他机器节点授权,例如:
grant all on *.* to root@'主机名' identified by '123456';
(2-5)刷新
flush privileges;
1、启动Hive
到Hive的安装目录下输入命令行:
bin/hive
2、修改HDFS系统中关于Hive的一些目录权限,如tmp目录以及warehouse目录(存放数据的地方)
进入hadoop的安装目录,输入命令行:
bin/hadoop fs -chmod 777 /tmp/
bin/hadoop fs -chmod 777 /user/hive/warehouse
3、启动时的常见错误:启动Hive时,无法初始化metastore数据库,无法创建连接,无法创建会话
可能原因:
(1)hive的metastore数据库丢失了,比如drop,比如文件损坏,
(2)metasotre版本号不对,
(3)远程表服务。
4、Hive历史命令存放地:home目录
cat ~/.hivehistory
1、修改配置文件:hive-site.xml
(1)hive.server2.thrift.port的值修改为10000
(2)hive.server2.thrift.bind.host的值修改为集群的hostname
(3)hive.server2.long.polling.timeout的值修改为5000(去掉L)
具体如下:
<property> <name>hive.server2.thrift.port</name> <value>10000</value> <description>Port number of HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT</description> </property> <property> <name>hive.server2.thrift.bind.host</name> <value>hadoop-senior01.halearn.cn</value> <description>Bind host on which to run the HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST</description> </property> <property> <name>hive.server2.long.polling.timeout</name> <value>5000</value> <description>Time in milliseconds that HiveServer2 will wait, before responding to asynchronous calls that use long polling</description> </property>
2、检查端口
sudo netstat -antp | grep 10000
3、启动服务
进入到hive安装目录下,输入命令行:
bin/hive --service hiveserver2
4、启动beeline
进入到hive安装目录下,输入命令行:
bin/beeline
5、连接服务
beeline> !connect jdbc:hive2://hadoop-senior01.halearn.cn:10000
连续敲2个回车即可。
注意:如果需要执行MapReduce任务(如:聚合查询),则还需配置hive-site.xml中hive.server2.enable.doAs的值,改为false,如下
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
<description>
Setting this property to true will have HiveServer2 execute
Hive operations as the user making the calls to it.
</description>
</property>
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。