赞
踩
本节课主要讲解大数据的背景,应用于哪些行业,hadoop是什么,hadoop生态圈,hadoop架构,hdfs分布式文件系统,hdfs的体系结构,hadoop常用命令
数据的爆炸式增长
地球上至今总共的数据量:
在2006年,个人用户才刚刚迈进TB时代,全球一共新产生了约180EB的数据;在2011年,这个数字达到了1.8ZB。
而有市场研究机构预测:
到2020年,整个世界的数据总量将会增长44倍,达到35.2ZB( 1ZB=10亿TB) !
涉及各个行业领域
电力、电信、经贸、教育、医疗、金融、石油、民航
天文、气象、基因、医学、物理、互联网
与人类社会活动有关的网络数据
我们都是大数据的生产者
Hadoop是一个开源的分布式系统基础框架,可编写和运行分布式应用处理大规模数据,是专为离线和大规模数据分析而设计的,并不适合那种对几个记录随机读写的在线事务处理模式
hadoop:分布式系统框架
hive:数据仓库
mahout:算法库
storm:分布式实时计算框架
hbase:分布式实时列式存储数据库
HDFS:分布式文件系统
YARN:资源调度器
MapReduce:分布式计算框架
文件系统是操作系统用于明确存储设备(常见的是磁盘,也有基于NAND Flash的固态硬盘)或分区上的文件的方法和数据结构,即在存储设备上组织文件的方法
我们要拿着一个小本本,上面记着,文件名,文件所在扇区以及文件大小。每次要读写文件,我们要人工查询这个账本,知道我们要的文件在哪里。如果文件A所在的扇区M已经写满了,随后的一个扇区M+1被文件B占用了,我们还想接着写文件A,怎么办呢?只能从其他地方找一个空闲扇区N,然后在账本上把N记录到文件A占用的扇区项中。
我们如何知道硬盘上还有哪些空间可以用呢?难道每次都从前往后把扇区使用情况计算一遍吗吗?可能还需要另起一个账本记录扇区使用情况,删除文件,我们把对应的扇区标记为空闲,如果创建文件,把对应的扇区标记为不能使用。
对于操作系统而言呢?我觉得,没有文件系统就不会有操作系统,这样的操作系统充其量就是一个硬盘驱动。为什么?可以设想一下创建文件的过程:
namenode :
接收客户端的读写请求
存储元数据信息
接收datanode的心跳报告
负载均衡
分配数据块的存储节点
datanode:
真正处理客户端的读写请求
向namenode发送心跳
向namenode发送块报告真正的数据存储
副本之间的相互复制
secondarynamenode :
备份元数据信息
帮助namenode进行元数据合,并减轻namenode的压力
客户端
进行数据块的物理切分
向namenode发送读写请求
向namenode发送读写响应
工作原理
resourcemanager ( JobTracker )
1、处理客户端请求
2、启动或监控 MRAppMaster
3、监控 NodeManager的健康状况,nodemanager定期的向resourcemanager进行发送心跳报告
4、资源的分配与调度
nodemanager ( TaskTracker )
1、管理单个节点上的资源
2、处理来自ResourceManager的命令(启动mrappmaster的时候)
3、处理来自MRAppMaster的命令(启动maptask reducetask任务的时候)
mapreduce编程模型
使用以下命令执行mr
先创建一个文件
vi words
hello world
i like java
i like java too
上传至文件系统
hdfs dfs -put /home/hadooplwords /
hadoop jar /software/hadoop/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /words /output
hadoop fs所有文件系统都可以使用
hdfs dfs仅针对于hdfs文件系统
hdfs dfs -ls /
hdfs dfs -mkdir /test
hdfs dfs -touchz /kkb.txt
hdfs dfs -rm /kkb.txt
hdfs dfs -put /kkb1.txt /
hdfs dfs -cat /kkb1.txt
hdfs dfs -get /kkb.txt ./
hdfs dfs -rmr /test
https://github.com/apache/hadoop
https://hadoop.apache.org/
#设置时区,同步时间
timedatectl set-timezone Asia/Shanghai
yum install -y ntpdate
ntpdate -u ntp1.aliyum.com
[root@master ]# vi /etc/hostname
配置信息如下,如果已经存在则不修改,将HadoopMaster节点的主机名改为master,即下面代码的第2行所示。
master
确实修改生效命令:
[root@master kkb]# hostname master
[root@master kkb]# hostnamectl set-hostname master
[root@master kkb]# reboot
检测主机名是否修改成功命令如下,在操作之前需要关闭当前终端,重新打开一个终端︰
[root@master ]# hostname
使用vi编辑主机名︰
[ root@slave kkb]# vi /etc/hostname
配置信息如下,如果已经存在则不修改,将Hadoop slave节点的主机名改为slave,即下面代码的第2行所示。
slave01
另一台改为
slave02
确实修改生效命令∶
[root@slave kkb]# hostnamectl set-hostname slave01
[root@slave kkb]# hostname slave01
## 另一台执行
[root@slave kkb]# hostnamectl set-hostname slave02
[root@slave kkb]# hostname slave02
检测主机名是否修改成功命令如下,在操作之前需要关闭当前终端,重新打开一个终端︰
[ root@slave01 ]# hostname
[ root@slave02 ]# hostname
配置三台机器的网络在一个网段中
该项也需要在HadoopSlave节点配置。
在终端中执行下面命令∶
[root@master~]# systemctl stop firewalld.service
[root@master ~]# systemctl disable firewalld.service
该项也需要在HadoopSlave节点配置。
需要在root用户下(使用su命令),编辑主机名列表的命令∶
[root@master]# vi /etc/hosts
将下面两行添加到/etc/hosts文件中:
[root@master]# vi /etc/hosts
192.168.25.208 master
192.168.25.209 slave01
192.168.25.211 slave02
注意:这里master节点对应IP地址是192.168.25.208,slave对应的IP是192.168.25.209,而自己在做配置时,需要将这两个IP地址改为你的master和slave对应的P地址。
[root@master~]$ ping slave01
[root@master ~]$ ping slave02
[root@slave01~]$ ping master
[root@slave02~]$ ping master
如果出现下图的信息表示配置成功:
该项也需要在HadoopSlave节点配置。首先查询系统自带的jdk
[ root@master ~]$rpm -qa l grep java
然后移除系统自带的jdk
[root@master ~]# yum remove java-1.*
将JDK文件解压,放到/usr/java目录下
[root@master ~]# cd /usr
[root@master usr]# mkdir java
[root@master usr]# cp /root/jdk-8u131-linux-x64.tar.gz /usr/java/
[root@master usr]#cd java
[root@master java]# tar -zxvf /usr/java/jdk-8u131-linux-x64.tar.gz
[root@master java]# chmod +x / usr/java/jdk1.8.e_131/bin/*
使用cd命令回到root用户的家目录下
[root@master java]$ cd
使用vi配置环境变量root用户下
[root@master~]$ vi .bash_profile
复制粘贴以下内容添加到到上面vi打开的文件中:
PATH=$PATH:$HOME/bin
export PATH
JAVA_HOME=/usr/java/jdk1.8.0_161/
HADOOP_HOME=/software/hadoop/hadoop-2.10.1
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME
export HADOOP_HOME
export PATH
``
使改动生效命令:
[root@master~]$ source .bash_profile
测试配置
### 7.免密钥登录配置(root用户与普通用户均需要配置)
#### 7.1三个节点分别执行生成秘钥的命令
[root@master ~]# ssh-keygen -t rsa #一路回车完成秘钥生成
[root@slavee1~]# ssh-keygen -t rsa #一路回车完成秘钥生成
[root@slave02~]# ssh-keygen -t rsa #一路回车完成秘钥生成
#### 7.2在master节点上做示范.以下的命令需要在所有节点上执行
[root@master ~]# ssh-copy-id -i slave01
[root@master ~]# ssh-copy-id -i slave02
后续创建hadoop用户后执行相同的操作
下载地址
https://downloads.apache.org/hadoop/common
先切回到root用户
[ kkb@master ~]$ su - root
创建/software/hadoop目录
[root@master ~]$ mkdir -p /software/hadoop/
进入到hadoop目录下(压缩包可以使用鼠标将压缩包拖入虚拟机中)
[root@master ~]$ cd /software/hadoop/
解压Hadoop安装包命令如下:
[root@master hadoop]#mv hadoop-2.10.1.tar.gz /software/hadoop/
[root@master hadoop]# tar -zxvf hadoop-2.7.3.tar.gz
环境变量文件中,只需要配置JDK的路径。
[root@master hadoop-2.7.3]$ vim /software/hadoop/hadoop-2.10.1/etc/hadoop/yarn-env.sh
在文件的靠前的部分找到下面的一行代码:
[root@master hadoop-2.7.3]$ vim /software/hadoop/hadoop-2.10.1/etc/hadoop/hadoop-env.sh
export JAVA_HOME=${AVA_HONE]
将这行代码修改为下面的代码:
export JAVA_HOME=/usr/java/jdk1.8.0_161/
然后保存文件。
配置环境变量yarn-env.sh
环境变量文件中,只需要配置JDK的路径。
[root@master hadoop]# vim /software/hadoop/hadoop-2.10.1/etc/hadoop/yarn-env.sh
在文件的靠前的部分找到下面的一行代码:
export AVA_HONE=/home/y/libexec/jdk1.6.e/
将这行代码修改为下面的代码(将#号去掉):
export JAVA_HOME=/usr/java/jdk1.8.0_161/
然后保存文件。
使用vim编辑∶
[root@master hadoop]# vim /software/hadoop/hadoop-2.10.1/etc/hadoop/core-site.xml
用下面的代码替换core-site.xml中的内容:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/software/hadoop/hadoopdate</value> </property> </configuration>
使用vim配置
[root@master hadoop]# vim /software/hadoop/hadoop-2.10.1/etc/hadoop/hdfs-site.xml
用下面的代码替换hdfs-site.xml中的内容:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
使用vi编辑∶
[root@master hadoop]# vim /software/hadoop/hadoop-2.10.1/etc/hadoop/yarn-site.xml
用下面的代码替换yarn-site.xml中的内容∶
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:18040</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:18030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:18025</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:18141</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:18088</value> </property> </configuration>
复制mapred-site-template.xml文件:
[root@master hadoop]# cp /software/hadoop/hadoop-2.10.1/etc/hadoop/mapred-site.xml.template /software/hadoop/hadoop-2.10.1/etc/hadoop/mapred-site.xml
使用vi编辑:
root@master hadoop]# vim /software/hadoop/hadoop-2.10.1/etc/hadoop/mapred-site.xml
用下面的代码替换mapred-site.xml中的内容
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
给hadoop目录授权
[root@master hadoop-2.7.3]$ chown hadoop.hadoop /software/hadoop/ -R
使用下面的命令将已经配置完成的Hadoop复制到从节点HadoopSlave上:
[root@master hadoop]# scp -r /software/hadoop/hadoop-2.10.1 root@slave01:/software/hadoop/
[root@master hadoop]# scp -r /software/hadoop/hadoop-2.10.1 root@slave02:/software/hadoop/
[root@master hadoop-2.7.3]$ chown hadoop.hadoop /software/hadoop/ -R
使用vi编辑:
[root@master hadoop]# vim /software/hadoop/hadoop-2.10.1/etc/hadoop/slaves
用下面的代码替换slaves中的内容:
slave01
slave02
su - hadoop
该节的配置需要同时在两个节点( HadoopMaster和HadoopSlave )上进行操作,操作命令如下︰
只需要在hadoop用户下执行环境变量配置操作即可
[hadoop@master]# vi .bash_profile
JAVA_HOME=/usr/java/jdk1.8.0_161/
HADOOP_HOME=/software/hadoop/hadoop-2.10.1
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME
export HADOOP_HOME
export PATH
使改动生效命令∶
[ root@master ~]$ source ~/.bash_profile
该节的配置需要同时在两个节点( HadoopMaster和HadoopSlave )上进行操作。在kkb的用户主目录下,创建数据目录,命令如下:
[ hadoop@master ~]$ mkdir /software/hadoop/hadoopdata
格式化命令如下,该操作需要在HadoopMaster节点上执行:
[hadoop@master ~]$ hdfs namenode -format
使用start-all.sh启动Hadoop集群,首先进入Hadoop安装主目录,然后执行启动命令︰
[hadoop@master ~]$ cd /software/hadoop/hadoop-2.10.1
[hadoop@master ~]$ start-all.sh
执行命令后,提示输入yes/no时,输入yes.
在HadoopMaster的终端执行jps命令,在打印结果中会看到4个进程,分别是ResourceManager,Jps、NameNode和SecondaryNameNode,
如下图所示。如果出现了这4个进程表示主节点进程启动成功。
[hadoop@master ~]$ jps
3797 SecondaryNameNode
3959 ResourceManager
3594 NameNode
4251 Jps
在HadoopSlave的终端执行jps命令,在打印结果中会看到3个进程,分别是NodeManager、DataNode和jps,如下图所示。如果出现了这3个进程表示从节点进程启动成功。
[hadoop@slave01 hadoop]$ jps
1896 DataNode
2153 Jps
2013 NodeManager
可先配置主机映射
在本机路径C:\Windows\System32\drivers\etc\hosts文件
192.168.25.208 master
192.168.25.209 slave01
192.168.25.211 slave02
添加上自己的虚拟机IP以及主机名。之后通过主机名访问即可。
当然也可以,通过IP地址访问
启动Frefox浏览器或其他浏览器,在浏览器地址栏中输入输入http://192.168.25.208:50070/,检查namenode和datanode是否正常。UI页面如下图所示。
启动Firefox浏览器或其他浏览器,在浏览器地址栏中输入输入http://192.168.25.208:18088/,检查Yarn是否正常,页面如下图所示。
验证mapreduce
[ hadoop@master ~]# touch words
使用vi命令向words中输入如下内容:
hello world
i like java
i like java too
[ hadoop@master ~]# hadoop dfs -mkdir /test #创建test目录 DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. [hadoop@master ~]# hadoop dfs -ls / #查看test目录创建成功 DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 1 items drwxr-xr-x - hadoop supergroup 0 2021-12-08 09:12 /test [hadoop@master ~]# hadoop dfs -put words /test #将创建好的words文件上传到hdfs文件系统中的test目录下 DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. [hadoop@node01 ~]# hadoop dfs -ls /test #查看文件上传是否成功 [hadoop@slave01 hadoop]$ hadoop dfs -ls /test DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 1 items -rw-r--r-- 3 hadoop supergroup 39 2021-12-08 09:15 /test/words
执行
[hadoop@node01 ~]# hadoop jar /software/hadoop/hadoop-2.10.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.1.jar wordcount /test/words /test/output 21/12/08 09:18:23 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.25.208:18040 21/12/08 09:18:24 INFO input.FileInputFormat: Total input files to process : 1 21/12/08 09:18:24 INFO mapreduce.JobSubmitter: number of splits:1 21/12/08 09:18:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1638924126207_0001 21/12/08 09:18:24 INFO conf.Configuration: resource-types.xml not found 21/12/08 09:18:24 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 21/12/08 09:18:24 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE 21/12/08 09:18:24 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE 21/12/08 09:18:25 INFO impl.YarnClientImpl: Submitted application application_1638924126207_0001 21/12/08 09:18:25 INFO mapreduce.Job: The url to track the job: http://master:18088/proxy/application_1638924126207_0001/ 21/12/08 09:18:25 INFO mapreduce.Job: Running job: job_1638924126207_0001 21/12/08 09:18:32 INFO mapreduce.Job: Job job_1638924126207_0001 running in uber mode : false 21/12/08 09:18:32 INFO mapreduce.Job: map 0% reduce 0% 21/12/08 09:18:37 INFO mapreduce.Job: map 100% reduce 0% 21/12/08 09:18:42 INFO mapreduce.Job: map 100% reduce 100% 21/12/08 09:18:43 INFO mapreduce.Job: Job job_1638924126207_0001 completed successfully 21/12/08 09:18:43 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=69 FILE: Number of bytes written=416859 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=133 HDFS: Number of bytes written=39 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=2888 Total time spent by all reduces in occupied slots (ms)=2448 Total time spent by all map tasks (ms)=2888 Total time spent by all reduce tasks (ms)=2448 Total vcore-milliseconds taken by all map tasks=2888 Total vcore-milliseconds taken by all reduce tasks=2448 Total megabyte-milliseconds taken by all map tasks=2957312 Total megabyte-milliseconds taken by all reduce tasks=2506752 Map-Reduce Framework Map input records=3 Map output records=9 Map output bytes=75 Map output materialized bytes=69 Input split bytes=94 Combine input records=9 Combine output records=6 Reduce input groups=6 Reduce shuffle bytes=69 Reduce input records=6 Reduce output records=6 Spilled Records=12 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=186 CPU time spent (ms)=1160 Physical memory (bytes) snapshot=474054656 Virtual memory (bytes) snapshot=4210536448 Total committed heap usage (bytes)=293601280 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=39 File Output Format Counters Bytes Written=39
查看结果
[hadoop@slave01 hadoop]$ hadoop dfs -ls /test/output DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 2 items -rw-r--r-- 3 hadoop supergroup 0 2021-12-08 09:18 /test/output/_SUCCESS -rw-r--r-- 3 hadoop supergroup 39 2021-12-08 09:18 /test/output/part-r-00000 [hadoop@slave01 hadoop]$ hadoop dfs -cat /test/output/part-r-00000 DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. hello 1 i 2 java 2 like 2 to 1 world 1
下载地址https://mirrors.tuna.tsinghua.edu.cn/apache/hive/
将下载的tar包文件上传到node(根据自身集群资源情况,在hadoop集群运行的情况下,选择一台剩余内存相对比较多的机器作为hive的安装节点此处选择node节点)。例如master的节点内存较多
[hadoop@master hadoop]$ tar -xvf apache-hive-3.1.2-bin.tar.gz
[rootgmaster ~]# vi .bash_profile PATH=$PATH:$HOME/.local/bin:$HOME/bin export PATH JAVA_HOME=/usr/java/jdk1.8.0_161/ HADOOP_HOME=/software/hadoop/hadoop-2.10.1 HIVE_HOME=/software/hadoop/apache-hive-3.1.2-bin PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin export JAVA_HOME export HADOOP_HOME export HIVE_HOME export PATH [hadoop@master bin]$ source ~/.bash_profile [hadoop@master bin]$ hive --version #查看hive版本号,验证环境变量配置是否正确 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/software/hadoop/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/software/hadoop/hadoop-2.10.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive 3.1.2 Git git://HW13934/Users/gates/tmp/hive-branch-3.1/hive -r 8190d2be7b7165effa62bd21b7d60ef81fb0e4af Compiled by gates on Thu Aug 22 15:01:18 PDT 2019 From source with checksum 0492c08f784b188c349f6afb1d8d9847
在master节点,使用hadoop(su -hadoop)用户登录,并以此执行以下命令创建临时目录和hive表存放路径
hadoop fs -mkdir /tmp
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse
[hadoop@master conf]$ vim hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:derby:;databaseName=metastore_db;create=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>org.apache.derby.jdbc.EmbeddedDriver</value> </property> <property> <name>hive.metastore.local</name> <value>true</value> </property> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> <name>datanucleus.schema.autoCreateAll</name> <value>true</value> </property> </configuration>
[hadoop@master conf]$ vim hive-env.sh
HADOOP_HOME=/software/hadoop/hadoop-2.10.1
HIVE_CONF_DIR=/software/hadoop/apache-hive-3.1.2-bin/conf
hive安装目录下执行下面命令初始化hive默认数据库为derby
[hadoop@master apache-hive-3.1.2-bin]$ ./bin/schematool -dbType derby -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/software/hadoop/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/software/hadoop/hadoop-2.10.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Starting metastore schema initialization to 3.1.0
Initialization script hive-schema-3.1.0.derby.sql
...................
注意事项:
hadoop dfsadmin -safemode leave
命令关闭安全模式
Hive uses Hadoop, sO:
you must have Hadoop in your path OR
export HADOOP_HOME=
In addition, you must use below HDFS commands to create /tmp and /user/hive/warehouse (akahive.metastore.warehouse.dir ) and set them chmod g+u before you can create a table in Hive.
当我们配置好hive的环境变量执行,直接使用hive命令启动hive的shell操作界面
[hadoop@master apache-hive-3.1.2-bin]$ hive #直接使用hive命令启动hive cli命令界面窗口
which: no hbase in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/hadoop/.local/bin:/home/hadoop/bin:/home/hadoop/bin:/usr/java/jdk1.8.0_161//bin:/software/hadoop/hadoop-2.10.1/bin:/software/hadoop/hadoop-2.10.1/sbin:/home/hadoop/.local/bin:/home/hadoop/bin:/home/hadoop/bin:/usr/java/jdk1.8.0_161//bin:/software/hadoop/hadoop-2.10.1/bin:/software/hadoop/hadoop-2.10.1/sbin:/software/hadoop/apache-hive-3.1.2-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/software/hadoop/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/software/hadoop/hadoop-2.10.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 45aa3096-75ba-4dd7-826e-bb3479e8b8f4
Logging initialized using configuration in jar:file:/software/hadoop/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = f7d4cc55-5312-4c62-a8f2-ec34b9cbc899
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show tables; #使用showa tables如果安装成功会显示oK提示.
OK
Time taken: 0.575 seconds
https://www.cnblogs.com/benjamin77/p/10232561.html
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。