1、安装虚拟机
centOS6.5mini安装
界面选择:Install or upgrade an existing system
界面选择:SKIP
界面选择:basic storage devices
界面选择:yes,discard any data
界面选择:hostname
界面选择:configure network
界面选择:勾选connect automatically
界面选择:IPv4 setting并配置IP
界面选择:manual
界面选择:去掉system clock users UTC
界面选择:勾选user all space
界面选择:勾选write changes to disk
root 登录,添加用户和密码
#useradd hadoop
#passwd hadoop
查看网络配置 使ONBOOT=yes
#more /etc/sysconfig/network-scripts/ifcfg-eth0
###注意:严格按照下面的步骤
5.1启动zookeeper集群(分别在hadoop03、hadoop04、hadoop05上启动zk)
cd /mytest/zookeeper-3.4.5/bin/
./zkServer.sh start
#查看状态:一个leader,两个follower
./zkServer.sh status
5.2启动journalnode(分别在在hadoop03、hadoop04、hadoop05上执行)
cd /mytest/hadoop-2.6.0
sbin/hadoop-daemon.sh start journalnode
#运行jps命令检验,hadoop03、hadoop04、hadoop05上多了JournalNode进程
7.4.配置hive
cp hive-default.xml.template hive-site.xml
修改hive-site.xml(删除所有内容,只留一个<property></property>)
添加如下内容:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop01:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password to use against metastore database</description>
</property>
7.5.安装hive和mysq完成后,将mysql的连接jar包拷贝到$HIVE_HOME/lib目录下
如果出现没有权限的问题,在mysql授权(在安装mysql的机器上执行)
mysql -uroot -p
#(执行下面的语句 *.*:所有库下的所有表 %:任何IP地址或主机都可以连接)
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '123' WITH GRANT OPTION;
FLUSH PRIVILEGES;
7.6.建表(默认是内部表)
create table trade_detail(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t';
建分区表
create table td_part(id bigint, account string, income double, expenses double, time string) partitioned by (logdate string) row format delimited fields terminated by '\t';
建外部表
create external table td_ext(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t' location '/td_ext';
7.7.创建分区表
普通表和分区表区别:有大量数据增加的需要建分区表
create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by '\t';
分区表加载数据
load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');
load data local inpath '/root/data.am' into table beauty partition (nation="USA");
select nation, avg(size) from beauties group by nation order by avg(size);
8、Sqool安装
1.上传sqoop
2.安装和配置
----修改配置文件 sqoop-env.sh
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/hadoop/app/hadoop-2.4.1
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/hadoop/app/hadoop-2.4.1
#set the path to where bin/hbase is available
export HBASE_HOME=/home/hadoop/app/hbase-0.96.2-hadoop2
#Set the path to where bin/hive is available
export HIVE_HOME=/home/hadoop/app/hive-0.12.0-bin
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/home/hadoop/app/zookeeper-3.4.5/conf
-h [ --help ] show this usage information
--version show version information
-f [ --config ] arg configuration file specifying additional options
--port arg specify port number
--bind_ip arg local ip address to bind listener - all local ips
bound by default
-v [ --verbose ] be more verbose (include multiple times for more
verbosity e.g. -vvvvv)
--dbpath arg (=/data/db/) directory for datafiles 指定数据存放目录
--quiet quieter output 静默模式
--logpath arg file to send all output to instead of stdout 指定日志存放目录
--logappend appnd to logpath instead of over-writing 指定日志是以追加还是以覆盖的方式写入日志文件
--fork fork server process 以创建子进程的方式运行
--cpu periodically show cpu and iowait utilization 周期性的显示cpu和io的使用情况
--noauth run without security 无认证模式运行
--auth run with security 认证模式运行
--objcheck inspect client data for validity on receipt 检查客户端输入数据的有效性检查
--quota enable db quota management 开始数据库配额的管理
--quotaFiles arg number of files allower per db, requires --quota 规定每个数据库允许的文件数
--appsrvpath arg root directory for the babble app server
--nocursors diagnostic/debugging option 调试诊断选项
--nohints ignore query hints 忽略查询命中率
--nohttpinterface disable http interface 关闭http接口,默认是28017
--noscripting disable scripting engine 关闭脚本引擎
--noprealloc disable data file preallocation 关闭数据库文件大小预分配
--smallfiles use a smaller default file size 使用较小的默认文件大小
--nssize arg (=16) .ns file size (in MB) for new databases 新数据库ns文件的默认大小
--diaglog arg 0=off 1=W 2=R 3=both 7=W+some reads 提供的方式,是只读,只写,还是读写都行,还是主要写+部分的读模式
--sysinfo print some diagnostic system information 打印系统诊断信息
--upgrade upgrade db if needed 如果需要就更新数据库
--repair run repair on all dbs 修复所有的数据库
--notablescan do not allow table scans 不运行表扫描
--syncdelay arg (=60) seconds between disk syncs (0 for never) 系统同步刷新磁盘的时间,默认是60s
Replication options:
--master master mode 主复制模式
--slave slave mode 从复制模式
--source arg when slave: specify master as <server:port> 当为从时,指定主的地址和端口
--only arg when slave: specify a single database to replicate 当为从时,指定需要从主复制的单一库
--pairwith arg address of server to pair with
--arbiter arg address of arbiter server 仲裁服务器,在主主中和pair中用到
--autoresync automatically resync if slave data is stale 自动同步从的数据
--oplogSize arg size limit (in MB) for op log 指定操作日志的大小
--opIdMem arg size limit (in bytes) for in memory storage of op ids指定存储操作日志的内存大小
Sharding options:
--configsvr declare this is a config db of a cluster 指定shard中的配置服务器
--shardsvr declare this is a shard db of a cluster 指定shard服务器
6. 进入数据库的CLI管理界面
cd到mongodb目录下的bin文件夹,执行命令./mongo
运行如下:
[root@namenode mongodb]# ./bin/mongo
MongoDB shell version: 1.8.2
connecting to: test
> use test;
switched to db test
14、spark
5.1 解压
root@Master:/tools# tar -zxf spark-1.6.0-bin-hadoop2.6.tgz -C /usr/local/spark/
root@Master:/tools# cd /usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/
root@Master:/tools# tar -xzvf scala-2.10.6.tgz
============================ Test Spark=============================
scala> val rdd=sc.textFile("hdfs://hadoop-spark.dragon.org:8020/user/hadoop/data/wc.input")
scala> rdd.cache()
scala> val wordcount=rdd.flatMap(_.split(" ")).map(x=>(x,1)).reduceByKey(_+_)
scala> wordcount.take(10)
scala> val wordsort=wordcount.map(x=>(x._2,x._1)).sortByKey(false).map(x=>(x._2,x._1))
scala> wordsort.take(10)
5.6 一键分发到各个服务器上去
root@Master:~# vim fenfa.sh
#!/bin/sh
for i in 1 2 3 4
do
#scp -rq /usr/local/hadoop/ root@Worker$i:/usr/local/
scp -rq /usr/local/spark/ root@Worker$i:/usr/local/
done