赞
踩
两台机器。
# uname -a
Linux xxx 2.6.32_1-16-0-0_virtio #1 SMP Thu May 14 15:30:56 CST 2015 x86_64 x86_64 x86_64 GNU/Linux
# java -version
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
软件版本:
zookeeper: 3.4.8
hadoop: 2.7.2
目录路径需要自行替换为自己的目录路径。
Tip:
开启 ssh 免认证登录ssh(自行安装ssh)。
$ ssh-keygen -t rsa -P ""
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
公钥自行变更
Hadoop官网 有详细的资料介绍。 Linux下可以直接选取最近的点下载。
$ pwd
/home/www/install
$ wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
Hadoop依赖JDK 和 Zookeeper, 需要自行下载安装。
安装以及日后的运维推荐使用专属账户hadoop
进行.
Hadoop依赖SSH, 因此需要配置 SSH 免密码登录。
如下执行用户 hadoop
当前为单机
解压Hadoop并准备编辑各类配置文件
$ pwd
/home/www/install
$ tar -xzvf hadoop-2.7.2.tar.gz
$ cd hadoop-2.7.2/etc/hadoop/
被编辑文件列表:
core-site.xml
hadoop.tmp.dir
值可以自行替换
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/www/install/hadoop-2.7.2/tmp</value>
<description>Hadoop Temp Dir</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8900</value>
</property>
</configuration>
mapred-site.xml.template
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8901</value>
</property>
</configuration>
hdfs-site.xml
dfs.namenode.name.dir
和dfs.datanode.data.dir
可在hadoop.tmp.dir
目录之下
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/www/install/hadoop-2.7.2/tmp/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/www/install/hadoop-2.7.2/tmp/hdfs/data</value>
</property>
</configuration>

初始化 HDFS 系统
$ pwd
/home/www/install/hadoop-2.7.2
$ ./bin/hdfs namenode -format
开启
$ pwd
/home/www/install/hadoop-2.7.2
$ sh sbin/start-dfs.sh
此处可能出现的问题:
① 多次提示登录
配置SSH 免密码登录。
② localhost: Error: JAVA_HOME is not set and could not be found
将 JAVA_HOME
写死在 hadoop.env.sh
中
$ pwd
/home/www/install/hadoop-2.7.2/etc/hadoop
$ echo "export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_91" >> hadoop-env.sh
查看进程信息
$ jps
20724 SecondaryNameNode
20041 NameNode
22444 Jps
20429 DataNode
Hadoop的启动和关闭都非常直观,可见 sbin目录下。
比如 start-dfs.sh, stop-dfs.sh, start-all.sh, stop-all.sh
集群安装中大题跟单点安装一致。
ssh
登录slaves
文件原封不动的拷贝单点安装时的hadoop去需要多点安装的机器。
选择一台机器作为Master, 可以免密码 ssh 登录 Slaves 机器。 修改配置。
$ pwd
/home/www/install/hadoop-2.7.2/etc/hadoop
echo "设定的slaves" >> slaves
通过命令 sbin/start-all.sh
启动hadoop集群。
在Master上可以看到
$ jps
9792 SecondaryNameNode
10420 NodeManager
9462 DataNode
9031 NameNode
10185 ResourceManager
27806 Jps
在 Slaves 所有机器上可以看到
$ jps
7283 Jps
16564 DataNode
17192 NodeManager
12687 NameNode
hadoop提供了很多的 Hello World案例。 通常比较喜欢wordcount.
随意选择hadoop集群里面的一台机器。
$ pwd
/home/www/install/hadoop-2.7.2/share/hadoop/mapreduce
$ echo "hello world NO.1" > h1
$ echo "hello world NO.2" > h2
$ hadoop fs -put h1 /input
$ hadoop fs -put h2 /input
$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /input /output
$ hadoop fs -ls /output/
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2016-07-06 13:51 /output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 30 2016-07-06 13:51 /output/part-r-00000
$ hadoop fs -cat /output/part-r-00000
NO.1 1
NO.2 1
hello 2
world 2

Hadoop有很多的Hello World支持, 可以参照如下文档替换执行。
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

完毕
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。