当前位置:   article > 正文

Cent OS安装Hadoop_centos下载hadoop压缩包

centos下载hadoop压缩包

环境介绍

两台机器。

# uname -a
Linux xxx 2.6.32_1-16-0-0_virtio #1 SMP Thu May 14 15:30:56 CST 2015 x86_64 x86_64 x86_64 GNU/Linux

# java -version
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

软件版本:

zookeeper: 3.4.8
hadoop: 2.7.2

目录路径需要自行替换为自己的目录路径。

Tip:
开启 ssh 免认证登录ssh(自行安装ssh)。

$ ssh-keygen -t rsa -P ""
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  • 1
  • 2

公钥自行变更

下载资源

Hadoop官网 有详细的资料介绍。 Linux下可以直接选取最近的点下载。

$ pwd
/home/www/install

$ wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
  • 1
  • 2
  • 3
  • 4

Hadoop依赖JDK 和 Zookeeper, 需要自行下载安装。

开始安装

安装以及日后的运维推荐使用专属账户hadoop进行.
Hadoop依赖SSH, 因此需要配置 SSH 免密码登录。

如下执行用户 hadoop
当前为单机

解压Hadoop并准备编辑各类配置文件

$ pwd
/home/www/install

$ tar -xzvf hadoop-2.7.2.tar.gz
$ cd hadoop-2.7.2/etc/hadoop/
  • 1
  • 2
  • 3
  • 4
  • 5

被编辑文件列表:

  • core-site.xml
  • mapred-site.xml.template
  • hdfs-site.xml

core-site.xml
hadoop.tmp.dir 值可以自行替换

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>file:/home/www/install/hadoop-2.7.2/tmp</value>
                <description>Hadoop Temp Dir</description>
        </property>

        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:8900</value>
        </property>
</configuration>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

mapred-site.xml.template

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->
<configuration>

        <property>
                <name>mapred.job.tracker</name>
                <value>localhost:8901</value>
        </property>

</configuration>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

hdfs-site.xml
dfs.namenode.name.dirdfs.datanode.data.dir 可在hadoop.tmp.dir 目录之下

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>

        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/home/www/install/hadoop-2.7.2/tmp/hdfs/name</value>
        </property>

        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/home/www/install/hadoop-2.7.2/tmp/hdfs/data</value>
        </property>

</configuration>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

初始化 HDFS 系统

$ pwd
/home/www/install/hadoop-2.7.2

$ ./bin/hdfs namenode -format
  • 1
  • 2
  • 3
  • 4

开启

$ pwd
/home/www/install/hadoop-2.7.2

$ sh sbin/start-dfs.sh
  • 1
  • 2
  • 3
  • 4

此处可能出现的问题:
① 多次提示登录

配置SSH 免密码登录。

② localhost: Error: JAVA_HOME is not set and could not be found

JAVA_HOME 写死在 hadoop.env.sh

$ pwd
/home/www/install/hadoop-2.7.2/etc/hadoop

$ echo "export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_91" >> hadoop-env.sh 
  • 1
  • 2
  • 3
  • 4

查看进程信息

$ jps
20724 SecondaryNameNode
20041 NameNode
22444 Jps
20429 DataNode
  • 1
  • 2
  • 3
  • 4
  • 5

Hadoop的启动和关闭都非常直观,可见 sbin目录下。
比如 start-dfs.sh, stop-dfs.sh, start-all.sh, stop-all.sh

多点安装

集群安装中大题跟单点安装一致。

  • 需要注意集群间能互相访问并且免密码ssh登录
  • 需要配置 slaves 文件

原封不动的拷贝单点安装时的hadoop去需要多点安装的机器。
选择一台机器作为Master, 可以免密码 ssh 登录 Slaves 机器。 修改配置。

$ pwd
/home/www/install/hadoop-2.7.2/etc/hadoop

echo "设定的slaves" >> slaves
  • 1
  • 2
  • 3
  • 4

通过命令 sbin/start-all.sh 启动hadoop集群。

在Master上可以看到

$ jps
9792 SecondaryNameNode
10420 NodeManager
9462 DataNode
9031 NameNode
10185 ResourceManager
27806 Jps
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

在 Slaves 所有机器上可以看到

$ jps
7283 Jps
16564 DataNode
17192 NodeManager
12687 NameNode
  • 1
  • 2
  • 3
  • 4
  • 5

Hadoop Hello World

hadoop提供了很多的 Hello World案例。 通常比较喜欢wordcount.

随意选择hadoop集群里面的一台机器。

$ pwd
/home/www/install/hadoop-2.7.2/share/hadoop/mapreduce

$ echo "hello world NO.1" > h1
$ echo "hello world NO.2" > h2
$ hadoop fs -put h1 /input
$ hadoop fs -put h2 /input

$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /input /output
$ hadoop fs -ls /output/
Found 2 items
-rw-r--r--   1 hadoop supergroup          0 2016-07-06 13:51 /output/_SUCCESS
-rw-r--r--   1 hadoop supergroup         30 2016-07-06 13:51 /output/part-r-00000

$ hadoop fs -cat /output/part-r-00000
NO.1    1
NO.2    1
hello   2
world   2
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

Hadoop有很多的Hello World支持, 可以参照如下文档替换执行。

  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

完毕

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/知新_RL/article/detail/352762
推荐阅读
相关标签
  

闽ICP备14008679号