当前位置:   article > 正文

hadoop2.6.3 集群部署_supermap hadoop

supermap hadoop

准备工作

准备三台ubuntu1404 环境

master  192.168.12.127

slave1  192.168.12.132

slave2  192.168.12.133

本例是通过openstack 创建的三台VM


配置三个节点间ssh 免密码登录

在slave1上

生成密钥,ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

在slave2上

生成密钥,ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

在master 上

1. 生成密钥,ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
2. 将公钥追加到授权的key中,cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

3,scp supermap@192.168.12.132:/home/supermap/.ssh/id_dsa.pub ~/.ssh/id_dsa_132.pub

4,scp supermap@192.168.12.133:/home/supermap/.ssh/id_dsa.pub ~/.ssh/id_dsa_133.pub

5,cat ~/.ssh/id_dsa_132.pub >> ~/.ssh/authorized_keys

6,cat ~/.ssh/id_dsa_133.pub >> ~/.ssh/authorized_keys

7,scp ~/.ssh/authorized_keys supermap@192.168.12.132:/home/supermap/.ssh/authorized_keys

8,scp ~/.ssh/authorized_keys supermap@192.168.12.133:/home/supermap/.ssh/authorized_keys

测试ssh

  1. supermap@master:~$ ssh slave1
  2. Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)
  3. * Documentation: https://help.ubuntu.com/
  4. System information as of Fri Jan 8 15:58:58 CST 2016
  5. System load: 0.01 Processes: 102
  6. Usage of /: 7.2% of 27.20GB Users logged in: 1
  7. Memory usage: 2% IP address for eth0: 192.168.12.132
  8. Swap usage: 0%
  9. Graph this data and manage this system at:
  10. https://landscape.canonical.com/
  11. Last login: Fri Jan 8 15:50:22 2016 from 192.168.13.8
  12. supermap@slave1:~$

安装jdk1.7 

apt-get install openjdk-7-jdk
查看jdk默认 安装目录
  1. root@slave1:~# ll /usr/lib/jvm/java-7-openjdk-amd64
  2. total 28
  3. drwxr-xr-x 7 root root 4096 Jan 8 11:02 ./
  4. drwxr-xr-x 3 root root 4096 Jan 8 10:58 ../
  5. lrwxrwxrwx 1 root root 22 Nov 19 18:39 ASSEMBLY_EXCEPTION -> jre/ASSEMBLY_EXCEPTION
  6. lrwxrwxrwx 1 root root 22 Nov 19 18:39 THIRD_PARTY_README -> jre/THIRD_PARTY_README
  7. drwxr-xr-x 2 root root 4096 Jan 8 11:02 bin/
  8. lrwxrwxrwx 1 root root 41 Nov 19 18:39 docs -> ../../../share/doc/openjdk-7-jre-headless/
  9. drwxr-xr-x 3 root root 4096 Jan 8 11:02 include/
  10. drwxr-xr-x 5 root root 4096 Jan 8 10:58 jre/
  11. drwxr-xr-x 3 root root 4096 Jan 8 11:02 lib/
  12. drwxr-xr-x 4 root root 4096 Jan 8 10:58 man/
  13. lrwxrwxrwx 1 root root 20 Nov 19 18:39 src.zip -> ../openjdk-7/src.zip
  14. root@slave1:~#

安装hadoop

下载hadoop2.6.3,下载链接 http://hadoop.apache.org/releases.html

解压 tar -xvf hadoop-2.6.3.tar.gz ,并在主目录下创建tmpdfsdfs/namedfs/node

  1. supermap@slave1:~$ pwd
  2. /home/supermap
  3. supermap@slave1:~$ ll hadoop-2.6.3
  4. total 72
  5. drwxr-xr-x 12 supermap supermap 4096 Jan 8 11:33 ./
  6. drwxr-xr-x 5 supermap supermap 4096 Jan 8 15:53 ../
  7. -rw-r--r-- 1 supermap supermap 15429 Dec 18 09:52 LICENSE.txt
  8. -rw-r--r-- 1 supermap supermap 101 Dec 18 09:52 NOTICE.txt
  9. -rw-r--r-- 1 supermap supermap 1366 Dec 18 09:52 README.txt
  10. drwxr-xr-x 2 supermap supermap 4096 Dec 18 09:52 bin/
  11. drwxrwxr-x 4 supermap supermap 4096 Jan 8 11:34 dfs/
  12. drwxrwxr-x 2 supermap supermap 4096 Jan 8 11:33 dsf/
  13. drwxr-xr-x 3 supermap supermap 4096 Dec 18 09:52 etc/
  14. drwxr-xr-x 2 supermap supermap 4096 Dec 18 09:52 include/
  15. drwxr-xr-x 3 supermap supermap 4096 Dec 18 09:52 lib/
  16. drwxr-xr-x 2 supermap supermap 4096 Dec 18 09:52 libexec/
  17. drwxr-xr-x 2 supermap supermap 4096 Dec 18 09:52 sbin/
  18. drwxr-xr-x 4 supermap supermap 4096 Dec 18 09:52 share/
  19. drwxrwxr-x 2 supermap supermap 4096 Jan 8 11:33 tmp/

配置hadoop

配置hadoop 守护进程的运行环境

编辑etc/hadoop/hadoop-env.sh

修改其中JAVA_HOME=实际安装目录export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

配置hadoop 守护进程的运行参数 

  1. 配置 core-site.xml文件-->>增加hadoop核心配置(hdfs文件端口是9000、file:/home/supermp/hadoop-2.6.3/tmp、)
  2. <configuration>
  3. <property>
  4. <name>fs.defaultFS</name>
  5. <value>hdfs://master:9000</value>
  6. </property>
  7. <property>
  8. <name>io.file.buffer.size</name>
  9. <value>131072</value>
  10. </property>
  11. <property>
  12. <name>hadoop.tmp.dir</name>
  13. <value>file:/home/supermap/hadoop-2.6.3/tmp</value>
  14. <description>Abasefor other temporary directories.</description>
  15. </property>
  16. <property>
  17. <name>hadoop.proxyuser.spark.hosts</name>
  18. <value>*</value>
  19. </property>
  20. <property>
  21. <name>hadoop.proxyuser.spark.groups</name>
  22. <value>*</value>
  23. </property>
  24. </configuration>
  25. 配置 hdfs-site.xml 文件-->>增加hdfs配置信息(namenode、datanode端口和目录位置)
  26. <configuration>
  27. <property>
  28. <name>dfs.namenode.secondary.http-address</name>
  29. <value>master:9001</value>
  30. </property>
  31. <property>
  32. <name>dfs.namenode.name.dir</name>
  33. <value>file:/home/supermap/hadoop-2.6.3/dfs/name</value>
  34. </property>
  35. <property>
  36. <name>dfs.datanode.data.dir</name>
  37. <value>file:/home/supermap/hadoop-2.6.3/dfs/data</value>
  38. </property>
  39. <property>
  40. <name>dfs.replication</name>
  41. <value>3</value>
  42. </property>
  43. <property>
  44. <name>dfs.webhdfs.enabled</name>
  45. <value>true</value>
  46. </property>
  47. </configuration>
  48. 配置 mapred-site.xml 文件-->>增加mapreduce配置(使用yarn框架、jobhistory使用地址以及web地址)
  49. <configuration>
  50. <property>
  51. <name>mapreduce.framework.name</name>
  52. <value>yarn</value>
  53. </property>
  54. <property>
  55. <name>mapreduce.jobhistory.address</name>
  56. <value>master:10020</value>
  57. </property>
  58. <property>
  59. <name>mapreduce.jobhistory.webapp.address</name>
  60. <value>master:19888</value>
  61. </property>
  62. </configuration>
  63. 配置 yarn-site.xml 文件-->>增加yarn功能
  64. <configuration>
  65. <property>
  66. <name>yarn.nodemanager.aux-services</name>
  67. <value>mapreduce_shuffle</value>
  68. </property>
  69. <property>
  70. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  71. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  72. </property>
  73. <property>
  74. <name>yarn.resourcemanager.address</name>
  75. <value>master:8032</value>
  76. </property>
  77. <property>
  78. <name>yarn.resourcemanager.scheduler.address</name>
  79. <value>master:8030</value>
  80. </property>
  81. <property>
  82. <name>yarn.resourcemanager.resource-tracker.address</name>
  83. <value>master:8035</value>
  84. </property>
  85. <property>
  86. <name>yarn.resourcemanager.admin.address</name>
  87. <value>master:8033</value>
  88. </property>
  89. <property>
  90. <name>yarn.resourcemanager.webapp.address</name>
  91. <value>master:8088</value>
  92. </property>
  93. </configuration>

同步配置到两个salve

supermap@master:~/hadoop-2.6.3/etc$ scp -r hadoop supermap@slave1:/home/supermap/hadoop-2.6.3/etc


添加 slave1,slave2 到集群

编辑slaves 文件

  1. supermap@master:~/hadoop-2.6.3/etc/hadoop$ cat slaves
  2. master
  3. slave1
  4. slave2

启动hadoop 集群

格式化一个新的分布式文件系统:
$ bin/hadoop namenode -format
启动HDFS

$ sbin/start-dfs.sh

sbin/start-dfs.sh脚本会参照 slaves文件的内容,在所有列出的slave上启动DataNode守护进程

  1. supermap@master:~/hadoop-2.6.3$ ./sbin/start-dfs.sh
  2. Starting namenodes on [master]
  3. master: starting namenode, logging to /home/supermap/hadoop-2.6.3/logs/hadoop-supermap-namenode-master.out
  4. slave1: starting datanode, logging to /home/supermap/hadoop-2.6.3/logs/hadoop-supermap-datanode-slave1.out
  5. slave2: starting datanode, logging to /home/supermap/hadoop-2.6.3/logs/hadoop-supermap-datanode-slave2.out
  6. master: starting datanode, logging to /home/supermap/hadoop-2.6.3/logs/hadoop-supermap-datanode-master.out
  7. Starting secondary namenodes [master]
  8. master: starting secondarynamenode, logging to /home/supermap/hadoop-2.6.3/logs/hadoop-supermap-secondarynamenode-master.out

启动yarn 

  1. supermap@master:~/hadoop-2.6.3$ ./sbin/start-yarn.sh
  2. starting yarn daemons
  3. starting resourcemanager, logging to /home/supermap/hadoop-2.6.3/logs/yarn-supermap-resourcemanager-master.out
  4. master: starting nodemanager, logging to /home/supermap/hadoop-2.6.3/logs/yarn-supermap-nodemanager-master.out
  5. slave2: starting nodemanager, logging to /home/supermap/hadoop-2.6.3/logs/yarn-supermap-nodemanager-slave2.out
  6. slave1: starting nodemanager, logging to /home/supermap/hadoop-2.6.3/logs/yarn-supermap-nodemanager-slave1.out

检查集群状态

  1. supermap@master:~/hadoop-2.6.3$ ./bin/hdfs dfsadmin -report
  2. Configured Capacity: 58405412864 (54.39 GB)
  3. Present Capacity: 51191595008 (47.68 GB)
  4. DFS Remaining: 51191545856 (47.68 GB)
  5. DFS Used: 49152 (48 KB)
  6. DFS Used%: 0.00%
  7. Under replicated blocks: 0
  8. Blocks with corrupt replicas: 0
  9. Missing blocks: 0
  10. -------------------------------------------------
  11. Live datanodes (2):
  12. Name: 192.168.12.132:50010 (slave1)
  13. Hostname: slave1
  14. Decommission Status : Normal
  15. Configured Capacity: 29202706432 (27.20 GB)
  16. DFS Used: 24576 (24 KB)
  17. Non DFS Used: 3606908928 (3.36 GB)
  18. DFS Remaining: 25595772928 (23.84 GB)
  19. DFS Used%: 0.00%
  20. DFS Remaining%: 87.65%
  21. Configured Cache Capacity: 0 (0 B)
  22. Cache Used: 0 (0 B)
  23. Cache Remaining: 0 (0 B)
  24. Cache Used%: 100.00%
  25. Cache Remaining%: 0.00%
  26. Xceivers: 1
  27. Last contact: Fri Jan 08 16:51:01 CST 2016
  28. Name: 192.168.12.133:50010 (slave2)
  29. Hostname: slave2
  30. Decommission Status : Normal
  31. Configured Capacity: 29202706432 (27.20 GB)
  32. DFS Used: 24576 (24 KB)
  33. Non DFS Used: 3606908928 (3.36 GB)
  34. DFS Remaining: 25595772928 (23.84 GB)
  35. DFS Used%: 0.00%
  36. DFS Remaining%: 87.65%
  37. Configured Cache Capacity: 0 (0 B)
  38. Cache Used: 0 (0 B)
  39. Cache Remaining: 0 (0 B)
  40. Cache Used%: 100.00%
  41. Cache Remaining%: 0.00%
  42. Xceivers: 1
  43. Last contact: Fri Jan 08 16:51:01 CST 2016

查看集群

通过http://master:8088/cluster/nodes 查看集群

测试集群

使用hadoop自带的实例进行测试
首先在hdfs中创建一个文件夹
bin/hdfs dfs -mkdir /input
导入一个文件到hdfs中
bin/hdfs dfs -copyFromLocal LICENSE.txt /input
进入share/hadoop/mapreduce目录下执行
../../../bin/hadoop jar hadoop-mapreduce-examples-2.6.3.jar wordcount /input /output

  1. supermap@master:~/hadoop-2.6.3/share/hadoop/mapreduce$ ../../../bin/hadoop jar hadoop-mapreduce-examples-2.6.3.jar wordcount /input /output
  2. 16/01/08 16:56:52 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.12.127:8032
  3. 16/01/08 16:56:53 INFO input.FileInputFormat: Total input paths to process : 0
  4. 16/01/08 16:56:53 INFO mapreduce.JobSubmitter: number of splits:0
  5. 16/01/08 16:56:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1452243042219_0002
  6. 16/01/08 16:56:55 INFO impl.YarnClientImpl: Submitted application application_1452243042219_0002
  7. 16/01/08 16:56:55 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1452243042219_0002/
  8. 16/01/08 16:56:55 INFO mapreduce.Job: Running job: job_1452243042219_0002
  9. 16/01/08 16:57:03 INFO mapreduce.Job: Job job_1452243042219_0002 running in uber mode : false
  10. 16/01/08 16:57:03 INFO mapreduce.Job: map 0% reduce 0%
  11. 16/01/08 16:57:10 INFO mapreduce.Job: map 0% reduce 100%
  12. 16/01/08 16:57:11 INFO mapreduce.Job: Job job_1452243042219_0002 completed successfully
  13. 16/01/08 16:57:11 INFO mapreduce.Job: Counters: 38
  14. File System Counters
  15. FILE: Number of bytes read=0
  16. FILE: Number of bytes written=106221
  17. FILE: Number of read operations=0
  18. FILE: Number of large read operations=0
  19. FILE: Number of write operations=0
  20. HDFS: Number of bytes read=0
  21. HDFS: Number of bytes written=0
  22. HDFS: Number of read operations=3
  23. HDFS: Number of large read operations=0
  24. HDFS: Number of write operations=2
  25. Job Counters
  26. Launched reduce tasks=1
  27. Total time spent by all maps in occupied slots (ms)=0
  28. Total time spent by all reduces in occupied slots (ms)=3848
  29. Total time spent by all reduce tasks (ms)=3848
  30. Total vcore-milliseconds taken by all reduce tasks=3848
  31. Total megabyte-milliseconds taken by all reduce tasks=3940352
  32. Map-Reduce Framework
  33. Combine input records=0
  34. Combine output records=0
  35. Reduce input groups=0
  36. Reduce shuffle bytes=0
  37. Reduce input records=0
  38. Reduce output records=0
  39. Spilled Records=0
  40. Shuffled Maps =0
  41. Failed Shuffles=0
  42. Merged Map outputs=0
  43. GC time elapsed (ms)=24
  44. CPU time spent (ms)=400
  45. Physical memory (bytes) snapshot=167751680
  46. Virtual memory (bytes) snapshot=843304960
  47. Total committed heap usage (bytes)=110624768
  48. Shuffle Errors
  49. BAD_ID=0
  50. CONNECTION=0
  51. IO_ERROR=0
  52. WRONG_LENGTH=0
  53. WRONG_MAP=0
  54. WRONG_REDUCE=0
  55. File Output Format Counters
  56. Bytes Written=0


查看结果
 bin/hadoop fs -cat /output/*
执行结束后可以通过 http://master:8088 查看任务



声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Monodyee/article/detail/648669
推荐阅读
相关标签
  

闽ICP备14008679号