赞
踩
Take host “hadoop2” as VM setup example
以安装 hadoop2 虚拟机作为例子
Select ISO Image,选择宿主机上的镜像文件
Select OS,选择 Debian10 操作系统
Select install CentOS7,选择安装
Select start up disk,选择硬盘
Select GNOME GUI,选择安装桌面
Select timezone,选择时区
Enable network and set host name,开启网络,设置主机名
Create user hadoop,创建 hadoop 用户
Begin installation,开始安装
During installation,安装中
Finished installation and reboot,安装完毕点击重启
Accept license,接受声明
Complete CentOS installation,完成
Login GUI as user hadoop,hadoop 用户登录
Enable date & time update, 同步节点之间的时间
Use FinalShell software to SSH,使用 FinalShell 软件远程登录到三台虚拟机
Login as hadoop user,hadoop 用户登录
Edit network file in 3 machines,修改网络配置文件
# Edit ifcfg-{network interface}
sudo vim /etc/sysconfig/network-scripts/ifcfg-ens160
...
BOOTPROTO=static
...
# append,追加
IPADDR=192.168.57.134
GATEWAY=192.168.57.2
NETMASK=255.255.255.0
DNS1=192.168.57.2
DNS2=114.114.114.114
PREFIX=24
/etc/init.d/network restart
ifconfig
sudo vim /etc/hosts
# append,追加
192.168.57.134 hadoop1
192.168.57.135 hadoop2
192.168.57.136 hadoop3
2. Generate key pair under user “hadoop” on 3 hosts,三台机器上生成 hadoop 用户的密钥
su hadoop
ssh-keygen -t rsa
ssh-copy-id hadoop@hadoop1
ssh-copy-id hadoop@hadoop2
ssh-copy-id hadoop@hadoop3
# check added pub keys for each hosts,查看本台机器保存的其他机器的公钥
cat ~/.ssh/authorized_keys
4. Test SSH to other hosts without password,测试免密登录
# from hadoop1
ssh hadoop@hadoop2
# from hadoop3
ssh hadoop@hadoop1
# on hadoop1
su -
ssh-keygen -t rsa
ssh-copy-id root@hadoop1
ssh-copy-id root@hadoop2
ssh-copy-id root@hadoop3
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo mkdir /opt/modules
sudo mkdir /opt/software
sudo chown hadoop:hadoop /opt/modules
sudo chown hadoop:hadoop /opt/software
3. Upload both to /opt/software on 3 hosts via FinalShell GUI,用 FinalShell 上传软件包到 hadoop1 目录/opt/software
su hadoop
tar -zxvf /opt/software/hadoop-3.4.0-aarch64.tar.gz -C /opt/modules
tar -zxvf /opt/software/jdk-8u411-linux-aarch64.tar.gz -C /opt/modules
cd /opt/modules
mv jdk1.8.0_411 jdk1.8.0
ls -l
su -
# add my jdk to list
update-alternatives --install /usr/bin/java java /opt/modules/jdk1.8.0/bin/java 1
update-alternatives --install /usr/bin/javac javac /opt/modules/jdk1.8.0/bin/javac 1
# choose my jdk as default
update-alternatives --config java
update-alternatives --config javac
# check default java
ls -l /etc/alternatives/java
ls -l /etc/alternatives/javac
java -version
javac -version
6. Add JDK and Hadoop to $PATH,添加 jdk 和 hadoop 软件到全局环境变量
# root user
su -
vim /etc/profile
# append, 追加到最后
export JAVA_HOME=/opt/modules/jdk1.8.0
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/opt/modules/hadoop-3.4.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile
# test hadoop
hadoop version
su hadoop
cd /opt/modules/hadoop-3.4.0/etc/hadoop
vi core-site.xml
or
Use VS code,也可以使用其他编辑器
# core-site.xml
<configuration>
<!-- 设置hdfs内部端口 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:9000</value>
</property>
<!-- 设置数据/元数据存储位置 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data</value>
</property>
</configuration>
# hadoop-env.sh
export JAVA_HOME=/opt/modules/jdk1.8.0
# The language environment in which Hadoop runs. Use the English
# environment to ensure that logs are printed as expected.
export LANG=en_US.UTF-8
# Location of Hadoop. By default, Hadoop will attempt to determine
# this location based upon its execution path.
# export HADOOP_HOME=
export HADOOP_HOME=/opt/modules/hadoop-3.4.0
# hdfs-site.xml,HDFS配置
<configuration>
<!-- 设置namenode网页访问地址 -->
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop1:9870</value>
</property>
<!-- 设置secondarynamenode网页访问地址 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop2:9868</value>
</property>
</configuration>
# mapred-site.xml,MapReduce配置 <configuration> <!-- 设置mapreduce为yarn模式 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> </configuration>
# yarn-site.xml,Yarn配置 <configuration> <!-- 设置hadoop1为resourcemanager --> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop1</value> </property> <!-- 开启shuffle服务 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--NodeManager在启动时加载shuffleHandler类--> <property> <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <!-- 开启日志聚集功能 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 日志聚集服务器地址 --> <property> <name>yarn.log.server.url</name> <value>http://hadoop1:19888/jobhistory/logs</value> </property> <!-- 设置日志保留7天 --> <property> <name>yarn.log-aggregation。retain-seconds</name> <value>604800</value> </property> </configuration>
# workers,声明所有DataNode机器,无空格
hadoop1
hadoop2
hadoop3
# on hadoop1
scp -r /opt/modules/* hadoop@hadoop2:/opt/modules
scp -r /opt/modules/* hadoop@hadoop3:/opt/modules
# on hadoop1
scp /etc/profile root@hadoop2:/etc
# on hadoop2
source /etc/profile
# on hadoop1
scp /etc/profile root@hadoop3:/etc
# on hadoop3
source /etc/profile
# change default Java for hadoop2, hadoop3 (不是必须)
su -
# add my jdk to list
update-alternatives --install /usr/bin/java java /opt/modules/jdk1.8.0/bin/java 1
update-alternatives --install /usr/bin/javac javac /opt/modules/jdk1.8.0/bin/javac 1
# choose my jdk as default
update-alternatives --config java
update-alternatives --config javac
# on hadoop1
rsync -avz /opt/modules/hadoop-3.4.0/etc/hadoop/ hadoop@hadoop2:/opt/modules/hadoop-3.4.0/etc/hadoop/
rsync -avz /opt/modules/hadoop-3.4.0/etc/hadoop/ hadoop@hadoop3:/opt/modules/hadoop-3.4.0/etc/hadoop/
# on hadoop1
hdfs namenode -format
You should see namenode meta data dir is created, 检查 namenode 数据文件夹
# on hadoop1
cd /home/hadoop/data/dfs/name/current
cat VERSION
# on hadoop1
start-all.sh
hadoop1,使用 jps 查看进程
hadoop2,使用 jps 查看进程
hadoop3,使用 jps 查看进程
# on any host
mapred --daemon start historyserver
4. Check Web UI,虚拟机上查看服务对应网页
HDFS: http://hadoop1:9870
YARN: http://hadoop1:8088
Note 3 active node, 注意应该有 3 个活跃节点,如果只有一个,检查防火墙是否关闭
MapReduce History Server: http://hadoop1:19888
sudo vi /etc/hosts
192.168.57.134 hadoop1
192.168.57.135 hadoop2
192.168.57.136 hadoop3
hdfs dfs -mkdir /input
vim ~/words.txt
hello hadoop
hello world
hello hadoop
mapreduce
hdfs dfs -put ~/words.txt /input
hdfs dfs -ls /input
3. Run example program, 运行示例程序
# /output doesn't exist, /output路径不能存在
hadoop jar /opt/modules/hadoop-3.4.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar wordcount /input /output
hdfs dfs -ls /output
hdfs dfs -cat /output/part-r-00000
5. Yarn web UI and historyserver
# on hadoop1
mapred --daemon stop historyserver && stop-all.sh
poweroff
3. Take snapshot for each machine,截取虚拟机快照
使用hadoop streaming创建python mapreduce程序请参考:https://blog.csdn.net/Jacob12138/article/details/138908010
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。