赞
踩
In this lesson, we will see how we can get started with Apache Hadoop by installing it on our Ubuntu machine. Installing and running Apache Hadoop can be tricky and that’s why we’ll try to keep this lesson as simple and informative as possible.
在本课程中,我们将了解如何通过在Ubuntu计算机上安装Apache Hadoop来入门。 安装和运行Apache Hadoop可能很棘手,这就是为什么我们将尽量简化本课并提供更多信息的原因。
In this installation guide, we will make use of Ubuntu 17.10 (GNU/Linux 4.13.0-37-generic x86_64) machine:
在此安装指南中,我们将使用Ubuntu 17.10(GNU / Linux 4.13.0-37-generic x86_64)计算机:
Also, if you just want quickly explore Hadoop, read CloudEra Hadoop VMWare Single Node Environment Setup.
另外,如果您只是想快速探索Hadoop,请阅读CloudEra Hadoop VMWare单节点环境设置 。
Before we can start installing Hadoop, we need to update Ubuntu with the latest software patches available:
在开始安装Hadoop之前,我们需要使用可用的最新软件补丁更新Ubuntu:
sudo apt-get update && sudo apt-get -y dist-upgrade
Next, we need to install Java on the machine as Java is the main Prerequisite to run Hadoop. Java 6 and above versions are supported for Hadoop. Let’s install Java 8 for this lesson:
接下来,我们需要在计算机上安装Java,因为Java是运行Hadoop的主要前提条件。 Hadoop支持Java 6及更高版本。 让我们为此课程安装Java 8:
sudo apt-get -y install openjdk-8-jdk-headless
To install Hadoop, make a directory and move inside it:
要安装Hadoop,请创建一个目录并在其中移动:
mkdir jd-hadoop && cd jd-hadoop
Now that we’re ready with the basic setup for Hadoop on our Ubuntu machine, let’s download Hadoop installation files so that we can work on its configuration as well:
现在我们已经准备好在我们的Ubuntu机器上进行Hadoop的基本设置,让我们下载Hadoop安装文件,以便我们也可以对其进行配置:
wget https://mirror.cc.columbia.edu/pub/software/apache/hadoop/common/hadoop-3.0.1/hadoop-3.0.1.tar.gz
We’re going to use the Hadoop 3.0.1 version for Hadoop. Find the latest version for Hadoop here. Once the file is downloaded, run the following command to unzip the file:
我们将针对Hadoop使用Hadoop 3.0.1版本。 在此处找到Hadoop的最新版本。 下载文件后,运行以下命令解压缩文件:
tar xvzf hadoop-3.0.1.tar.gz
This might take few moments as the archive is big in size. At this moment, Hadoop should be unarchived in your current directory:
由于归档文件很大,因此可能需要一些时间。 此时,Hadoop应该在当前目录中取消归档:
We will create a separate Hadoop user on our machine to keep HDFS separate from our original file system. We can first create a User group on our machine:
我们将在计算机上创建一个单独的Hadoop用户,以使HDFS与原始文件系统保持独立。 我们首先可以在我们的机器上创建一个用户组:
addgroup hadoop
You should see something like this:
您应该会看到以下内容:
Now we can add a new user to this group:
现在我们可以向该组添加新用户:
useradd -G hadoop jdhadoopuser
Note that I am running all commands as a root user. Now, we have a user called jdhadoopuser
in the hadoop
group.
请注意,我以root用户身份运行所有命令。 现在,在hadoop
组中有一个名为jdhadoopuser
的用户。
Finally, we’ll provide root access to jdhadoopuser
user. To do this, open the /etc/sudoers
file with this command:
最后,我们将为jdhadoopuser
用户提供root访问权限。 为此,请使用以下命令打开/etc/sudoers
文件:
sudo visudo
Now, enter this as the last line in the file:
现在,将其作为文件的最后一行输入:
jdhadoopuser ALL=(ALL) ALL
As of now, file should look like this:
到目前为止,文件应如下所示:
Hadoop on a Single Node means that Hadoop will run as a single Java process. This mode is usually used only in debugging environments and not for production use. With this mode, we can run simple Map R programs which process a smaller amount of data.
单个节点上的Hadoop意味着Hadoop将作为单个Java进程运行。 此模式通常仅在调试环境中使用,而不用于生产环境。 使用这种模式,我们可以运行简单的Map R程序,以处理少量数据。
Rename the hadoop archive as currently present to hadoop
only:
将hadoop存档重命名为当前存在,仅用于hadoop
:
mv /root/jd-hadoop/hadoop-3.0.1 /root/jd-hadoop/hadoop
Now, provide ownership of this directory to the jdhadoopuser
.
现在,将此目录的所有权提供给jdhadoopuser
。
chown -R jdhadoopuser:hadoop /root/jd-hadoop/hadoop
A better location for Hadoop will be the /usr/local/ directory, so let’s move it there:
对于Hadoop来说,更好的位置是/ usr / local /目录,所以我们将其移到那里:
- mv hadoop /usr/local/
- cd /usr/local/
Now, edit the .bashrc
file to add Hadoop and Java to path using this command:
现在,使用以下命令编辑.bashrc
文件以将Hadoop和Java添加到路径:
vi ~/.bashrc
Add these lines to the end of the .bashrc
file:
将这些行添加到.bashrc
文件的末尾:
- # Configure Hadoop and Java Home
- export HADOOP_HOME=/usr/local/hadoop
- export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
-
- export PATH=$PATH:$HADOOP_HOME/bin
Now, it is time to tell Hadoop as well where Java is present. We can do this by providing this path in hadoop-env.sh
file. In separate Hadoop installations, the location of this file can be different. To find where this file is, run the following command right outside the hadoop
directory:
现在,是时候告诉Hadoop Java在哪里了。 我们可以通过在hadoop-env.sh
文件中提供此路径来做到这一点。 在单独的Hadoop安装中,此文件的位置可以不同。 要查找此文件的位置,请在hadoop
目录外部运行以下命令:
find hadoop/ -name hadoop-env.sh
When I visit the directory I am shown, I can see the needed file present there:
当我访问显示的目录时,可以在此处看到所需的文件:
vi hadoop-env.sh
On the last line, enter the following and save it:
在最后一行,输入以下内容并保存:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
We can test Hadoop installation by executing a sample application now which comes pre-made with Hadoop, a word counter example JAR. Just execute the following command:
我们可以通过执行一个示例应用程序来测试Hadoop的安装,该示例应用程序是Hadoop(单词计数器示例JAR)预先制成的。 只需执行以下命令:
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.1.jar wordcount /usr/local/hadoop/README.txt /root/jd-hadoop/Output
Once you execute the following command, we see the file part-r-00000
as an output:
执行以下命令后,我们将文件part-r-00000
视为输出:
cat part-r-00000
Now that this example ran, this means that Hadoop has been successfully installed on your system!
现在已经运行了该示例,这意味着Hadoop已成功安装在您的系统上!
In this lesson, we saw how we can install Apache Hadoop on an Ubuntu server and start executing sample programs with it. Read more Big Data Posts to gain deeper knowledge of available Big Data tools and processing frameworks.
在本课程中,我们了解了如何在Ubuntu服务器上安装Apache Hadoop并开始使用它执行示例程序。 阅读更多有关大数据的文章,以更深入地了解可用的大数据工具和处理框架。
翻译自: https://www.journaldev.com/20358/install-hadoop-on-ubuntu
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。