当前位置:   article > 正文

五分钟搭建本地大数据集群

五分钟搭建本地大数据集群

引言

刚接触大数据以及部分接触大数据多年的伙伴可能从来没有自己搭建过一套属于自己的大数据集群,今天就花点时间聊聊怎么快速搭建一套属于自己、且可用于操作、调试的大数据集群

正文

本次搭建的组件都有以下服务以及对应的版本

  • hadoop(3.2.4)
  • zookeeper(3.9.1)
  • kafka(2.13-3.6.1)

组件下载地址

上述组件都是apache旗下的,通过此地址找到对应的版本下载使用即可 https://archive.apache.org/dist/hadoop/common/,但如果下载速度慢的话可以考虑通过这个地址进行加速下载 https://mirrors.tuna.tsinghua.edu.cn/apache/,后面这个地址仅用于学习,请勿用于商用

hadoop

hadoop是大数据最基本的底座,在将安装包解压后修改下 ./etc/hadoop 目录下重要的四个配置内容

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
    <!-- 指定 namenode 的通信地址 -->
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <!-- 指定hadoop运行时产生文件的存储路径 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/Users/lin/dev/bigdata/hadoop-3.2.4/temp</value>
    </property>
</configuration>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
        <property>
                <name>dfs.permissions.enabled</name>
                <value>false</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>        
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/Users/lin/dev/bigdata/hadoop-3.2.4/data/namenode</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/Users/lin/dev/bigdata/hadoop-3.2.4/data/datanode</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>localhost:9001</value>
        </property>
        <property>
                <name>dfs.webhdfs.enabled</name>
                <value>true</value>
        </property>
        <property>
                <name>dfs.http.address</name>
                <value>0.0.0.0:50070</value>
        </property>
</configuration>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47

yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
                <final>true</final>
                <description>The runtime framework for executing MapReduce jobs</description>
        </property>
        <property>
                <name>yarn.app.mapreduce.am.env</name>
                <value>HADOOP_MAPRED_HOME=/Users/lin/dev/bigdata/hadoop-3.2.4</value>
        </property>
        <property>
                <name>mapreduce.map.env</name>
                <value>HADOOP_MAPRED_HOME=/Users/lin/dev/bigdata/hadoop-3.2.4</value>
        </property>
        <property>
                <name>mapreduce.reduce.env</name>
                <value>HADOOP_MAPRED_HOME=/Users/lin/dev/bigdata/hadoop-3.2.4</value>
        </property>
</configuration>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37

改完上面四个配置后,通过./sbin/start-all.sh指令启动集群,通过访问地址 http://localhost:50070 可看到hdfs服务已经正常启动
在这里插入图片描述

接下来简单验证下服务是否正常工作

展示文件目录
在这里插入图片描述

创建一个自定义目录data
在这里插入图片描述

上传一个本地文件到hadoop集群
在这里插入图片描述

通过上述演示已完整的部署本地的Hadoop服务

zookeeper

解压后通过指令bin/zkServer.sh start启动服务即可,通过指令查询可看到已经启动服务
在这里插入图片描述

接下来简单进行验证下,首先通过指令bin/zkCli.sh进入客户端
在这里插入图片描述

kafka

解压kafka安装包后,通过指令nohup bin/kafka-server-start.sh config/server.properties 2>&1 &进行服务的后台启动。通过linux指令可以看到kafka服务已经正常启动
在这里插入图片描述

接下来进行简单验证下

  1. 创建Topic
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic TestKafkaTopic1
  • 1

在这里插入图片描述

  1. 消费Topic
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TestKafkaTopic1 --from-beginning
  • 1
  1. 写Topic
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic  TestKafkaTopic1
  • 1

在这里插入图片描述

  1. 查看消费情况
    在这里插入图片描述

通过上述几步操作能看到我们的kafka服务也正常工作了

小结

以上就是搭建一个简单的本地调试环境的流程,最好是都能手动操作一次,对这几个基础服务都有一定的了解

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/木道寻08/article/detail/776646
推荐阅读
相关标签
  

闽ICP备14008679号