赞
踩
背景:hadoop100,hadoop101,hadoop102三台服务器上已经安装了hadoop集群
- tar -zxvf spark-3.2.4-bin-hadoop3.2-scala2.13.tgz -C /opt/spark
-
- mv spark-3.2.4-bin-hadoop3.2-scala2.13.tgz spark-3.2.4
添加下列内容到文件末尾
spark.master spark://hadoop100:7077
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 1g
spark.executor.memory 1g
添加下列内容到文件末尾
export JAVA_HOME=/opt/java/jdk8
export HADOOP_HOME=/opt/hadoop/hadoop-3.2.2
export HADOOP_CONF_DIR=/opt/hadoop/hadoop-3.2.2/etc/hadoop
export SPARK_DIST_CLASSPATH=$(/opt/hadoop/hadoop-3.2.2/bin/hadoop classpath)
export SPARK_MASTER_HOST=hadoop100
export SPARK_MASTER_PORT=7077
export SPARK_HOME=/opt/spark/spark-3.2.4
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
scp -rf /opt/spark hadoop@hadoop101:/opt
scp -rf /opt/spark hadoop@hadoop102:/opt
同时在hadoop101和102上添加环境变量
1.4.2 启动服务器
先启动hadoop集群
然后到spark的sbin目录下
./start-all.sh
通过jps查看是否启动成功
成功标志:hadoop100出现Master hadoop101,hadoop102出现Worker
通过浏览器查看:输入hadoop100:8080
然后到spark的bin目录下
spark-submit --master yarn --deploy-mode client --class org.apache.spark.examples.SparkPi /opt/spark/spark-3.2.4/examples/jars/spark-examples_2.13-3.2.4.jar
出现下列类似信息(该图来源网络)
下列图片为后续的开发环境设置导致无法在服务器上启动,只能监听
到此spark集群的搭建就完成了。
file>settings>plugins>marketplace:搜scala然后安装
官网地址:All Available Versions | The Scala Programming Language
下载和spark的scala版本一致或者小于的版本;这里我下载的是2.13.5的
然后解压下载的压缩包放到合适的位置
porn.xml配置
- <?xml version="1.0" encoding="UTF-8"?>
- <project xmlns="http://maven.apache.org/POM/4.0.0"
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
- <modelVersion>4.0.0</modelVersion>
-
- <groupId>test</groupId>
- <artifactId>SparkPi</artifactId>
- <version>1.0-SNAPSHOT</version>
-
- <properties>
- <spark.version>3.2.4</spark.version>
- <scala.version>2.13</scala.version>
- </properties>
- <repositories>
- <repository>
- <id>nexus-aliyun</id>
- <name>Nexus aliyun</name>
- <url>http://maven.aliyun.com/nexus/content/groups/public</url>
- </repository>
- </repositories>
-
- <dependencies>
- <dependency>
- <groupId>org.apache.spark</groupId>
- <artifactId>spark-core_${scala.version}</artifactId>
- <version>${spark.version}</version>
- </dependency>
- <dependency>
- <groupId>org.apache.spark</groupId>
- <artifactId>spark-streaming_${scala.version}</artifactId>
- <version>${spark.version}</version>
- </dependency>
- <dependency>
- <groupId>org.apache.spark</groupId>
- <artifactId>spark-sql_${scala.version}</artifactId>
- <version>${spark.version}</version>
- </dependency>
- <dependency>
- <groupId>org.apache.spark</groupId>
- <artifactId>spark-hive_${scala.version}</artifactId>
- <version>${spark.version}</version>
- </dependency>
- <dependency>
- <groupId>org.apache.spark</groupId>
- <artifactId>spark-mllib_${scala.version}</artifactId>
- <version>${spark.version}</version>
- </dependency>
-
- <dependency>
- <groupId>junit</groupId>
- <artifactId>junit</artifactId>
- <version>3.8.1</version>
- <scope>compile</scope>
- </dependency>
- </dependencies>
-
- <build>
- <plugins>
-
- <plugin>
- <groupId>org.scala-tools</groupId>
- <artifactId>maven-scala-plugin</artifactId>
- <version>2.13.1</version>
- <executions>
- <execution>
- <goals>
- <goal>compile</goal>
- <goal>testCompile</goal>
- </goals>
- </execution>
- </executions>
- </plugin>
-
- <plugin>
- <artifactId>maven-compiler-plugin</artifactId>
- <version>3.6.0</version>
- <configuration>
- <source>1.8</source>
- <target>1.8</target>
- </configuration>
- </plugin>
-
- <plugin>
- <groupId>org.apache.maven.plugins</groupId>
- <artifactId>maven-surefire-plugin</artifactId>
- <version>2.19</version>
- <configuration>
- <skip>true</skip>
- </configuration>
- </plugin>
-
- </plugins>
- </build>
-
- </project>
创建scala目录,标记为Sources Root
file>Project Structure>Libra > 添加scsla目录 ,就是之前解压的目录
到此基础环境搭建完成,接下来就是连接spark集群
将下列代码添加到spark-env.sh 末尾(集群上所有机器都要)
- export SPARK_SUBMIT_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005"
- # address:JVM在5005端口上监听请求,这个设定为一个不冲突的端口即可。
- # server:y表示启动的JVM是被调试者,n表示启动的JVM是调试器。
- # suspend:y表示启动的JVM会暂停等待,直到调试器连接上才继续执行,n则JVM不会暂停等待。
2.2.3 window配hadoop环境
下载地址:https://github.com/steveloughran/winutils
这里用了hadoop3.0.0
配置环境变量
配置ddl
将该文件复制到C:\Windows\System32 目录下
这里需要一定的scala基础
在scala目录下创建Test
配置jar
测试代码:如果windows没有配置映射ip,就直接输入ip
-
- import scala.math.{log, random}
- import org.apache.spark._
- import org.apache.spark.sql.Row.empty.schema
- import org.apache.spark.sql.SparkSession
- import org.apache.spark.sql.types.{StructField, StructType}
- import org.sparkproject.dmg.pmml.False
-
- object Test {
- def main(args: Array[String]): Unit ={
- val conf = new SparkConf().setAppName("Test").setMaster("spark://hadoop100:7077")
- .setJars(Seq("F:\\IntelliJ_IDEA\\Spark\\out\\artifacts\\SparkPi_jar\\SparkPi.jar"))
- .set("spark.driver.host","localhost")
- val sc: SparkContext = new SparkContext(conf)
- val data_0=sc.textFile("hdfs://hadoop100:9000/spark/user.txt")
- val dataGroupBy4=data_0.map(line=>{
- val word=line.split(" ")
- (word(4),1)
- }).reduceByKey(_+_)
- println("按点击来源聚合:")
- dataGroupBy4.collect().foreach(println)
- println("==================")
-
- val dataGroupBy34=data_0.map(line=>{
- val word=line.split(" ")
- ((word(3),word(4)),1)
- }).reduceByKey(_+_)
- println("按设备和点击来源聚合:")
- dataGroupBy34.collect().foreach(println)
- println("==================")
-
-
- sc.stop()
- }
- }
user.txt内容:
1 1001 2023-07-20 12:00:00 手机 首页
2 1002 2023-07-20 13:00:00 电脑 搜索
3 1003 2023-07-20 14:00:00 平板 商品详情
4 1004 2023-07-21 12:00:00 手机 首页
5 1005 2023-07-21 13:00:00 电脑 搜索
6 1006 2023-07-21 14:00:00 平板 商品详情
7 1007 2023-07-22 12:00:00 手机 首页
8 1008 2023-07-22 13:00:00 电脑 搜索
9 1009 2023-07-22 14:00:00 平板 商品详情
10 1010 2023-07-23 12:00:00 手机 首页
11 1011 2023-07-23 13:00:00 电脑 搜索
12 1012 2023-07-23 14:00:00 平板 商品详情
13 1013 2023-07-24 12:00:00 手机 首页
14 1014 2023-07-24 13:00:00 电脑 搜索
15 1015 2023-07-24 14:00:00 平板 商品详情
16 1016 2023-07-25 12:00:00 手机 首页
17 1017 2023-07-25 13:00:00 电脑 搜索
18 1018 2023-07-25 14:00:00 平板 商品详情
19 1019 2023-07-26 12:00:00 手机 首页
20 1020 2023-07-26 13:00:00 电脑 搜索
输出结果:
到处开发调试环境搭建完毕。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。