赞
踩
private val spark:SparkSession = SparkSession.builder()
.master("local[*]")
.config("spark.sql.warehouse.dir", warehouseLocation)
.appName("spark-hive connection")
.enableHiveSupport()
.getOrCreate()
问题一般都出现在
enableHiveSupport()
这一段代码之中,我们继续阅读报错原因
由于我们的操作代码比较简单,因此针对这一系列关联的报错问题根源基本上都可以归类为包依赖以及IDE设置问题
以下本文会根据问题排查的层次一步步深入从包依赖到IDE设置进行问题的修复
ERROR: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveSessionState
如果是此类问题,我们需要查看是否正确导入spark-hive连接包
同样的,如果是这一个问题
ERROR: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
也属于包没有正确导入,这个没有导入的包是jdbc-connector连接包,该包是hive连接mysql上metastore的必备包
首先我们观察相关包是否被加入maven版本控制流程中
导入spark-hive_2.11包,坐标如下
<!--https://mvnrepository.com/artifact/org.apache.spark/spark-hive-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.1.0</version>
<exclusions>
<exclusion>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-1.2-api</artifactId>
</exclusion>
</exclusions>
</dependency>
如果是jdbc-connector导入问题,应该使用如下坐标
<!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.40</version>
</dependency>
有可能是我们项目中spark相关包和scala的版本不能够匹配
注意,spark相关包spark-core spark-sql和spark-hive 都是需要匹配当前项目的scala版本的,与此同时,所有的与scala相关,即artifactId修饰属性下面携带scala版本号的坐标都要做到与项目scala版本匹配
查看项目的scala版本可以通过
file->project structure->Global Libraries
流程进行,这就引发下面一个问题
我们在本机中可能保存有预先下载好的scala包,切记,在本项目中我们尽量使用maven repository自我管理的scala包
首先,我们在pom文件中导入scala相关坐标,然后通过maven进行相应jar包的下载和关联
<!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.scala-lang/scala-reflect -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>2.11.8</version>
</dependency>
接下来。进入 file->project structure->Global Libraries, 在选择外部scala包依赖的时候选择maven管理的scala包(即Location=Maven的包),这样能够避免IDEA在打包时出现个别包找不到的情况
此时,如果问题还没有得到解决,那么我们需要关注maven是否正确将pom文件指示jar包下载到指定位置,这需要我们选择
Preference -> Build,Execution,Deployment->Build Tools -> Maven
直接找到Maven Local repository 所在位置,然后根据坐标中包名找到对应jar所在位置
我们可能会发现,在该文件夹下不存在相应的jar包,说明maven下载存在问题,此时我们需要手动下载相应jar包,放置到该位置。
针对如上问题,我们可能需要重新下载spark-hive_2.11.jar包,相关下载网页在这里
如果仍旧存在问题,可以尝试删除 cache并重启IDE,操作方式如下
file -> invalid caches
此时重启能够强制maven重新加载所有外部包
一般初始化的报错不会涉及到配置文件的问题,如果担心设置文件出现问题,可以将设置好的如下3个文件拷贝到项目resource文件夹下:
core-site.xml
hdfs-site.xml
hive-site.xml
为了给问题提供更多帮助,下面提供三个文件以及spark-env.sh
(hadoop伪分布式,spark单机式部署)
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/Users/collinsliu/hadoop-2.7.1/temp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
```<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/Users/collinsliu/hadoop-2.7.1/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/Users/collinsliu/hadoop-2.7.1/tmp/dfs/data</value> </property> </configuration>
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- jdbc 连接的 URL --> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?useSSL=false</value> </property> <!-- jdbc 连接的 Driver--> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <!-- jdbc 连接的 username--> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <!-- jdbc 连接的 password --> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>password</value> </property> <!-- Hive 元数据存储版本的验证 --> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> <!--元数据存储授权--> <property> <name>hive.metastore.event.db.notification.api.auth</name> <value>false</value> </property> <property> <name>hive.exec.scratchdir</name> <value>hdfs://localhost:9000/hive/opt</value> </property> <!-- Hive 默认在 HDFS 的工作目录 --> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <!-- 指定 hiveserver2 连接的 host --> <property> <name>hive.server2.thrift.bind.host</name>4 <value>localhost</value> </property> <!-- 指定 hiveserver2 连接的端口号 --> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property> <!-- 指定本地模式执行任务,提高性能 --> <property> <name>hive.exec.mode.local.auto</name> <value>true</value> </property> </configuration>
对应位置按照自己的文件夹位置进行修改即可
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_311.jdk/Contents/Home
export SCALA_HOME=/Users/collinsliu/scala-2.11.8
export HIVE_CONF_DIR=/Users/collinsliu/hive-1.2.2/conf
export HADOOP_CONF_DIR=/Users/collinsliu/hadoop-2.7.1/etc/hadoop
export SPARK_DIST_CLASSPATH=$(/Users/collinsliu/hadoop-2.7.1/bin/hadoop classpath)
export CLASSPATH=$CLASSPATH:/Users/collinsliu/hive-1.2.2/lib
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/Users/collinsliu/hive-1.2.2/lib/mysql-connector-java-5.1.40.jar
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。