赞
踩
下面有我报的错误和代码参考。(里面有整个项目代码)
1.路径名:
ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, qianfeng01, executor 1): java.lang.NoClassDefFoundError: Could not initialize class com.hankcs.hanlp.dictionary.CoreDictionary
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, qianfeng01, executor 1): java.lang.NoClassDefFoundError: Could not initialize class com.hankcs.hanlp.dictionary.CoreDictionary
看上面的找不到文件标记吗?
主要是适配器类找不到,这就是路径的问题!
下面是hanlp.perporties文件
root=/common/nlp/ #核心词典路径 #CoreDictionaryPath=data/dictionary/CoreNatureDictionary.txt #2元语法词典路径 #BiGramDictionaryPath=data/dictionary/CoreNatureDictionary.ngram.txt #自定义词典路径,用;隔开多个自定义词典,空格开头表示在同一个目录,使用“文件名 词性”形式则表示这个词典的词性默认是该词性。优先级递减。 #所有词典统一使用UTF-8编码,每一行代表一个单词,格式遵从[单词] [词性A] [A的频次] [词性B] [B的频次] ... 如果不填词性则表示采用词典的默认词性。 CustomDictionaryPath=data/dictionary/custom/CustomDictionary.txt; 现代汉语补充词库.txt; 全国地名大全.txt ns; 人名词典.txt; 机构名词典.txt; 上海地名.txt ns;data/dictionary/person/nrf.txt nrf; #CustomDictionaryPath=data/dictionary/custom/user-profile-dict.txt; #停用词词典路径 CoreStopWordDictionaryPath=data/dictionary/stopwords.txtIOAdapter=com.qf.bigdata.profile.nlp.hanlp.HadoopFileIoAdapter
我一开始是把适配器的路径名写错了,适配器的路径名要自己拷贝正确的
IOAdapter=com.qf.bigdata.profile.nlp.hanlp.HadoopFileIoAdapter
2、语料库的bin文件没删掉
其实在运行程序之前,要查看语料库中的dictionary中的bin文件有没有删掉。
我是因为顺利运行了所以就重新加载了。第一次运行记得把bin文件删掉!
3、WARN HanLP: 工厂类没有默认构造方法,不符合要求
自定义适配器中没有无参构造函数无效!
对于这个异常,我已经见了好几次了,原因是我把适配器定义成了Object,而不是class。
对于spark编程,真的要养成写半生类的习惯,不然程序无法运行。
另一个异常:
org.apache.hadoop.yarn.exceptions.InvalidApplicationMasterRequestException: Application doesn t exist in cache appattempt_1647388674777_0006_000002
有关yarn异常,我是直接重启服务器。发现自己程序中sparkconf之前少了一个.setMaster("local[*]")
下面就把HANLP自然语言处理流程代码发一下吧。
依赖
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.qf.bigdata</groupId> <artifactId>user-profile</artifactId> <version>1.0-SNAPSHOT</version> <properties> <scala.version>2.11.12</scala.version> <play-json.version>2.3.9</play-json.version> <maven-scala-plugin.version>2.10.1</maven-scala-plugin.version> <scala-maven-plugin.version>3.2.0</scala-maven-plugin.version> <maven-assembly-plugin.version>2.6</maven-assembly-plugin.version> <spark.version>2.4.5</spark.version> <scope.type>compile</scope.type> <json.version>1.2.3</json.version> <!--compile provided--> </properties> <dependencies> <!--json 包--> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>${json.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>${spark.version}</version> <scope>${scope.type}</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>${spark.version}</version> <scope>${scope.type}</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.11</artifactId> <version>${spark.version}</version> <scope>${scope.type}</scope> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.28</version> </dependency> <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version>1.2.17</version> <scope>${scope.type}</scope> </dependency> <dependency> <groupId>commons-codec</groupId> <artifactId>commons-codec</artifactId> <version>1.6</version> </dependency> <dependency> <groupId>org.scala-lang</groupId> <artifactId
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。