当前位置:   article > 正文

运行hive遇到的一些问题_the user name 'hive' is not found: id:

the user name 'hive' is not found: id:

1. Transform问题

不能这样用 select usrid, movieid, rating, transform(ts) using “python stamp2date.py” as date from rating_table; 只能这样用 select transform(usrid, movieid, rating, ts) using “python stamp2date.py” as usrid, movieid, rating, date from rating_table;

Stamp2date.py 里面是用 split(‘\t’),因为select之后的字段是用’\t’隔开的

Cat stamp2date.py

import sys

from datetime import datetime

 

for ss in sys.stdin:

        userid, movieid, rating, timest = ss.strip().split('\t')

        ymddate = datetime.fromtimestamp(int(timest)).date()

        ymdstr = ymddate.strftime("%Y-%m-%d")

        print ','.join([userid, movieid, rating, ymdstr])

这样子输出之后只能作为一个字段,如果作为as usrid, movieid, rating, date会输出1,1029,3.0,2012-10-01\N\N\N;所以as ss才会输出1,1029,3.0,2012-10-01

2. hive遇到FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask错误

三种方案:改变引擎,调整map reduce的内存,设置三台机子同步

hive>set hive.execution.engine=tez;

改变引擎

https://www.cnblogs.com/hankedang/p/4210598.html

https://jingyan.baidu.com/article/bad08e1e4e425b49c8512188.html

调整map reduce的内存(主要是这个原因)

https://blog.csdn.net/random0815/article/details/84944815

https://blog.csdn.net/qq_26442553/article/details/80143559?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task

https://www.cnblogs.com/ITtangtang/p/7683028.html

设置三台机子同步

https://blog.csdn.net/wangxizhen123/article/details/79884008?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task

https://blog.csdn.net/Amber_wuha/article/details/82823889?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task

退出安全模式

https://www.jianshu.com/p/de308d935d9b

通过tracking url查找错误(一定要用谷歌浏览器)

先启动https://blog.csdn.net/weixin_43481376/article/details/88662831

https://blog.csdn.net/lcm_linux/article/details/103835204

3. Select创建的表格不能是外部表

USE practice;

CREATE TABLE behavior_table

LOCATION '/hive-test/behavior' 这些话是在as之前

as

SELECT A.movieid, B.userid, A.title, B.rating

FROM

(SELECT movieid, title FROM movie_table) A

INNER JOIN

(SELECT userid, movieid, rating FROM rating_table) B

on A.movieid = B.movieid;

 

4. 创建分区表

https://www.jianshu.com/p/69efe36d068b

 

5. 动态分区异常处理

https://blog.csdn.net/helloxiaozhe/article/details/79710707

 

6. 运行hive时ls: 无法访问/usr/local/src/spark-2.0.2-bin-hadoop2.6/lib/spark-assembly-*.jar: 没有那个文件或目录

https://blog.csdn.net/weixin_42496757/article/details/87555292

 

7. [动态分区中]如何减少map文件数量,就算不是动态分区也适用

https://blog.csdn.net/mhtian2015/article/details/79898169?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task

 

8. Container killed on request. Exit code is 143

https://blog.csdn.net/yijichangkong/article/details/51332432

 

9. 创建分桶表不能用LIKE

10. Hive的日志

https://www.cnblogs.com/kouryoushine/p/7805657.html

https://www.cnblogs.com/hello-wei/p/10645740.html

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/不正经/article/detail/459715
推荐阅读
相关标签
  

闽ICP备14008679号