赞
踩
一、报如下错误
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"bike"},"value":{"_col0":3,"_col1":10.23}}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)
1.在hive shell 把自定义的python脚本加载到hive中
add file hdfs:///user/hive/lib/udf1.py;
2.using 'udf1.py' 检查这里是否正确
select transform(id,vtype,price) using 'udf1.py' as (vtype string,mean float,var float) from (select * from test cluster by vtype) as temp_table;
二、报下面的错误
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: [Error 20003]: An error occurred when trying to close the Operator running your custom script.
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
1.报上面的错误,基本都是脚本自身有问题,调试下脚本。
举个例子:
#!/usr/bin/python
# _*_ coding:utf-8 _*_
import sys
import logging
import numpy as np
import pandas as pd
sep='\t'
def read_input(input_data):
for line in input_data:
line = line.strip()
if line == "":
continue
yield line.split(sep)
def main():
data=read_input(sys.stdin)
for vtype,group in groupby(data,itemgetter(1)):
group=[(int(rowid),vtype,float(price)) for rowid,vtype,price in gr
oup] #如果不把price做类型转换,在 df['price'].mean()就会有报错的问题,就会抛出上面的异常。
df=pd.DataFrame(group,columns=('id','vtype','price'))
output=[vtype,df['price'].sum(),df['price'].mean()]
#print len(group)
print (sep.join(str(o) for o in output ))
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。