当前位置:   article > 正文

Hive的表查询操作_hive查询的数据新表

hive查询的数据新表

首先开启Hadoop集群

  1. cd /opt/hadoop-3.1.4/sbin
  2. ./start-dfs.sh
  3. ./start-yarn.sh
  4. ./mr-jobhistory-daemon.sh start historyserver
  5. jps

然后打开hive

  1. cd /usr/local/hive/conf
  2. source /etc/profile
  3. schematool -dbType mysql -initSchema
  4. hive --service metastore &
  5. hive

 下面依旧进行先数据库和表的建立,我先将之前的数据库和表都删了,重新进行建立,这里的删除不只是使用drop,还要在master:50070中删除。

  1. create database taitan;
  2. use taitan;
  3. create table tidanic(
  4. passengerid int,
  5. survived int,
  6. pclass int,
  7. name string,
  8. sex string,
  9. age int,
  10. sibsp int,
  11. parch int,
  12. ticket string,
  13. fare double,
  14. cabin string,
  15. embarked string)
  16. row format delimited fields terminated by ',';

 必须要加上use这一句,否则数据库和表不是前后相关联的,而是并列的:

 然后从本地系统的centos导入train.csv数据,加上local:

load data local inpath '/opt/train.csv' overwrite into table tidanic;

然后就是删除刚才上传的文件:

dfs -rm -r /user/hive/warehouse/taitan.db/tidanic/train.csv;

 从hdfs系统导入数据(不要加local),首先先把数据传到hdfs系统下,再导入数据:

  1. dfs -put /opt/train.csv /user/root;
  2. load data inpath '/user/root/train.csv' overwrite into table tidanic;

 

下面进行多表查询导入数据:

  1. create table tidanic_save like tidanic;
  2. create table tidanic_died like tidanic;
  3. from tidanic
  4. insert overwrite table tidanic_save select * where survived=1
  5. insert overwrite table tidanic_died select * where survived=0;

将数据导出到centos本地上:

insert overwrite local directory '/opt/tidanic_save' row format delimited fields terminated by ',' select * from tidanic_save;

 下面是一些查询任务的例子,分别统计性别与生存率的关系、客舱等级与生存率的关系、登船港口与生存率的关系:

  1. select sex,s_count/all_count as s_percent from(select sex,count(*) as s_count from tidanic where survived=1) a join (select count(*) as all_count from tidanic where survived=1) b on 1=1;
  2. select pclass,s_count/all_count as s_percent from(select pclass,count(*) as s_count from tidanic where survived=1 group by pclass) a join (select count(*) as all_count from tidanic where survived=1) b on 1=1;
  3. select embarked,s_count/all_count as s_percent from(select embarked,count(*) as s_count from tidanic where survived=1 group by embarked) a join (select count(*) as all_count from tidanic where survived=1) b on 1=1;

如果报错的话,如图加入两行代码,再重新运行:

  1. set hive.strict.checks.cartesian.product=false;
  2. set hive.mapred.mode=nonstrict;

 

 

声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop】
推荐阅读
相关标签
  

闽ICP备14008679号