当前位置:   article > 正文

【Hive】hive 微博案例_hive微博用户总量统计语句

hive微博用户总量统计语句

数据准备及描述

数据描述

用户的历史数据,戴止到20131215,压缩后221MB,解压后878MB,整个数据1206个小文件,所有数据格式均是json格式
数据下载链接

数据样例

[{
  "beCommentWeiboId":"","beForwardWeiboId":"","catchTime":"1387165034","commentCount":"6","content":"Raresmileyportrait(1977)","createTime":"1387130972","info1":"","info2":"","info3":"","mlevel":"","musicurl":[],"pic_list":["http://ww2.sinaimg.cn/thumbnail/69d3e27djw1ebkxp7rtczj20mo0mogmy.jpg"],"praiseCount":"5","reportCount":"70","source":"","userId":"1775493757","videourl":[],"weiboId":"3655954636173507","weiboUrl":"http://weibo.com/1775493757/AntDppU0H"}]
[{
  "beCommentWeiboId":"","beForwardWeiboId":"3655954636173507","catchTime":"1387165034","commentCount":"29","content":"玲笑容!","createTime":"1387139090","info1":"","info2":"","info3":"","mlevel":"","musicurl":[],"pic_list":[],"praiseCount":"72","reportCount":"61","source":"新浪微博","userId":"1719481457","videourl":[],"weiboId":"3655988685551869","weiboUrl":"http://weibo.com/1719481457/Anuwkniih"}]
[{
  "beCommentWeiboId":"","beForwardWeiboId":"","catchTime":"1387165034","commentCount":"4","content":"lifeisbeautifulandallisaboutconfident&trust&friends&LOVE,thanksto@黄伟文,youmakemefeellikehongkongismagic&happiness.","createTime":"1387053188","info1":"","info2":"","info3":"","mlevel":"","musicurl":[],"pic_list":[],"praiseCount":"8","reportCount":"8","source":"","userId":"1733190683","videourl":[],"weiboId":"3655628385727081","weiboUrl":"http://weibo.com/1733190683/Anl9co1Sh"}]

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

字段描述

共19个字段:

beCommentWeiboId  是否评论
beForwardWeiboId 是否是转发微博
catchTime 抓取时间
commentCount 评论次数
content	内容
createTime 创建时间
info1 信息字段1
info2信息字段2
info3信息字段3
mlevel   no sure
musicurl	音乐链接
pic_list	照片列表(可以有多个)
praiseCount	点赞人数
reportCount	转发人数
source	数据来源
userId	用户id
videourl	视频链接	
weiboId	微博id
weiboUrl	微博网址

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

数据存储

hdfs://hdp01:9000/data/weibo
建表的时候,建外表

[hdp01@hdp01 weibo]$ hdfs dfs -ls /data/weibo
Found 2 items
-rw-r--r--   2 hdp01 supergroup    1004992 2020-01-11 16:17 /data/weibo/1387159770_1087770692_20100101000000_VCSvoMgPvrSTKhCkkIA7uMV9Hn10877706927159770ouss.json
-rw-r--r--   2 hdp01 supergroup     680641 2020-01-11 16:17 /data/weibo/1387159770_1180721740_20100101000000_tBx94gQvEoOWTiB4n3gORSmS11807217407159771ouss.json

  • 1
  • 2
  • 3
  • 4
  • 5

准备开始

hive> set hive.exec.model.local.auto=true;
--hive> set hive.cli.print.header=true;
hive> create database weibo;
hive> use weibo;
  • 1
  • 2
  • 3
  • 4

功能需求

1. 数据处理:针对数据问题,请给出对应的解决方案(15分)

数据文件过多:要合并,请给出解决方案
mapreduce

2. 组织数据(10分)

(创建Hive表weibo_json(json string),表只有一个字段,导入所有数据,并验证查询前5条数据)
(解析完weibo_json当中的json格式数据到拥有19个字段的weibo表中,写出必要的SQL语句)

创建weibo_json表

hive> create external table if not exists weibo_json(
    > json string)
    > location "/data/weibo";
   
-- 因为我创建的外部表,location指向了/data/weibo,所以表创建完成直接就可以读数据了
hive> select * from weibo_json limit 2;
OK
[{
  "beCommentWeiboId":"","beForwardWeiboId":"","catchTime":"1387159495","commentCount":"1419","content":"分享图片","createTime":"1386981067","info1":"","info2":"","info3":"","mlevel":"","musicurl":[],"pic_list":["http://ww3.sinaimg.cn/thumbnail/40d61044jw1ebixhnsiknj20qo0qognx.jpg"],"praiseCount":"5265","reportCount":"1285","source":"iPad客户端","userId":
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/AllinToyou/article/detail/175355
推荐阅读
相关标签
  

闽ICP备14008679号