当前位置:   article > 正文

hive/impala:分区表,hbase外表_hbase 外表

hbase 外表

1, hive创建hbase外表

  • a, hive表有数据  —> 给hbase表用:  映射时,自动创建hbae表
  • b, hbase表有数据 —> 给hive表用
CREATE EXTERNAL TABLE  [dbname.]表名
( 
rowkey string, 
name string
) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
SERDEPROPERTIES ("hbase.columns.mapping"= 
":key, f:name"
) TBLPROPERTIES ("hbase.table.name" = "hbase表");
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

2, 创建分区表:插入数据( 动态分区,手动分区)

create table users(
id int,
user_id int,
user_name string
);


create table product(
id int,
product_id int,
product_name string,
price double
);

create table orders (
id int,
user_id int,
product_id int,
product_count int
) partitioned by (year string,month string,day string);

--------------指定分区:插入数据
insert into users values
(1,1,'a'),
(2,2,'b'),
(3,3,'c');


insert into product values
(1,1,'iphone6',9000),
(2,2,'<<old man and the sea>>',23.2),
(3,3,'lenovo-pc',5623.4);


insert into  orders partition(year='2019',month='03',day='02') values
(1,1,1,2),
(2,1,3,1),
(3,2,2,3);
insert into  orders partition(year='2019',month='03',day='03') values
(null,1,2,1),
(null,2,1,4);

------------动态分区:插入数据:
create table orders_t1 (
id int,
user_id int,
product_id int,
product_count int,
time string
) ;
insert into orders_t1 values
(1,1,1,2,'2011-12-02 12:03:32'),
(2,1,3,1,'2014-12-05 16:06:32'),
(3,2,2,3,'2016-10-02 18:03:32');

set hive.exec.dynamic.partition.mode=nonstrict;
create table orders2 like orders;
insert into  orders2 partition(year,month,day)
select id,user_id,product_id,product_count,year(time),month(time),day(time) from orders_t1;

--====================查询语句
--解决错误:Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=1)
SET hive.exec.mode.local.auto=true;
SET hive.auto.convert.join=false;
SET mapreduce.map.memory.mb = 26384; 
SET mapreduce.map.java.opts='-Djava.net.preferIPv4Stack=true -Xmx23107M';
SET mapreduce.reduce.memory.mb = 26384; 
SET mapreduce.reduce.java.opts='-Djava.net.preferIPv4Stack=true -Xmx23107M';
set hive.support.concurrency = false;


select user_name, product_name, product_count 
from orders
left join product on product.product_id=orders.product_id
left join users on users.user_id=orders.user_id
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75

3, impala 加速hive查询:避免mapreduce

  • 推荐hive/impala外表中的字段类型统一使用string: Impala can translate predicates (through operators such as =, <, and BETWEEN) against this column into fast lookups in HBase, but this optimization (“predicate pushdown”) only works when that column is defined as STRING. You should still define the column that corresponds to the HBase row key as a STRING, to allow fast lookups using those columns. If row keys are not mapped to string columns, then ordering is typically incorrect and comparison operations do not work. For example, if row keys are not mapped to string columns, evaluating for greater than (>) or less than (<) cannot be completed.

  • 外表的字段类型使用string可以使用hbase 的SingleColumnValueFilters过滤器达到查询效果的优化:Predicates on non-key columns can be sent to HBase to scan as SingleColumnValueFilters, providing some performance gains. Doing the filtering within HBase is more efficient than transmitting all the data to Impala and doing the filtering on the Impala side. for performance, it is important to declare the hbase-rowkey column as STRING. Other columns, such as BIRTH_YEAR and NEVER_LOGGED_ON, are also declared as STRING, rather than their “natural” types of INT or BOOLEAN, because Impala can optimize those types more effectively in HBase tables.

  • 如果是全表扫描hbase的话,可以设置两个参数来降低hbaes内存消耗:you can set these same caching behaviors through Impala query options, to control the memory pressure on the HBase RegionServer. For example, when doing queries in HBase that result in full-table scans (which by default are inefficient for HBase), you can reduce memory usage and speed up the queries by turning off the HBASE_CACHE_BLOCKS setting and specifying a large number for the HBASE_CACHING setting.
    set hbase_cache_blocks=false;– Same as calling setCacheBlocks(false)
    set hbase_caching=1000; --Same as calling setCaching(rows).

[root@localhost ~]$ impala-shell
Starting Impala Shell without Kerberos authentication
Connected to localhost:21000
Server version: impalad version 2.9.0-cdh5.12.0 RELEASE (build 03c6ddbdcec39238be4f5b14a300d5c4f576097e)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.9.0-cdh5.12.0 (03c6ddb) built on Thu Jun 29 04:17:31 PDT 2017)
The SET command shows the current value of all shell and query options.
***********************************************************************************

[localhost:21000] > select * from test1;
Query: select * from test1
Query submitted at: 2021-08-04 10:15:58 (Coordinator: http://localhost:25000)
ERROR: AnalysisException: Could not resolve table reference: 'test1'
[localhost:21000] > invalidate metadata test1;
Query: invalidate metadata test1
Query submitted at: 2021-08-04 10:16:48 (Coordinator: http://localhost:25000)
Query progress can be monitored at: http://localhost:25000/query_plan?query_id=5b4963fa04264a07:892ef9c900000000
Fetched 0 row(s) in 0.17s
[localhost:21000] > select * from test1;
Query: select * from test1
Query submitted at: 2021-08-04 10:16:51 (Coordinator: http://localhost:25000)
Query progress can be monitored at: http://localhost:25000/query_plan?query_id=c9434c48b3bd1970:1cbd023f00000000
+--------+-----+---------+
| rowkey | age | name    |
+--------+-----+---------+
| r8     | 18  | 888name |
+--------+-----+---------+
Fetched 1 row(s) in 3.58s



[localhost:21000] > show create table test_exter2;
Query: show create table test_exter2
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| result                                                                                                                                                                                                                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| CREATE EXTERNAL TABLE default.test_exter2 (                                                                                                                                                                           |
|   rowkey STRING,                                                                                                                                                                                                      |
|   age STRING,                                                                                                                                                                                                         |
|   name STRING                                                                                                                                                                                                         |
| )                                                                                                                                                                                                                     |
| STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'                                                                                                                                                          |
| WITH SERDEPROPERTIES ('hbase.columns.mapping'=':key,f:name,f:age', 'serialization.format'='1')                                                                                                                        |
| TBLPROPERTIES ('hbase.table.name'='test', 'totalSize'='0', 'numRows'='-1', 'rawDataSize'='-1', 'COLUMN_STATS_ACCURATE'='false', 'numFiles'='0', 'storage_handler'='org.apache.hadoop.hive.hbase.HBaseStorageHandler') |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


 [localhost:21000] > explain select * from test_exter2 where age > '2';
Query: explain select * from test_exter2 where age > '2'
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Per-Host Resource Reservation: Memory=0B                                           |
| Per-Host Resource Estimates: Memory=1.00GB                                         |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.test_exter2                                                                |
|                                                                                    |
| PLAN-ROOT SINK                                                                     |
| |                                                                                  |
| 01:EXCHANGE [UNPARTITIONED]                                                        |
| |                                                                                  |
| 00:SCAN HBASE [default.test_exter2]                                                |
|    hbase filters: f:age GREATER '2'                                                |
|    predicates: age > '2'                                                           |
+------------------------------------------------------------------------------------+
Fetched 12 row(s) in 0.03s


[localhost:21000] > explain select * from test_exter2 limit 1 ;
Query: explain select * from test_exter2 limit 1
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Per-Host Resource Reservation: Memory=0B                                           |
| Per-Host Resource Estimates: Memory=1.00GB                                         |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.test_exter2                                                                |
|                                                                                    |
| PLAN-ROOT SINK                                                                     |
| |                                                                                  |
| 00:SCAN HBASE [default.test_exter2]                                                |
|    limit: 1                                                                        |
+------------------------------------------------------------------------------------+
Fetched 9 row(s) in 0.02s

#仅支持insert (hbase数据), 不支持update, delete 
[localhost:21000] > insert into test_exter2 values('r3','5','c');
Query: insert into test_exter2 values('r3','5','c')
Query submitted at: 2021-08-04 10:06:25 (Coordinator: http://localhost:25000)
Query progress can be monitored at: http://localhost:25000/query_plan?query_id=15427ae54697419d:531d473300000000
Modified 1 row(s) in 0.14s
[localhost:21000] > select* from test_exter2;
Query: select * from test_exter2
Query submitted at: 2021-08-04 10:06:28 (Coordinator: http://localhost:25000)
Query progress can be monitored at: http://localhost:25000/query_plan?query_id=f8439548428c9192:5f77ffac00000000
+--------+-----+------+
| rowkey | age | name |
+--------+-----+------+
| r1     | 2   | a    |
| r2     | 4   | b    |
| r3     | 5   | c    |
+--------+-----+------+
Fetched 3 row(s) in 0.15s

[localhost:21000] > insert into test_exter2 select rowkey,age,name from test1;
Query: insert into test_exter2 select rowkey,age,name from test1
Query submitted at: 2021-08-04 10:18:42 (Coordinator: http://localhost:25000)
Query progress can be monitored at: http://localhost:25000/query_plan?query_id=4b420c30db13d20e:5852301100000000
Modified 1 row(s) in 0.13s
[localhost:21000] > select * from test_exter2;
Query: select * from test_exter2
Query submitted at: 2021-08-04 10:18:45 (Coordinator: http://localhost:25000)
Query progress can be monitored at: http://localhost:25000/query_plan?query_id=2743af4a17e35a02:304a786f00000000
+--------+-----+---------+
| rowkey | age | name    |
+--------+-----+---------+
| r1     | 2   | a       |
| r2     | 4   | b       |
| r3     | 5   | c       |
| r8     | 18  | 888name |
+--------+-----+---------+
Fetched 4 row(s) in 0.15s

[localhost:21000] > update test_exter2 set age='6' where rowkey='r3';
Query: update test_exter2 set age='6' where rowkey='r3'
Query submitted at: 2021-08-04 10:07:11 (Coordinator: http://localhost:25000)
ERROR: AnalysisException: Impala does not support modifying a non-Kudu table: default.test_exter2
[localhost:21000] > delete from test_exter2 where rowkey='r3';
Query: delete from test_exter2 where rowkey='r3'
Query submitted at: 2021-08-04 10:07:31 (Coordinator: http://localhost:25000)
ERROR: AnalysisException: Impala does not support modifying a non-Kudu table: default.test_exter2

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号