赞
踩
hbase(main):009:0> create 't_hbase1','info'
0 row(s) in 1.5930 seconds
=> Hbase::Table - t_hbase1
hbase(main):010:0> desc 't_hbase1'
Table t_hbase1 is ENABLED
t_hbase1
COLUMN FAMILIES DESCRIPTION
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERS
IONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.1380 seconds
CREATE EXTERNAL TABLE t_hbase1(
key string,
id string,
salary string,
start_date string,
end_date string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:id,info:salary,info:start_date,info:end_date")
TBLPROPERTIES("hbase.table.name" = "t_hbase1");
测试发现:hive外部表关联hbase表并不受hbase表是否通过phoenix还是hbase api插入数据的影响
3.往t_hbase1中添加数据
hbase(main):010:0> put 't_hbase1','1','info:id','1'
hbase(main):010:0> put 't_hbase1','1','info:salary','1'
hbase(main):011:0> put 't_hbase1','1','info:start_date','2017-09-18'
hbase(main):012:0> put 't_hbase1','1','info:end_date','2017-09-18'
hbase(main):016:0> scan 't_hbase1'
ROW COLUMN+CELL
1 column=info:end_date, timestamp=1530671481354, value=2017-09-18
1 column=info:id, timestamp=1530671453795, value=1
1 column=info:salary, timestamp=1530671469153, value=1
1 column=info:start_date, timestamp=1530671475444, value=2017-09-18
[root@hdp3 ~]# /usr/hdp/2.5.3.0-37/phoenix/bin/sqlline.py hdp1,hdp2,hdp3:2181
create table "t_hbase1"(
"ROW" varchar primary key,
"info"."start_date" varchar ,
"info"."end_date" varchar ,
"info"."id" varchar ,
"info"."salary" varchar);
通过phoenix查询数据
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> select * from "t_hbase1";
+------+-------------+-------------+-----+---------+
| ROW | start_date | end_date | id | salary |
+------+-------------+-------------+-----+---------+
| 1 | 2017-09-18 | 2017-09-18 | 1 | 1 |
+------+-------------+-------------+-----+---------+
查询数据量
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> select count(*) from "t_hbase1";
+-----------+
| COUNT(1) |
+-----------+
| 1 |
+-----------+
hbase(main):010:0> put 't_hbase1','2','info:id','2'
hbase(main):010:0> put 't_hbase1','2','info:salary','2'
hbase(main):011:0> put 't_hbase1','2','info:start_date','2017-09-18'
hbase(main):012:0> put 't_hbase1','2','info:end_date','2017-09-18'
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> select count(*) from "t_hbase1";
+-----------+
| COUNT(1) |
+-----------+
| 2 |
+-----------+
1 row selected (0.029 seconds)
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> select * from "t_hbase1";
+------+-------------+-------------+-----+---------+
| ROW | start_date | end_date | id | salary |
+------+-------------+-------------+-----+---------+
| 1 | 2017-09-18 | 2017-09-18 | 1 | 1 |
| 2 | 2017-09-18 | 2017-09-18 | 2 | 2 |
+------+-------------+-------------+-----+---------+
2 rows selected (0.034 seconds)
UPSERT INTO "t_hbase1" VALUES('3','2017-09-18','2017-09-18','3','3');
phoenix shell:
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> select * from "t_hbase1";
+------+-------------+-------------+-----+---------+
| ROW | start_date | end_date | id | salary |
+------+-------------+-------------+-----+---------+
| 1 | 2017-09-18 | 2017-09-18 | 1 | 1 |
| 2 | 2017-09-18 | 2017-09-18 | 2 | 2 |
| 3 | 2017-09-18 | 2017-09-18 | 3 | 3 |
+------+-------------+-------------+-----+---------+
3 rows selected (0.057 seconds)
hbase shell:
hbase(main):022:0> scan 't_hbase1'
ROW COLUMN+CELL
1 column=info:_0, timestamp=1530671481354, value=
1 column=info:end_date, timestamp=1530671481354, value=2017-09-18
1 column=info:id, timestamp=1530671453795, value=1
1 column=info:salary, timestamp=1530671469153, value=1
1 column=info:start_date, timestamp=1530671475444, value=2017-09-18
2 column=info:end_date, timestamp=1530672268599, value=2017-09-18
2 column=info:id, timestamp=1530672247623, value=2
2 column=info:salary, timestamp=1530672253810, value=2
2 column=info:start_date, timestamp=1530672262302, value=2017-09-18
3 column=info:_0, timestamp=1530672889061, value=x
3 column=info:end_date, timestamp=1530672889061, value=2017-09-18
3 column=info:id, timestamp=1530672889061, value=3
3 column=info:salary, timestamp=1530672889061, value=3
3 column=info:start_date, timestamp=1530672889061, value=2017-09-18
hive shell:
hive> select * from t_hbase1;
OK
1 1 1 2017-09-18 2017-09-18
2 2 2 2017-09-18 2017-09-18
3 3 3 2017-09-18 2017-09-18
Time taken: 0.902 seconds, Fetched: 3 row(s)
通过phoenix删除指定数据
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> delete from "t_hbase1" where "id"='3';
1 row affected (0.052 seconds)
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> select * from "t_hbase1";
+------+-------------+-------------+-----+---------+
| ROW | start_date | end_date | id | salary |
+------+-------------+-------------+-----+---------+
| 1 | 2017-09-18 | 2017-09-18 | 1 | 1 |
| 2 | 2017-09-18 | 2017-09-18 | 2 | 2 |
+------+-------------+-------------+-----+---------+
2 rows selected (0.057 seconds)
CREATE INDEX THBASE1_INDEX_ID ON "t_hbase1"("info"."id");
创建成功后,phoenix中会添加一张索引表(THBASE1_INDEX_ID ),
当然同时hbase中也会添加对应表(THBASE1_INDEX_ID )
该索引表只存储Row与ID,即存储了hbase原rowkey与二级索引对应的字段,如下所示:
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> select * from THBASE1_INDEX_ID ;
+----------+-------+
| info:id | :ROW |
+----------+-------+
| 1 | 1 |
| 2 | 2 |
+----------+-------+
因此笔者觉得如果是想通过二级索引快速查询到对应的数据,可先通过二级索引查询到其对应的rowkey,
再通过该rowkey查询实际数据,实测速度可提升几十陪至百陪,sql如下:
select * from "t_hbase1" t1
INNER JOIN
(select "ROW" FROM "t_hbase1" WHERE "id"='200002') t2
on t1."ROW" = t2."ROW";
或者
select * from "t_hbase1" where "ROW" in
(select "ROW" FROM "t_hbase1" WHERE "id"='200002');
测试发现第二种方式会快一点。
通过hbase shell添加数据到t_hbase1
hbase(main):010:0> put 't_hbase1','3','info:id','3'
hbase(main):010:0> put 't_hbase1','3','info:salary','3'
hbase(main):011:0> put 't_hbase1','3','info:start_date','2017-09-18'
hbase(main):012:0> put 't_hbase1','3','info:end_date','2017-09-18'
插入后查询结果如下所示:
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> select * from "t_hbase1";
+------+-------------+-------------+-----+---------+
| ROW | start_date | end_date | id | salary |
+------+-------------+-------------+-----+---------+
| 1 | 2017-09-18 | 2017-09-18 | 1 | 1 |
| 2 | 2017-09-18 | 2017-09-18 | 2 | 2 |
| 3 | 2017-09-18 | 2017-09-18 | 3 | 3 |
+------+-------------+-------------+-----+---------+
3 rows selected (0.058 seconds)
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> select count(*) from "t_hbase1";
+-----------+
| COUNT(1) |
+-----------+
| 2 |
+-----------+
1 row selected (0.083 seconds)
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> select * from "t_hbase1" where "id"='3';
+------+-------------+-------------+-----+---------+
| ROW | start_date | end_date | id | salary |
+------+-------------+-------------+-----+---------+
| 3 | 2017-09-18 | 2017-09-18 | 3 | 3 |
+------+-------------+-------------+-----+---------+
1 row selected (0.059 seconds)
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> select * from THBASE1_INDEX_ID;
+----------+-------+
| info:id | :ROW |
+----------+-------+
| 1 | 1 |
| 2 | 2 |
+----------+-------+
2 rows selected (0.034 seconds)
UPSERT INTO "t_hbase1" VALUES('4','2017-09-18','2017-09-18','4','4');
插入后查询结果如下所示:
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> select * from THBASE1_INDEX_ID;
+----------+-------+
| info:id | :ROW |
+----------+-------+
| 1 | 1 |
| 2 | 2 |
| 4 | 4 |
+----------+-------+
3 rows selected (0.055 seconds)
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> select count(*) from "t_hbase1";
+-----------+
| COUNT(1) |
+-----------+
| 3 |
+-----------+
1 row selected (0.027 seconds)
0: jdbc:phoenix:hdp04,hdp05,hdp06:2181> select * from "t_hbase1";
+------+-------------+-------------+-----+---------+
| ROW | start_date | end_date | id | salary |
+------+-------------+-------------+-----+---------+
| 1 | 2017-09-18 | 2017-09-18 | 1 | 1 |
| 2 | 2017-09-18 | 2017-09-18 | 2 | 2 |
| 3 | 2017-09-18 | 2017-09-18 | 3 | 3 |
| 4 | 2017-09-18 | 2017-09-18 | 4 | 4 |
+------+-------------+-------------+-----+---------+
4 rows selected (0.113 seconds)
创建phoenix表
CREATE TABLE example (
my_pk bigint not null,
m.first_name varchar(50),
m.last_name varchar(50)
CONSTRAINT pk PRIMARY KEY (my_pk));
创建二级索引
create index example_first_name_index on example(m.first_name);
创建data.csv文件
12345,John,Doe
67890,Mary,Poppins
批量写入数据
[root@hdp18 Templates]#
/usr/hdp/2.5.3.0-37/phoenix/bin/psql.py -t EXAMPLE hdp14:2181 /root/Templates/data.csv
验证索引表是否有进行同步更新
0: jdbc:phoenix:hdp14,hdp15> select * from example;
+--------+-------------+------------+
| MY_PK | FIRST_NAME | LAST_NAME |
+--------+-------------+------------+
| 12345 | John | Doe |
| 67890 | Mary | Poppins |
+--------+-------------+------------+
2 rows selected (0.023 seconds)
0: jdbc:phoenix:hdp04,hdp05> select * from example_first_name_index;
+---------------+---------+
| M:FIRST_NAME | :MY_PK |
+---------------+---------+
| John | 12345 |
| Mary | 67890 |
+---------------+---------+
2 rows selected (0.018 seconds)
通过上述结果可知批量导入数据的情况是会自动更新索引表的。
Hbase表不存在的情况经过跟上述操作步骤一样的测试,结果与hbase表已存在的情况是一样的。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。