赞
踩
大数据竞赛结束已经有2年了,当时没有考察hbase的知识之后也没了解,导致现在工作遇到相关问题啥也不懂,趁周末赶紧补补课。
我用的版本分别为:
hadoop-2.7.3
zookeeper-3.4.10
hbase-1.2.4
虽然之前比赛也有现成的脚本,但是是多节点的,这次单纯学习hbase,单节点就够了,于是上网找了点大佬的博客。
配置可以看这篇:
不过上篇文章的启动命令有点问题,我是看下面这篇文章的启动命令成功启动的:
启动好后,使用
你的hbase目录/bin/./start-hbase.sh
后运行
你的hbase目录/bin/./hbase shell
进入shell界面就算安装启动成功:
我们输入以下命令可以分别查看hbase的版本、集群状态以及当前用户:
hbase(main):001:0> version
1.2.4, r67592f3d062743907f8c5ae00dbbe1ae4f69e5af, Tue Oct 25 18:10:20 CDT 2016
hbase(main):002:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
hbase(main):003:0> whoami
root (auth:SIMPLE)
groups: root
hbase(main):003:0>
使用如下命令可以赋予权限,系统也会给予一些提示:
]
hbase(main):005:0> grant 'root', 'RWXCA' ERROR: DISABLED: Security features are not available Here is some help for this command: Grant users specific rights. Syntax : grant <user>, <permissions> [, <@namespace> [, <table> [, <column family> [, <column qualifier>]]] permissions is either zero or more letters from the set "RWXCA". READ('R'), WRITE('W'), EXEC('X'), CREATE('C'), ADMIN('A') Note: Groups and users are granted access in the same way, but groups are prefixed with an '@' character. In the same way, tables and namespaces are specified, but namespaces are prefixed with an '@' character. For example: hbase> grant 'bobsmith', 'RWXCA' hbase> grant '@admins', 'RWXCA' hbase> grant 'bobsmith', 'RWXCA', '@ns1' hbase> grant 'bobsmith', 'RW', 't1', 'f1', 'col1' hbase> grant 'bobsmith', 'RW', 'ns1:t1', 'f1', 'col1' hbase(main):006:0>
(ADMIN代表管理权)
Here is some help for this command: Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute. Examples: Create a table with namespace=ns1 and table qualifier=t1 hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5} Create a table with namespace=default and table qualifier=t1 hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'} hbase> # The above in shorthand would be the following: hbase> create 't1', 'f1', 'f2', 'f3' hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true} hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}} Table configuration options can be put at the end. Examples: hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40'] hbase> create 't1', 'f1', SPLITS => ['10', '20', '30', '40'] hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe' hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' } hbase> # Optionally pre-split the table into NUMREGIONS, using hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname) hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'} hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', REGION_REPLICATION => 2, CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}} hbase> create 't1', {NAME => 'f1', DFS_REPLICATION => 1} You can also keep around a reference to the created table: hbase> t1 = create 't1', 'f1' Which gives you a reference to the table named 't1', on which you can then call methods. hbase(main):002:0>
我们可以用如下命令创建名叫“lol”的表,并且列簇名分别为“name”和“technique”:
hbase(main):007:0> create 'lol',{NAME=>'name'},{NAME=>'technique'}
0 row(s) in 1.4040 seconds
=> Hbase::Table - lol
hbase(main):008:0>
删除表需要先disable然后drop
hbase(main):003:0> disable 'lol'
0 row(s) in 3.2680 seconds
hbase(main):004:0> drop 'lol'
0 row(s) in 1.2960 seconds
hbase(main):005:0>
创建完表后,需要先创建临时快照,然后clone临时快照并重新命名新的表,然后删除临时快照,最后查看新复制的表的属性:
hbase(main):005:0> create 'lol',{NAME=>'name'},{NAME=>'technique'} 0 row(s) in 1.2550 seconds => Hbase::Table - lol hbase(main):006:0> snapshot 'lol','lol_tmp' 0 row(s) in 0.3550 seconds hbase(main):007:0> clone_snapshot 'lol_tmp','leagueoflengend' 0 row(s) in 0.5940 seconds hbase(main):008:0> delete delete delete_all_snapshot delete_snapshot deleteall hbase(main):008:0> delete_snapshot 'lol_tmp' 0 row(s) in 0.0640 seconds hbase(main):010:0> desc 'leagueoflengend' Table leagueoflengend is ENABLED leagueoflengend COLUMN FAMILIES DESCRIPTION {NAME => 'name', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KE EP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', CO MPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65 536', REPLICATION_SCOPE => '0'} {NAME => 'technique', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false ', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER ', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE = > '65536', REPLICATION_SCOPE => '0'} 2 row(s) in 0.0700 seconds hbase(main):011:0>
list
hbase(main):012:0> list
TABLE
leagueoflengend
lol
2 row(s) in 0.0170 seconds
=> ["leagueoflengend", "lol"]
hbase(main):013:0>
我们对表’lol’插入“机械公敌-兰博”的数据,可以看到行键为“3”的行存储了兰博的信息(名字为Rambo,称号为Mechanical Enemy,Q技能为grilled at high temperature):
hbase(main):064:0> put 'lol','3','name:name','Rambo' 0 row(s) in 0.0120 seconds hbase(main):066:0> put 'lol','3','name:title','Mechanical Enemy' 0 row(s) in 0.0260 seconds hbase(main):085:0> put 'lol','3','tech:q','grilled at high temperature' 0 row(s) in 0.0410 seconds hbase(main):067:0> scan 'lol' ROW COLUMN+CELL 1 column=name:fname, timestamp=1652003547121, value=Yone 1 column=name:tech, timestamp=1652059717967, value=Staggered jade cut 1 column=name:title, timestamp=1652003832088, value=Demon Sword Soul 1 column=tech:q, timestamp=1652061296629, value=Staggered jade cut 2 column=name:name, timestamp=1652059786181, value=Yasuo 2 column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash 2 column=name:title, timestamp=1652059364721, value=Wind Swordsman 3 column=name:name, timestamp=1652059766646, value=Rambo 3 column=name:title, timestamp=1652059808472, value=Mechanical Enemy 3 column=tech:q, timestamp=1652061445488, value=grilled at high temperature 3 row(s) in 0.0510 seconds hbase(main):068:0>
deleteall:删除整行所有数据:
hbase(main):198:0> deleteall 'lol','2'
0 row(s) in 0.0060 seconds
hbase(main):199:0>
delete:删除对应行对应列对应时间戳的数据:
hbase(main):199:0> scan 'lol' ROW COLUMN+CELL 1 column=name:cn-name, timestamp=1652065791331, value=\xE6\xB0\xB8\xE6\x81\xA9 1 column=name:fname, timestamp=1652003547121, value=Yone 1 column=name:tech, timestamp=1652059717967, value=Staggered jade cut 1 column=name:title, timestamp=1652003832088, value=Demon Sword Soul 1 column=tech:q, timestamp=1652061296629, value=Staggered jade cut 1 column=tech:w, timestamp=1652065118775, value=Spirit Cleave 2 column=name:cn-name, timestamp=1652065856945, value=\xE4\xBA\x9A\xE7\xB4\xA2 2 column=name:name, timestamp=1652059786181, value=Yasuo 2 column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash 2 column=name:title, timestamp=1652059364721, value=Wind Swordsman 3 column=name:cn-name, timestamp=1652065881601, value=\xE5\x85\xB0\xE5\x8D\x9A 3 column=name:name, timestamp=1652059766646, value=Rambo 3 column=name:title, timestamp=1652059808472, value=Mechanical Enemy 3 column=tech:q, timestamp=1652061445488, value=grilled at high temperature 3 row(s) in 0.0410 seconds hbase(main):200:0> delete 'lol','1','name:tech',1652059717967 0 row(s) in 0.0090 seconds hbase(main):201:0>
可以直接再put就可以更改了:
put 'lol','3','name:name','Rambo_changed'
增加列簇:
hbase(main):079:0> alter 'lol',NAME=>'tech' Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 1.9890 seconds hbase(main):080:0> desc 'lol' Table lol is ENABLED lol COLUMN FAMILIES DESCRIPTION {NAME => 'country', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} {NAME => 'name', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} {NAME => 'tech', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 3 row(s) in 0.0170 seconds hbase(main):081:0>
删除列簇:
hbase(main):088:0> alter 'lol',NAME=>'country',METHOD=>'delete' Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 2.2430 seconds hbase(main):089:0> desc 'lol' Table lol is ENABLED lol COLUMN FAMILIES DESCRIPTION {NAME => 'name', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} {NAME => 'tech', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 2 row(s) in 0.0170 seconds hbase(main):090:0>
先增加列簇,再删除列簇即可
hbase的匹配有两种,一种是模糊匹配,一种是精确匹配,类似于elasticsearch里的match和term:
substring:模糊匹配,如(我之前已在lol表的name列簇中添加了tech列):
hbase(main):148:0> scan 'lol',FILTER=>"QualifierFilter (=,'substring:te')"
ROW COLUMN+CELL
1 column=name:tech, timestamp=1652059717967, value=Staggered jade cut
2 column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash
2 row(s) in 0.0220 seconds
hbase(main):149:0> scan 'lol',FILTER=>"QualifierFilter (=,'substring:tech')"
ROW COLUMN+CELL
1 column=name:tech, timestamp=1652059717967, value=Staggered jade cut
2 column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash
2 row(s) in 0.0070 seconds
hbase(main):150:0>
binary:精确匹配,如:
hbase(main):146:0> scan 'lol',FILTER=>"QualifierFilter (=,'binary:te')"
ROW COLUMN+CELL
0 row(s) in 0.0280 seconds
hbase(main):147:0> scan 'lol',FILTER=>"QualifierFilter (=,'binary:tech')"
ROW COLUMN+CELL
1 column=name:tech, timestamp=1652059717967, value=Staggered jade cut
2 column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash
2 row(s) in 0.0160 seconds
hbase(main):148:0>
get用来查询某行的数据。官方help:
Here is some help for this command: Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp, timerange and versions. Examples: hbase> get 'ns1:t1', 'r1' hbase> get 't1', 'r1' hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]} hbase> get 't1', 'r1', {COLUMN => 'c1'} hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4} hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"} hbase> get 't1', 'r1', 'c1' hbase> get 't1', 'r1', 'c1', 'c2' hbase> get 't1', 'r1', ['c1', 'c2'] hbase> get 't1', 'r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}} hbase> get 't1', 'r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']} hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE'} hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1} Besides the default 'toStringBinary' format, 'get' also supports custom formatting by column. A user can define a FORMATTER by adding it to the column name in the get specification. The FORMATTER can be stipulated: 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString) 2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'. Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: hbase> get 't1', 'r1' {COLUMN => ['cf:qualifier1:toInt', 'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] } Note that you can specify a FORMATTER by column only (cf:qualifier). You cannot specify a FORMATTER for all columns of a column family. The same commands also can be run on a reference to a table (obtained via get_table or create_table). Suppose you had a reference t to table 't1', the corresponding commands would be: hbase> t.get 'r1' hbase> t.get 'r1', {TIMERANGE => [ts1, ts2]} hbase> t.get 'r1', {COLUMN => 'c1'} hbase> t.get 'r1', {COLUMN => ['c1', 'c2', 'c3']} hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1} hbase> t.get 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4} hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4} hbase> t.get 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"} hbase> t.get 'r1', 'c1' hbase> t.get 'r1', 'c1', 'c2' hbase> t.get 'r1', ['c1', 'c2'] hbase> t.get 'r1', {CONSISTENCY => 'TIMELINE'} hbase> t.get 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}
以下分别为查询rowkey为“1”,列簇为“tech”的数据和同时查询列簇为“name”或“tech”的数据:
hbase(main):092:0> get 'lol','1',{COLUMN=>'tech'}
COLUMN CELL
tech:q timestamp=1652061296629, value=Staggered jade cut
1 row(s) in 0.0080 seconds
hbase(main):093:0> get 'lol','1',{COLUMN=>['name','tech']}
COLUMN CELL
name:fname timestamp=1652003547121, value=Yone
name:tech timestamp=1652059717967, value=Staggered jade cut
name:title timestamp=1652003832088, value=Demon Sword Soul
tech:q timestamp=1652061296629, value=Staggered jade cut
4 row(s) in 0.0080 seconds
hbase(main):094:0>
如果要查询某个值,比如查询名称为“Yone”的数据:
hbase(main):094:0> get 'lol','1',{FILTER=>"ValueFilter (=,'binary:Yone')"}
COLUMN CELL
name:fname timestamp=1652003547121, value=Yone
1 row(s) in 0.0090 seconds
hbase(main):095:0>
官方的help总是很全面,基本上看这个就够了
Here is some help for this command: Scan a table; pass table name and optionally a dictionary of scanner specifications. Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, ROWPREFIXFILTER, TIMESTAMP, MAXLENGTH or COLUMNS, CACHE or RAW, VERSIONS, ALL_METRICS or METRICS If no columns are specified, all columns will be scanned. To scan all members of a column family, leave the qualifier empty as in 'col_family'. The filter can be specified in two ways: 1. Using a filterString - more information on this is available in the Filter Language document attached to the HBASE-4176 JIRA 2. Using the entire package name of the filter. If you wish to see metrics regarding the execution of the scan, the ALL_METRICS boolean should be set to true. Alternatively, if you would prefer to see only a subset of the metrics, the METRICS array can be defined to include the names of only the metrics you care about. Some examples: hbase> scan 'hbase:meta' hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'} hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]} hbase> scan 't1', {REVERSED => true} hbase> scan 't1', {ALL_METRICS => true} hbase> scan 't1', {METRICS => ['RPC_RETRIES', 'ROWS_FILTERED']} hbase> scan 't1', {ROWPREFIXFILTER => 'row2', FILTER => " (QualifierFilter (>=, 'binary:xyz')) AND (TimestampsFilter ( 123, 456))"} hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} hbase> scan 't1', {CONSISTENCY => 'TIMELINE'} For setting the Operation Attributes hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}} hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']} For experts, there is an additional option -- CACHE_BLOCKS -- which switches block caching for the scanner on (true) or off (false). By default it is enabled. Examples: hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false} Also for experts, there is an advanced option -- RAW -- which instructs the scanner to return all cells (including delete markers and uncollected deleted cells). This option cannot be combined with requesting specific COLUMNS. Disabled by default. Example: hbase> scan 't1', {RAW => true, VERSIONS => 10} Besides the default 'toStringBinary' format, 'scan' supports custom formatting by column. A user can define a FORMATTER by adding it to the column name in the scan specification. The FORMATTER can be stipulated: 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString) 2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'. Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt', 'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] } Note that you can specify a FORMATTER by column only (cf:qualifier). You cannot specify a FORMATTER for all columns of a column family. Scan can also be used directly from a table, by first getting a reference to a table, like such: hbase> t = get_table 't' hbase> t.scan Note in the above situation, you can still provide all the filtering, columns, options, etc as described above.
比如我们要查看表"lol"的前两行数据:
若不加限制条件则直接查看表所有数据
hbase(main):068:0> scan 'lol',{LIMIT=>2}
ROW COLUMN+CELL
1 column=name:fname, timestamp=1652003547121, value=Yone
1 column=name:tech, timestamp=1652059717967, value=Staggered jade cut
1 column=name:title, timestamp=1652003832088, value=Demon Sword Soul
2 column=name:name, timestamp=1652059786181, value=Yasuo
2 column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash
2 column=name:title, timestamp=1652059364721, value=Wind Swordsman
2 row(s) in 0.0330 seconds
hbase(main):069:0>
如果要对行号进行限定:
注意,区间为[STARTROW, ENDROW)
hbase(main):069:0> scan 'lol',{STARTROW=>'2',ENDROW=>'3'} ROW COLUMN+CELL 2 column=name:name, timestamp=1652059786181, value=Yasuo 2 column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash 2 column=name:title, timestamp=1652059364721, value=Wind Swordsman 1 row(s) in 0.0340 seconds hbase(main):070:0> scan 'lol',{STARTROW=>'2',ENDROW=>'4'} ROW COLUMN+CELL 2 column=name:name, timestamp=1652059786181, value=Yasuo 2 column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash 2 column=name:title, timestamp=1652059364721, value=Wind Swordsman 3 column=name:name, timestamp=1652059766646, value=Rambo 3 column=name:title, timestamp=1652059808472, value=Mechanical Enemy 2 row(s) in 0.0090 seconds hbase(main):071:0>
如果要对时间进行限定:
hbase(main):074:0> scan 'lol',{FILTER=>"(TimestampsFilter (1652003547121,1652059643106))"}
ROW COLUMN+CELL
1 column=name:fname, timestamp=1652003547121, value=Yone
2 column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash
2 row(s) in 0.0340 seconds
hbase(main):075:0>
hbase(main):047:0> show_filters DependentColumnFilter KeyOnlyFilter ColumnCountGetFilter SingleColumnValueFilter PrefixFilter SingleColumnValueExcludeFilter FirstKeyOnlyFilter ColumnRangeFilter TimestampsFilter FamilyFilter QualifierFilter ColumnPrefixFilter RowFilter MultipleColumnPrefixFilter InclusiveStopFilter PageFilter ValueFilter ColumnPaginationFilter hbase(main):048:0>
RowFilter:对行键进行过滤。如以下命令获取rowkey开头为“1”的数据
hbase(main):051:0> scan 'lol',FILTER=>"RowFilter(=,'binaryprefix:1')"
ROW COLUMN+CELL
1 column=name:fname, timestamp=1652003547121, value=Yone
1 column=name:title, timestamp=1652003832088, value=Demon Sword Soul
1 row(s) in 0.0240 seconds
hbase(main):052:0>
PrefixFilter:行键前缀过滤。上面的命令可以这样写:
hbase(main):056:0> scan 'lol',FILTER=>"PrefixFilter('1')"
ROW COLUMN+CELL
1 column=name:fname, timestamp=1652003547121, value=Yone
1 column=name:title, timestamp=1652003832088, value=Demon Sword Soul
1 row(s) in 0.0260 seconds
hbase(main):057:0>
FirstKeyOnlyFilter:显示每个逻辑行的第一个数据,可以用来快速查看表的基本数据,也可以提高统计计数的效率
hbase(main):098:0> scan 'lol',{FILTER=>"FirstKeyOnlyFilter()"}
ROW COLUMN+CELL
1 column=name:fname, timestamp=1652003547121, value=Yone
2 column=name:name, timestamp=1652059786181, value=Yasuo
3 column=name:name, timestamp=1652059766646, value=Rambo
3 row(s) in 0.0130 seconds
hbase(main):099:0>
同时我们可以直接用count来查询行数:
hbase(main):194:0> scan 'lol' ROW COLUMN+CELL 1 column=name:cn-name, timestamp=1652065791331, value=\xE6\xB0\xB8\xE6\x81\xA9 1 column=name:fname, timestamp=1652003547121, value=Yone 1 column=name:tech, timestamp=1652059717967, value=Staggered jade cut 1 column=name:title, timestamp=1652003832088, value=Demon Sword Soul 1 column=tech:q, timestamp=1652061296629, value=Staggered jade cut 1 column=tech:w, timestamp=1652065118775, value=Spirit Cleave 2 column=name:cn-name, timestamp=1652065856945, value=\xE4\xBA\x9A\xE7\xB4\xA2 2 column=name:name, timestamp=1652059786181, value=Yasuo 2 column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash 2 column=name:title, timestamp=1652059364721, value=Wind Swordsman 3 column=name:cn-name, timestamp=1652065881601, value=\xE5\x85\xB0\xE5\x8D\x9A 3 column=name:name, timestamp=1652059766646, value=Rambo 3 column=name:title, timestamp=1652059808472, value=Mechanical Enemy 3 column=tech:q, timestamp=1652061445488, value=grilled at high temperature 4 column=name:name, timestamp=1652085807213, value=Foyego 4 row(s) in 0.0120 seconds hbase(main):195:0> count 'lol' 4 row(s) in 0.0070 seconds => 4 hbase(main):196:0>
FamilyFilter:查询列簇名。如查找列簇名包含“te”的数据:
hbase(main):101:0> scan 'lol',FILTER=>"FamilyFilter (=,'substring:te')"
ROW COLUMN+CELL
1 column=tech:q, timestamp=1652061296629, value=Staggered jade cut
3 column=tech:q, timestamp=1652061445488, value=grilled at high temperature
2 row(s) in 0.0270 seconds
hbase(main):102:0>
QualifierFilter:查询列名。如查找包含“tech”的列的数据:
hbase(main):104:0> scan 'lol',FILTER=>"QualifierFilter (=,'substring:tech')"
ROW COLUMN+CELL
1 column=name:tech, timestamp=1652059717967, value=Staggered jade cut
2 column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash
2 row(s) in 0.0080 seconds
hbase(main):105:0>
ColumnPrefixFilter:查询列前缀为xx。如查找列以“f”开头的数据:
hbase(main):106:0> scan 'lol',FILTER=>"ColumnPrefixFilter('f')"
ROW COLUMN+CELL
1 column=name:fname, timestamp=1652003547121, value=Yone
1 row(s) in 0.0060 seconds
hbase(main):107:0>
MultipleColumnPrefixFilter:查询多个列前缀。如:
hbase(main):107:0> scan 'lol',FILTER=>"MultipleColumnPrefixFilter('na','f')"
ROW COLUMN+CELL
1 column=name:fname, timestamp=1652003547121, value=Yone
2 column=name:name, timestamp=1652059786181, value=Yasuo
3 column=name:name, timestamp=1652059766646, value=Rambo
3 row(s) in 0.0190 seconds
hbase(main):108:0>
ColumnRangeFilter:设定范围来对列进行过滤,其中true和false来设置起始点和结束点,范围与STARTROW和ENDROW一样是左闭右开:
hbase(main):110:0> scan 'lol' ROW COLUMN+CELL 1 column=name:fname, timestamp=1652003547121, value=Yone 1 column=name:tech, timestamp=1652059717967, value=Staggered jade cut 1 column=name:title, timestamp=1652003832088, value=Demon Sword Soul 1 column=tech:q, timestamp=1652061296629, value=Staggered jade cut 1 column=tech:w, timestamp=1652065118775, value=Spirit Cleave 2 column=name:name, timestamp=1652059786181, value=Yasuo 2 column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash 2 column=name:title, timestamp=1652059364721, value=Wind Swordsman 3 column=name:name, timestamp=1652059766646, value=Rambo 3 column=name:title, timestamp=1652059808472, value=Mechanical Enemy 3 column=tech:q, timestamp=1652061445488, value=grilled at high temperature 3 row(s) in 0.0120 seconds hbase(main):111:0> scan 'lol',FILTER=>"ColumnRangeFilter ('na',true,'te',false)" ROW COLUMN+CELL 1 column=tech:q, timestamp=1652061296629, value=Staggered jade cut 2 column=name:name, timestamp=1652059786181, value=Yasuo 3 column=name:name, timestamp=1652059766646, value=Rambo 3 column=tech:q, timestamp=1652061445488, value=grilled at high temperature 3 row(s) in 0.0530 seconds hbase(main):112:0> scan 'lol',FILTER=>"ColumnRangeFilter ('na',true,'wa',false)" ROW COLUMN+CELL 1 column=name:tech, timestamp=1652059717967, value=Staggered jade cut 1 column=name:title, timestamp=1652003832088, value=Demon Sword Soul 1 column=tech:q, timestamp=1652061296629, value=Staggered jade cut 1 column=tech:w, timestamp=1652065118775, value=Spirit Cleave 2 column=name:name, timestamp=1652059786181, value=Yasuo 2 column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash 2 column=name:title, timestamp=1652059364721, value=Wind Swordsman 3 column=name:name, timestamp=1652059766646, value=Rambo 3 column=name:title, timestamp=1652059808472, value=Mechanical Enemy 3 column=tech:q, timestamp=1652061445488, value=grilled at high temperature 3 row(s) in 0.0330 seconds hbase(main):113:0>
ValueFilter:查询值。在此之前我对lol表插入了各个英雄的中文名,可以看到hbase默认会将中文转化成以16进制存储并展示:
hbase(main):117:0> scan 'lol' ROW COLUMN+CELL 1 column=name:cn-name, timestamp=1652065791331, value=\xE6\xB0\xB8\xE6\x81\xA9 1 column=name:fname, timestamp=1652003547121, value=Yone 1 column=name:tech, timestamp=1652059717967, value=Staggered jade cut 1 column=name:title, timestamp=1652003832088, value=Demon Sword Soul 1 column=tech:q, timestamp=1652061296629, value=Staggered jade cut 1 column=tech:w, timestamp=1652065118775, value=Spirit Cleave 2 column=name:cn-name, timestamp=1652065856945, value=\xE4\xBA\x9A\xE7\xB4\xA2 2 column=name:name, timestamp=1652059786181, value=Yasuo 2 column=name:tech, timestamp=1652059643106, value=Chopping Steel Flash 2 column=name:title, timestamp=1652059364721, value=Wind Swordsman 3 column=name:cn-name, timestamp=1652065881601, value=\xE5\x85\xB0\xE5\x8D\x9A 3 column=name:name, timestamp=1652059766646, value=Rambo 3 column=name:title, timestamp=1652059808472, value=Mechanical Enemy 3 column=tech:q, timestamp=1652061445488, value=grilled at high temperature 3 row(s) in 0.0230 seconds hbase(main):118:0> scan 'lol',FILTER=>"ValueFilter (=,'substring:永恩')" ROW COLUMN+CELL 1 column=name:cn-name, timestamp=1652065791331, value=\xE6\xB0\xB8\xE6\x81\xA9 1 row(s) in 0.0290 seconds hbase(main):145:0> scan 'lol',FILTER=>"ValueFilter (=,'substring:ne')" ROW COLUMN+CELL 1 column=name:fname, timestamp=1652003547121, value=Yone 3 column=name:title, timestamp=1652059808472, value=Mechanical Enemy 2 row(s) in 0.0300 seconds hbase(main):119:0>
Tips:hbase显示中文:
hbase(main):144:0> scan 'lol',{COLUMNS => 'name:cn-name:toString'}
ROW COLUMN+CELL
1 column=name:cn-name, timestamp=1652065791331, value=永恩
2 column=name:cn-name, timestamp=1652065856945, value=亚索
3 column=name:cn-name, timestamp=1652065881601, value=兰博
3 row(s) in 0.0210 seconds
hbase(main):145:0>
本文介绍的只是非常基础的语法,hbase还有很多用法没有展示,比如导入数据等,期待进一步学习。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。