当前位置:   article > 正文

HBase常用的Filter过滤器操作_hbasefilter

hbasefilter

HBase过滤器种类很多,我们选择8种常用的过滤器进行介绍。为了获得更好的示例效果,先利用HBase Shell新建students表格,并往表格中进行写入多行数据。

一、数据准备工作

(1)在默认命名空间中新建表格students,设置列族info、score。

  1. hbase:002:0> create 'students','info','score'
  2. 2024-03-26 00:22:15,810 INFO [main] client.HBaseAdmin (HBaseAdmin.java:postOperationResult(3591)) - Operation: CREATE, Table Name: default:students, procId: 290 completed
  3. Created table students
  4. Took 3.1425 seconds
  5. => Hbase::Table - students

(2)往students表格中写入5行数据,并用scan 'students'命令查看写入结果。

  1. hbase:005:0> put 'students','s001','info:name','Jack'
  2. Took 30.6978 seconds
  3. hbase:017:0> put 'students','s001','info:age','18'
  4. Took 0.0419 seconds
  5. hbase:019:0> put 'students','s001','score:English','95'
  6. Took 0.0472 seconds
  7. hbase:021:0> put 'students','s002','info:name','Tom'
  8. Took 0.0255 seconds
  9. hbase:022:0> put 'students','s002','info:age','20'
  10. Took 0.0160 seconds
  11. hbase:023:0> put 'students','s002','score:Chinese','85'
  12. Took 0.0296 seconds
  13. hbase:024:0> put 'students','s002','score:Math','90'
  14. Took 0.0155 seconds
  15. hbase:025:0> put 'students','s003','info:name','Mike'
  16. Took 0.0188 seconds
  17. hbase:026:0> put 'students','s003','info:age','19'
  18. Took 0.0183 seconds
  19. hbase:027:0> put 'students','s003','score:Chinese','90'
  20. Took 0.0178 seconds
  21. hbase:028:0> put 'students','s003','score:Math','95'
  22. Took 0.0445 seconds
  23. hbase:029:0> put 'students','s004','info:name','Lucy'
  24. Took 0.0104 seconds
  25. hbase:030:0> put 'students','s004','score:English','100'
  26. Took 0.0170 seconds
  27. hbase:031:0> put 'students','s005','info:name','Lily'
  28. Took 0.0249 seconds
  29. hbase:032:0> put 'students','s005','score:Chinese','99'
  30. Took 0.0228 seconds
  31. hbase:033:0> scan 'students'
  32. ROW COLUMN+CELL
  33. s001 column=info:age, timestamp=2024-03-26T00:25:17.982, value=18
  34. s001 column=info:name, timestamp=2024-03-26T00:24:39.510, value=Jack
  35. s001 column=score:English, timestamp=2024-03-26T00:25:52.207, value=95
  36. s002 column=info:age, timestamp=2024-03-26T00:26:46.922, value=20
  37. s002 column=info:name, timestamp=2024-03-26T00:26:26.924, value=Tom
  38. s002 column=score:Chinese, timestamp=2024-03-26T00:27:13.181, value=85
  39. s002 column=score:Math, timestamp=2024-03-26T00:27:28.787, value=90
  40. s003 column=info:age, timestamp=2024-03-26T00:28:08.402, value=19
  41. s003 column=info:name, timestamp=2024-03-26T00:27:48.629, value=Mike
  42. s003 column=score:Chinese, timestamp=2024-03-26T00:28:46.714, value=90
  43. s003 column=score:Math, timestamp=2024-03-26T00:29:01.881, value=95
  44. s004 column=info:name, timestamp=2024-03-26T00:29:19.868, value=Lucy
  45. s004 column=score:English, timestamp=2024-03-26T00:29:44.831, value=100
  46. s005 column=info:name, timestamp=2024-03-26T00:30:04.231, value=Lily
  47. s005 column=score:Chinese, timestamp=2024-03-26T00:30:25.477, value=99
  48. 5 row(s)
  49. Took 0.3369 seconds

二、过滤器的使用介绍

1.ValueFilter过滤器

根据数据列单元格的值进行过滤。值过滤器的比较方式有二进制位比较(binary)和子字符串匹配比较(substring)。

(1)按二进制位进行值比较

使用get命令,查询students表格中,行键为s001,单元格值为Jack的数据结果。

  1. #ValueFilter(=,'binary:Jack')是值过滤器,比较方式是binary二进制
  2. hbase:034:0> get 'students','s001',{FILTER=>"ValueFilter(=,'binary:Jack')"}
  3. COLUMN CELL
  4. info:name timestamp=2024-03-26T00:24:39.510, value=Jack
  5. 1 row(s)
  6. Took 0.6506 seconds

使用scan命令,扫描出students表格中,单元格值为90的数据结果。

  1. #查询结果是多条,需要用scan命令全表扫描,不能使用get命令
  2. hbase:036:0> scan 'students',{FILTER=>"ValueFilter(=,'binary:90')"}
  3. ROW COLUMN+CELL
  4. s002 column=score:Math, timestamp=2024-03-26T00:27:28.787, value=90
  5. s003 column=score:Chinese, timestamp=2024-03-26T00:28:46.714, value=90
  6. 2 row(s)
  7. Took 0.2162 seconds

(2)按子字符串匹配比较

使用get命令,查询students表格中,行键为s001,单元格值包含子字符串ac的数据结果。

  1. hbase:037:0> get 'students','s001',{FILTER=>"ValueFilter(=,'substring:ac')"}
  2. COLUMN CELL
  3. info:name timestamp=2024-03-26T00:24:39.510, value=Jack
  4. 1 row(s)
  5. Took 0.1578 seconds

使用scan命令,扫描出表格students中单元格值包含子字符串0的数据结果。

  1. #查询结果是多条,需要用scan命令全表扫描,不能使用get命令
  2. hbase:038:0> scan 'students',{FILTER=>"ValueFilter(=,'substring:0')"}
  3. ROW COLUMN+CELL
  4. s002 column=info:age, timestamp=2024-03-26T00:26:46.922, value=20
  5. s002 column=score:Math, timestamp=2024-03-26T00:27:28.787, value=90
  6. s003 column=score:Chinese, timestamp=2024-03-26T00:28:46.714, value=90
  7. s004 column=score:English, timestamp=2024-03-26T00:29:44.831, value=100
  8. 3 row(s)
  9. Took 0.0868 seconds

2.QualifierFilter过滤器

列限定符过滤器QualifierFilter是只根据数据列的列限定符进行过滤,并不关注列族名称。列限定符过滤器的常用比较方式为二进制位(binary)比较。

使用get命令,查询students表格中,行键为s001,列限定符为name的数据结果。

  1. hbase:039:0> get 'students','s001',{FILTER=>"QualifierFilter(=,'binary:name')"}
  2. COLUMN CELL
  3. info:name timestamp=2024-03-26T00:24:39.510, value=Jack
  4. 1 row(s)
  5. Took 0.3310 seconds

使用scan命令,扫描students表格中,列限定符为name的数据结果。

  1. hbase:041:0> scan 'students',{FILTER=>"QualifierFilter(=,'binary:name')"}
  2. ROW COLUMN+CELL
  3. s001 column=info:name, timestamp=2024-03-26T00:24:39.510, value=Jack
  4. s002 column=info:name, timestamp=2024-03-26T00:26:26.924, value=Tom
  5. s003 column=info:name, timestamp=2024-03-26T00:27:48.629, value=Mike
  6. s004 column=info:name, timestamp=2024-03-26T00:29:19.868, value=Lucy
  7. s005 column=info:name, timestamp=2024-03-26T00:30:04.231, value=Lily
  8. 5 row(s)
  9. Took 0.0845 seconds

3.ColumnPrefixFilter过滤器

列前缀符过滤器ColumnPrefixFilter是根据数据列的列限定符的前缀进行过滤。前缀过滤必须从第一个字符开始匹配,而子字符串过滤可以从任何位置开始进行子串匹配。前缀过滤器严格区分字母大小写

使用get命令,查询出students表格中,行键为s002,列限定符的前缀字符串为Chi的数据结果。

  1. hbase:042:0> get 'students','s002',{FILTER=>"ColumnPrefixFilter('Chi')"}
  2. COLUMN CELL
  3. score:Chinese timestamp=2024-03-26T00:27:13.181, value=85
  4. 1 row(s)
  5. Took 0.1693 seconds

使用scan命令,扫描students表格,列限定符的前缀字符串为Chi的数据结果。

  1. hbase:044:0> scan 'students',{FILTER=>"ColumnPrefixFilter('Chi')"}
  2. ROW COLUMN+CELL
  3. s002 column=score:Chinese, timestamp=2024-03-26T00:27:13.181, value=85
  4. s003 column=score:Chinese, timestamp=2024-03-26T00:28:46.714, value=90
  5. s005 column=score:Chinese, timestamp=2024-03-26T00:30:25.477, value=99
  6. 3 row(s)
  7. Took 0.0397 seconds

4.RowFilter过滤器

行键过滤器RowFilter是根据行键对数据列进行过滤。

注意:一般不在get命令中使用行键过滤器,get命令必须指定唯一确定完整的行键,没有必要再对行键进行过滤。

(1)按二进制位比较。

使用scan命令,扫描students表格,筛选出行键值为s001的所有数据结果。

  1. hbase:045:0> scan 'students',{FILTER=>"RowFilter(=,'binary:s001')"}
  2. ROW COLUMN+CELL
  3. s001 column=info:age, timestamp=2024-03-26T00:25:17.982, value=18
  4. s001 column=info:name, timestamp=2024-03-26T00:24:39.510, value=Jack
  5. s001 column=score:English, timestamp=2024-03-26T00:25:52.207, value=95
  6. 1 row(s)
  7. Took 0.1297 seconds

 (2)按子字符串匹配比较。

使用scan命令,扫描students表格,筛选出行键值包含子字符串01的所有数据结果。

  1. hbase:046:0> scan 'students',{FILTER=>"RowFilter(=,'substring:01')"}
  2. ROW COLUMN+CELL
  3. s001 column=info:age, timestamp=2024-03-26T00:25:17.982, value=18
  4. s001 column=info:name, timestamp=2024-03-26T00:24:39.510, value=Jack
  5. s001 column=score:English, timestamp=2024-03-26T00:25:52.207, value=95
  6. 1 row(s)
  7. Took 0.3426 seconds

5.PrefixFilter过滤器

行键前缀过滤器PrefixFilter是根据行键的前缀进行过滤。前缀过滤必须从行键的第一个字符开始匹配,严格区分字母大小写

使用scan命令,扫描students表格,筛选出行键值以s00为前缀开头的数据结果。

  1. hbase:047:0> scan 'students',{FILTER=>"PrefixFilter('s00')"}
  2. ROW COLUMN+CELL
  3. s001 column=info:age, timestamp=2024-03-26T00:25:17.982, value=18
  4. s001 column=info:name, timestamp=2024-03-26T00:24:39.510, value=Jack
  5. s001 column=score:English, timestamp=2024-03-26T00:25:52.207, value=95
  6. s002 column=info:age, timestamp=2024-03-26T00:26:46.922, value=20
  7. s002 column=info:name, timestamp=2024-03-26T00:26:26.924, value=Tom
  8. s002 column=score:Chinese, timestamp=2024-03-26T00:27:13.181, value=85
  9. s002 column=score:Math, timestamp=2024-03-26T00:27:28.787, value=90
  10. s003 column=info:age, timestamp=2024-03-26T00:28:08.402, value=19
  11. s003 column=info:name, timestamp=2024-03-26T00:27:48.629, value=Mike
  12. s003 column=score:Chinese, timestamp=2024-03-26T00:28:46.714, value=90
  13. s003 column=score:Math, timestamp=2024-03-26T00:29:01.881, value=95
  14. s004 column=info:name, timestamp=2024-03-26T00:29:19.868, value=Lucy
  15. s004 column=score:English, timestamp=2024-03-26T00:29:44.831, value=100
  16. s005 column=info:name, timestamp=2024-03-26T00:30:04.231, value=Lily
  17. s005 column=score:Chinese, timestamp=2024-03-26T00:30:25.477, value=99
  18. 5 row(s)
  19. Took 0.4404 seconds

6.FamilyFilter过滤器

列族过滤器FamilyFilter是根据列族名称进行过滤。列族过滤器的比较方式有二进制位比较(binary)、子字符串匹配比较(substring)等。

(1)按二进制位比较。

使用scan命令,扫描表格students,筛选出列族名称值为info的数据结果。

  1. hbase:005:0> scan 'students',FILTER=>"FamilyFilter(=,'binary:info')"
  2. ROW COLUMN+CELL
  3. s001 column=info:age, timestamp=2024-03-26T00:25:17.982, value=18
  4. s001 column=info:name, timestamp=2024-03-26T00:24:39.510, value=Jack
  5. s002 column=info:age, timestamp=2024-03-26T00:26:46.922, value=20
  6. s002 column=info:name, timestamp=2024-03-26T00:26:26.924, value=Tom
  7. s003 column=info:age, timestamp=2024-03-26T00:28:08.402, value=19
  8. s003 column=info:name, timestamp=2024-03-26T00:27:48.629, value=Mike
  9. s004 column=info:name, timestamp=2024-03-26T00:29:19.868, value=Lucy
  10. s005 column=info:name, timestamp=2024-03-26T00:30:04.231, value=Lily
  11. 5 row(s)
  12. Took 0.0399 seconds

(2)按子字符串匹配比较。

使用scan命令,扫描表格students,筛选出列族名称包含子字符串s的数据结果。

  1. hbase:008:0> scan 'students',FILTER=>"FamilyFilter(=,'substring:s')"
  2. ROW COLUMN+CELL
  3. s001 column=score:English, timestamp=2024-03-26T00:25:52.207, value=95
  4. s002 column=score:Chinese, timestamp=2024-03-26T00:27:13.181, value=85
  5. s002 column=score:Math, timestamp=2024-03-26T00:27:28.787, value=90
  6. s003 column=score:Chinese, timestamp=2024-03-26T00:28:46.714, value=90
  7. s003 column=score:Math, timestamp=2024-03-26T00:29:01.881, value=95
  8. s004 column=score:English, timestamp=2024-03-26T00:29:44.831, value=100
  9. s005 column=score:Chinese, timestamp=2024-03-26T00:30:25.477, value=99
  10. 5 row(s)
  11. Took 0.0915 seconds

7.SingleColumnValueFilter过滤器

单列值过滤器SingleColumnValueFilters是根据指定列族和列限定符的单个数据列的单元格值进行过滤,类似SQL中的”select列名from表名where列名=值”语句。
(1)按二进制位比较。

使用scan命令,扫描表格students,筛选出列族info,列限定符age的单元格值为19的数据列。

  1. hbase:006:0> scan 'students',{COLUMN=>'info:age',FILTER=>"SingleColumnValueFilter('info','age',=,'binary:19')"}
  2. ROW COLUMN+CELL
  3. s003 column=info:age, timestamp=2024-03-26T00:28:08.402, value=19
  4. 1 row(s)
  5. Took 0.4166 seconds

(2)按子字符串匹配比较。

使用scan命令,扫描表格students,筛选出列族info,列限定符name的值包括子字符串y的数据。

  1. hbase:008:0> scan 'students',{COLUMN=>'info:name',FILTER=>"SingleColumnValueFilter('info','name',=,'substring:y')"}
  2. ROW COLUMN+CELL
  3. s004 column=info:name, timestamp=2024-03-26T00:29:19.868, value=Lucy
  4. s005 column=info:name, timestamp=2024-03-26T00:30:04.231, value=Lily
  5. 2 row(s)
  6. Took 0.0658 seconds


 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/羊村懒王/article/detail/626930
推荐阅读
相关标签
  

闽ICP备14008679号