赞
踩
RANGE 分区根据分区表定义时为每个分区建立的分区键值范围,将数据映射到相应的分区中。它是常见的分区类型,经常跟日期类型一起使用。例如,可以将业务日志表按日/周/月分区。
(1)简单语法
CREATE TABLE table_name (
column_name1 column_type
[, column_nameN column_type]
) PARTITION BY RANGE ( expr(column_name1) | column_name1)
(
PARTITION p0 VALUES LESS THAN ( expr )
[, PARTITION pN VALUES LESS THAN (expr ) ]
[, PARTITION pX VALUES LESS THAN (MAXVALUE) ]
);
说明:range分区需要遵循以下规则:
① PARTITION BY RANGE ( expr ) 里的expr表达式的结果必须为整型。
② 每个分区都有一个VALUES LESS THAN子句,它为分区指定一个非包含的上限值。分区键的任何值等于或大于这个值时将被映射到下一个分区中。
③ 除第一个分区外,所有分区都隐含一个下限值,即上一个分区的上限值。
④ 仅允许最后一个分区的上限定义为MAXVALUE,这个值没有具体的数值,并且比其他所有分区的上限都要大,也包含空值。
⑤ 如果要按时间类型列做 RANGE 分区,则必须使用 Timestamp 类型,并且使用函数 UNIX_TIMESTAMP 将时间类型转换为数值
(2)案例
① mysql模式:
obclient>CREATE TABLE t_log_part_by_range ( log_id bigint NOT NULL , log_value varchar(50) , log_date timestamp NOT NULL ) PARTITION BY RANGE(UNIX_TIMESTAMP(log_date)) ( PARTITION M202001 VALUES LESS THAN(UNIX_TIMESTAMP('2020/02/01')) , PARTITION M202002 VALUES LESS THAN(UNIX_TIMESTAMP('2020/03/01')) , PARTITION M202003 VALUES LESS THAN(UNIX_TIMESTAMP('2020/04/01')) , PARTITION M202004 VALUES LESS THAN(UNIX_TIMESTAMP('2020/05/01')) , PARTITION M202005 VALUES LESS THAN(UNIX_TIMESTAMP('2020/06/01')) , PARTITION M202006 VALUES LESS THAN(UNIX_TIMESTAMP('2020/07/01')) , PARTITION M202007 VALUES LESS THAN(UNIX_TIMESTAMP('2020/08/01')) , PARTITION M202008 VALUES LESS THAN(UNIX_TIMESTAMP('2020/09/01')) , PARTITION M202009 VALUES LESS THAN(UNIX_TIMESTAMP('2020/10/01')) , PARTITION M202010 VALUES LESS THAN(UNIX_TIMESTAMP('2020/11/01')) , PARTITION M202011 VALUES LESS THAN(UNIX_TIMESTAMP('2020/12/01')) , PARTITION M202012 VALUES LESS THAN(UNIX_TIMESTAMP('2021/01/01')) );
② Oracle模式:
obclient> CREATE TABLE t_log_part_by_range ( log_id number NOT NULL , log_value varchar2(50) , log_date date NOT NULL DEFAULT sysdate ) PARTITION BY RANGE(log_date) ( PARTITION M202001 VALUES LESS THAN(TO_DATE('2020/02/01','YYYY/MM/DD')) , PARTITION M202002 VALUES LESS THAN(TO_DATE('2020/03/01','YYYY/MM/DD')) , PARTITION M202003 VALUES LESS THAN(TO_DATE('2020/04/01','YYYY/MM/DD')) , PARTITION M202004 VALUES LESS THAN(TO_DATE('2020/05/01','YYYY/MM/DD')) , PARTITION M202005 VALUES LESS THAN(TO_DATE('2020/06/01','YYYY/MM/DD')) , PARTITION M202006 VALUES LESS THAN(TO_DATE('2020/07/01','YYYY/MM/DD')) , PARTITION M202007 VALUES LESS THAN(TO_DATE('2020/08/01','YYYY/MM/DD')) , PARTITION M202008 VALUES LESS THAN(TO_DATE('2020/09/01','YYYY/MM/DD')) , PARTITION M202009 VALUES LESS THAN(TO_DATE('2020/10/01','YYYY/MM/DD')) , PARTITION M202010 VALUES LESS THAN(TO_DATE('2020/11/01','YYYY/MM/DD')) , PARTITION M202011 VALUES LESS THAN(TO_DATE('2020/12/01','YYYY/MM/DD')) , PARTITION M202012 VALUES LESS THAN(TO_DATE('2021/01/01','YYYY/MM/DD')) , PARTITION MMAX VALUES LESS THAN (MAXVALUE) );
① RANGE COLUMNS 分区的分区键的结果不要求是整型,可以是任意类型。
② RANGE COLUMS 分区的分区键不能使用表达式。
③ RANGE COLUMNS 分区的分区键可以写多个列(即列向量)。
CREATE TABLE table_name (
column_name1 column_type
[, column_nameN column_type]
) PARTITION BY RANGE ( column_name1 [, column_name2] )
(
PARTITION p0 VALUES LESS THAN ( expr )
[, PARTITION pN VALUES LESS THAN (expr ) ]
[, PARTITION pX VALUES LESS THAN (maxvalue) ]
);
obclient>CREATE TABLE t_log_part_by_range_columns ( log_id bigint NOT NULL , log_value varchar(50) , log_date date NOT NULL ) PARTITION BY RANGE COLUMNS(log_date) ( PARTITION M202001 VALUES LESS THAN('2020/02/01') , PARTITION M202002 VALUES LESS THAN('2020/03/01') , PARTITION M202003 VALUES LESS THAN('2020/04/01') , PARTITION M202004 VALUES LESS THAN('2020/05/01') , PARTITION M202005 VALUES LESS THAN('2020/06/01') , PARTITION M202006 VALUES LESS THAN('2020/07/01') , PARTITION M202007 VALUES LESS THAN('2020/08/01') , PARTITION M202008 VALUES LESS THAN('2020/09/01') , PARTITION M202009 VALUES LESS THAN('2020/10/01') , PARTITION M202010 VALUES LESS THAN('2020/11/01') , PARTITION M202011 VALUES LESS THAN('2020/12/01') , PARTITION M202012 VALUES LESS THAN('2021/01/01') , PARTITION MMAX VALUES LESS THAN MAXVALUE );
CREATE TABLE table_name (
column_name1 column_type
[, column_nameN column_type]
) PARTITION BY LIST ( expr(column_name1) | column_name1)
(
PARTITION p0 VALUES IN ( v01 [, v0N] )
[, PARTITION pN VALUES IN ( vN1 [, vNN] ) ]
[, PARTITION pX VALUES IN (default) ]
);
规则:
① 分区表达式的结果必须是整型。
② 分区表达式只能引用一列,不能有多列(即列向量)
obclient>CREATE TABLE t_part_by_list (
c1 BIGINT PRIMARY KEY
, c2 VARCHAR(50)
) PARTITION BY list(c1)
(
PARTITION p0 VALUES IN (1, 2, 3)
, PARTITION p1 VALUES IN (5, 6)
, PARTITION p2 VALUES IN (DEFAULT)
);
(2)Oracle模式
obclient>CREATE TABLE t_part_by_list ( obclient> CREATE TABLE t_log_part_by_list ( log_id number NOT NULL , log_value varchar2(50) , log_date date NOT NULL DEFAULT sysdate , PRIMARY key(log_id, log_value) ) PARTITION BY list(log_value) ( PARTITION P01 VALUES ( '01' ) , PARTITION P02 VALUES ( '02' ) , PARTITION P03 VALUES ( '03' ) , PARTITION P04 VALUES ( '04' ) , PARTITION P05 VALUES ( '05' ) ); Query OK, 0 rows affected (0.10 sec) obclient> INSERT INTO t_log_part_by_list(log_id, log_value) values(1,'01'),(2,'02'),(3,'03'),(4,'04'),(5,'05'),(6,'01'); Query OK, 6 rows affected (0.01 sec) Records: 6 Duplicates: 0 Warnings: 0
CREATE TABLE table_name (
column_name1 column_type
[, column_nameN column_type]
) PARTITION BY LIST COLUMNS ( column_name1 [, column_nameN ] )
(
PARTITION p0 VALUES IN ( v01 [, v0N] )
[, PARTITION pN VALUES IN ( vN1 [, vNN] ) ]
[, PARTITION pX VALUES IN (default) ]
);
obclient>CREATE TABLE t2 (
id varchar(64),
type varchar(16),
info varchar(512),
gmt_create datetime(6),
gmt_modified datetime(6),
partition_id varchar(2) GENERATED ALWAYS AS (substr(`id`,19,20)) VIRTUAL,
PRIMARY KEY (id)
) partition by list columns(partition_id)
(partition p0 values in ('00','01'),
partition p1 values in ('02','03'),
partition p2 values in (default));
① 不能指定数据的分区键的列表特征。
② 不同范围内的数据大小相差非常大,并且很难手动调整均衡。
③ 使用 RANGE 分区后数据聚集严重。
④ 并行 DML、分区剪枝和分区连接等性能非常重要
obclient>CREATE TABLE ware(
w_id int
, w_ytd number(12,2)
, w_tax number(4,4)
, w_name varchar(10)
, w_street_1 varchar(20)
, w_street_2 varchar(20)
, w_city varchar(20)
, w_state char(2)
, w_zip char(9)
, primary key(w_id)
) PARTITION by hash(w_id) partitions 60;
① KEY 分区的分区键不要求为整型,可以为任意类型
② KEY 分区的分区键不能使用表达式
③ KEY 分区的分区键支持向量
④ KEY 分区的分区键中不指定任何列时,表示 KEY 分区的分区键是主键
obclient>CREATE TABLE t_log_part_by_key(
id INT,
gmt_create DATETIME,
info VARCHAR(20))
PARTITION BY KEY(id, gmt_create)
PARTITIONS 3;
CREATE TABLE ....
partition BY [hash|range|list] (column_list)
subpartition BY [hash|range|list] (column_list)
subpartition template
(
subpartition subpart_name subpartition_define
, ...
)
(
partition part_name partition_define
, ...
obclient> CREATE TABLE t_range_range (c1 int, c2 int, c3 int) partition BY range(c1)
subpartition BY range (c2)
subpartition template
(
subpartition rp1 VALUES less than (100),
subpartition rp2 VALUES less than (200),
subpartition rp3 VALUES less than (300)
)
(
partition p0 VALUES less than (100),
partition p1 VALUES less than (200),
partition p2 VALUES less than (300)
);
CREATE TABLE ....
partition BY [hash|range|list] (column_list)
subpartition BY [hash|range|list] (column_list)
(
partition part_name partition_define
(
subpartition subpart_name subpartition_define
, ...
)
, ...
)
obclient> CREATE TABLE t_range_range1 (c1 int, c2 int, c3 int) partition BY range(c1) subpartition BY range (c2) ( partition p0 VALUES less than (100) ( subpartition p0_r1 VALUES less than (100), subpartition p0_r2 VALUES less than (200), subpartition p0_r3 VALUES less than (300) ), partition p1 VALUES less than (200) ( subpartition p1_r1 VALUES less than (100), subpartition p1_r2 VALUES less than (200), subpartition p1_r3 VALUES less than (300) ), partition p2 VALUES less than (300) ( subpartition p2_r1 VALUES less than (100), subpartition p2_r2 VALUES less than (200), subpartition p2_r3 VALUES less than (300) ) );
obclient> CREATE TABLE t1
(
c1 INT,
c2 INT
)
PARTITION BY HASH(c1) partitions 5;
obclient> SELECT * FROM t1 WHERE c1 = 1;
obclient> CREATE TABLE t1 ( c1 INT, c2 INT ) PARTITION BY HASH(c1 + c2) partitions 5; obclient> EXPLAIN SELECT * FROM t1 WHERE c1 + c2 = 1 \G *************************** 1. row *************************** Query Plan: =================================== |ID|OPERATOR |NAME|EST. ROWS|COST| ----------------------------------- |0 |TABLE SCAN|t1 |5 |1303| =================================== Outputs & filters: ------------------------------------- 0 - output([t1.c1], [t1.c2]), filter([t1.c1 + t1.c2 = 1]), access([t1.c1], [t1.c2]), partitions(p1)
(2) RANGE分区
通过 where 子句的分区键的范围跟表定义的分区范围的交集来确定需要访问的分区。对于 Range 分区,因为考虑到函数的单调性,如果分区表达式是一个函数并且查询条件是一个范围,则不支持分区裁剪。
案例:
分区条件为表达式,而查询条件为非等值条件(c1 < 150 and c1 > 100),则无法进行分区裁剪
obclient> CREATE TABLE t1 ( c1 INT, c2 INT ) PARTITION BY RANGE(c1 + 1) ( PARTITION p0 VALUES less than (100), PARTITION p1 VALUES less than (200) ); obclient> EXPLAIN SELECT * FROM t1 WHERE c1 < 150 and c1 > 110 \G *************************** 1. row *************************** Query Plan: ============================================ |ID|OPERATOR |NAME|EST. ROWS|COST| -------------------------------------------- |0 |EXCHANGE IN DISTR | |19 |1410| |1 | EXCHANGE OUT DISTR| |19 |1303| |2 | TABLE SCAN |t1 |19 |1303| ============================================ Outputs & filters: ------------------------------------- 0 - output([t1.c1], [t1.c2]), filter(nil) 1 - output([t1.c1], [t1.c2]), filter(nil) 2 - output([t1.c1], [t1.c2]), filter([t1.c1 < 150], [t1.c1 > 110]), access([t1.c1], [t1.c2]), partitions(p[0-1])
如果查询条件是等值条件,则可以进行分区裁剪。示例如下
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 = 150 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1 |1 |1303|
===================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t1.c2]), filter([t1.c1 = 150]),
access([t1.c1], [t1.c2]), partitions(p1)
obclient> CREATE TABLE t1 ( c1 INT , c2 INT ) PARTITION BY hash(c1) SUBPARTITION BY RANGE(c2) SUBPARTITION template ( SUBPARTITION sp0 VALUES less than(100), SUBPARTITION sp1 VALUES less than(200) ) partitions 5 SELECT * FROM t1 WHERE (c1 = 1 OR c1 = 2) AND (c2 > 101 AND c2 < 150) obclient> EXPLAIN SELECT * FROM t1 WHERE (c1 = 1 or c1 = 2) and (c2 > 101 and c2 < 150) \G *************************** 1. row *************************** Query Plan: ============================================ |ID|OPERATOR |NAME|EST. ROWS|COST| -------------------------------------------- |0 |EXCHANGE IN DISTR | |1 |1403| |1 |EXCHANGE OUT DISTR| |1 |1303| |2 | TABLE SCAN |t1 |1 |1303| ============================================ Outputs & filters: ------------------------------------- 0 - output([t1.c1], [t1.c2]), filter(nil) 1 - output([t1.c1], [t1.c2]), filter(nil) 2 - output([t1.c1], [t1.c2]), filter([t1.c1 = 1 OR t1.c1 = 2], [t1.c2 > 101], [t1.c2 < 150]), access([t1.c1], [t1.c2]), partitions(p1sp1, p2sp1)
obclient> SELECT * FROM t1 partition (p0);
obclient> CREATE TABLE t1 ( c1 INT, c2 INT ) PARTITION BY hash(c1) SUBPARTITION BY RANGE(c2) SUBPARTITION template ( SUBPARTITION sp0 VALUES less than(100), SUBPARTITION sp1 VALUES less than(200) ) partitions 5 obclient> SELECT partition_id FROM __all_meta_table JOIN __all_table using(table_id) WHERE table_name = 't1'; +---------------------+ | partition_id | +---------------------+ | 1152921504875282432 | | 1152921504875282433 | | 1152921509170249728 | | 1152921509170249729 | | 1152921513465217024 | | 1152921513465217025 | | 1152921517760184320 | | 1152921517760184321 | | 1152921522055151616 | | 1152921522055151617 | +---------------------+
obclient> CREATE TABLE t1(a int primary key, b int) PARTITION BY hash(a) partitions 5;
obclient> CREATE INDEX idx ON t1(b) local;
obclient> CREATE TABLE t2(a int primary key, b int) PARTITION BY hash(a) partitions 5;
obclient> CREATE UNIQUE INDEX uk ON t2(b) LOCAL;
ERROR 1503 (HY000): A UNIQUE INDEX must include all columns in the table's partitioning function
obclient> CREATE UNIQUE INDEX uk2 on t2(b, a) local;
Query OK, 0 rows affected (5.32 sec)
obclient> EXPLAIN SELECT /*+index(t1 idx)*/ b FROM t1 WHERE b=1 AND a=1\G
*************************** 1. row ***************************
Query Plan: =====================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
-------------------------------------
|0 |TABLE GET|t1(idx)|1 |52 |
=====================================
Outputs & filters:
-------------------------------------
0 - output([t1.b]), filter(nil),
access([t1.b]), partitions(p1)
1 row in set (0.01 sec)
(2)如果在查询中,没有指定分区键,那么局部索引将无法进行分区裁剪,这时会扫描所有分区,增加额外的扫描代价。下述示例语句为查询条件中不指定分区键:
obclient> EXPLAIN SELECT /*+index(t1 idx)*/ b FROM t1 WHERE b=1\G *************************** 1. row *************************** Query Plan: ==================================================== |ID|OPERATOR |NAME |EST. ROWS|COST| ---------------------------------------------------- |0 |EXCHANGE IN DISTR | |4950 |3551| |1 | EXCHANGE OUT DISTR |:EX10000|4950 |3083| |2 | PX PARTITION ITERATOR| |4950 |3083| |3 | TABLE SCAN |t1(idx) |4950 |3083| ==================================================== Outputs & filters: ------------------------------------- 0 - output([t1.b]), filter(nil) 1 - output([t1.b]), filter(nil), dop=1 2 - output([t1.b]), filter(nil) 3 - output([t1.b]), filter(nil), access([t1.b]), partitions(p[0-4]) 1 row in set (0.01 sec)
全局索引的创建规则是在索引属性中指定 GLOBAL 关键字。与局部索引相比,全局索引最大的特点是全局索引的分区规则跟表分区是相互独立的,全局索引允许指定自己的分区规则和分区个数,不一定需要跟表分区规则保持一致
案例:
obclient> CREATE TABLE t1(a int PRIMARY KEY, b int, c int) PARTITION BY hash(a) partitions 5;
obclient> CREATE INDEX gkey ON t1(b) GLOBAL PARTITION BY range(b) (
partition p0 VALUES less than (1),
partition p1 VALUES less than (2),
partition p2 VALUES less than (3)
);
全局索引的分区键一定是索引键本身,因此在使用全局索引的过程中就会指定索引分区键的查询条件,我们可以针对索引的分区规则进行分区裁剪,在查询到索引键值后可以利用索引表中存储的主键信息计算出主表的分区位置,进而对主表也能进行快速的分区定位,避免扫描主表的所有分区,因此对于无法指定主表分区键的查询而言,全局索引在一定条件下能够加速查询的检索效率。
案例如下:
obclient> EXPLAIN SELECT /*+index(t1 gkey)*/ * FROM t1 WHERE b=1\G
*************************** 1. row ***************************
Query Plan: ==========================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
------------------------------------------
|0 |TABLE LOOKUP|T1 |4950 |38645|
|1 | TABLE SCAN |T1(GKEY)|4950 |1115 |
==========================================
Outputs & filters:
-------------------------------------
0 - output([T1.A], [T1.B], [T1.C]), filter(nil),
partitions(p[0-4])
1 - output([T1.A]), filter(nil),
access([T1.A]), partitions(p1)
obclient> SET GLOBAL ob_timestamp_service=LTS;
Query OK, 0 rows affected (0.06 sec)
obclient> CREATE TABLE t1(a int, b int, PRIMARY KEY(a));
Query OK, 0 rows affected (0.17 sec)
obclient> CREATE INDEX gkey ON t1(b) PARTITION BY range(b) (PARTITION p0 VALUES LESS THAN (1), PARTITION p1 VALUES LESS THAN (2), PARTITION p2 VALUES LESS THAN(3));
ERROR 1235 (0A000): create global index when GTS is off not supported
由于 OceanBase 数据库的表是索引组织表(IOT),对于分区表而言,为了保证指定主键的查询能很快定位到表所在的分区,所以分区键必须是主键的子集。如果需要在分区表上创建局部分区唯一索引(Local Partitioned Unique Index),则该索引键需要包含主表的分区键,而对于全局分区唯一索引(Global Partitioned Unique Index)并没有这个限制
obclient> CREATE TABLE test(pk int,c2 int ,c3 int, PRIMARY KEY(pk)) PARTITION BY hash(pk) partitions 5;
Query OK, 0 rows affected (0.20 sec)
obclient> CREATE UNIQUE INDEX idx ON test(c2) LOCAL;
ERROR 1503 (HY000): A UNIQUE INDEX must include all columns in the table's partitioning function
obclient> CREATE UNIQUE INDEX idx ON test(c2, pk) LOCAL;
Query OK, 0 rows affected (5.34 sec)
obclient> DROP INDEX idx ON test;
Query OK, 0 rows affected (0.02 sec)
obclient> CREATE UNIQUE INDEX idx ON test(c2) GLOBAL;
Query OK, 0 rows affected (17.47 sec)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。