赞
踩
Paimon支持schema evolution将数据插入到Paimon表中,添加的列将实时同步到Paimon表,并且无需重启同步作业。
目前支持的同步方式如下:
什么是 Schema Evolution (模式演变)
假设有一个名为tableA
的MySQL表,它有三个字段:field_1
、field_2
、field_3
,想将此MySQL表加载到Paimon时,可以在Flink SQL中执行如下操作,或使用MySqlSyncTableAction。
Flink SQL:
在Flink SQL中,如果在插入后更改MySQL表的表模式(表结构),表模式更改将不会同步到Paimon。
MySqlSyncTableAction:
在MySqlSyncTableAction中,如果在摄取后更改MySQL表的表模式,表模式更改将同步到Paimon,新添加的field_4
的数据也将同步到Paimon。
Schema Change Evolution(模式变化进化)
cdc Ingestion支持的模式更改行为有限,该框架无法重命名表、删除列,因此RENAME TABLE
和DROP COLUMN
的行为将被忽略,RENAME COLUMN
将添加新列。目前支持的模式更改包括:
Computed Functions(计算函数)
Function | Description |
---|---|
year(date-column) | Extract year from a DATE, DATETIME or TIMESTAMP (or its corresponding string format). Output is an INT value represent the year. |
month(date-column) | Extract month of year from a DATE, DATETIME or TIMESTAMP (or its corresponding string format). Output is an INT value represent the month of year. |
day(date-column) | Extract day of month from a DATE, DATETIME or TIMESTAMP (or its corresponding string format). Output is an INT value represent the day of month. |
hour(date-column) | Extract hour from a DATE, DATETIME or TIMESTAMP (or its corresponding string format). Output is an INT value represent the hour. |
minute(date-column) | Extract minute from a DATE, DATETIME or TIMESTAMP (or its corresponding string format). Output is an INT value represent the minute. |
second(date-column) | Extract second from a DATE, DATETIME or TIMESTAMP (or its corresponding string format). Output is an INT value represent the second. |
date_format(date-column,format) | Convert date format from a DATE, DATETIME or TIMESTAMP (or its corresponding string format). ‘format’ is compatible with Java’s DateTimeFormatter String (for example, ‘yyyy-MM-dd’). Output is a string value in converted date format. |
substring(column,beginInclusive) | Get column.substring(beginInclusive). Output is a STRING. |
substring(column,beginInclusive,endExclusive) | Get column.substring(beginInclusive,endExclusive). Output is a STRING. |
truncate(column,width) | truncate column by width. Output type is same with column.If the column is a STRING, truncate(column,width) will truncate the string to width characters, namely value.substring(0, width) . If the column is an INT or LONG, truncate(column,width) will truncate the number with the algorithm v - (((v % W) + W) % W) . The redundant compute part is to keep the result always positive. If the column is a DECIMAL, truncate(column,width) will truncate the decimal with the algorithm: let scaled_W = decimal(W, scale(v)) , then return v - (v % scaled_W) . |
tinyint1-not-bool
(使用--type_mapping
),那么该列将映射到Paimon表中的TINYINT。to-nullable
(使用--type_mapping
)来忽略所有NOT NULL约束(主键除外)。to-string
(使用--type_mapping
)将所有MySQL数据类型映射到字符串。char-to-string
(使用--type_mapping
)将MySQL CHAR(长度)/VARCHAR(长度)类型映射到STRING。longtext-to-bytes
(使用--type_mapping
)将MySQL LONGTEXT类型映射到BYTES。BIGINT UNSIGNED
,BIGINT UNSIGNED ZEROFILL
,SERIAL
将默认映射到DECIMAL(20, 0)
可以使用类型映射选项bigint-unsigned-to-bigint
(使用--type_mapping
)将这些类型映射到Paimon BIGINT
,但存在潜在的数据溢出,因为BIGINT UNSIGNED
可以存储多达20位的整数值,而Paimon BIGINT
只能存储多达19位的整数值。因此,应确保使用此选项时不会发生溢出。Custom Job Settings(自定义作业设置)
Checkpointing(检查点)
使用-Dexecution.checkpointing.interval=
启用检查点并设置时间间隔,对于0.7及更高版本,如果尚未启用检查点,Paimon将默认启用检查点,并将检查点间隔设置为180秒。
Job Name
使用-Dpipeline.name=
设置自定义同步作业的名称。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。