当前位置:   article > 正文

AWS实战:Aurora到Redshift数据同步_aws redshift

aws redshift

什么是Aurora

  • Amazon Aurora是一种基于云且完全托管关系型数据库服务,与MySQL 和 PostgreSQL 数据库兼容,完全托管意味着自动对数据库进行管理,包括管理数据备份、硬件配置和软件更新等操作

  • Amazon Aurora提供了企业级性能

  • Amazon Aurora提供了多种实例类型,不同的实例类型有不同的CPU,内存,带宽等配置,用户可以根据需求选择合适的实例类型来支持自己的业务

  • Amazon Aurora 提供了可扩展性,可以根据需要自动扩大或缩小存储规模。Amazon Aurora 将自动从最小 10 GB 存储扩展到 64 TB 存储,每次增量为 10 GB,且扩展不会影响数据库的性能

什么是Redshift

  • Amazon Redshift 是一个基于云的完全托管的数据仓库服务。开发人员可以使用 redshift 来构建、查询和维护多达 PB 或更多的数据。

  • 每个 Redshift 数据仓库都是完全托管的,这意味着备份创建、安全和配置等管理任务都是自动化的。

  • Redshift 中的每个数据仓库都由一组节点组成。这些节点被组织成集群,使得对大量数据运行分析查询变得容易。

  • 许多现有的基于 SQL 的客户端,以及范围广泛的数据源和数据分析工具,都可以与 Redshift 一起使用。它具有稳定的架构,可以轻松地与各种商业智能工具进行交互。

  • 由于 Redshift 旨在处理大量数据,因此其模块化设计使其能够轻松扩展。它的多层结构使一次处理多个查询变得简单。

  • 可以从 Redshift 集群创建切片,允许对数据集进行更精细的检查

  • Redshift 提供的性能指标可用于跟踪数据库的健康状况

为什么会有从Aurora到Redshift数据同步的需求

  • Aurora 是一个基于行的数据库,这使得它最适合事务查询(OLTP)业务,比如Web 应用程序。想根据用户 ID 查找用户名?使用 Aurora 很容易做到。

  • Redshift 是一个基于列的数据库,旨在高效处理对大量数据的复杂查询,这使得它最适合数据分析(OLAP)业务,比如电商网站用户行为分析。

  • 比如我们可以用Aurora来实现电商网站的基本业务功能,用户管理,订单管理,商品管理等基本功能,然后用Redshift来实现用户行为分析,商品销售情况分析等等需要处理海量数据的查询功能。

AWS Data Migration Service

  • AWS Database Migration Service (AWS DMS) 用于管理和迁移云上的数据库。

  • 借助 AWS DMS,客户可以跨许多流行的商业数据库同步数据,包括 Microsoft SQL Server、Amazon Aurora 和 MySQL。

  • 使用 AWS DMS,用户可以运行一次性迁移或设置持续同步。

  • 在同步数据库之前,需要在云服务器上创建一个复制实例。此实例读取源数据并将其加载到目标。

如何使用AWS DMS从Aurora同步数据到Redshift

  • 在实际开发过程中,我们的web服务器位于VPC public subnet中,允许任何客户端进行访问

  • 用户数据库一般位于VPC的private subnet中,且只允许同一个VPC的web服务器进行访问,以保证数据库的安全

  • DMS进行数据同步使用的instance也只需要位于private subnet,无需对外开放

  • 因此我们首先需要创建一个VPC,vpc的架构如下:

接下来我们详细介绍每一部分需要创建的资源。

IAM

如果在资源创建过程中出现类似如下没有权限的错误,则需要在AWS Web Console IAM页面为用户添加相应的权限。

User: arn:aws:iam::****:user/*** is not authorized to perform: dms:CreateReplicationSubnetGroup on resource: arn:aws:dms:ap-southeast-1:****:subgrp:aurora-to-redshift-instance because no identity-based policy allows the dms:CreateReplicationSubnetGroup action (Service: AWSDatabaseMigrationService; Status Code: 400; Error Code: AccessDeniedException; ; Proxy: null)

本文中用到的权限如下:

  • AmazonS3FullAccess: AWS Managed Policy

  • AmazonVPCFullAccess: AWS Managed Policy

  • AmazonRDSFullAccess: AWS Managed Policy

  • AmazonRedshiftFullAccess: AWS Managed Policy

  • DMS没有找到AWS Managed Policy,需要自己手动创建一下,内容如下:

  1. {
  2. "Version": "2012-10-17",
  3. "Statement": [
  4. {
  5. "Sid": "VisualEditor0",
  6. "Effect": "Allow",
  7. "Action": "dms:*",
  8. "Resource": "*"
  9. }
  10. ]
  11. }

VPC

  • vpc

  • cird block: 10.10.0.0/16

  • internet gateway

  • public subnet

  • 三个Availability Zone

  • private subnet

  • 三个Availability Zone

  • public route table: public subnet关联的 route table

  • destination: 0.0.0.0/0 target: internet-gateway-id(允许与外界进行通信)

  • destination:10.10.0.0/16 local(内部通信)

  • private route table:private subnet关联的 route table

  • destination:10.10.0.0/16 local(内部通信)

  • web server security group

  • 允许任意ip对443端口进行访问

  • 允许自己的ipdui22端口进行访问,以便ssh到服务器上向数据库插入数据

  • private dms security group

  • 只允许vpc内部访问

  • private aurora security group

  • 允许web server security group对3306端口进行访问

  • 允许dms security group对3306端口进行访问

  • private redshift aurora security group

  • 允许web server security group对5439端口进行访问

  • 允许dms security group对5439端口进行访问

VPC全部资源的serverless文件:

  • custom:bucketNamePrefix 替换为自己的创建的bucket

  • resources:Parameters:MyIP 替换为自己的public ip

  1. service: aurora-to-redshift-vpc
  2. custom:
  3. bucketNamePrefix: "bucketName"
  4. provider:
  5. name: aws
  6. region: ${opt:region, "ap-southeast-1"}
  7. stackName: ${self:service}
  8. deploymentBucket:
  9. name: com.${self:custom.bucketNamePrefix}.deploy-bucket
  10. serverSideEncryption: AES256
  11. resources:
  12. Parameters:
  13. VpcName:
  14. Type: String
  15. Default: "test-vpc"
  16. MyIP:
  17. Type: String
  18. Default: "your public ip"
  19. Resources:
  20. VPC:
  21. Type: "AWS::EC2::VPC"
  22. Properties:
  23. CidrBlock: "10.10.0.0/16"
  24. EnableDnsSupport: true
  25. EnableDnsHostnames: true
  26. InstanceTenancy: default
  27. Tags:
  28. - Key: Name
  29. Value: !Sub "VPC_${VpcName}"
  30. # Internet Gateway
  31. InternetGateway:
  32. Type: "AWS::EC2::InternetGateway"
  33. Properties:
  34. Tags:
  35. - Key: Name
  36. Value: !Sub "VPC_${VpcName}_InternetGateway"
  37. VPCGatewayAttachment:
  38. Type: "AWS::EC2::VPCGatewayAttachment"
  39. Properties:
  40. VpcId: !Ref VPC
  41. InternetGatewayId: !Ref InternetGateway
  42. # web server security group
  43. WebServerSecurityGroup:
  44. Type: AWS::EC2::SecurityGroup
  45. Properties:
  46. GroupDescription: Allow access from public
  47. VpcId: !Ref VPC
  48. SecurityGroupIngress:
  49. - IpProtocol: tcp
  50. FromPort: 443
  51. ToPort: 443
  52. CidrIp: "0.0.0.0/0"
  53. - IpProtocol: tcp
  54. FromPort: 22
  55. ToPort: 22
  56. CidrIp: !Sub "${MyIP}/32"
  57. Tags:
  58. - Key: Name
  59. Value: !Sub "VPC_${VpcName}_WebServerSecurityGroup"
  60. # public route table
  61. RouteTablePublic:
  62. Type: "AWS::EC2::RouteTable"
  63. Properties:
  64. VpcId: !Ref VPC
  65. Tags:
  66. - Key: Name
  67. Value: !Sub "VPC_${VpcName}_RouteTablePublic"
  68. RouteTablePublicInternetRoute:
  69. Type: "AWS::EC2::Route"
  70. DependsOn: VPCGatewayAttachment
  71. Properties:
  72. RouteTableId: !Ref RouteTablePublic
  73. DestinationCidrBlock: "0.0.0.0/0"
  74. GatewayId: !Ref InternetGateway
  75. # public subnet
  76. SubnetAPublic:
  77. Type: "AWS::EC2::Subnet"
  78. Properties:
  79. AvailabilityZone: !Select [0, !GetAZs ""]
  80. CidrBlock: "10.10.0.0/24"
  81. MapPublicIpOnLaunch: true
  82. VpcId: !Ref VPC
  83. Tags:
  84. - Key: Name
  85. Value: !Sub "VPC_${VpcName}_SubnetAPublic"
  86. RouteTableAssociationAPublic:
  87. Type: "AWS::EC2::SubnetRouteTableAssociation"
  88. Properties:
  89. SubnetId: !Ref SubnetAPublic
  90. RouteTableId: !Ref RouteTablePublic
  91. SubnetBPublic:
  92. Type: "AWS::EC2::Subnet"
  93. Properties:
  94. AvailabilityZone: !Select [1, !GetAZs ""]
  95. CidrBlock: "10.10.32.0/24"
  96. MapPublicIpOnLaunch: true
  97. VpcId: !Ref VPC
  98. Tags:
  99. - Key: Name
  100. Value: !Sub "VPC_${VpcName}_SubnetBPublic"
  101. RouteTableAssociationBPublic:
  102. Type: "AWS::EC2::SubnetRouteTableAssociation"
  103. Properties:
  104. SubnetId: !Ref SubnetBPublic
  105. RouteTableId: !Ref RouteTablePublic
  106. SubnetCPublic:
  107. Type: "AWS::EC2::Subnet"
  108. Properties:
  109. AvailabilityZone: !Select [2, !GetAZs ""]
  110. CidrBlock: "10.10.64.0/24"
  111. MapPublicIpOnLaunch: true
  112. VpcId: !Ref VPC
  113. Tags:
  114. - Key: Name
  115. Value: !Sub "VPC_${VpcName}_SubnetCPublic"
  116. RouteTableAssociationCPublic:
  117. Type: "AWS::EC2::SubnetRouteTableAssociation"
  118. Properties:
  119. SubnetId: !Ref SubnetCPublic
  120. RouteTableId: !Ref RouteTablePublic
  121. # instance security group
  122. PrivateDMSInstanceSecurityGroup:
  123. Type: AWS::EC2::SecurityGroup
  124. Properties:
  125. GroupDescription: Allow http to client host
  126. VpcId: !Ref VPC
  127. SecurityGroupIngress:
  128. - IpProtocol: tcp
  129. FromPort: 443
  130. ToPort: 443
  131. CidrIp: "10.10.0.0/16"
  132. Tags:
  133. - Key: Name
  134. Value: !Sub "VPC_${VpcName}_PrivateDMSInstanceSecurityGroup"
  135. # aurora security group
  136. PrivateAuroraSecurityGroup:
  137. Type: AWS::EC2::SecurityGroup
  138. Properties:
  139. GroupDescription: Allow access from dms instance securty group
  140. VpcId: !Ref VPC
  141. SecurityGroupIngress:
  142. - IpProtocol: tcp
  143. FromPort: 3306
  144. ToPort: 3306
  145. SourceSecurityGroupId: !Ref PrivateDMSInstanceSecurityGroup
  146. - IpProtocol: tcp
  147. FromPort: 3306
  148. ToPort: 3306
  149. SourceSecurityGroupId: !Ref WebServerSecurityGroup
  150. Tags:
  151. - Key: Name
  152. Value: !Sub "VPC_${VpcName}_PrivateAuroraSecurityGroup"
  153. # redshift security group
  154. PrivateRedshiftSecurityGroup:
  155. Type: AWS::EC2::SecurityGroup
  156. Properties:
  157. GroupDescription: Allow access from dms instance securty group
  158. VpcId: !Ref VPC
  159. SecurityGroupIngress:
  160. - IpProtocol: tcp
  161. FromPort: 5439
  162. ToPort: 5439
  163. SourceSecurityGroupId: !Ref PrivateDMSInstanceSecurityGroup
  164. - IpProtocol: tcp
  165. FromPort: 5439
  166. ToPort: 5439
  167. SourceSecurityGroupId: !Ref WebServerSecurityGroup
  168. Tags:
  169. - Key: Name
  170. Value: !Sub "VPC_${VpcName}_PrivateRedshiftSecurityGroup"
  171. # private route table
  172. RouteTablePrivate:
  173. Type: "AWS::EC2::RouteTable"
  174. Properties:
  175. VpcId: !Ref VPC
  176. Tags:
  177. - Key: Name
  178. Value: !Sub "VPC_${VpcName}_RouteTablePrivate"
  179. # private subnet
  180. SubnetAPrivate:
  181. Type: "AWS::EC2::Subnet"
  182. Properties:
  183. AvailabilityZone: !Select [0, !GetAZs ""]
  184. CidrBlock: "10.10.16.0/24"
  185. VpcId: !Ref VPC
  186. Tags:
  187. - Key: Name
  188. Value: !Sub "VPC_${VpcName}_SubnetAPrivate"
  189. RouteTableAssociationAPrivate:
  190. Type: "AWS::EC2::SubnetRouteTableAssociation"
  191. Properties:
  192. SubnetId: !Ref SubnetAPrivate
  193. RouteTableId: !Ref RouteTablePrivate
  194. SubnetBPrivate:
  195. Type: "AWS::EC2::Subnet"
  196. Properties:
  197. AvailabilityZone: !Select [1, !GetAZs ""]
  198. CidrBlock: "10.10.48.0/24"
  199. VpcId: !Ref VPC
  200. Tags:
  201. - Key: Name
  202. Value: !Sub "VPC_${VpcName}_SubnetBPrivate"
  203. RouteTableAssociationBPrivate:
  204. Type: "AWS::EC2::SubnetRouteTableAssociation"
  205. Properties:
  206. SubnetId: !Ref SubnetBPrivate
  207. RouteTableId: !Ref RouteTablePrivate
  208. SubnetCPrivate:
  209. Type: "AWS::EC2::Subnet"
  210. Properties:
  211. AvailabilityZone: !Select [2, !GetAZs ""]
  212. CidrBlock: "10.10.80.0/24"
  213. VpcId: !Ref VPC
  214. Tags:
  215. - Key: Name
  216. Value: !Sub "VPC_${VpcName}_SubnetCPrivate"
  217. RouteTableAssociationCPrivate:
  218. Type: "AWS::EC2::SubnetRouteTableAssociation"
  219. Properties:
  220. SubnetId: !Ref SubnetCPrivate
  221. RouteTableId: !Ref RouteTablePrivate
  222. Outputs:
  223. VPC:
  224. Description: "VPC."
  225. Value: !Ref VPC
  226. Export:
  227. Name: !Sub "${self:provider.stackName}"
  228. SubnetsPublic:
  229. Description: "Subnets public."
  230. Value:
  231. !Join [
  232. ",",
  233. [!Ref SubnetAPublic, !Ref SubnetBPublic, !Ref SubnetCPublic],
  234. ]
  235. Export:
  236. Name: !Sub "${self:provider.stackName}-PublicSubnets"
  237. SubnetsPrivate:
  238. Description: "Subnets private."
  239. Value:
  240. !Join [
  241. ",",
  242. [!Ref SubnetAPrivate, !Ref SubnetBPrivate, !Ref SubnetCPrivate],
  243. ]
  244. Export:
  245. Name: !Sub "${self:provider.stackName}-PrivateSubnets"
  246. DefaultSecurityGroup:
  247. Description: "VPC Default Security Group"
  248. Value: !GetAtt VPC.DefaultSecurityGroup
  249. Export:
  250. Name: !Sub "${self:provider.stackName}-DefaultSecurityGroup"
  251. WebServerSecurityGroup:
  252. Description: "VPC Web Server Security Group"
  253. Value: !Ref WebServerSecurityGroup
  254. Export:
  255. Name: !Sub "${self:provider.stackName}-WebServerSecurityGroup"
  256. PrivateDMSInstanceSecurityGroup:
  257. Description: "The id of the InstanceSecurityGroup"
  258. Value: !Ref PrivateDMSInstanceSecurityGroup
  259. Export:
  260. Name: !Sub "${self:provider.stackName}-PrivateDMSInstanceSecurityGroup"
  261. PrivateAuroraSecurityGroup:
  262. Description: "The id of the AuroraSecurityGroup"
  263. Value: !Ref PrivateAuroraSecurityGroup
  264. Export:
  265. Name: !Sub "${self:provider.stackName}-PrivateAuroraSecurityGroup"
  266. PrivateRedshiftSecurityGroup:
  267. Description: "The id of the RedshiftSecurityGroup"
  268. Value: !Ref PrivateRedshiftSecurityGroup
  269. Export:
  270. Name: !Sub "${self:provider.stackName}-PrivateRedshiftSecurityGroup"

Aurora

  • DB Cluster Parameter Group

  • 要使用DMS对aurora进行数据同步,需要打开binlog相关的配置,默认的Cluster Parameter Group中binlog的配置是关闭的,并且默认Cluster Parameter Group是不可修改的

  • 因此必须创建一个自己的Cluster Parameter Group ,并设置binlog相关的配置

  • Family: aurora-mysql8.0

  • Parameters:

  • binlog_format: "ROW"

  • binlog_row_image: "Full"

  • binlog_checksum: "NONE"

  • VpcSecurityGroupIds

  • private aurora security group

DB Parameter Group

  • Family: aurora-mysql8.0

  • Subnet group

  • 创建一个包含private subnet的private subnet group就可以

  • DB Cluster

  • subnet group

  • private subnet group

Engine: aurora-mysql

  • EngineVersion: 8.0.mysql_aurora.3.02.0

  • 可以使用如下命令查看支持的EngineVersion:

aws rds describe-db-engine-versions --engine aurora --query "DBEngineVersions[].EngineVersion"
  • DB Cluster Instance

  • Engine: aurora-mysql

  • DBInstanceClass: db.t2.small

  • 可以使用如下命令查看支持的instance类型:

  1. aws rds describe-orderable-db-instance-options --engine aurora-mysql --engine-version "8.0.mysql_aurora.3.02.2" --query "OrderableDBInstanceOptions
  2. [].DBInstanceClass"

aurora全部资源的serverless文件:

  • custom:bucketNamePrefix 替换为自己的创建的bucket

  1. service: aurora-to-redshift-aurora
  2. custom:
  3. bucketNamePrefix: "bucketName"
  4. provider:
  5. name: aws
  6. region: ${opt:region, "ap-southeast-1"}
  7. stackName: ${self:service}
  8. deploymentBucket:
  9. name: com.${self:custom.bucketNamePrefix}.deploy-bucket
  10. serverSideEncryption: AES256
  11. resources:
  12. Parameters:
  13. ServiceName:
  14. Type: String
  15. Default: aurora-to-redshift
  16. Resources:
  17. DBClusterParameterGroup:
  18. Type: "AWS::RDS::DBClusterParameterGroup"
  19. Properties:
  20. DBClusterParameterGroupName: cluster-aurora-mysql8-with-binlog
  21. Description: DB Cluster Parameter Group with aurora-mysql8.0 engine, and binlog enabled
  22. Family: aurora-mysql8.0
  23. Parameters:
  24. binlog_format: "ROW"
  25. binlog_row_image: "Full"
  26. binlog_checksum: "NONE"
  27. DBParameterGroup:
  28. Type: "AWS::RDS::DBParameterGroup"
  29. Properties:
  30. DBParameterGroupName: instance-mysql8
  31. Description: DB Instance Parameter Group
  32. Family: aurora-mysql8.0
  33. PrivateDBSubnetGroup:
  34. Type: "AWS::RDS::DBSubnetGroup"
  35. Properties:
  36. DBSubnetGroupName: private-subnet
  37. DBSubnetGroupDescription: "private subnet"
  38. SubnetIds:
  39. Fn::Split:
  40. - ","
  41. - Fn::ImportValue: !Sub "${ServiceName}-vpc-PrivateSubnets"
  42. RDSCluster:
  43. Type: "AWS::RDS::DBCluster"
  44. Properties:
  45. DBClusterIdentifier: test-data-sync-aurora
  46. DBClusterParameterGroupName:
  47. Ref: DBClusterParameterGroup
  48. DBSubnetGroupName:
  49. Ref: PrivateDBSubnetGroup
  50. VpcSecurityGroupIds:
  51. - Fn::ImportValue: !Sub "${ServiceName}-vpc-PrivateAuroraSecurityGroup"
  52. Engine: aurora-mysql
  53. EngineVersion: 8.0.mysql_aurora.3.02.2
  54. DatabaseName: dev
  55. MasterUserPassword: aurora_admin_2022
  56. MasterUsername: admin
  57. RDSDBInstance:
  58. Type: "AWS::RDS::DBInstance"
  59. Properties:
  60. DBClusterIdentifier:
  61. Ref: RDSCluster
  62. DBInstanceIdentifier: test-data-sync-aurora-instance
  63. DBInstanceClass: db.t3.medium
  64. DBParameterGroupName:
  65. Ref: DBParameterGroup
  66. DBSubnetGroupName:
  67. Ref: PrivateDBSubnetGroup
  68. Engine: aurora-mysql
  69. Outputs:
  70. AuroraEndpoint:
  71. Description: "Aurora endpoint"
  72. Value: !GetAtt RDSCluster.Endpoint.Address
  73. Export:
  74. Name: !Sub "${self:provider.stackName}-AuroraEndpoint"
  75. AuroraPort:
  76. Description: "Aurora port"
  77. Value: !GetAtt RDSCluster.Endpoint.Port
  78. Export:
  79. Name: !Sub "${self:provider.stackName}-AuroraPort"
  • 在创建完资源后,连接数据库,执行一下sql创建数据

  • 连接数据库需要创建一个位于vpc public subnet的服务器,创建过程参考这里:

  • vpc: 选择上面步骤中我们自己创建的vpc

  • subnet:选择任意一个public subnet

  • security group:web server security group

  • key pair: 创建一个名为tutorial-ec2-instance-web-server的key pari,选择pem(ssh)格式

  • 打开终端或者SSH Client

  • 找到创建key pair时下载的文件: tutorial-ec2-instance-web-server.pem

  • 执行一下命令修改文件的执行权限

chmod 400 tutorial-ec2-instance-web-server.pem
  • 使用Public DNS连接到ec2实例,可以在aws web console打开ec2实例的页面,点击右上角的Connect来找到如何连接

ssh -i "tutorial-ec2-instance-web-server.pem" ec2-user@${ec2_public_dns}.ap-southeast-1.compute.amazonaws.com
  • 连接完成后,安装mysql

sudo yum install mysql
  • 连接aurora,输入以下命令,然后输入密码继续

mysql -h {aurora-cluster}.ap-southeast-1.rds.amazonaws.com -P 3306 -u admin -p
  • 执行以下sql插入数据

  1. USE dev;
  2. CREATE TABLE person (
  3. person_id int,
  4. last_name varchar(255),
  5. first_name varchar(255),
  6. city varchar(255)
  7. );
  8. INSERT INTO person VALUES (1, 'win', 'jessica', 'sh');

Redshift

  • Cluster subnet group

  • 创建一个包含private subnet的private subnet group

  • Cluster

  • ClusterSubnetGroupName

  • 使用private subnet group

  • VpcSecurityGroupIds

  • 使用private redshift security group

  • NodeType: dc2.large

  • ClusterType: single-node

  • redshift全部资源的serverless文件:

  • custom:bucketNamePrefix 替换为自己的创建的bucket

  1. service: aurora-to-redshift-redshift
  2. custom:
  3. bucketNamePrefix: "bucketName"
  4. provider:
  5. name: aws
  6. region: ${opt:region, "ap-southeast-1"}
  7. stackName: ${self:service}
  8. deploymentBucket:
  9. name: com.${self:custom.bucketNamePrefix}.deploy-bucket
  10. serverSideEncryption: AES256
  11. resources:
  12. Parameters:
  13. ServiceName:
  14. Type: String
  15. Default: aurora-to-redshift
  16. Resources:
  17. ClusterSubnetGroup:
  18. Type: "AWS::Redshift::ClusterSubnetGroup"
  19. Properties:
  20. Description: Private Cluster Subnet Group
  21. SubnetIds:
  22. Fn::Split:
  23. - ","
  24. - Fn::ImportValue: !Sub ${ServiceName}-vpc-PrivateSubnets
  25. Tags:
  26. - Key: Name
  27. Value: private-subnet
  28. Cluster:
  29. Type: "AWS::Redshift::Cluster"
  30. Properties:
  31. ClusterIdentifier: test-data-sync-redshift
  32. ClusterSubnetGroupName: !Ref ClusterSubnetGroup
  33. VpcSecurityGroupIds:
  34. - Fn::ImportValue: !Sub ${ServiceName}-vpc-PrivateRedshiftSecurityGroup
  35. DBName: dev
  36. MasterUsername: admin
  37. MasterUserPassword: Redshift_admin_2022
  38. NodeType: dc2.large
  39. ClusterType: single-node
  40. Outputs:
  41. RedshiftEndpoint:
  42. Description: "Redshift endpoint"
  43. Value: !GetAtt Cluster.Endpoint.Address
  44. Export:
  45. Name: !Sub "${self:provider.stackName}-RedshiftEndpoint"
  46. RedshiftPort:
  47. Description: "Redshift port"
  48. Value: !GetAtt Cluster.Endpoint.Port
  49. Export:
  50. Name: !Sub "${self:provider.stackName}-RedshiftPort"

DMS

  • Subnet group

  • 创建一个包含private subnet的private subnet group

  • Replication instance

  • ReplicationInstanceClass: dms.t2.small

  • ReplicationSubnetGroupIdentifier:

  • 使用private subnet group

  • VpcSecurityGroupIds

  • 使用dms security group

  • Source endpoint

  • EndpointType: source

  • EngineName: aurora

  • ServerName: aurora cluster endpoint

  • Port: 3306

  • Target endpoint

  • EndpointType: target

  • EngineName: ednpoint

  • ServerName: redshift cluster endpoint

  • Port: 5439

  • Database migration task

  • MigrationType: full-load-and-cdc

  • ReplicationInstanceArn

  • TableMappings:"{ \"rules\": [ { \"rule-type\": \"selection\", \"rule-id\": \"1\", \"rule-name\": \"1\", \"object-locator\": { \"schema-name\": \"dev\", \"table-name\": \"test-aurora-to-redshift\" }, \"rule-action\": \"include\" } ] }"

  • ReplicationTaskSettings: "{ \"FullLoadSettings\": {\"TargetTablePrepMode\": \"DO_NOTHING\"}}"

  • dms全部资源的serverless文件

  • custom:bucketNamePrefix 替换为自己的创建的bucket

  1. service: aurora-to-redshift-dms
  2. custom:
  3. bucketNamePrefix: "bucketName"
  4. provider:
  5. name: aws
  6. region: ${opt:region, "ap-southeast-1"}
  7. stackName: ${self:service}
  8. deploymentBucket:
  9. name: com.${self:custom.bucketNamePrefix}.deploy-bucket
  10. serverSideEncryption: AES256
  11. resources:
  12. Parameters:
  13. ServiceName:
  14. Type: String
  15. Default: aurora-to-redshift
  16. Resources:
  17. ReplicationSubnetGroup:
  18. Type: "AWS::DMS::ReplicationSubnetGroup"
  19. Properties:
  20. ReplicationSubnetGroupDescription: instance to sync data from aurora to redshift
  21. ReplicationSubnetGroupIdentifier: aurora-to-redshift-instance
  22. SubnetIds:
  23. Fn::Split:
  24. - ","
  25. - Fn::ImportValue: !Sub ${ServiceName}-vpc-PrivateSubnets
  26. ReplicationInstance:
  27. Type: "AWS::DMS::ReplicationInstance"
  28. Properties:
  29. ReplicationInstanceIdentifier: aurora-to-redshift-instance
  30. ReplicationInstanceClass: dms.t2.small
  31. ReplicationSubnetGroupIdentifier: !Ref ReplicationSubnetGroup
  32. PubliclyAccessible: false
  33. VpcSecurityGroupIds:
  34. - Fn::ImportValue: !Sub ${ServiceName}-vpc-PrivateDMSInstanceSecurityGroup
  35. AuroraSourceEndpoint:
  36. Type: "AWS::DMS::Endpoint"
  37. Properties:
  38. EndpointIdentifier: aurora
  39. EndpointType: source
  40. ServerName:
  41. Fn::ImportValue: !Sub ${ServiceName}-aurora-AuroraEndpoint
  42. Port:
  43. Fn::ImportValue: !Sub ${ServiceName}-aurora-AuroraPort
  44. Username: admin
  45. Password: aurora_admin_2022
  46. EngineName: aurora
  47. RedshiftTargetEndpoint:
  48. Type: "AWS::DMS::Endpoint"
  49. Properties:
  50. EndpointIdentifier: redshift
  51. EndpointType: target
  52. ServerName:
  53. Fn::ImportValue: !Sub ${ServiceName}-redshift-RedshiftEndpoint
  54. Port:
  55. Fn::ImportValue: !Sub ${ServiceName}-redshift-RedshiftPort
  56. DatabaseName: dev
  57. Username: admin
  58. Password: Redshift_admin_2022
  59. EngineName: redshift
  60. ReplicationTask:
  61. Type: "AWS::DMS::ReplicationTask"
  62. Properties:
  63. ReplicationTaskIdentifier: aurora-to-redshift-task
  64. MigrationType: full-load-and-cdc
  65. ReplicationInstanceArn: !Ref ReplicationInstance
  66. SourceEndpointArn: !Ref AuroraSourceEndpoint
  67. TableMappings: '{ "rules": [ { "rule-type": "selection", "rule-id": "1", "rule-name": "1", "object-locator": { "schema-name": "dev", "table-name": "%" }, "rule-action": "include" } ] }'
  68. TargetEndpointArn: !Ref RedshiftTargetEndpoint
  69. ReplicationTaskSettings: '{ "FullLoadSettings": {"TargetTablePrepMode": "DO_NOTHING"}}'

数据同步

  • 上述资源创建完毕后,可以在aws web console运行DMS的Database migration task,执行成功后status会是Load complete, replication ongoing

  • 同步完成后,可以通过Redshift query editor来确认数据是否同步成功。

  • 继续修改,添加,删除mysql数据,然后通过Redshift query editor来确认数据是否同步成功。

其他同步方法

Amazon Aurora zero-ETL integration with Amazon Redshift

  • Amazon即将推出一个Aurora的新功能,实现与 Amazon Redshift 的零 ETL 集成,目前这个功能还没有正式推出,处于Preview阶段,且在美国东部(弗吉尼亚北部)区域可用。以下来自Amazon公告,原文链接在此

  • Amazon Aurora 现在支持与 Amazon Redshift 的零 ETL 集成,以使用 Amazon Redshift 对来自 Aurora 的数 PB 交易数据进行近乎实时的分析和机器学习 (ML)。在事务数据写入 Aurora 的几秒钟内,数据就可以在 Amazon Redshift 中使用,因此您不必构建和维护复杂的数据管道来执行提取、转换和加载 (ETL) 操作。

  • 这种零 ETL 集成还使您能够在同一个新的或现有的 Amazon Redshift 实例中分析来自多个 Aurora 数据库集群的数据,以获得跨多个应用程序或分区的整体见解。通过近乎实时地访问交易数据,您可以利用 Amazon Redshift 的分析和功能,例如内置 ML、物化视图、数据共享以及对多个数据存储和数据湖的联合访问,从交易数据和其他数据中获得洞察力。

  • Amazon Aurora 与 Amazon Redshift 的零 ETL 集成现已在美国东部(弗吉尼亚北部)区域推出兼容 MySQL 8.0 的 Amazon Aurora MySQL 3 的有限预览版。

AWS Data Pipeline

  • AWS Data Pipeline也支持aurora到redshift的持续同步

  • 相比于DMS的缺点

  • AWS Data Pipeline目前只支持以下region

  • 一个AWS Data Pipeline只能同步一张aurora表,aurora表名必须完全匹配

  • 相比于DMS的优点

  • 同步过程中支持自定义查询语句,可以对源表数据进行筛选之后同步到redshift

AWS Glue

  • AWS Glue也支持aurora到redshift同步

  • 相比于DMS的缺点

  • 一个AWS Data Pipeline只能同步一张aurora表

  • 不支持持续同步,只支持全表数据同步

  • 需要在glue管理源表和目标表的scheme

  • 相比于DMS的优点

  • 可以在数据同步过程中对数据列应用mapping规则,可以按一定规则修改列值,可以删除某些列(目标表比源表少某些列),目标表与源表列名也可以不同

如何进行选择

  • 如果需要持续同步且要使用的region不支持AWS Data Pipeline,则只能使用AWS DMS

  • 如果需要持续同步且要使用的region支持AWS Data Pipeline,则可以根据其他需求决定使用哪个service,比如如果需要对数据进行筛选则可以选择AWS Data Pipeline

  • 如果只需要整表一次性同步,且没有数据修改的需求,则可以选择任意一个

  • 如果只需要整表一次性同步,且需要在同步过程中对数据进行一定规则的修改,则选择AWS Glue

github

https://github.com/JessicaWin/aws-in-action/tree/master/aurora-to-redshift

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/菜鸟追梦旅行/article/detail/622111
推荐阅读
  

闽ICP备14008679号