赞
踩
Amazon Aurora是一种基于云且完全托管关系型数据库服务,与MySQL 和 PostgreSQL 数据库兼容,完全托管意味着自动对数据库进行管理,包括管理数据备份、硬件配置和软件更新等操作
Amazon Aurora提供了企业级性能
Amazon Aurora提供了多种实例类型,不同的实例类型有不同的CPU,内存,带宽等配置,用户可以根据需求选择合适的实例类型来支持自己的业务
Amazon Aurora 提供了可扩展性,可以根据需要自动扩大或缩小存储规模。Amazon Aurora 将自动从最小 10 GB 存储扩展到 64 TB 存储,每次增量为 10 GB,且扩展不会影响数据库的性能
Amazon Redshift 是一个基于云的完全托管的数据仓库服务。开发人员可以使用 redshift 来构建、查询和维护多达 PB 或更多的数据。
每个 Redshift 数据仓库都是完全托管的,这意味着备份创建、安全和配置等管理任务都是自动化的。
Redshift 中的每个数据仓库都由一组节点组成。这些节点被组织成集群,使得对大量数据运行分析查询变得容易。
许多现有的基于 SQL 的客户端,以及范围广泛的数据源和数据分析工具,都可以与 Redshift 一起使用。它具有稳定的架构,可以轻松地与各种商业智能工具进行交互。
由于 Redshift 旨在处理大量数据,因此其模块化设计使其能够轻松扩展。它的多层结构使一次处理多个查询变得简单。
可以从 Redshift 集群创建切片,允许对数据集进行更精细的检查
Redshift 提供的性能指标可用于跟踪数据库的健康状况
Aurora 是一个基于行的数据库,这使得它最适合事务查询(OLTP)业务,比如Web 应用程序。想根据用户 ID 查找用户名?使用 Aurora 很容易做到。
Redshift 是一个基于列的数据库,旨在高效处理对大量数据的复杂查询,这使得它最适合数据分析(OLAP)业务,比如电商网站用户行为分析。
比如我们可以用Aurora来实现电商网站的基本业务功能,用户管理,订单管理,商品管理等基本功能,然后用Redshift来实现用户行为分析,商品销售情况分析等等需要处理海量数据的查询功能。
AWS Database Migration Service (AWS DMS) 用于管理和迁移云上的数据库。
借助 AWS DMS,客户可以跨许多流行的商业数据库同步数据,包括 Microsoft SQL Server、Amazon Aurora 和 MySQL。
使用 AWS DMS,用户可以运行一次性迁移或设置持续同步。
在同步数据库之前,需要在云服务器上创建一个复制实例。此实例读取源数据并将其加载到目标。
在实际开发过程中,我们的web服务器位于VPC public subnet中,允许任何客户端进行访问
用户数据库一般位于VPC的private subnet中,且只允许同一个VPC的web服务器进行访问,以保证数据库的安全
DMS进行数据同步使用的instance也只需要位于private subnet,无需对外开放
因此我们首先需要创建一个VPC,vpc的架构如下:
接下来我们详细介绍每一部分需要创建的资源。
如果在资源创建过程中出现类似如下没有权限的错误,则需要在AWS Web Console IAM页面为用户添加相应的权限。
User: arn:aws:iam::****:user/*** is not authorized to perform: dms:CreateReplicationSubnetGroup on resource: arn:aws:dms:ap-southeast-1:****:subgrp:aurora-to-redshift-instance because no identity-based policy allows the dms:CreateReplicationSubnetGroup action (Service: AWSDatabaseMigrationService; Status Code: 400; Error Code: AccessDeniedException; ; Proxy: null)
本文中用到的权限如下:
AmazonS3FullAccess: AWS Managed Policy
AmazonVPCFullAccess: AWS Managed Policy
AmazonRDSFullAccess: AWS Managed Policy
AmazonRedshiftFullAccess: AWS Managed Policy
DMS没有找到AWS Managed Policy,需要自己手动创建一下,内容如下:
- {
- "Version": "2012-10-17",
- "Statement": [
- {
- "Sid": "VisualEditor0",
- "Effect": "Allow",
- "Action": "dms:*",
- "Resource": "*"
- }
- ]
- }
vpc
cird block: 10.10.0.0/16
internet gateway
public subnet
三个Availability Zone
private subnet
三个Availability Zone
public route table: public subnet关联的 route table
destination: 0.0.0.0/0 target: internet-gateway-id(允许与外界进行通信)
destination:10.10.0.0/16 local(内部通信)
private route table:private subnet关联的 route table
destination:10.10.0.0/16 local(内部通信)
web server security group
允许任意ip对443端口进行访问
允许自己的ipdui22端口进行访问,以便ssh到服务器上向数据库插入数据
private dms security group
只允许vpc内部访问
private aurora security group
允许web server security group对3306端口进行访问
允许dms security group对3306端口进行访问
private redshift aurora security group
允许web server security group对5439端口进行访问
允许dms security group对5439端口进行访问
VPC全部资源的serverless文件:
custom:bucketNamePrefix 替换为自己的创建的bucket
resources:Parameters:MyIP 替换为自己的public ip
service: aurora-to-redshift-vpc custom: bucketNamePrefix: "bucketName" provider: name: aws region: ${opt:region, "ap-southeast-1"} stackName: ${self:service} deploymentBucket: name: com.${self:custom.bucketNamePrefix}.deploy-bucket serverSideEncryption: AES256 resources: Parameters: VpcName: Type: String Default: "test-vpc" MyIP: Type: String Default: "your public ip" Resources: VPC: Type: "AWS::EC2::VPC" Properties: CidrBlock: "10.10.0.0/16" EnableDnsSupport: true EnableDnsHostnames: true InstanceTenancy: default Tags: - Key: Name Value: !Sub "VPC_${VpcName}" # Internet Gateway InternetGateway: Type: "AWS::EC2::InternetGateway" Properties: Tags: - Key: Name Value: !Sub "VPC_${VpcName}_InternetGateway" VPCGatewayAttachment: Type: "AWS::EC2::VPCGatewayAttachment" Properties: VpcId: !Ref VPC InternetGatewayId: !Ref InternetGateway # web server security group WebServerSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Allow access from public VpcId: !Ref VPC SecurityGroupIngress: - IpProtocol: tcp FromPort: 443 ToPort: 443 CidrIp: "0.0.0.0/0" - IpProtocol: tcp FromPort: 22 ToPort: 22 CidrIp: !Sub "${MyIP}/32" Tags: - Key: Name Value: !Sub "VPC_${VpcName}_WebServerSecurityGroup" # public route table RouteTablePublic: Type: "AWS::EC2::RouteTable" Properties: VpcId: !Ref VPC Tags: - Key: Name Value: !Sub "VPC_${VpcName}_RouteTablePublic" RouteTablePublicInternetRoute: Type: "AWS::EC2::Route" DependsOn: VPCGatewayAttachment Properties: RouteTableId: !Ref RouteTablePublic DestinationCidrBlock: "0.0.0.0/0" GatewayId: !Ref InternetGateway # public subnet SubnetAPublic: Type: "AWS::EC2::Subnet" Properties: AvailabilityZone: !Select [0, !GetAZs ""] CidrBlock: "10.10.0.0/24" MapPublicIpOnLaunch: true VpcId: !Ref VPC Tags: - Key: Name Value: !Sub "VPC_${VpcName}_SubnetAPublic" RouteTableAssociationAPublic: Type: "AWS::EC2::SubnetRouteTableAssociation" Properties: SubnetId: !Ref SubnetAPublic RouteTableId: !Ref RouteTablePublic SubnetBPublic: Type: "AWS::EC2::Subnet" Properties: AvailabilityZone: !Select [1, !GetAZs ""] CidrBlock: "10.10.32.0/24" MapPublicIpOnLaunch: true VpcId: !Ref VPC Tags: - Key: Name Value: !Sub "VPC_${VpcName}_SubnetBPublic" RouteTableAssociationBPublic: Type: "AWS::EC2::SubnetRouteTableAssociation" Properties: SubnetId: !Ref SubnetBPublic RouteTableId: !Ref RouteTablePublic SubnetCPublic: Type: "AWS::EC2::Subnet" Properties: AvailabilityZone: !Select [2, !GetAZs ""] CidrBlock: "10.10.64.0/24" MapPublicIpOnLaunch: true VpcId: !Ref VPC Tags: - Key: Name Value: !Sub "VPC_${VpcName}_SubnetCPublic" RouteTableAssociationCPublic: Type: "AWS::EC2::SubnetRouteTableAssociation" Properties: SubnetId: !Ref SubnetCPublic RouteTableId: !Ref RouteTablePublic # instance security group PrivateDMSInstanceSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Allow http to client host VpcId: !Ref VPC SecurityGroupIngress: - IpProtocol: tcp FromPort: 443 ToPort: 443 CidrIp: "10.10.0.0/16" Tags: - Key: Name Value: !Sub "VPC_${VpcName}_PrivateDMSInstanceSecurityGroup" # aurora security group PrivateAuroraSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Allow access from dms instance securty group VpcId: !Ref VPC SecurityGroupIngress: - IpProtocol: tcp FromPort: 3306 ToPort: 3306 SourceSecurityGroupId: !Ref PrivateDMSInstanceSecurityGroup - IpProtocol: tcp FromPort: 3306 ToPort: 3306 SourceSecurityGroupId: !Ref WebServerSecurityGroup Tags: - Key: Name Value: !Sub "VPC_${VpcName}_PrivateAuroraSecurityGroup" # redshift security group PrivateRedshiftSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Allow access from dms instance securty group VpcId: !Ref VPC SecurityGroupIngress: - IpProtocol: tcp FromPort: 5439 ToPort: 5439 SourceSecurityGroupId: !Ref PrivateDMSInstanceSecurityGroup - IpProtocol: tcp FromPort: 5439 ToPort: 5439 SourceSecurityGroupId: !Ref WebServerSecurityGroup Tags: - Key: Name Value: !Sub "VPC_${VpcName}_PrivateRedshiftSecurityGroup" # private route table RouteTablePrivate: Type: "AWS::EC2::RouteTable" Properties: VpcId: !Ref VPC Tags: - Key: Name Value: !Sub "VPC_${VpcName}_RouteTablePrivate" # private subnet SubnetAPrivate: Type: "AWS::EC2::Subnet" Properties: AvailabilityZone: !Select [0, !GetAZs ""] CidrBlock: "10.10.16.0/24" VpcId: !Ref VPC Tags: - Key: Name Value: !Sub "VPC_${VpcName}_SubnetAPrivate" RouteTableAssociationAPrivate: Type: "AWS::EC2::SubnetRouteTableAssociation" Properties: SubnetId: !Ref SubnetAPrivate RouteTableId: !Ref RouteTablePrivate SubnetBPrivate: Type: "AWS::EC2::Subnet" Properties: AvailabilityZone: !Select [1, !GetAZs ""] CidrBlock: "10.10.48.0/24" VpcId: !Ref VPC Tags: - Key: Name Value: !Sub "VPC_${VpcName}_SubnetBPrivate" RouteTableAssociationBPrivate: Type: "AWS::EC2::SubnetRouteTableAssociation" Properties: SubnetId: !Ref SubnetBPrivate RouteTableId: !Ref RouteTablePrivate SubnetCPrivate: Type: "AWS::EC2::Subnet" Properties: AvailabilityZone: !Select [2, !GetAZs ""] CidrBlock: "10.10.80.0/24" VpcId: !Ref VPC Tags: - Key: Name Value: !Sub "VPC_${VpcName}_SubnetCPrivate" RouteTableAssociationCPrivate: Type: "AWS::EC2::SubnetRouteTableAssociation" Properties: SubnetId: !Ref SubnetCPrivate RouteTableId: !Ref RouteTablePrivate Outputs: VPC: Description: "VPC." Value: !Ref VPC Export: Name: !Sub "${self:provider.stackName}" SubnetsPublic: Description: "Subnets public." Value: !Join [ ",", [!Ref SubnetAPublic, !Ref SubnetBPublic, !Ref SubnetCPublic], ] Export: Name: !Sub "${self:provider.stackName}-PublicSubnets" SubnetsPrivate: Description: "Subnets private." Value: !Join [ ",", [!Ref SubnetAPrivate, !Ref SubnetBPrivate, !Ref SubnetCPrivate], ] Export: Name: !Sub "${self:provider.stackName}-PrivateSubnets" DefaultSecurityGroup: Description: "VPC Default Security Group" Value: !GetAtt VPC.DefaultSecurityGroup Export: Name: !Sub "${self:provider.stackName}-DefaultSecurityGroup" WebServerSecurityGroup: Description: "VPC Web Server Security Group" Value: !Ref WebServerSecurityGroup Export: Name: !Sub "${self:provider.stackName}-WebServerSecurityGroup" PrivateDMSInstanceSecurityGroup: Description: "The id of the InstanceSecurityGroup" Value: !Ref PrivateDMSInstanceSecurityGroup Export: Name: !Sub "${self:provider.stackName}-PrivateDMSInstanceSecurityGroup" PrivateAuroraSecurityGroup: Description: "The id of the AuroraSecurityGroup" Value: !Ref PrivateAuroraSecurityGroup Export: Name: !Sub "${self:provider.stackName}-PrivateAuroraSecurityGroup" PrivateRedshiftSecurityGroup: Description: "The id of the RedshiftSecurityGroup" Value: !Ref PrivateRedshiftSecurityGroup Export: Name: !Sub "${self:provider.stackName}-PrivateRedshiftSecurityGroup"
DB Cluster Parameter Group
要使用DMS对aurora进行数据同步,需要打开binlog相关的配置,默认的Cluster Parameter Group中binlog的配置是关闭的,并且默认Cluster Parameter Group是不可修改的
因此必须创建一个自己的Cluster Parameter Group ,并设置binlog相关的配置
Family: aurora-mysql8.0
Parameters:
binlog_format: "ROW"
binlog_row_image: "Full"
binlog_checksum: "NONE"
VpcSecurityGroupIds
private aurora security group
DB Parameter Group
Family: aurora-mysql8.0
Subnet group
创建一个包含private subnet的private subnet group就可以
DB Cluster
subnet group
private subnet group
Engine: aurora-mysql
EngineVersion: 8.0.mysql_aurora.3.02.0
可以使用如下命令查看支持的EngineVersion:
aws rds describe-db-engine-versions --engine aurora --query "DBEngineVersions[].EngineVersion"
DB Cluster Instance
Engine: aurora-mysql
DBInstanceClass: db.t2.small
可以使用如下命令查看支持的instance类型:
- aws rds describe-orderable-db-instance-options --engine aurora-mysql --engine-version "8.0.mysql_aurora.3.02.2" --query "OrderableDBInstanceOptions
- [].DBInstanceClass"
aurora全部资源的serverless文件:
custom:bucketNamePrefix 替换为自己的创建的bucket
service: aurora-to-redshift-aurora custom: bucketNamePrefix: "bucketName" provider: name: aws region: ${opt:region, "ap-southeast-1"} stackName: ${self:service} deploymentBucket: name: com.${self:custom.bucketNamePrefix}.deploy-bucket serverSideEncryption: AES256 resources: Parameters: ServiceName: Type: String Default: aurora-to-redshift Resources: DBClusterParameterGroup: Type: "AWS::RDS::DBClusterParameterGroup" Properties: DBClusterParameterGroupName: cluster-aurora-mysql8-with-binlog Description: DB Cluster Parameter Group with aurora-mysql8.0 engine, and binlog enabled Family: aurora-mysql8.0 Parameters: binlog_format: "ROW" binlog_row_image: "Full" binlog_checksum: "NONE" DBParameterGroup: Type: "AWS::RDS::DBParameterGroup" Properties: DBParameterGroupName: instance-mysql8 Description: DB Instance Parameter Group Family: aurora-mysql8.0 PrivateDBSubnetGroup: Type: "AWS::RDS::DBSubnetGroup" Properties: DBSubnetGroupName: private-subnet DBSubnetGroupDescription: "private subnet" SubnetIds: Fn::Split: - "," - Fn::ImportValue: !Sub "${ServiceName}-vpc-PrivateSubnets" RDSCluster: Type: "AWS::RDS::DBCluster" Properties: DBClusterIdentifier: test-data-sync-aurora DBClusterParameterGroupName: Ref: DBClusterParameterGroup DBSubnetGroupName: Ref: PrivateDBSubnetGroup VpcSecurityGroupIds: - Fn::ImportValue: !Sub "${ServiceName}-vpc-PrivateAuroraSecurityGroup" Engine: aurora-mysql EngineVersion: 8.0.mysql_aurora.3.02.2 DatabaseName: dev MasterUserPassword: aurora_admin_2022 MasterUsername: admin RDSDBInstance: Type: "AWS::RDS::DBInstance" Properties: DBClusterIdentifier: Ref: RDSCluster DBInstanceIdentifier: test-data-sync-aurora-instance DBInstanceClass: db.t3.medium DBParameterGroupName: Ref: DBParameterGroup DBSubnetGroupName: Ref: PrivateDBSubnetGroup Engine: aurora-mysql Outputs: AuroraEndpoint: Description: "Aurora endpoint" Value: !GetAtt RDSCluster.Endpoint.Address Export: Name: !Sub "${self:provider.stackName}-AuroraEndpoint" AuroraPort: Description: "Aurora port" Value: !GetAtt RDSCluster.Endpoint.Port Export: Name: !Sub "${self:provider.stackName}-AuroraPort"
在创建完资源后,连接数据库,执行一下sql创建数据
连接数据库需要创建一个位于vpc public subnet的服务器,创建过程参考这里:
vpc: 选择上面步骤中我们自己创建的vpc
subnet:选择任意一个public subnet
security group:web server security group
key pair: 创建一个名为tutorial-ec2-instance-web-server的key pari,选择pem(ssh)格式
打开终端或者SSH Client
找到创建key pair时下载的文件: tutorial-ec2-instance-web-server.pem
执行一下命令修改文件的执行权限
chmod 400 tutorial-ec2-instance-web-server.pem
使用Public DNS连接到ec2实例,可以在aws web console打开ec2实例的页面,点击右上角的Connect来找到如何连接
ssh -i "tutorial-ec2-instance-web-server.pem" ec2-user@${ec2_public_dns}.ap-southeast-1.compute.amazonaws.com
连接完成后,安装mysql
sudo yum install mysql
连接aurora,输入以下命令,然后输入密码继续
mysql -h {aurora-cluster}.ap-southeast-1.rds.amazonaws.com -P 3306 -u admin -p
执行以下sql插入数据
- USE dev;
- CREATE TABLE person (
- person_id int,
- last_name varchar(255),
- first_name varchar(255),
- city varchar(255)
- );
- INSERT INTO person VALUES (1, 'win', 'jessica', 'sh');
Cluster subnet group
创建一个包含private subnet的private subnet group
Cluster
ClusterSubnetGroupName
使用private subnet group
VpcSecurityGroupIds
使用private redshift security group
NodeType: dc2.large
ClusterType: single-node
redshift全部资源的serverless文件:
custom:bucketNamePrefix 替换为自己的创建的bucket
service: aurora-to-redshift-redshift custom: bucketNamePrefix: "bucketName" provider: name: aws region: ${opt:region, "ap-southeast-1"} stackName: ${self:service} deploymentBucket: name: com.${self:custom.bucketNamePrefix}.deploy-bucket serverSideEncryption: AES256 resources: Parameters: ServiceName: Type: String Default: aurora-to-redshift Resources: ClusterSubnetGroup: Type: "AWS::Redshift::ClusterSubnetGroup" Properties: Description: Private Cluster Subnet Group SubnetIds: Fn::Split: - "," - Fn::ImportValue: !Sub ${ServiceName}-vpc-PrivateSubnets Tags: - Key: Name Value: private-subnet Cluster: Type: "AWS::Redshift::Cluster" Properties: ClusterIdentifier: test-data-sync-redshift ClusterSubnetGroupName: !Ref ClusterSubnetGroup VpcSecurityGroupIds: - Fn::ImportValue: !Sub ${ServiceName}-vpc-PrivateRedshiftSecurityGroup DBName: dev MasterUsername: admin MasterUserPassword: Redshift_admin_2022 NodeType: dc2.large ClusterType: single-node Outputs: RedshiftEndpoint: Description: "Redshift endpoint" Value: !GetAtt Cluster.Endpoint.Address Export: Name: !Sub "${self:provider.stackName}-RedshiftEndpoint" RedshiftPort: Description: "Redshift port" Value: !GetAtt Cluster.Endpoint.Port Export: Name: !Sub "${self:provider.stackName}-RedshiftPort"
Subnet group
创建一个包含private subnet的private subnet group
Replication instance
ReplicationInstanceClass: dms.t2.small
ReplicationSubnetGroupIdentifier:
使用private subnet group
VpcSecurityGroupIds
使用dms security group
Source endpoint
EndpointType: source
EngineName: aurora
ServerName: aurora cluster endpoint
Port: 3306
Target endpoint
EndpointType: target
EngineName: ednpoint
ServerName: redshift cluster endpoint
Port: 5439
Database migration task
MigrationType: full-load-and-cdc
ReplicationInstanceArn
TableMappings:"{ \"rules\": [ { \"rule-type\": \"selection\", \"rule-id\": \"1\", \"rule-name\": \"1\", \"object-locator\": { \"schema-name\": \"dev\", \"table-name\": \"test-aurora-to-redshift\" }, \"rule-action\": \"include\" } ] }"
ReplicationTaskSettings: "{ \"FullLoadSettings\": {\"TargetTablePrepMode\": \"DO_NOTHING\"}}"
dms全部资源的serverless文件
custom:bucketNamePrefix 替换为自己的创建的bucket
service: aurora-to-redshift-dms custom: bucketNamePrefix: "bucketName" provider: name: aws region: ${opt:region, "ap-southeast-1"} stackName: ${self:service} deploymentBucket: name: com.${self:custom.bucketNamePrefix}.deploy-bucket serverSideEncryption: AES256 resources: Parameters: ServiceName: Type: String Default: aurora-to-redshift Resources: ReplicationSubnetGroup: Type: "AWS::DMS::ReplicationSubnetGroup" Properties: ReplicationSubnetGroupDescription: instance to sync data from aurora to redshift ReplicationSubnetGroupIdentifier: aurora-to-redshift-instance SubnetIds: Fn::Split: - "," - Fn::ImportValue: !Sub ${ServiceName}-vpc-PrivateSubnets ReplicationInstance: Type: "AWS::DMS::ReplicationInstance" Properties: ReplicationInstanceIdentifier: aurora-to-redshift-instance ReplicationInstanceClass: dms.t2.small ReplicationSubnetGroupIdentifier: !Ref ReplicationSubnetGroup PubliclyAccessible: false VpcSecurityGroupIds: - Fn::ImportValue: !Sub ${ServiceName}-vpc-PrivateDMSInstanceSecurityGroup AuroraSourceEndpoint: Type: "AWS::DMS::Endpoint" Properties: EndpointIdentifier: aurora EndpointType: source ServerName: Fn::ImportValue: !Sub ${ServiceName}-aurora-AuroraEndpoint Port: Fn::ImportValue: !Sub ${ServiceName}-aurora-AuroraPort Username: admin Password: aurora_admin_2022 EngineName: aurora RedshiftTargetEndpoint: Type: "AWS::DMS::Endpoint" Properties: EndpointIdentifier: redshift EndpointType: target ServerName: Fn::ImportValue: !Sub ${ServiceName}-redshift-RedshiftEndpoint Port: Fn::ImportValue: !Sub ${ServiceName}-redshift-RedshiftPort DatabaseName: dev Username: admin Password: Redshift_admin_2022 EngineName: redshift ReplicationTask: Type: "AWS::DMS::ReplicationTask" Properties: ReplicationTaskIdentifier: aurora-to-redshift-task MigrationType: full-load-and-cdc ReplicationInstanceArn: !Ref ReplicationInstance SourceEndpointArn: !Ref AuroraSourceEndpoint TableMappings: '{ "rules": [ { "rule-type": "selection", "rule-id": "1", "rule-name": "1", "object-locator": { "schema-name": "dev", "table-name": "%" }, "rule-action": "include" } ] }' TargetEndpointArn: !Ref RedshiftTargetEndpoint ReplicationTaskSettings: '{ "FullLoadSettings": {"TargetTablePrepMode": "DO_NOTHING"}}'
上述资源创建完毕后,可以在aws web console运行DMS的Database migration task,执行成功后status会是Load complete, replication ongoing
同步完成后,可以通过Redshift query editor来确认数据是否同步成功。
继续修改,添加,删除mysql数据,然后通过Redshift query editor来确认数据是否同步成功。
Amazon即将推出一个Aurora的新功能,实现与 Amazon Redshift 的零 ETL 集成,目前这个功能还没有正式推出,处于Preview阶段,且在美国东部(弗吉尼亚北部)区域可用。以下来自Amazon公告,原文链接在此:
Amazon Aurora 现在支持与 Amazon Redshift 的零 ETL 集成,以使用 Amazon Redshift 对来自 Aurora 的数 PB 交易数据进行近乎实时的分析和机器学习 (ML)。在事务数据写入 Aurora 的几秒钟内,数据就可以在 Amazon Redshift 中使用,因此您不必构建和维护复杂的数据管道来执行提取、转换和加载 (ETL) 操作。
这种零 ETL 集成还使您能够在同一个新的或现有的 Amazon Redshift 实例中分析来自多个 Aurora 数据库集群的数据,以获得跨多个应用程序或分区的整体见解。通过近乎实时地访问交易数据,您可以利用 Amazon Redshift 的分析和功能,例如内置 ML、物化视图、数据共享以及对多个数据存储和数据湖的联合访问,从交易数据和其他数据中获得洞察力。
Amazon Aurora 与 Amazon Redshift 的零 ETL 集成现已在美国东部(弗吉尼亚北部)区域推出兼容 MySQL 8.0 的 Amazon Aurora MySQL 3 的有限预览版。
AWS Data Pipeline也支持aurora到redshift的持续同步
相比于DMS的缺点
AWS Data Pipeline目前只支持以下region
一个AWS Data Pipeline只能同步一张aurora表,aurora表名必须完全匹配
相比于DMS的优点
同步过程中支持自定义查询语句,可以对源表数据进行筛选之后同步到redshift
AWS Glue也支持aurora到redshift同步
相比于DMS的缺点
一个AWS Data Pipeline只能同步一张aurora表
不支持持续同步,只支持全表数据同步
需要在glue管理源表和目标表的scheme
相比于DMS的优点
可以在数据同步过程中对数据列应用mapping规则,可以按一定规则修改列值,可以删除某些列(目标表比源表少某些列),目标表与源表列名也可以不同
如果需要持续同步且要使用的region不支持AWS Data Pipeline,则只能使用AWS DMS
如果需要持续同步且要使用的region支持AWS Data Pipeline,则可以根据其他需求决定使用哪个service,比如如果需要对数据进行筛选则可以选择AWS Data Pipeline
如果只需要整表一次性同步,且没有数据修改的需求,则可以选择任意一个
如果只需要整表一次性同步,且需要在同步过程中对数据进行一定规则的修改,则选择AWS Glue
https://github.com/JessicaWin/aws-in-action/tree/master/aurora-to-redshift
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。