云计算中的海量数据存储在哪
Are you experiencing slow transfer speeds between your GCE VM and a Cloud Storage bucket? Then read on to learn how to maximize your upload and download throughput for Linux and Windows VMs!
您在GCE VM和Cloud Storage存储桶之间的传输速度是否很慢? 然后继续阅读以了解如何最大程度地提高Linux和Windows VM的上传和下载吞吐量!
总览 (Overview)
I recently had a DoiT International customer ask why their data on a Google Compute Engine (henceforth referred to as GCE) Windows Server’s local SSD was uploading to a Cloud Storage bucket much slower than expected. At first, I began with what I thought would be a simple benchmarking of gsutil
commands to demonstrate the effectiveness of using ideal gsutil
arguments as recommended by the GCP documentation. Instead, my ‘quick ’ look into the issue turned into a full-blown investigation into data transfer performance between GCE and GCS, as my initial findings were unusual and very much unexpected.
我最近有一个DoiT International客户问为什么他们在Google Compute Engine(以下称为GCE)Windows Server的本地SSD上的数据上载到Cloud Storage存储桶的速度比预期的要慢得多。 最初,我以对gsutil
命令的简单基准测试为gsutil
以证明按照GCP文档的建议使用理想的gsutil
参数的有效性。 取而代之的是,我的快速发现变成了对GCE和GCS之间数据传输性能的全面调查,因为我的最初发现并不寻常,而且非常出乎意料。
If you are simply interested in knowing what the ideal methods are for moving data between GCE and GCS for a Linux or Windows machine, go ahead and scroll all the way down to “Effective Transfer Tool Use Conclusions”.
如果您只是想了解在Linux或Windows计算机上的GCE和GCS之间移动数据的理想方法,请继续向下滚动至“有效的传输工具使用结论”。
If you instead want to scratch your head over the bizarre, often counter-intuitive throughput rates achievable through commonly used commands and arguments, stay with me as we dive into the details of what led to my complex recommendations summary at the end of this article.
如果您反而想通过通常使用的命令和参数来达到奇怪的,通常与直觉相反的吞吐率,那么请与我同在,因为我们将深入探讨导致本文复杂的建议摘要的细节。
使用gsutil的Linux VM性能:大文件 (Linux VM performance with gsutil: Large files)
Although the customer’s request involved data transfer on a Windows server, I first performed basic benchmarking where I felt the most comfortable:Linux, via the “Debian GNU/Linux 10 (buster)” GCE public image.
尽管客户的要求涉及在Windows服务器上进行数据传输,但我还是先进行了最舒适的基本基准测试:Linux,通过“ Debian GNU / Linux 10(破坏者)” GCE公共映像进行。
Since the customer was already attempting file transfers from local SSDs and I wanted to minimize the odds that networked disks would impact transfer speeds, I configured two VM sizes, n2-standard-4 and n2-standard-80, with each having one local SSD attached where we will perform benchmarking.
由于客户已经在尝试从本地SSD进行文件传输,并且我想最大程度地降低网络磁盘会影响传输速度的几率,因此我配置了两个VM大小,n2-standard-4和n2-standard-80,每个VM都有一个本地SSD附上我们将执行基准测试的位置。
The GCS bucket I will use, as well as all VMs described in this article, are created as regional resources located in us-central1.
我将使用的GCS存储桶以及本文中介绍的所有VM被创建为位于us-central1中的区域资源。
To simulate the customer’s large file upload experience, I created an empty file 30 GBs in size:
为了模拟客户的大文件上传体验,我创建了一个大小为30 GB的空文件:
fallocate -l 30G temp_30GB_file
From here, I tested two commonly recommended gsutil parameters:
从这里开始,我测试了两个常用的gsutil参数:
-m
: Used to perform parallel, multi-threaded copy. Useful for transferring a large number of files in parallel, not the upload of individual files.-m
:用于执行并行的多线程复制。 对于并行传输大量文件很有用,而不是单个文件的上传。-o GSUtil:parallel_composite_upload_threshold=150M
: Used to split large files exceeding the specified threshold into parts that are then uploaded in parallel and combined upon upload completion of all parts.-o GSUtil:parallel_composite_upload_threshold=150M
:用于将超出指定阈值的大文件拆分为多个部分,然后并行上传并在所有部分完成上传后合并。
The estimated max performance for the local SSD on both VMs is as follows:
估计两个虚拟机上本地SSD的最大性能如下:
We should therefore be able to achieve up to 660 MB/s read and 350 MB/s write throughput with gsutil
. Let’s see what the upload benchmarks revealed:
因此,使用gsutil
我们应该能够达到660 MB / s的读取速度和350 MB / s的写入吞吐量。 让我们看看上传基准测试显示了什么:
time gsutil cp temp_30GB_file gs://doit-speed-test-bucket/# n2-standard-4: 2m21.893s, 216.50 MB/s# n2-standard-80: 2m11.676s, 233.30 MB/stime gsutil -m cp temp_30GB_file gs://doit-speed-test-bucket/# n2-standard-4: 2m48.710s, 182.09 MB/s# n2-standard-80: 2m29.348s, 205.69 MB/stime gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp temp_30GB_file gs://doit-speed-test-bucket/# n2-standard-4: 1m40.104s, 306.88 MB/s# n2-standard-80: 0m52.145s, 589.13 MB/stime gsutil -m -o GSUtil:parallel_composite_upload_threshold=150M cp temp_30GB_file gs://doit-speed-test-bucket/# n2-standard-4: 1m44.579s, 293.75 MB/s# n2-standard-80: 0m51.154s, 600.54 MB/s
As expected based on GCP’s gsutil documentation, large file uploads benefit from including -o GSUtil
. When more vCPUs are made available to assist in the parallel upload of file parts, upload time is improved dramatically, to the point that with a consistent 600 MB/s upload speed on the n2-standard-80 we come close to achieving the SSD’s max throughput of 660 MB/s. Including -m
for only one file decreases upload time by a few seconds. So far, we’ve seen nothing out of the ordinary.
正如基于GCP的gsutil文档所预期的那样,大文件上传受益于-o GSUtil
。 当提供更多vCPU来协助并行上传文件部分时,上传时间将大大缩短,以至于n2-standard-80上的一致600 MB / s的上传速度,我们几乎可以达到SSD的最大值吞吐量为660 MB / s。 仅对一个文件包含-m
上传时间减少几秒钟。 到目前为止,我们没有发现任何异常。
Let’s check out the download benchmarks:
让我们检查一下下载基准:
time gsutil cp gs://doit-speed-test-bucket/temp_30GB_file .# n2-standard-4: 8m3.186s, 63.58 MB/s# n2-standard-80: 6m13.585, 82.23 MB/stime gsutil -m cp gs://doit-speed-test-bucket/temp_30GB_file .# n2-standard-4: 7m57.881s, 64.28 MB/s# n2-standard-80: 6m20.131s, 80.81 MB/s
Download performance on the 80 vCPU VM only achieved 23% of the maximum local SSD write throughput. Additionally, despite the enabling of multi-threading with -m
not improving performance for this single file download, and despite both machines utilizing well under their maximum throughput (10 Gbps for n2-standard-4, 32 Gbps for n2-standard-80), evidently using a higher tier machine within the same family leads to a ~30% improvement in download speed. Weird, but not as weird as getting only 1/4th of local SSD write throughput with an absurdly expensive VM.
在80 vCPU VM上的下载性能仅达到最大本地SSD写入吞吐量的23%。 此外,尽管启用了带有-m
的多线程功能并没有提高此单文件下载的性能,并且尽管两台机器在其最大吞吐量下都利用得很好(n2-standard-4为10 Gbps,n2-standard-80为32 Gbps) ,显然,在同一系列中使用更高级别的计算机可以使下载速度提高约30%。 奇怪,但并不奇怪,而价格昂贵的VM仅获得本地SSD写入吞吐量的1/4。
What is going on?
到底是怎么回事?
After much searching around on this issue, I found no answers but instead discovered s5cmd, a tool designed to dramatically improve uploads to and downloads from S3 buckets. It claims to run 12X faster than the equivalent AWS CLI commands (e.g. aws s3 cp
) due in large part to being written in Go, a compiled language, versus the AWS CLI that is written in Python. It just so happens that gsutil
is also written in Python. Perhaps gsutil is severely hampered by its language choice, or simply optimized poorly? Given that GCS buckets can be configured to have S3 API Interoperability, is it possible to speed up uploads and downloads with s5cmd
by simply working with a compiled tool?
在对该问题进行了大量搜索之后,我没有找到任何答案,而是找到了s5cmd ,该工具旨在显着改善到S3存储桶的上传和下载。 它声称比等效的AWS CLI命令(例如aws s3 cp
)运行速度快12倍,这主要是因为它是用Go(一种编译语言)编写的,而不是用Python编写的AWS CLI。 碰巧gsutil
也用Python编写。 也许gsutil的语言选择受到严重阻碍,或者只是优化不佳? 鉴于可以将GCS存储桶配置为具有S3 API互操作性,是否可以通过仅使用编译工具来使用s5cmd
加快上载和下载?
使用s5cmd的Linux VM性能:大文件 (Linux VM performance with s5cmd: Large files)
It took a little bit to get s5cmd
working, mostly because I had to discover the hard way that GCS Interoperability doesn’t support S3’s multipart upload API, and given that this tool is written only with AWS in mind it will fail on large file uploads in GCP. You must provide -p=1000000
, an argument that forces multi-part upload to be avoided. See s5cmd
issues #1 and #2 for more info.
s5cmd
时间,主要是因为我不得不发现GCS互操作性不支持S3的分段上传API的困难方式,并且考虑到此工具仅针对AWS编写,因此在大文件上传时将失败在GCP中。 您必须提供-p=1000000
,该参数强制避免分段上传。 有关更多信息,请参见s5cmd
问题1和2 。
Note that s5cmd
also offers a -c
parameter for setting the number of concurrent parts/files transferred, with a default value of 5.
请注意, s5cmd
还提供了-c
参数,用于设置传输的并发部分/文件的数量,默认值为5。
With those two arguments in mind I performed the following Linux upload benchmarks:
考虑到这两个参数,我执行了以下Linux上传基准测试:
time s5cmd --endpoint-url https://storage.googleapis.com cp -c=1 -p=1000000 temp_30GB_file s3://doit-speed-test-bucket/# n2-standard-4: 6m7.459s, 83.60 MB/s# n2-standard-80: 6m50.272s, 74.88 MB/stime s5cmd --endpoint-url https://storage.googleapis.com cp -p=1000000 temp_30GB_file s3://doit-speed-test-bucket/# n2-standard-4: 7m18.682s, 70.03 MB/s# n2-standard-80: 6m48.380s, 75.22 MB/s
As expected, large file uploads perform considerably worse compared to gsutil
given the lack of a multi-part upload strategy as an option. We are seeing 75–85 MB/s upload compared to gsutil
’s 200–600 MB/s. Providing concurrency of 1 vs. the default 5 only has a small impact on improving performance. Thus, due to s5cmd
’s treatment of AWS as a first-class citizen without consideration for GCP, we cannot improve uploads by using s5cmd
.
如预期的那样,由于缺少gsutil
上传策略作为选择,因此与gsutil
相比,大文件上传的性能要差得多。 与gsutil
的200-600 MB / s相比,我们看到75-85 MB / s的上传速度。 提供1与默认5的并发性只会对提高性能产生很小的影响。 因此,由于s5cmd
将AWS视为头等公民而不考虑GCP,因此我们无法使用s5cmd
来改善上传。
Below are the s5cmd
download benchmarks:
以下是s5cmd
下载基准:
time s5cmd --endpoint-url https://storage.googleapis.com cp -c=1 -p=1000000 s3://doit-speed-test-bucket/temp_30GB_file .# n2-standard-4: 1m56.170s, 264.44 MB/s# n2-standard-80: 1m46.196s, 289.28 MB/stime s5cmd --endpoint-url https://storage.googleapis.com cp -c=1 s3://doit-speed-test-bucket/temp_30GB_file .# n2-standard-4: 3m21.380s, 152.55 MB/s# n2-standard-80: 3m45.414s, 136.28 MB/stime s5cmd --endpoint-url https://storage.googleapis.com cp -p=1000000 s3://doit-speed-test-bucket/temp_30GB_file .# n2-standard-4: 2m33.148s, 200.59 MB/s# n2-standard-80: 2m48.071s, 182.78 MB/stime s5cmd --endpoint-url https://storage.googleapis.com cp s3://doit-speed-test-bucket/temp_30GB_file .# n2-standard-4: 1m46.378s, 288.78 MB/s# n2-standard-80: 2m1.116s, 253.64 MB/s
What a dramatic improvement! While there is some variability in download time, it seems that by leaving out -c
and -p
, leaving them to their defaults, we achieve optimal speed. We are unable to reach the max write throughput of 350 MB/s, but ~289 MB/s on an n2-standard-4 is much closer to that than ~64 MB/s provided by gsutil
on the same machine. That is a 4.5X increase in download speed simply by swapping out the data transfer tool used.
多么了不起的进步! 尽管下载时间有所不同,但似乎通过省略-c
和-p
,将它们保留为默认值,可以达到最佳速度。 我们无法达到350 MB / s的最大写入吞吐量,但是n2-standard-4上的〜289 MB / s与同一台计算机上gsutil
提供的gsutil
MB / s相比要近得多。 只需换出使用的数据传输工具,下载速度即可提高4.5倍。
Summarizing all of the above findings, for Linux:
总结以上针对Linux的所有发现:
Given that
s5cmd
cannot enable multi-part uploads when working with GCS, it makes sense to continue usinggsutil
for upload to GCS so long as you include-o GSUtil:parallel_composite_upload_threshold=150M
.鉴于
s5cmd
在使用GCS时无法启用s5cmd
上传,因此只要您包含-o GSUtil:parallel_composite_upload_threshold=150M
,就可以继续使用gsutil
上载到GCS。s5cmd
with its default parameters blowsgsutil
out of the water in download performance. Simply utilizing a data transfer tool written with a compiled language yields dramatic (4.5X) performance improvements.带有默认参数的
s5cmd
gsutil
的下载性能s5cmd
。 只需使用以编译语言编写的数据传输工具,即可显着提高(4.5X)性能。
gsutil的Windows VM性能:大文件 (Windows VM performance with gsutil: Large files)
If you thought the above wasn’t unusual enough, buckle in as we go off the deep end with Windows. Since the DoIT customer was dealing with Windows Server, after all, it was time to set out on benchmarking that OS. I began to suspect their problem was not going to be between the keyboard and the chair.
如果您认为上述情况还不够特殊,请在我们深入了解Windows的同时,尝试一下。 毕竟,由于DoIT客户正在与Windows Server打交道,因此该开始对该操作系统进行基准测试了。 我开始怀疑他们的问题不会在键盘和椅子之间。
Having confirmed that, for Linux, gsutil
works great for upload when given the right parameters and s5cmd
works great for download with default parameters, it was time to try these commands on Windows where I will once again feel humbled by my lack of experience with Powershell.
确认对于Linux, gsutil
在提供正确参数的情况下非常适合上传,而s5cmd
在使用默认参数的情况下也非常适合下载,是时候在Windows上尝试这些命令了,我由于对Powershell缺乏经验而再次感到沮丧。
I eventually managed to gather benchmarks from an n2-standard-4 machine with a local SSD attached running on the “Windows Server version 1809 Datacenter Core for Containers, built on 20200813” GCE VM image. Due to the per vCPU licensing fees that Windows server charges, I’ve opted to not gather metrics from an n2-standard-80 in this experiment.
我最终设法从一台n2-standard-4机器上收集了基准,该机器上装有本地SSD,该机器运行在“ Windows Server版本1809容器数据中心核心(建于20200813)” GCE VM映像上。 由于Windows服务器收取的每vCPU许可费用,因此我选择不从该实验的n2-standard-80收集指标。
An important side note before we dive into the metrics:The GCP documentation on attaching local SSDs recommends that for “All Windows Servers” you should use the SCSI driver to attach your local SSD rather than the NVMe driver you typically use for a Linux machine, as SCSI is better optimized for achieving maximum throughput performance. I went ahead and provisioned two VMs with a local SSD attached, one attached via NVMe and one via SCSI, determined to compare their performance alongside the various tools and parameters I’ve been investigating thus far.
在深入了解指标之前,请注意以下重要注意事项:GCP有关附加本地SSD的文档建议,对于“所有Windows服务器”,应使用SCSI驱动程序附加本地SSD,而不是通常用于Linux计算机的NVMe驱动程序, SCSI进行了更好的优化,以实现最大的吞吐量性能。 我继续进行操作,并为两台配备了本地SSD的VM进行了配置,其中一台通过NVMe进行了连接,另一台通过SCSI进行了安装,因此决定将它们的性能与我迄今为止研究的各种工具和参数进行比较。
Below are the upload speed benchmarks:
以下是上传速度基准:
Measure-Command {gsutil cp temp_30GB_file gs://doit-speed-test-bucket/}# NVMe: 3m50.064s, 133.53 MB/s# SCSI: 4m7.256s, 124.24 MB/sMeasure-Command {gsutil -m cp temp_30GB_file gs://doit-speed-test-bucket/}# NVMe: 3m59.462s, 128.29 MB/s# SCSI: 3m34.013s, 143.54 MB/sMeasure-Command {gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp temp_30GB_file gs://doit-speed-test-bucket/}# NVMe: 5m54.046s, 86.77 MB/s# SCSI: 6m13.929s, 82.15 MB/sMeasure-Command {gsutil -m -o GSUtil:parallel_composite_upload_threshold=150M cp temp_30GB_file gs://doit-speed-test-bucket/}# NVMe: 5m55.751s, 86.40 MB/s# SCSI: 5m58.078s, 85.79 MB/s
With no arguments provided to gsutil
, upload throughput is ~60% of the throughput achieved on a Linux machine. Providing any combination of arguments degrades performance. When multi-part upload is enabled—which led to a 42% improvement in upload speed on Linux — the upload speed drops by 35%. You may also notice that when -m
is not provided and gsutil
is allowed to upload more optimally for a single large file, upload from the NVMe drive completes more quickly than on the SCSI drive, the latter of which supposedly has drivers more optimized for Windows Servers. What is going on?!
没有为gsutil
提供任何参数,上传吞吐量约为Linux计算机上实现的吞吐量的60%。 提供参数的任何组合都会降低性能。 启用分段上传后,Linux上的上传速度提高了42%,上传速度下降了35% 。 您可能还会注意到,当未提供-m
且允许gsutil
为单个大文件进行更优化的上传时,从NVMe驱动器上载的完成比在SCSI驱动器上完成的速度要快,后者应该具有针对Windows优化的驱动程序服务器。 到底是怎么回事?!
Low upload performance around 80–85 MB/s was the exact range that the DoiT customer was experiencing, so their problem was at least reproducible. By removing the GCP-recommended argument-o GSUtil:parallel_composite_upload_threshold=150M
for large file uploads, the customer could remove a 35% performance penalty.