赞
踩
本文主要介绍利用python从amazon s3 bucket下载数据集
在下载数据集之前,你得知道该数据的bucket,这个单词的意思是桶,水桶也是这个词,也就是说得知道你要得数据集放在哪个桶里面的:)
另外你还得知道两个key,一个是access key,另一个是secret access key。这两个具体是啥我也不清楚……反正是类似于密码
下面是一个例子
bucket = 'open-neurodata'
access_key = 'AKIA4XXGEV6ZQOTMTHX6'
secret_key = '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'
# pip install boto3
import boto3
# connect to client
client = boto3.client('s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key)
print('connect to client successfully!')
查看bucket下某个object(“funke”)的数据 (一个bucket下可能会存在多个object)
# list data
print(client.list_objects(Bucket=bucket, Prefix="funke"))
下载该bucket下"funke"对象的目录结构文件
# download directory structure file - this shows exactly how the s3 data is stored
client.download_file(
Bucket=bucket,
Key="funke/structure.md",
Filename="structure.md")
效果:
核心函数,对于该函数可以不用具体研究,只要知道输入就行了。提醒下载的数据会放在运行脚本的当前目录下
# function to download all files nested in a bucket path def downloadDirectory( bucket_name, path, access_key, secret_key): resource = boto3.resource( 's3', aws_access_key_id=access_key, aws_secret_access_key=secret_key) bucket = resource.Bucket(bucket_name) for obj in bucket.objects.filter(Prefix=path): if not os.path.exists(os.path.dirname(obj.key)): os.makedirs(os.path.dirname(obj.key)) key = obj.key print(f'Downloading {key}') bucket.download_file(key, key)
# download
path = 'funke/fib25/testing/ground_truth' # 指定想要下载的文件夹
downloadDirectory(
bucket,
path,
access_key,
secret_key)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。