当前位置:   article > 正文

docker中mongo容器导出数据(按时间段) mongoexport_mongoexport 分段导出

mongoexport 分段导出

背景简介

开发背景

  • 开发环境:
    • docker中容器名:mongo(版本:mongo:4.2.3)
  • mongo中相关配置:
    • 数据库(database):audit
    • 表(或集合collection):t_audit

条件操作符介绍

单个

操作符说明示例类似于sql
$gt大于{"_id" : {$gt : 3 }}where _id > 3
$lt小于{"_id" : {$lt : 3 }}where _id < 3
$lte不大于{"_id" : {$lte : 3 }}where _id <= 3
$gte不小于{"_id" : {$lte : 3 }}where _id >= 3

组合

操作符说明示例类似于sql
$gt 和 $lt数字区间段{"_id" : {$gt : 3, $lt: 10}}where _id > 3 and _id < 10

时间操作

Date()

类型为:字符串

//Wed Dec 19 2012 01:03:25 GMT-0500 (EST)
Date() method which returns the current date as a string.
  • 1
  • 2

new Date()

类型为:字符串ISODate

//ISODate("2012-12-19T06:01:17.171Z")
new Date() constructor which returns a Date object using the ISODate() wrapper.
  • 1
  • 2

ISODate()

类型为:字符串ISODate

ISODate() constructor which returns a Date object using the ISODate() wrapper.
  • 1

mongoexport 简介

官方解释:Export data from MongoDB in CSV or JSON format.
以json或csv格式导出数据

Usage:
  mongoexport <options>

Export data from MongoDB in CSV or JSON format.
  • 1
  • 2
  • 3
  • 4

mongoexport命令 参数比较多(不懂的时候,在shell中键入 mongoexport --help即可)

verbosity options:
  -v, --verbose=<level>                           more detailed log output (include multiple times for more verbosity, e.g. -vvvvv, or specify a numeric value, e.g. --verbose=N)
      --quiet                                     hide all log output

connection options:
  -h, --host=<hostname>                           mongodb host to connect to (setname/host1,host2 for replica sets)
      --port=<port>                               server port (can also use --host hostname:port)

ssl options:
      --ssl                                       connect to a mongod or mongos that has ssl enabled
      --sslCAFile=<filename>                      the .pem file containing the root certificate chain from the certificate authority
      --sslPEMKeyFile=<filename>                  the .pem file containing the certificate and key
      --sslPEMKeyPassword=<password>              the password to decrypt the sslPEMKeyFile, if necessary
      --sslCRLFile=<filename>                     the .pem file containing the certificate revocation list
      --sslAllowInvalidCertificates               bypass the validation for server certificates
      --sslAllowInvalidHostnames                  bypass the validation for server name
      --sslFIPSMode                               use FIPS mode of the installed openssl library

authentication options:
  -u, --username=<username>                       username for authentication
  -p, --password=<password>                       password for authentication
      --authenticationDatabase=<database-name>    database that holds the user's credentials
      --authenticationMechanism=<mechanism>       authentication mechanism to use

kerberos options:
      --gssapiServiceName=<service-name>          service name to use when authenticating using GSSAPI/Kerberos (default: mongodb)
      --gssapiHostName=<host-name>                hostname to use when authenticating using GSSAPI/Kerberos (default: <remote server's address>)

namespace options:
  -d, --db=<database-name>                        database to use
  -c, --collection=<collection-name>              collection to use

uri options:
      --uri=mongodb-uri                           mongodb uri connection string

output options:
  -f, --fields=<field>[,<field>]*                 comma separated list of field names (required for exporting CSV) e.g. -f "name,age"
      --fieldFile=<filename>                      file with field names - 1 per line
      --type=<type>                               the output format, either json or csv (defaults to 'json') (default: json)
  -o, --out=<filename>                            output file; if not specified, stdout is used
      --jsonArray                                 output to a JSON array rather than one object per line
      --pretty                                    output JSON formatted to be human-readable
      --noHeaderLine                              export CSV data without a list of field names at the first line
      --jsonFormat=<type>                         the extended JSON format to output, either canonical or relaxed (defaults to 'relaxed') (default: relaxed)

querying options:
  -q, --query=<json>                              query filter, as a JSON string, e.g., '{x:{$gt:1}}'
      --queryFile=<filename>                      path to a file containing a query filter (JSON)
  -k, --slaveOk                                   allow secondary reads if available (default true) (default: false)
      --readPreference=<string>|<json>            specify either a preference mode (e.g. 'nearest') or a preference json object (e.g. '{mode: "nearest", tagSets: [{a: "b"}], maxStalenessSeconds:
                                                  123}')
      --forceTableScan                            force a table scan (do not use $snapshot or hint _id). Deprecated since this is default behavior on WiredTiger
      --skip=<count>                              number of documents to skip
      --limit=<count>                             limit the number of documents to export
      --sort=<json>                               sort order, as a JSON string, e.g. '{x:1}'
      --assertExists                              if specified, export fails if the collection does not exist (default: false)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56

按时间段导出

需求:将数据库audit,集合t_audit中2020-12-10T16:00:00.000Z ~ 2020-12-10T16:59:59.999Z

mongoexport 在容器内生成csv

mongoexport -h localhost -u root -p 123456 --authenticationDatabase=admin --authenticationMechanism=SCRAM-SHA-1 --port 27017 -d audit -c t_audit --type=csv  -q '{"createTime": {"$gte": {"$date" : "2020-12-09T16:00:00.000Z"}}}' -f _id,userId,userName,accessIp,accessUrl,description,createTime,pushTime  -o /root/audit20201210_1.csv
  • 1

虽然上一节中有mongoexport的参数说明,但还是再重复一下~

  • -h 主机
  • -u 用户
  • -p 密码
  • --authenticationDatabase 保存用户凭据的数据库
  • --authenticationMechanism 使用的身份验证机制
  • –port mongo服务端口号
  • -d 数据库名
  • -c 集合(表)名
  • –type 导出格式
  • -q 查询条件
  • -f 查询字段(用逗号隔开)
  • -o 导出文件名(容器内绝对路径)

docker cp将容器内csv复制到主机上

将容器中csv文件复制到主机/root/db/data目录里

docker cp mongo:/mongo/export.csv /root/db/data/
  • 1

遇到的问题

query无法识别 $gt

出现问题:Failed: error parsing query as Extended JSON: invalid JSON input. Position: 20. Character: $

问题点:-q后面接收的参数非json格式,需要检查参数

–query=<json> query filter, as a JSON string, e.g., ‘{x:{$gt:1}}’
–queryFile=<filename>

解决方案$gte 加上双引号

-q ‘{"_id": {$gte: 100}}’ -->> 改成 -q ‘{"_id": {"$gte": 100}}’

query无法识别 ISODate

出现问题:Failed: error parsing query as Extended JSON: invalid JSON input. Position: 20. Character: I

问题点:query参数无法识别ISODate

解决方案$date
(在query条件中尝试过ISODate的各路写法,但是一直都会抛非法json格式,最后在万能的stackoverflow上找到了 $date,然后就愉快的解决了问题!)

 -q '{"createTime": {"$gte": {"$date" : "2020-12-09T16:00:00.000Z"}}}'
  • 1

初次接触mongo,对它知之甚少,相关需求一来就懵圈,看来还是要不断摸索学习啊~~

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/546717
推荐阅读
相关标签
  

闽ICP备14008679号