当前位置:   article > 正文

Mongodb与Clickhouse对比_clickhouse mongodb

clickhouse mongodb

一、硬件设施

 watermark,type_ZHJvaWRzYW5zZmFsbGJhY2s,shadow_50,text_Q1NETiBA6Z2S6ZyE,size_12,color_FFFFFF,t_70,g_se,x_16

二、安装数据库

2.1、docker安装clickhouse

  1. 1、拉取镜像
  2. docker pull yandex/clickhouse-server
  3. 2、启动服务(临时启动,获取配置文件)
  4. docker run --rm -d --name=clickhouse-server \
  5. --ulimit nofile=262144:262144 \
  6. -p 8123:8123 -p 9009:9009 -p 9000:9000 \
  7. yandex/clickhouse-server:latest
  8. 3、复制容器中的配置文件到宿主机
  9. 提前创建好文件夹,mkdir -p /home/clickhouse/conf/
  10. docker cp clickhouse-server:/etc/clickhouse-server/config.xml /home/clickhouse/conf/config.xml
  11. docker cp clickhouse-server:/etc/clickhouse-server/users.xml /home/clickhouse/conf/users.xml
  12. 4、停止第2步启动的ck容器(clickhouse-server)
  13. 5、修改密码 第3步的配置文件
  14. vi /home/clickhouse/conf/users.xml,在password标签中加上密码,可以使用加密密码
  15. 6、重启启动
  16. docker run -d --name=clickhouse-server \
  17. -p 8123:8123 -p 9009:9009 -p 9000:9000 \
  18. --ulimit nofile=262144:262144 \
  19. -v /home/clickhouse/data:/var/lib/clickhouse:rw \
  20. -v /home/clickhouse/conf/config.xml:/etc/clickhouse-server/config.xml \
  21. -v /home/clickhouse/conf/users.xml:/etc/clickhouse-server/users.xml \
  22. -v /home/clickhouse/log:/var/log/clickhouse-server:rw \
  23. yandex/clickhouse-server:latest
  24. 7、进入ck容器
  25. docker exec -it 82 clickhouse-client  --user=default --password=your_password
  26. select version() # 查看版本

2.2、docker安装mongodb

配置文件:/home/mongo/configdb/mongod.conf
  1. storage:
  2.     dbPath: /data/db
  3.     journal:
  4.         enabled: true
  5.     engine: wiredTiger
  6. security:
  7.     authorization: enabled
  8. net:
  9.     port: 27017
  10.     bindIp: 192.168.xx.xx
容器启动:
  1. docker run -d \
  2. -p 27017:27017 \
  3. --name mongo \
  4. -v /home/mongo/db:/data/db \
  5. -v /home/mongo/configdb:/data/configdb \
  6. -v /etc/localtime:/etc/localtime \
  7. mongo -f /data/configdb/mongod.conf

三、大数据插入脚本

  1. import json, time
  2. import pymongo,traceback
  3. from clickhouse_driver import Client
  4. import uuid
  5. import random
  6. # 装饰器统计运行耗时
  7. def coast_time(func):
  8. def fun(*args, **kwargs):
  9. t = time.perf_counter()
  10. result = func(*args, **kwargs)
  11. print(f'func {func.__name__} coast time:{time.perf_counter() - t:.8f} s')
  12. return result
  13. return fun
  14. class MyEncoder(json.JSONEncoder):
  15. """
  16. func:
  17. 解决Object of type 'bytes' is not JSON serializable
  18. """
  19. def default(self, obj):
  20. if isinstance(obj, bytes):
  21. return str(obj, encoding='utf-8')
  22. return json.JSONEncoder.default(self, obj)
  23. create_task_table = """CREATE TABLE IF NOT EXISTS task(\
  24. `_id` String,\
  25. `task_name` String,\
  26. `task_size` UInt16,\
  27. `status` UInt8
  28. )
  29. ENGINE = MergeTree() PRIMARY KEY _id;
  30. """
  31. ck_client = Client(host='192.168.12.199', port=9000, database="testdb", user='default', send_receive_timeout=20)
  32. mongo_client = pymongo.MongoClient("mongodb://192.168.12.199:27017/")
  33. mongo_db = mongo_client["testdb"]
  34. mongo_col = mongo_db["task"]
  35. @coast_time
  36. def insert_mongo_task_data(total, patch):
  37. """
  38. func:批量插入任务数据到mongo
  39. """
  40. mongo_col.drop()
  41. sig = 0
  42. data_list = []
  43. for i in range(total):
  44. sig += 1
  45. try:
  46. dicts = {
  47. '_id':str(uuid.uuid1()),
  48. 'task_name': 'task_' + str(i),
  49. 'task_size': i,
  50. 'status': random.choice([0, 1])
  51. }
  52. data_list.append(dicts)
  53. if sig == patch:
  54. mongo_col.insert_many(data_list)
  55. sig = 0
  56. data_list=[]
  57. except Exception as e:
  58. print("task name :%s process failed:%s" % ('task_' + str(i), traceback.print_exc()))
  59. if len(data_list) >0:
  60. mongo_col.insert_many(data_list)
  61. @coast_time
  62. def insert_ck_task_data(total, patch):
  63. """
  64. func:批量插入任务数据到CK
  65. """
  66. ck_client.execute('DROP TABLE IF EXISTS task')
  67. ck_client.execute(create_task_table)
  68. sig = 0
  69. data_list = []
  70. for i in range(total):
  71. sig += 1
  72. try:
  73. dicts = {
  74. '_id':str(uuid.uuid1()),
  75. 'task_name': 'task_' + str(i),
  76. 'task_size': i,
  77. 'status': random.choice([0, 1])
  78. }
  79. data_list.append(dicts)
  80. if sig == patch:
  81. ck_client.execute("INSERT INTO task(*) VALUES", data_list, types_check=True)
  82. sig = 0
  83. data_list=[]
  84. except Exception as e:
  85. print("task name :%s process failed:%s" % ('task_' + str(i), traceback.print_exc()))
  86. if len(data_list) >0:
  87. ck_client.execute("INSERT INTO task(*) VALUES", data_list, types_check=True)
  88. insert_ck_task_data(100000000, 10000)
  89. insert_mongo_task_data(100000000, 10000)
批次 10000
插入条数
mongo 耗时
ck 耗时
Mongo 大小
Ck 大小
1000
0.01972508s
0.01732014s
99.5K
12K
10000
0.12277857s
0.08004815s
1004.8K
119K
100000
1.12529528s
0.73075602s
9.0M
1.9M
1000000
10.92156150s
7.17739819s
100M
49M
10000000
108.91806854s
72.16343116s
1009.8M
117M
100000000
1189.25558783s
748.89750133s
10G
1.1G
 
不同条数精准查询耗时分析:
插入条数
db.task.find({'task_name':'task_1'}) .explain("executionStats") 耗时
select * from testdb.task where
task_name ='task_1' 耗时
1000
0ms
0.004 sec
10000
3ms
0.008 sec
100000
29ms
0.006 sec
1000000
340ms
0.009 sec
10000000
3281ms
0.035 sec
100000000
165762ms
0.626 sec
不同条数模糊查询耗时分析:
插入条数
db.task.find({'task_name':{ '$regex':'.*_1.*'}})
.explain("executionStats") 耗时
select * from testdb.task where
task_name like '%_1%' 耗时
1000
0ms
0.004 sec
10000
4ms
0.012 sec
100000
42ms
0.022 sec
1000000
468ms
0.077 sec
10000000
5871ms
0.670 sec
100000000
112334ms
21.094 sec
group聚合耗时:
插入条数
db.getCollection ("task" ).aggregate ([
        {"$group" : {
                "_id" :null ,
                "total_num" :{"$sum" :1 },
                "total_size" :{"$sum" :"$task_size" },
                "avg_size" :{"$avg" :"$task_size" }
     }}
])  耗时
select count(*), SUM(task_size), AVG(task_size) from task   耗时
100000000
106775ms
0.035 sec
通过上述对比,Clickhouse对数据的压缩更为出色,同样的数据下,占用的空间较小;
批量插入数据,Clickhouse比Mongodb耗时少,而且随着数据量的增大,这种差距也在拉大;
万级别下的查询,Mongodb耗时比Clickhouse少,但超过万级别,随着数据的不断增大,Clickhouse的耗时小于Mongodb。

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小蓝xlanll/article/detail/583075
推荐阅读
相关标签
  

闽ICP备14008679号