当前位置:   article > 正文

Python 全栈系列137 微服务ESIO

Python 全栈系列137 微服务ESIO

说明

配合ES数据库做的服务,做个简单的设计。

过去其实对不同的数据库都写过专门的对象,来实现一系列功能,最后大部分都包的很好,但是要使用或者是改动就要回忆…。所以我想以后都以接口形式来处理和数据库的交互。

PS:基础的增删改查我觉得是不够的,使用时通常是一套“组合拳”。例如,「存在则删除」、「不存在则插入」… (这类操作多少有点CLC的意思),用接口的形式或许会比对象更好。未来将SCLC封装为对象,基本上的逻辑控制就可以满足了。

内容

1 功能设计

先做几个简单的功能,以后再逐步的加。

序号名字内容
1/主函数,用于连通性测试, GET
2get_stat/基本的统计,类似于展示库和表的命令。获取索引、文档的类型、文档的数量。
3save_a_rec存储一个文档,文档必须有 id, title, content和slot。
4match_search精确查找、全文查找
5filter_search过滤查找
6phrase_search短语查找
7del_a_rec删除一个文档

需要使用elasticsearch这个包(版本需要指定,7.16不行pip3 install elasticsearch==7.13.4),所以我在原来的base-flask镜像上做了一个更改(v8),使用最新的启动服务即可。

2 主要函数

假设已经启动了ES数据库(端口24005),有两种方式可以获取数据

一种是使用python的包,主要是在indices下面

# 本机
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host':'172.17.0.1','port':24005}])
es.indices.stats()

{'_shards': {'total': 10, 'successful': 5, 'failed': 0},
 '_all': {'primaries': {'docs': {'count': 3, 'deleted': 0},
   'store': {'size_in_bytes': 17547, 'throttle_time_in_millis': 0},
   'indexing': {'index_total': 3,
    'index_time_in_millis': 53,
    'index_current': 0,
    'index_failed': 0,
    'delete_total': 0,
    'delete_time_in_millis': 0,
    'delete_current': 0,
    'noop_update_total': 0,
    'is_throttled': False,
    'throttle_time_in_millis': 0},
   'get': {'total': 0,
    'time_in_millis': 0,
    'exists_total': 0,
    'exists_time_in_millis': 0,
    'missing_total': 0,
    'missing_time_in_millis': 0,
    'current': 0},
   'search': {'open_contexts': 0,
    'query_total': 0,
    'query_time_in_millis': 0,
    'query_current': 0,
    'fetch_total': 0,
    'fetch_time_in_millis': 0,
    'fetch_current': 0,
    'scroll_total': 0,
    'scroll_time_in_millis': 0,
    'scroll_current': 0,
    'suggest_total': 0,
    'suggest_time_in_millis': 0,
    'suggest_current': 0},
   'merges': {'current': 0,
    'current_docs': 0,
    'current_size_in_bytes': 0,
    'total': 0,
    'total_time_in_millis': 0,
    'total_docs': 0,
    'total_size_in_bytes': 0,
    'total_stopped_time_in_millis': 0,
    'total_throttled_time_in_millis': 0,
    'total_auto_throttle_in_bytes': 104857600},
   'refresh': {'total': 22, 'total_time_in_millis': 93, 'listeners': 0},
   'flush': {'total': 3, 'total_time_in_millis': 65},
   'warmer': {'current': 0, 'total': 14, 'total_time_in_millis': 6},
   'query_cache': {'memory_size_in_bytes': 0,
    'total_count': 0,
    'hit_count': 0,
    'miss_count': 0,
    'cache_size': 0,
    'cache_count': 0,
    'evictions': 0},
   'fielddata': {'memory_size_in_bytes': 0, 'evictions': 0},
   'completion': {'size_in_bytes': 0},
   'segments': {'count': 3,
    'memory_in_bytes': 11243,
    'terms_memory_in_bytes': 8868,
    'stored_fields_memory_in_bytes': 936,
    'term_vectors_memory_in_bytes': 0,
    'norms_memory_in_bytes': 960,
    'points_memory_in_bytes': 3,
    'doc_values_memory_in_bytes': 476,
    'index_writer_memory_in_bytes': 0,
    'version_map_memory_in_bytes': 0,
    'fixed_bit_set_memory_in_bytes': 0,
    'max_unsafe_auto_id_timestamp': -1,
    'file_sizes': {}},
   'translog': {'operations': 0, 'size_in_bytes': 215},
   'request_cache': {'memory_size_in_bytes': 0,
    'evictions': 0,
    'hit_count': 0,
    'miss_count': 0},
   'recovery': {'current_as_source': 0,
    'current_as_target': 0,
    'throttle_time_in_millis': 0}},
  'total': {'docs': {'count': 3, 'deleted': 0},
   'store': {'size_in_bytes': 17547, 'throttle_time_in_millis': 0},
   'indexing': {'index_total': 3,
    'index_time_in_millis': 53,
    'index_current': 0,
    'index_failed': 0,
    'delete_total': 0,
    'delete_time_in_millis': 0,
    'delete_current': 0,
    'noop_update_total': 0,
    'is_throttled': False,
    'throttle_time_in_millis': 0},
   'get': {'total': 0,
    'time_in_millis': 0,
    'exists_total': 0,
    'exists_time_in_millis': 0,
    'missing_total': 0,
    'missing_time_in_millis': 0,
    'current': 0},
   'search': {'open_contexts': 0,
    'query_total': 0,
    'query_time_in_millis': 0,
    'query_current': 0,
    'fetch_total': 0,
    'fetch_time_in_millis': 0,
    'fetch_current': 0,
    'scroll_total': 0,
    'scroll_time_in_millis': 0,
    'scroll_current': 0,
    'suggest_total': 0,
    'suggest_time_in_millis': 0,
    'suggest_current': 0},
   'merges': {'current': 0,
    'current_docs': 0,
    'current_size_in_bytes': 0,
    'total': 0,
    'total_time_in_millis': 0,
    'total_docs': 0,
    'total_size_in_bytes': 0,
    'total_stopped_time_in_millis': 0,
    'total_throttled_time_in_millis': 0,
    'total_auto_throttle_in_bytes': 104857600},
   'refresh': {'total': 22, 'total_time_in_millis': 93, 'listeners': 0},
   'flush': {'total': 3, 'total_time_in_millis': 65},
   'warmer': {'current': 0, 'total': 14, 'total_time_in_millis': 6},
   'query_cache': {'memory_size_in_bytes': 0,
    'total_count': 0,
    'hit_count': 0,
    'miss_count': 0,
    'cache_size': 0,
    'cache_count': 0,
    'evictions': 0},
   'fielddata': {'memory_size_in_bytes': 0, 'evictions': 0},
   'completion': {'size_in_bytes': 0},
   'segments': {'count': 3,
    'memory_in_bytes': 11243,
    'terms_memory_in_bytes': 8868,
    'stored_fields_memory_in_bytes': 936,
    'term_vectors_memory_in_bytes': 0,
    'norms_memory_in_bytes': 960,
    'points_memory_in_bytes': 3,
    'doc_values_memory_in_bytes': 476,
    'index_writer_memory_in_bytes': 0,
    'version_map_memory_in_bytes': 0,
    'fixed_bit_set_memory_in_bytes': 0,
    'max_unsafe_auto_id_timestamp': -1,
    'file_sizes': {}},
   'translog': {'operations': 0, 'size_in_bytes': 215},
   'request_cache': {'memory_size_in_bytes': 0,
    'evictions': 0,
    'hit_count': 0,
    'miss_count': 0},
   'recovery': {'current_as_source': 0,
    'current_as_target': 0,
    'throttle_time_in_millis': 0}}},
 'indices': {'megacorp': {'primaries': {'docs': {'count': 3, 'deleted': 0},
    'store': {'size_in_bytes': 17547, 'throttle_time_in_millis': 0},
    'indexing': {'index_total': 3,
     'index_time_in_millis': 53,
     'index_current': 0,
     'index_failed': 0,
     'delete_total': 0,
     'delete_time_in_millis': 0,
     'delete_current': 0,
     'noop_update_total': 0,
     'is_throttled': False,
     'throttle_time_in_millis': 0},
    'get': {'total': 0,
     'time_in_millis': 0,
     'exists_total': 0,
     'exists_time_in_millis': 0,
     'missing_total': 0,
     'missing_time_in_millis': 0,
     'current': 0},
    'search': {'open_contexts': 0,
     'query_total': 0,
     'query_time_in_millis': 0,
     'query_current': 0,
     'fetch_total': 0,
     'fetch_time_in_millis': 0,
     'fetch_current': 0,
     'scroll_total': 0,
     'scroll_time_in_millis': 0,
     'scroll_current': 0,
     'suggest_total': 0,
     'suggest_time_in_millis': 0,
     'suggest_current': 0},
    'merges': {'current': 0,
     'current_docs': 0,
     'current_size_in_bytes': 0,
     'total': 0,
     'total_time_in_millis': 0,
     'total_docs': 0,
     'total_size_in_bytes': 0,
     'total_stopped_time_in_millis': 0,
     'total_throttled_time_in_millis': 0,
     'total_auto_throttle_in_bytes': 104857600},
    'refresh': {'total': 22, 'total_time_in_millis': 93, 'listeners': 0},
    'flush': {'total': 3, 'total_time_in_millis': 65},
    'warmer': {'current': 0, 'total': 14, 'total_time_in_millis': 6},
    'query_cache': {'memory_size_in_bytes': 0,
     'total_count': 0,
     'hit_count': 0,
     'miss_count': 0,
     'cache_size': 0,
     'cache_count': 0,
     'evictions': 0},
    'fielddata': {'memory_size_in_bytes': 0, 'evictions': 0},
    'completion': {'size_in_bytes': 0},
    'segments': {'count': 3,
     'memory_in_bytes': 11243,
     'terms_memory_in_bytes': 8868,
     'stored_fields_memory_in_bytes': 936,
     'term_vectors_memory_in_bytes': 0,
     'norms_memory_in_bytes': 960,
     'points_memory_in_bytes': 3,
     'doc_values_memory_in_bytes': 476,
     'index_writer_memory_in_bytes': 0,
     'version_map_memory_in_bytes': 0,
     'fixed_bit_set_memory_in_bytes': 0,
     'max_unsafe_auto_id_timestamp': -1,
     'file_sizes': {}},
    'translog': {'operations': 0, 'size_in_bytes': 215},
    'request_cache': {'memory_size_in_bytes': 0,
     'evictions': 0,
     'hit_count': 0,
     'miss_count': 0},
    'recovery': {'current_as_source': 0,
     'current_as_target': 0,
     'throttle_time_in_millis': 0}},
   'total': {'docs': {'count': 3, 'deleted': 0},
    'store': {'size_in_bytes': 17547, 'throttle_time_in_millis': 0},
    'indexing': {'index_total': 3,
     'index_time_in_millis': 53,
     'index_current': 0,
     'index_failed': 0,
     'delete_total': 0,
     'delete_time_in_millis': 0,
     'delete_current': 0,
     'noop_update_total': 0,
     'is_throttled': False,
     'throttle_time_in_millis': 0},
    'get': {'total': 0,
     'time_in_millis': 0,
     'exists_total': 0,
     'exists_time_in_millis': 0,
     'missing_total': 0,
     'missing_time_in_millis': 0,
     'current': 0},
    'search': {'open_contexts': 0,
     'query_total': 0,
     'query_time_in_millis': 0,
     'query_current': 0,
     'fetch_total': 0,
     'fetch_time_in_millis': 0,
     'fetch_current': 0,
     'scroll_total': 0,
     'scroll_time_in_millis': 0,
     'scroll_current': 0,
     'suggest_total': 0,
     'suggest_time_in_millis': 0,
     'suggest_current': 0},
    'merges': {'current': 0,
     'current_docs': 0,
     'current_size_in_bytes': 0,
     'total': 0,
     'total_time_in_millis': 0,
     'total_docs': 0,
     'total_size_in_bytes': 0,
     'total_stopped_time_in_millis': 0,
     'total_throttled_time_in_millis': 0,
     'total_auto_throttle_in_bytes': 104857600},
    'refresh': {'total': 22, 'total_time_in_millis': 93, 'listeners': 0},
    'flush': {'total': 3, 'total_time_in_millis': 65},
    'warmer': {'current': 0, 'total': 14, 'total_time_in_millis': 6},
    'query_cache': {'memory_size_in_bytes': 0,
     'total_count': 0,
     'hit_count': 0,
     'miss_count': 0,
     'cache_size': 0,
     'cache_count': 0,
     'evictions': 0},
    'fielddata': {'memory_size_in_bytes': 0, 'evictions': 0},
    'completion': {'size_in_bytes': 0},
    'segments': {'count': 3,
     'memory_in_bytes': 11243,
     'terms_memory_in_bytes': 8868,
     'stored_fields_memory_in_bytes': 936,
     'term_vectors_memory_in_bytes': 0,
     'norms_memory_in_bytes': 960,
     'points_memory_in_bytes': 3,
     'doc_values_memory_in_bytes': 476,
     'index_writer_memory_in_bytes': 0,
     'version_map_memory_in_bytes': 0,
     'fixed_bit_set_memory_in_bytes': 0,
     'max_unsafe_auto_id_timestamp': -1,
     'file_sizes': {}},
    'translog': {'operations': 0, 'size_in_bytes': 215},
    'request_cache': {'memory_size_in_bytes': 0,
     'evictions': 0,
     'hit_count': 0,
     'miss_count': 0},
    'recovery': {'current_as_source': 0,
     'current_as_target': 0,
     'throttle_time_in_millis': 0}}}}}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205
  • 206
  • 207
  • 208
  • 209
  • 210
  • 211
  • 212
  • 213
  • 214
  • 215
  • 216
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • 224
  • 225
  • 226
  • 227
  • 228
  • 229
  • 230
  • 231
  • 232
  • 233
  • 234
  • 235
  • 236
  • 237
  • 238
  • 239
  • 240
  • 241
  • 242
  • 243
  • 244
  • 245
  • 246
  • 247
  • 248
  • 249
  • 250
  • 251
  • 252
  • 253
  • 254
  • 255
  • 256
  • 257
  • 258
  • 259
  • 260
  • 261
  • 262
  • 263
  • 264
  • 265
  • 266
  • 267
  • 268
  • 269
  • 270
  • 271
  • 272
  • 273
  • 274
  • 275
  • 276
  • 277
  • 278
  • 279
  • 280
  • 281
  • 282
  • 283
  • 284
  • 285
  • 286
  • 287
  • 288
  • 289
  • 290
  • 291
  • 292
  • 293
  • 294
  • 295
  • 296
  • 297
  • 298
  • 299
  • 300
  • 301
  • 302
  • 303
  • 304
  • 305
  • 306

另一种方式是直接访问服务的接口

import requests as req
resp = req.get('http://172.17.0.1:24005/_stats')
resp_stat = resp.json()
resp_stat.keys()
---
dict_keys(['_shards', '_all', 'indices'])

resp_stat['indices'].keys()
dict_keys(['megacorp'])
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

两种方式的效果是一样的。

2.1 连通性

使用简单接口服务家族(SimpleAPI)的项目模板,复制一份 ,在配置文件config.py中增加

# 基类
class Config:
    SECRET_KEY = 'xxxxxx'
    BASE_DIR = basedir
    ES_HOST = '172.17.0.1'
    # 正式的服务分配24005
    ES_PORT = 9200
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

在app初始化中实例化ES,因为是简单API(SimpleAPI),所以app的载入和视图函数都写在entry_py.py

app = Flask(__name__)
# 导入config中设置的配置文件
app.config.from_object(config[run_env])
# 返回中文显示utf8
app.config['JSON_AS_ASCII'] = False
# json dumps 显示utf8 --可能用不上
app.config['ENSURE_ASCII'] = False

print('Static Data Host %s : Port %s' % ( app.config['ES_HOST'], app.config['ES_PORT'] ))

from elasticsearch import Elasticsearch
es = Elasticsearch([{'host':app.config['ES_HOST'],'port':int(app.config['ES_PORT'])}])
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

对应的视图函数

# 连通性测试
@app.route('/info/', methods=['GET'])
def info():
    res_dict = {}

    res_dict['status'] = True 
    res_dict['msg'] = 'ok'
    res_dict['data'] = es.info()
    # ==========  define here ==========
    return jsonify(res_dict)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

使用浏览器访问http://YOURIP:24999/info/,这和直接访问es的根目录效果是一样的。

在这里插入图片描述

2.2 基础查询

查询当前ES中的索引和文档数量,使用浏览器访问http://YOURIP:24999/stat/

# 功能1:查询当前的索引名称及文档数量
@app.route('/stat/',methods=['GET'])
def stat():
    res_dict = {}

    data_dict = {}
    # 1 索引的个数
    resp_dict =  es.indices.stats()
    idx_list = []
    idx_list = sorted(list(resp_dict['indices'].keys()))
    data_dict['index_nums'] = len(idx_list)
    data_dict['index_list'] = idx_list

    tem_total_doc_cnt = 0
    tem_total_doc_del_cnt = 0 
    tem_total_doc_size_sum = 0

    data_dict['docs'] = {}

    for some_idx in idx_list:
        tem_dict = resp_dict['indices'][some_idx]['total']['docs']
        tem_dict['size'] = round(resp_dict['indices'][some_idx]['total']['store']['size_in_bytes'] / 1e6 , 3)
        data_dict['docs'][some_idx]  = tem_dict
        tem_total_doc_cnt += tem_dict['count']
        tem_total_doc_del_cnt += tem_dict['deleted']
        tem_total_doc_size_sum += tem_dict['size']

    data_dict['total_docs'] = tem_total_doc_cnt
    data_dict['total_del_docs'] =  tem_total_doc_del_cnt
    data_dict['total_docs_size_M'] =  tem_total_doc_size_sum


    res_dict['status'] = True 
    res_dict['msg'] = 'ok'
    res_dict['data'] = data_dict
    # ==========  define here ==========
    return jsonify(res_dict)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38

在这里插入图片描述

2.3 存储一条记录 | 存储一个文档

参考这条res1 = es.index(index="megacorp", doc_type='employee', id=1,body=body1)命令,使用接口实现如下:

# 功能2:存储一条记录
@app.route('/save_a_rec/', methods = ['POST'])
def save_a_rec():
    res_dict = {}
    input_data = request.get_json()
    # 非空检查
    if not input_data:
        res_dict['status'] = False 
        res_dict['msg'] = 'No Data'
        return jsonify(res_dict)
    
    # 必要的键值检查
    fhas_key_check = vd.fhasDictKeys(input_data, key_list = ['index', 'doc_type', 'body']) 
    if not fhas_key_check:
        res_dict['status'] = False 
        res_dict['msg'] ='Must has index,doc_type and body'
        return jsonify(res_dict)

    # 将内容的哈希作为id
    hash_id = ifuncs.get_md5_digest(input_data['body'], fs= ifuncs)
    input_data['id'] = hash_id

    data_dict_to_save = {}
    data_dict_to_save['index'] = input_data['index']
    data_dict_to_save['doc_type'] = input_data['doc_type']
    data_dict_to_save['id'] = hash_id
    data_dict_to_save['body'] = input_data['body']

    try:
        es.index(**data_dict_to_save)

        res_dict['status'] = True 
        res_dict['msg'] = 'ok'
        res_dict['data'] = None
        # ==========  define here ==========
        return jsonify(res_dict)
    except:
        res_dict['status'] = False
        res_dict['msg'] = 'Saving Error'
        return jsonify(res_dict)
         
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41

测试语句如下

test_post_data = {'index':'megacorp', 'doc_type':'employee', 'id' : 4, 'body':body3}
resp = req.post('http://YOURIP:24999/save_a_rec/', json = test_post_data)
resp.json()
---
{'data': None, 'msg': 'ok', 'status': True}
  • 1
  • 2
  • 3
  • 4
  • 5

Note:

  • 1 存储的时候是按照id来的,重复的id指挥存储一条
  • 2 存储后可以访问 /stat视图看变化

在这里插入图片描述

2.4 match_search| 精确查找、全文查找

可以指定index或者doc_type查找,或者在所有的index和doc_type中查找。可以限制返回的数量。

# 功能3:精确查找
@app.route('/match_search/', methods = ['POST'])
def match_search():
    res_dict = {}
    input_data = request.get_json()
    # 非空检查
    if not input_data:
        res_dict['status'] = False 
        res_dict['msg'] = 'No Data'
        return jsonify(res_dict)

    # 必要的键值检查
    fhas_key_check = vd.fhasDictKeys(input_data, key_list = ['search_index', 'search_doc_type', 'search_key', 'search_val', 'return_recs']) 
    if not fhas_key_check:
        res_dict['status'] = False 
        res_dict['msg'] ='Must has search_index,search_doc_type, search_key and search_val,return_recs'
        return jsonify(res_dict)

    # 确定搜索的方式
    search_index = input_data.get('search_index')
    search_doc_type = input_data.get('search_doc_type')
    search_key = input_data.get('search_key')
    search_val = input_data.get('search_val')
    return_recs = input_data.get('return_recs')
    if search_key is None or search_val is None:
        res_dict['status'] = False 
        res_dict['msg'] ='Search Key and Value cound neither be None'
        return jsonify(res_dict)

    query_body = {'query':{'match':{search_key:search_val}}}
    query_res = es.search(index=search_index, doc_type = search_doc_type, body=query_body, size=return_recs)

    
    res_dict['status'] = False 
    res_dict['msg'] = 'OK'
    res_dict['data'] = query_res['hits']
    return jsonify(res_dict)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38

测试语句

# 精确匹配
match_search_test_dict= {}
match_search_test_dict['search_index'] = None
match_search_test_dict['search_doc_type'] = None
match_search_test_dict['search_key'] = 'last_name'
match_search_test_dict['search_val'] = 'Fir'
match_search_test_dict['return_recs'] = None

resp = req.post('http://YOURIP:24999/match_search/', json = match_search_test_dict)
resp.json()['data']
---
{'hits': [{'_id': '2790b4497e81b0b2db11d518f6bc9fa7',
   '_index': 'megacorp',
   '_score': 0.2876821,
   '_source': {'about': 'I like to build cabinets',
    'age': 35,
    'first_name': 'Douglas',
    'interests': ['forestry'],
    'last_name': 'Fir'},
   '_type': 'employee'},
  {'_id': '3',
   '_index': 'megacorp',
   '_score': 0.2876821,
   '_source': {'about': 'I like to build cabinets',
    'age': 35,
    'first_name': 'Douglas',
    'interests': ['forestry'],
    'last_name': 'Fir'},
   '_type': 'employee'}],
 'max_score': 0.2876821,
 'total': 2}

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32

如果没有查到

{'hits': [], 'max_score': None, 'total': 0}
  • 1

查找的基本单位是词,英文的分词是通过空格和符号 ;中文的分词我没有装IK,估计就是把字拆开。

增加一个文档

body4={
    "first_name" :  "润发",
    "last_name" :   "周",
    "age" :         35,
    "about":        "拍电影的周",
    "interests":  [ "拍电影" ]
}
res4 = es.index(index="megacorp", doc_type='employee', id=4,body=body4)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

执行全文匹配

match_search_test_dict= {}
match_search_test_dict['search_index'] = None
match_search_test_dict['search_doc_type'] = None
match_search_test_dict['search_key'] = 'first_name'
match_search_test_dict['search_val'] = '润'
match_search_test_dict['return_recs'] = None

resp = req.post('http://YOURIP:24999/match_search/', json = match_search_test_dict)
resp.json()['data']
{'hits': [{'_id': '4',
   '_index': 'megacorp',
   '_score': 0.5377023,
   '_source': {'about': '拍电影的周',
    'age': 35,
    'first_name': '润发',
    'interests': ['拍电影'],
    'last_name': '周'},
   '_type': 'employee'}],
 'max_score': 0.5377023,
 'total': 1}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

2.5 filter_search | 过滤查找

暂时略过

2.6 phrase_search| 短语查找

类似:仅匹配同时包含 “rock” 和 “climbing” ,并且 二者以短语 “rock climbing” 的形式紧挨着的雇员记录。

和精确匹配相比,仅仅在视图函数中改变了query_body中的键名称。

# 功能4:短语匹配
@app.route('/phrase_search/', methods = ['POST'])
def phrase_search():
    res_dict = {}
    input_data = request.get_json()
    # 非空检查
    if not input_data:
        res_dict['status'] = False 
        res_dict['msg'] = 'No Data'
        return jsonify(res_dict)

    # 必要的键值检查
    fhas_key_check = vd.fhasDictKeys(input_data, key_list = ['search_index', 'search_doc_type', 'search_key', 'search_val', 'return_recs']) 
    if not fhas_key_check:
        res_dict['status'] = False 
        res_dict['msg'] ='Must has search_index,search_doc_type, search_key and search_val,return_recs'
        return jsonify(res_dict)

    # 确定搜索的方式
    search_index = input_data.get('search_index')
    search_doc_type = input_data.get('search_doc_type')
    search_key = input_data.get('search_key')
    search_val = input_data.get('search_val')
    return_recs = input_data.get('return_recs')
    if search_key is None or search_val is None:
        res_dict['status'] = False 
        res_dict['msg'] ='Search Key and Value cound neither be None'
        return jsonify(res_dict)

    query_body = {'query':{'match_phrase':{search_key:search_val}}}
    query_res = es.search(index=search_index, doc_type = search_doc_type, body=query_body, size=return_recs)

    
    res_dict['status'] = False 
    res_dict['msg'] = 'OK'
    res_dict['data'] = query_res['hits']
    return jsonify(res_dict)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38

测试

# 短语匹配
match_search_test_dict= {}
match_search_test_dict['search_index'] = None
match_search_test_dict['search_doc_type'] = None
match_search_test_dict['search_key'] = 'about'
match_search_test_dict['search_val'] = 'rock climbing'
match_search_test_dict['return_recs'] = None

resp = req.post('http://YOURIP:24999/phrase_search/', json = match_search_test_dict)
resp.json()['data']

{'hits': [{'_id': '1',
   '_index': 'megacorp',
   '_score': 0.53484553,
   '_source': {'about': 'I love to go rock climbing',
    'age': 25,
    'first_name': 'John',
    'interests': ['sports', 'music'],
    'last_name': 'Smith'},
   '_type': 'employee'}],
 'max_score': 0.53484553,
 'total': 1}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

3 测试

结合DNWS进行测试

实例化的DataNode元数据如下

dn.meta
{'name': 'ddd',
 'dntype': 'rf',
 'data_hash': 'fbbedd77723cc073b939a1eb578d6004',
 'data_size': 0.000232,
 'source': "def ddd(nume_dict, denom_dict, suffix= None):\n    nume_s = pd.Series(nume_dict)\n    denom_s = pd.Series(denom_dict)\n    res_s = nume_s / denom_s\n    if suffix is None:\n        return dict(res_s)\n    else:\n        new_s_index = ['_'.join([x,suffix]) for x in list(res_s.index)]\n        return dict(zip(new_s_index, list(res_s)))\n",
 'glampse': 'bbb',
 'explantion': 'bbb',
 'data_org_type': 'Element'}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

将元数据存到ES

test_post_data = {'index':'dnws', 'doc_type':'data_meta', 'body':dn.meta}
resp = req.post('http://1.15.95.79:24999/save_a_rec/', json = test_post_data)
resp.json()
{'data': None, 'msg': 'ok', 'status': True}
  • 1
  • 2
  • 3
  • 4

在这里插入图片描述

这个例子的dnws存储的是rf格式的数据,获得这个元数据之后就可以拉取数据了。

# 精确匹配
match_search_test_dict= {}
match_search_test_dict['search_index'] = None
match_search_test_dict['search_doc_type'] = None
match_search_test_dict['search_key'] = 'source'
match_search_test_dict['search_val'] = 'ddd'
match_search_test_dict['return_recs'] = None

resp = req.post('http://YOURIP:24999/match_search/', json = match_search_test_dict)
resp.json()['data']
{'hits': [{'_id': 'f5f3c1426de53fb9423628bd7436e3ac',
   '_index': 'dnws',
   '_score': 0.2787979,
   '_source': {'data_hash': 'fbbedd77723cc073b939a1eb578d6004',
    'data_org_type': 'Element',
    'data_size': 0.000232,
    'dntype': 'rf',
    'explantion': 'bbb',
    'glampse': 'bbb',
    'name': 'ddd',
    'source': "def ddd(nume_dict, denom_dict, suffix= None):\n    nume_s = pd.Series(nume_dict)\n    denom_s = pd.Series(denom_dict)\n    res_s = nume_s / denom_s\n    if suffix is None:\n        return dict(res_s)\n    else:\n        new_s_index = ['_'.join([x,suffix]) for x in list(res_s.index)]\n        return dict(zip(new_s_index, list(res_s)))\n"},
   '_type': 'data_meta'}],
 'max_score': 0.2787979,
 'total': 1}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24

在这里插入图片描述

根据name和dntype向DNWS发起查询即可。

2.7 del_a_rec | 删除一个文档

暂略

3 其他补充

  • 1 可以把微服务的接口文档存入备查
  • 2 可以把DNWS元数据存入
  • 3 工具性的技术文档
  • 4 其他备忘录

4 文档

接口文档(ESIO)


ESIO(Elastic Search IO)服务主要用于存储和搜索元数据, 可以用来管理一般性文档和DNWS元数据。

  • 1 接口说明文档 api_doc。p123类型的是内部微服务系统的编号特征。
  • 2 DNWS元数据 dnws。用来存储和搜索元数据。

1 功能简述

ESIO是单主机内的服务,不开放对外端口。

ESIO通过接口形式提供数据的存储和查询,以及删除操作。

2 部署方式

ESIO依赖24005(ES数据库),使用一键部署方式

切换到任何一个文件夹,例如/tmp, 执行脚本的拉取和运行

1 wget -N http://YOURIP:PORT/downup/download/p24006_setup_ESIO.sh
2 bash p24006_setup_ESIO.sh
  • 1
  • 2

3 请求说明

3.1 连通性测试

访问info接口执行测试

import requests as req
req.get('http://172.17.0.1:24006/info/').json()
  • 1
  • 2

结果:

{'data': {'cluster_name': 'elasticsearch',
  'cluster_uuid': 'v3FDcLclSqq66OHHwR_3QQ',
  'name': 'SV-YTmN',
  'tagline': 'You Know, for Search',
  'version': {'build_date': '2018-09-10T20:12:43.732Z',
   'build_hash': 'cfe3d9f',
   'build_snapshot': False,
   'lucene_version': '6.6.1',
   'number': '5.6.12'}},
 'msg': 'ok',
 'status': True}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

当然也可以直接使用浏览器访问,但是要公网地址

3.2 状态检查

访问stat查看当前数据库的状态

resp = req.get('http://172.17.0.1:24006/stat/')
  • 1

结果:

resp.json()

{'data': {'docs': {'megacorp': {'count': 3, 'deleted': 0, 'size': 0.018}},
  'index_list': ['megacorp'],
  'index_nums': 1,
  'total_del_docs': 0,
  'total_docs': 3,
  'total_docs_size_M': 0.018},
 'msg': 'ok',
 'status': True}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

3.3 存一条记录

存储一条记录

test_body={
    "first_name" :  "润发",
    "last_name" :   "周",
    "age" :         35,
    "about":        "拍电影的周",
    "interests":  [ "拍电影" ]
}

test_post_data = {'index':'megacorp', 'doc_type':'employee', 'body':test_body}
resp = req.post('http://172.17.0.1:24006/save_a_rec/', json = test_post_data)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

结果:

resp.json()
{'data': None, 'msg': 'ok', 'status': True}
  • 1
  • 2

3.4 匹配查询

使用词进行匹配查询。注意es采用默认的分词方法,英文一个单词算一个,汉字似乎一个字算一个(没有采用IK分词器)。这个用途应该是最广泛的。

match_search_test_dict= {}
match_search_test_dict['search_index'] = None
match_search_test_dict['search_doc_type'] = None
match_search_test_dict['search_key'] = 'first_name'
match_search_test_dict['search_val'] = '润'
match_search_test_dict['return_recs'] = None

resp = req.post('http://172.17.0.1:24006/match_search/', json = match_search_test_dict)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

结果:

resp.json()

{'data': {'hits': [{'_id': 'fea27cfdc0705739c65eceda605524bb',
'_index': 'megacorp',
'_score': 0.5377023,
'_source': {'about': '拍电影的周',
 'age': 35,
 'first_name': '润发',
 'interests': ['拍电影'],
 'last_name': '周'},
'_type': 'employee'}],
  'max_score': 0.5377023,
  'total': 1},
 'msg': 'OK',
 'status': False}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

3.5 短语查询

使用短语进行查询,例如要同时满足rock climbing短语存在才会返回。这个可能在搜索代码时比较有用。

match_search_test_dict= {}
match_search_test_dict['search_index'] = None
match_search_test_dict['search_doc_type'] = None
match_search_test_dict['search_key'] = 'about'
match_search_test_dict['search_val'] = 'rock climbing'
match_search_test_dict['return_recs'] = None

resp = req.post('http://172.17.0.1:24006/phrase_search/', json = match_search_test_dict)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

结果:

resp.json()

{'data': {'hits': [{'_id': '1',
'_index': 'megacorp',
'_score': 0.53484553,
'_source': {'about': 'I love to go rock climbing',
 'age': 25,
 'first_name': 'John',
 'interests': ['sports', 'music'],
 'last_name': 'Smith'},
'_type': 'employee'}],
  'max_score': 0.53484553,
  'total': 1},
 'msg': 'OK',
 'status': False}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/羊村懒王/article/detail/407858
推荐阅读
相关标签
  

闽ICP备14008679号