赞
踩
配合ES数据库做的服务,做个简单的设计。
过去其实对不同的数据库都写过专门的对象,来实现一系列功能,最后大部分都包的很好,但是要使用或者是改动就要回忆…。所以我想以后都以接口形式来处理和数据库的交互。
PS:基础的增删改查我觉得是不够的,使用时通常是一套“组合拳”。例如,「存在则删除」、「不存在则插入」… (这类操作多少有点CLC的意思),用接口的形式或许会比对象更好。未来将SCLC封装为对象,基本上的逻辑控制就可以满足了。
先做几个简单的功能,以后再逐步的加。
序号 | 名字 | 内容 |
---|---|---|
1 | / | 主函数,用于连通性测试, GET |
2 | get_stat/ | 基本的统计,类似于展示库和表的命令。获取索引、文档的类型、文档的数量。 |
3 | save_a_rec | 存储一个文档,文档必须有 id, title, content和slot。 |
4 | match_search | 精确查找、全文查找 |
5 | filter_search | 过滤查找 |
6 | phrase_search | 短语查找 |
7 | del_a_rec | 删除一个文档 |
需要使用elasticsearch
这个包(版本需要指定,7.16不行pip3 install elasticsearch==7.13.4
),所以我在原来的base-flask
镜像上做了一个更改(v8),使用最新的启动服务即可。
假设已经启动了ES数据库(端口24005),有两种方式可以获取数据
一种是使用python的包,主要是在indices
下面
# 本机 from elasticsearch import Elasticsearch es = Elasticsearch([{'host':'172.17.0.1','port':24005}]) es.indices.stats() {'_shards': {'total': 10, 'successful': 5, 'failed': 0}, '_all': {'primaries': {'docs': {'count': 3, 'deleted': 0}, 'store': {'size_in_bytes': 17547, 'throttle_time_in_millis': 0}, 'indexing': {'index_total': 3, 'index_time_in_millis': 53, 'index_current': 0, 'index_failed': 0, 'delete_total': 0, 'delete_time_in_millis': 0, 'delete_current': 0, 'noop_update_total': 0, 'is_throttled': False, 'throttle_time_in_millis': 0}, 'get': {'total': 0, 'time_in_millis': 0, 'exists_total': 0, 'exists_time_in_millis': 0, 'missing_total': 0, 'missing_time_in_millis': 0, 'current': 0}, 'search': {'open_contexts': 0, 'query_total': 0, 'query_time_in_millis': 0, 'query_current': 0, 'fetch_total': 0, 'fetch_time_in_millis': 0, 'fetch_current': 0, 'scroll_total': 0, 'scroll_time_in_millis': 0, 'scroll_current': 0, 'suggest_total': 0, 'suggest_time_in_millis': 0, 'suggest_current': 0}, 'merges': {'current': 0, 'current_docs': 0, 'current_size_in_bytes': 0, 'total': 0, 'total_time_in_millis': 0, 'total_docs': 0, 'total_size_in_bytes': 0, 'total_stopped_time_in_millis': 0, 'total_throttled_time_in_millis': 0, 'total_auto_throttle_in_bytes': 104857600}, 'refresh': {'total': 22, 'total_time_in_millis': 93, 'listeners': 0}, 'flush': {'total': 3, 'total_time_in_millis': 65}, 'warmer': {'current': 0, 'total': 14, 'total_time_in_millis': 6}, 'query_cache': {'memory_size_in_bytes': 0, 'total_count': 0, 'hit_count': 0, 'miss_count': 0, 'cache_size': 0, 'cache_count': 0, 'evictions': 0}, 'fielddata': {'memory_size_in_bytes': 0, 'evictions': 0}, 'completion': {'size_in_bytes': 0}, 'segments': {'count': 3, 'memory_in_bytes': 11243, 'terms_memory_in_bytes': 8868, 'stored_fields_memory_in_bytes': 936, 'term_vectors_memory_in_bytes': 0, 'norms_memory_in_bytes': 960, 'points_memory_in_bytes': 3, 'doc_values_memory_in_bytes': 476, 'index_writer_memory_in_bytes': 0, 'version_map_memory_in_bytes': 0, 'fixed_bit_set_memory_in_bytes': 0, 'max_unsafe_auto_id_timestamp': -1, 'file_sizes': {}}, 'translog': {'operations': 0, 'size_in_bytes': 215}, 'request_cache': {'memory_size_in_bytes': 0, 'evictions': 0, 'hit_count': 0, 'miss_count': 0}, 'recovery': {'current_as_source': 0, 'current_as_target': 0, 'throttle_time_in_millis': 0}}, 'total': {'docs': {'count': 3, 'deleted': 0}, 'store': {'size_in_bytes': 17547, 'throttle_time_in_millis': 0}, 'indexing': {'index_total': 3, 'index_time_in_millis': 53, 'index_current': 0, 'index_failed': 0, 'delete_total': 0, 'delete_time_in_millis': 0, 'delete_current': 0, 'noop_update_total': 0, 'is_throttled': False, 'throttle_time_in_millis': 0}, 'get': {'total': 0, 'time_in_millis': 0, 'exists_total': 0, 'exists_time_in_millis': 0, 'missing_total': 0, 'missing_time_in_millis': 0, 'current': 0}, 'search': {'open_contexts': 0, 'query_total': 0, 'query_time_in_millis': 0, 'query_current': 0, 'fetch_total': 0, 'fetch_time_in_millis': 0, 'fetch_current': 0, 'scroll_total': 0, 'scroll_time_in_millis': 0, 'scroll_current': 0, 'suggest_total': 0, 'suggest_time_in_millis': 0, 'suggest_current': 0}, 'merges': {'current': 0, 'current_docs': 0, 'current_size_in_bytes': 0, 'total': 0, 'total_time_in_millis': 0, 'total_docs': 0, 'total_size_in_bytes': 0, 'total_stopped_time_in_millis': 0, 'total_throttled_time_in_millis': 0, 'total_auto_throttle_in_bytes': 104857600}, 'refresh': {'total': 22, 'total_time_in_millis': 93, 'listeners': 0}, 'flush': {'total': 3, 'total_time_in_millis': 65}, 'warmer': {'current': 0, 'total': 14, 'total_time_in_millis': 6}, 'query_cache': {'memory_size_in_bytes': 0, 'total_count': 0, 'hit_count': 0, 'miss_count': 0, 'cache_size': 0, 'cache_count': 0, 'evictions': 0}, 'fielddata': {'memory_size_in_bytes': 0, 'evictions': 0}, 'completion': {'size_in_bytes': 0}, 'segments': {'count': 3, 'memory_in_bytes': 11243, 'terms_memory_in_bytes': 8868, 'stored_fields_memory_in_bytes': 936, 'term_vectors_memory_in_bytes': 0, 'norms_memory_in_bytes': 960, 'points_memory_in_bytes': 3, 'doc_values_memory_in_bytes': 476, 'index_writer_memory_in_bytes': 0, 'version_map_memory_in_bytes': 0, 'fixed_bit_set_memory_in_bytes': 0, 'max_unsafe_auto_id_timestamp': -1, 'file_sizes': {}}, 'translog': {'operations': 0, 'size_in_bytes': 215}, 'request_cache': {'memory_size_in_bytes': 0, 'evictions': 0, 'hit_count': 0, 'miss_count': 0}, 'recovery': {'current_as_source': 0, 'current_as_target': 0, 'throttle_time_in_millis': 0}}}, 'indices': {'megacorp': {'primaries': {'docs': {'count': 3, 'deleted': 0}, 'store': {'size_in_bytes': 17547, 'throttle_time_in_millis': 0}, 'indexing': {'index_total': 3, 'index_time_in_millis': 53, 'index_current': 0, 'index_failed': 0, 'delete_total': 0, 'delete_time_in_millis': 0, 'delete_current': 0, 'noop_update_total': 0, 'is_throttled': False, 'throttle_time_in_millis': 0}, 'get': {'total': 0, 'time_in_millis': 0, 'exists_total': 0, 'exists_time_in_millis': 0, 'missing_total': 0, 'missing_time_in_millis': 0, 'current': 0}, 'search': {'open_contexts': 0, 'query_total': 0, 'query_time_in_millis': 0, 'query_current': 0, 'fetch_total': 0, 'fetch_time_in_millis': 0, 'fetch_current': 0, 'scroll_total': 0, 'scroll_time_in_millis': 0, 'scroll_current': 0, 'suggest_total': 0, 'suggest_time_in_millis': 0, 'suggest_current': 0}, 'merges': {'current': 0, 'current_docs': 0, 'current_size_in_bytes': 0, 'total': 0, 'total_time_in_millis': 0, 'total_docs': 0, 'total_size_in_bytes': 0, 'total_stopped_time_in_millis': 0, 'total_throttled_time_in_millis': 0, 'total_auto_throttle_in_bytes': 104857600}, 'refresh': {'total': 22, 'total_time_in_millis': 93, 'listeners': 0}, 'flush': {'total': 3, 'total_time_in_millis': 65}, 'warmer': {'current': 0, 'total': 14, 'total_time_in_millis': 6}, 'query_cache': {'memory_size_in_bytes': 0, 'total_count': 0, 'hit_count': 0, 'miss_count': 0, 'cache_size': 0, 'cache_count': 0, 'evictions': 0}, 'fielddata': {'memory_size_in_bytes': 0, 'evictions': 0}, 'completion': {'size_in_bytes': 0}, 'segments': {'count': 3, 'memory_in_bytes': 11243, 'terms_memory_in_bytes': 8868, 'stored_fields_memory_in_bytes': 936, 'term_vectors_memory_in_bytes': 0, 'norms_memory_in_bytes': 960, 'points_memory_in_bytes': 3, 'doc_values_memory_in_bytes': 476, 'index_writer_memory_in_bytes': 0, 'version_map_memory_in_bytes': 0, 'fixed_bit_set_memory_in_bytes': 0, 'max_unsafe_auto_id_timestamp': -1, 'file_sizes': {}}, 'translog': {'operations': 0, 'size_in_bytes': 215}, 'request_cache': {'memory_size_in_bytes': 0, 'evictions': 0, 'hit_count': 0, 'miss_count': 0}, 'recovery': {'current_as_source': 0, 'current_as_target': 0, 'throttle_time_in_millis': 0}}, 'total': {'docs': {'count': 3, 'deleted': 0}, 'store': {'size_in_bytes': 17547, 'throttle_time_in_millis': 0}, 'indexing': {'index_total': 3, 'index_time_in_millis': 53, 'index_current': 0, 'index_failed': 0, 'delete_total': 0, 'delete_time_in_millis': 0, 'delete_current': 0, 'noop_update_total': 0, 'is_throttled': False, 'throttle_time_in_millis': 0}, 'get': {'total': 0, 'time_in_millis': 0, 'exists_total': 0, 'exists_time_in_millis': 0, 'missing_total': 0, 'missing_time_in_millis': 0, 'current': 0}, 'search': {'open_contexts': 0, 'query_total': 0, 'query_time_in_millis': 0, 'query_current': 0, 'fetch_total': 0, 'fetch_time_in_millis': 0, 'fetch_current': 0, 'scroll_total': 0, 'scroll_time_in_millis': 0, 'scroll_current': 0, 'suggest_total': 0, 'suggest_time_in_millis': 0, 'suggest_current': 0}, 'merges': {'current': 0, 'current_docs': 0, 'current_size_in_bytes': 0, 'total': 0, 'total_time_in_millis': 0, 'total_docs': 0, 'total_size_in_bytes': 0, 'total_stopped_time_in_millis': 0, 'total_throttled_time_in_millis': 0, 'total_auto_throttle_in_bytes': 104857600}, 'refresh': {'total': 22, 'total_time_in_millis': 93, 'listeners': 0}, 'flush': {'total': 3, 'total_time_in_millis': 65}, 'warmer': {'current': 0, 'total': 14, 'total_time_in_millis': 6}, 'query_cache': {'memory_size_in_bytes': 0, 'total_count': 0, 'hit_count': 0, 'miss_count': 0, 'cache_size': 0, 'cache_count': 0, 'evictions': 0}, 'fielddata': {'memory_size_in_bytes': 0, 'evictions': 0}, 'completion': {'size_in_bytes': 0}, 'segments': {'count': 3, 'memory_in_bytes': 11243, 'terms_memory_in_bytes': 8868, 'stored_fields_memory_in_bytes': 936, 'term_vectors_memory_in_bytes': 0, 'norms_memory_in_bytes': 960, 'points_memory_in_bytes': 3, 'doc_values_memory_in_bytes': 476, 'index_writer_memory_in_bytes': 0, 'version_map_memory_in_bytes': 0, 'fixed_bit_set_memory_in_bytes': 0, 'max_unsafe_auto_id_timestamp': -1, 'file_sizes': {}}, 'translog': {'operations': 0, 'size_in_bytes': 215}, 'request_cache': {'memory_size_in_bytes': 0, 'evictions': 0, 'hit_count': 0, 'miss_count': 0}, 'recovery': {'current_as_source': 0, 'current_as_target': 0, 'throttle_time_in_millis': 0}}}}}
另一种方式是直接访问服务的接口
import requests as req
resp = req.get('http://172.17.0.1:24005/_stats')
resp_stat = resp.json()
resp_stat.keys()
---
dict_keys(['_shards', '_all', 'indices'])
resp_stat['indices'].keys()
dict_keys(['megacorp'])
两种方式的效果是一样的。
使用简单接口服务家族(SimpleAPI
)的项目模板,复制一份 ,在配置文件config.py
中增加
# 基类
class Config:
SECRET_KEY = 'xxxxxx'
BASE_DIR = basedir
ES_HOST = '172.17.0.1'
# 正式的服务分配24005
ES_PORT = 9200
在app初始化中实例化ES,因为是简单API(SimpleAPI),所以app的载入和视图函数都写在entry_py.py
中
app = Flask(__name__)
# 导入config中设置的配置文件
app.config.from_object(config[run_env])
# 返回中文显示utf8
app.config['JSON_AS_ASCII'] = False
# json dumps 显示utf8 --可能用不上
app.config['ENSURE_ASCII'] = False
print('Static Data Host %s : Port %s' % ( app.config['ES_HOST'], app.config['ES_PORT'] ))
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host':app.config['ES_HOST'],'port':int(app.config['ES_PORT'])}])
对应的视图函数
# 连通性测试
@app.route('/info/', methods=['GET'])
def info():
res_dict = {}
res_dict['status'] = True
res_dict['msg'] = 'ok'
res_dict['data'] = es.info()
# ========== define here ==========
return jsonify(res_dict)
使用浏览器访问http://YOURIP:24999/info/
,这和直接访问es的根目录效果是一样的。
查询当前ES中的索引和文档数量,使用浏览器访问http://YOURIP:24999/stat/
# 功能1:查询当前的索引名称及文档数量 @app.route('/stat/',methods=['GET']) def stat(): res_dict = {} data_dict = {} # 1 索引的个数 resp_dict = es.indices.stats() idx_list = [] idx_list = sorted(list(resp_dict['indices'].keys())) data_dict['index_nums'] = len(idx_list) data_dict['index_list'] = idx_list tem_total_doc_cnt = 0 tem_total_doc_del_cnt = 0 tem_total_doc_size_sum = 0 data_dict['docs'] = {} for some_idx in idx_list: tem_dict = resp_dict['indices'][some_idx]['total']['docs'] tem_dict['size'] = round(resp_dict['indices'][some_idx]['total']['store']['size_in_bytes'] / 1e6 , 3) data_dict['docs'][some_idx] = tem_dict tem_total_doc_cnt += tem_dict['count'] tem_total_doc_del_cnt += tem_dict['deleted'] tem_total_doc_size_sum += tem_dict['size'] data_dict['total_docs'] = tem_total_doc_cnt data_dict['total_del_docs'] = tem_total_doc_del_cnt data_dict['total_docs_size_M'] = tem_total_doc_size_sum res_dict['status'] = True res_dict['msg'] = 'ok' res_dict['data'] = data_dict # ========== define here ========== return jsonify(res_dict)
参考这条res1 = es.index(index="megacorp", doc_type='employee', id=1,body=body1)
命令,使用接口实现如下:
# 功能2:存储一条记录 @app.route('/save_a_rec/', methods = ['POST']) def save_a_rec(): res_dict = {} input_data = request.get_json() # 非空检查 if not input_data: res_dict['status'] = False res_dict['msg'] = 'No Data' return jsonify(res_dict) # 必要的键值检查 fhas_key_check = vd.fhasDictKeys(input_data, key_list = ['index', 'doc_type', 'body']) if not fhas_key_check: res_dict['status'] = False res_dict['msg'] ='Must has index,doc_type and body' return jsonify(res_dict) # 将内容的哈希作为id hash_id = ifuncs.get_md5_digest(input_data['body'], fs= ifuncs) input_data['id'] = hash_id data_dict_to_save = {} data_dict_to_save['index'] = input_data['index'] data_dict_to_save['doc_type'] = input_data['doc_type'] data_dict_to_save['id'] = hash_id data_dict_to_save['body'] = input_data['body'] try: es.index(**data_dict_to_save) res_dict['status'] = True res_dict['msg'] = 'ok' res_dict['data'] = None # ========== define here ========== return jsonify(res_dict) except: res_dict['status'] = False res_dict['msg'] = 'Saving Error' return jsonify(res_dict)
测试语句如下
test_post_data = {'index':'megacorp', 'doc_type':'employee', 'id' : 4, 'body':body3}
resp = req.post('http://YOURIP:24999/save_a_rec/', json = test_post_data)
resp.json()
---
{'data': None, 'msg': 'ok', 'status': True}
Note:
可以指定index或者doc_type查找,或者在所有的index和doc_type中查找。可以限制返回的数量。
# 功能3:精确查找 @app.route('/match_search/', methods = ['POST']) def match_search(): res_dict = {} input_data = request.get_json() # 非空检查 if not input_data: res_dict['status'] = False res_dict['msg'] = 'No Data' return jsonify(res_dict) # 必要的键值检查 fhas_key_check = vd.fhasDictKeys(input_data, key_list = ['search_index', 'search_doc_type', 'search_key', 'search_val', 'return_recs']) if not fhas_key_check: res_dict['status'] = False res_dict['msg'] ='Must has search_index,search_doc_type, search_key and search_val,return_recs' return jsonify(res_dict) # 确定搜索的方式 search_index = input_data.get('search_index') search_doc_type = input_data.get('search_doc_type') search_key = input_data.get('search_key') search_val = input_data.get('search_val') return_recs = input_data.get('return_recs') if search_key is None or search_val is None: res_dict['status'] = False res_dict['msg'] ='Search Key and Value cound neither be None' return jsonify(res_dict) query_body = {'query':{'match':{search_key:search_val}}} query_res = es.search(index=search_index, doc_type = search_doc_type, body=query_body, size=return_recs) res_dict['status'] = False res_dict['msg'] = 'OK' res_dict['data'] = query_res['hits'] return jsonify(res_dict)
测试语句
# 精确匹配 match_search_test_dict= {} match_search_test_dict['search_index'] = None match_search_test_dict['search_doc_type'] = None match_search_test_dict['search_key'] = 'last_name' match_search_test_dict['search_val'] = 'Fir' match_search_test_dict['return_recs'] = None resp = req.post('http://YOURIP:24999/match_search/', json = match_search_test_dict) resp.json()['data'] --- {'hits': [{'_id': '2790b4497e81b0b2db11d518f6bc9fa7', '_index': 'megacorp', '_score': 0.2876821, '_source': {'about': 'I like to build cabinets', 'age': 35, 'first_name': 'Douglas', 'interests': ['forestry'], 'last_name': 'Fir'}, '_type': 'employee'}, {'_id': '3', '_index': 'megacorp', '_score': 0.2876821, '_source': {'about': 'I like to build cabinets', 'age': 35, 'first_name': 'Douglas', 'interests': ['forestry'], 'last_name': 'Fir'}, '_type': 'employee'}], 'max_score': 0.2876821, 'total': 2}
如果没有查到
{'hits': [], 'max_score': None, 'total': 0}
查找的基本单位是词,英文的分词是通过空格和符号 ;中文的分词我没有装IK,估计就是把字拆开。
增加一个文档
body4={
"first_name" : "润发",
"last_name" : "周",
"age" : 35,
"about": "拍电影的周",
"interests": [ "拍电影" ]
}
res4 = es.index(index="megacorp", doc_type='employee', id=4,body=body4)
执行全文匹配
match_search_test_dict= {} match_search_test_dict['search_index'] = None match_search_test_dict['search_doc_type'] = None match_search_test_dict['search_key'] = 'first_name' match_search_test_dict['search_val'] = '润' match_search_test_dict['return_recs'] = None resp = req.post('http://YOURIP:24999/match_search/', json = match_search_test_dict) resp.json()['data'] {'hits': [{'_id': '4', '_index': 'megacorp', '_score': 0.5377023, '_source': {'about': '拍电影的周', 'age': 35, 'first_name': '润发', 'interests': ['拍电影'], 'last_name': '周'}, '_type': 'employee'}], 'max_score': 0.5377023, 'total': 1}
暂时略过
类似:仅匹配同时包含 “rock” 和 “climbing” ,并且 二者以短语 “rock climbing” 的形式紧挨着的雇员记录。
和精确匹配相比,仅仅在视图函数中改变了query_body
中的键名称。
# 功能4:短语匹配 @app.route('/phrase_search/', methods = ['POST']) def phrase_search(): res_dict = {} input_data = request.get_json() # 非空检查 if not input_data: res_dict['status'] = False res_dict['msg'] = 'No Data' return jsonify(res_dict) # 必要的键值检查 fhas_key_check = vd.fhasDictKeys(input_data, key_list = ['search_index', 'search_doc_type', 'search_key', 'search_val', 'return_recs']) if not fhas_key_check: res_dict['status'] = False res_dict['msg'] ='Must has search_index,search_doc_type, search_key and search_val,return_recs' return jsonify(res_dict) # 确定搜索的方式 search_index = input_data.get('search_index') search_doc_type = input_data.get('search_doc_type') search_key = input_data.get('search_key') search_val = input_data.get('search_val') return_recs = input_data.get('return_recs') if search_key is None or search_val is None: res_dict['status'] = False res_dict['msg'] ='Search Key and Value cound neither be None' return jsonify(res_dict) query_body = {'query':{'match_phrase':{search_key:search_val}}} query_res = es.search(index=search_index, doc_type = search_doc_type, body=query_body, size=return_recs) res_dict['status'] = False res_dict['msg'] = 'OK' res_dict['data'] = query_res['hits'] return jsonify(res_dict)
测试
# 短语匹配 match_search_test_dict= {} match_search_test_dict['search_index'] = None match_search_test_dict['search_doc_type'] = None match_search_test_dict['search_key'] = 'about' match_search_test_dict['search_val'] = 'rock climbing' match_search_test_dict['return_recs'] = None resp = req.post('http://YOURIP:24999/phrase_search/', json = match_search_test_dict) resp.json()['data'] {'hits': [{'_id': '1', '_index': 'megacorp', '_score': 0.53484553, '_source': {'about': 'I love to go rock climbing', 'age': 25, 'first_name': 'John', 'interests': ['sports', 'music'], 'last_name': 'Smith'}, '_type': 'employee'}], 'max_score': 0.53484553, 'total': 1}
结合DNWS进行测试
实例化的DataNode
元数据如下
dn.meta
{'name': 'ddd',
'dntype': 'rf',
'data_hash': 'fbbedd77723cc073b939a1eb578d6004',
'data_size': 0.000232,
'source': "def ddd(nume_dict, denom_dict, suffix= None):\n nume_s = pd.Series(nume_dict)\n denom_s = pd.Series(denom_dict)\n res_s = nume_s / denom_s\n if suffix is None:\n return dict(res_s)\n else:\n new_s_index = ['_'.join([x,suffix]) for x in list(res_s.index)]\n return dict(zip(new_s_index, list(res_s)))\n",
'glampse': 'bbb',
'explantion': 'bbb',
'data_org_type': 'Element'}
将元数据存到ES
test_post_data = {'index':'dnws', 'doc_type':'data_meta', 'body':dn.meta}
resp = req.post('http://1.15.95.79:24999/save_a_rec/', json = test_post_data)
resp.json()
{'data': None, 'msg': 'ok', 'status': True}
这个例子的dnws存储的是rf格式的数据,获得这个元数据之后就可以拉取数据了。
# 精确匹配 match_search_test_dict= {} match_search_test_dict['search_index'] = None match_search_test_dict['search_doc_type'] = None match_search_test_dict['search_key'] = 'source' match_search_test_dict['search_val'] = 'ddd' match_search_test_dict['return_recs'] = None resp = req.post('http://YOURIP:24999/match_search/', json = match_search_test_dict) resp.json()['data'] {'hits': [{'_id': 'f5f3c1426de53fb9423628bd7436e3ac', '_index': 'dnws', '_score': 0.2787979, '_source': {'data_hash': 'fbbedd77723cc073b939a1eb578d6004', 'data_org_type': 'Element', 'data_size': 0.000232, 'dntype': 'rf', 'explantion': 'bbb', 'glampse': 'bbb', 'name': 'ddd', 'source': "def ddd(nume_dict, denom_dict, suffix= None):\n nume_s = pd.Series(nume_dict)\n denom_s = pd.Series(denom_dict)\n res_s = nume_s / denom_s\n if suffix is None:\n return dict(res_s)\n else:\n new_s_index = ['_'.join([x,suffix]) for x in list(res_s.index)]\n return dict(zip(new_s_index, list(res_s)))\n"}, '_type': 'data_meta'}], 'max_score': 0.2787979, 'total': 1}
根据name和dntype向DNWS发起查询即可。
暂略
ESIO(Elastic Search IO)服务主要用于存储和搜索元数据, 可以用来管理一般性文档和DNWS元数据。
ESIO是单主机内的服务,不开放对外端口。
ESIO通过接口形式提供数据的存储和查询,以及删除操作。
ESIO依赖24005(ES数据库),使用一键部署方式
切换到任何一个文件夹,例如/tmp
, 执行脚本的拉取和运行
1 wget -N http://YOURIP:PORT/downup/download/p24006_setup_ESIO.sh
2 bash p24006_setup_ESIO.sh
访问info接口执行测试
import requests as req
req.get('http://172.17.0.1:24006/info/').json()
结果:
{'data': {'cluster_name': 'elasticsearch',
'cluster_uuid': 'v3FDcLclSqq66OHHwR_3QQ',
'name': 'SV-YTmN',
'tagline': 'You Know, for Search',
'version': {'build_date': '2018-09-10T20:12:43.732Z',
'build_hash': 'cfe3d9f',
'build_snapshot': False,
'lucene_version': '6.6.1',
'number': '5.6.12'}},
'msg': 'ok',
'status': True}
当然也可以直接使用浏览器访问,但是要公网地址
访问stat查看当前数据库的状态
resp = req.get('http://172.17.0.1:24006/stat/')
结果:
resp.json()
{'data': {'docs': {'megacorp': {'count': 3, 'deleted': 0, 'size': 0.018}},
'index_list': ['megacorp'],
'index_nums': 1,
'total_del_docs': 0,
'total_docs': 3,
'total_docs_size_M': 0.018},
'msg': 'ok',
'status': True}
存储一条记录
test_body={
"first_name" : "润发",
"last_name" : "周",
"age" : 35,
"about": "拍电影的周",
"interests": [ "拍电影" ]
}
test_post_data = {'index':'megacorp', 'doc_type':'employee', 'body':test_body}
resp = req.post('http://172.17.0.1:24006/save_a_rec/', json = test_post_data)
结果:
resp.json()
{'data': None, 'msg': 'ok', 'status': True}
使用词进行匹配查询。注意es采用默认的分词方法,英文一个单词算一个,汉字似乎一个字算一个(没有采用IK分词器)。这个用途应该是最广泛的。
match_search_test_dict= {}
match_search_test_dict['search_index'] = None
match_search_test_dict['search_doc_type'] = None
match_search_test_dict['search_key'] = 'first_name'
match_search_test_dict['search_val'] = '润'
match_search_test_dict['return_recs'] = None
resp = req.post('http://172.17.0.1:24006/match_search/', json = match_search_test_dict)
结果:
resp.json()
{'data': {'hits': [{'_id': 'fea27cfdc0705739c65eceda605524bb',
'_index': 'megacorp',
'_score': 0.5377023,
'_source': {'about': '拍电影的周',
'age': 35,
'first_name': '润发',
'interests': ['拍电影'],
'last_name': '周'},
'_type': 'employee'}],
'max_score': 0.5377023,
'total': 1},
'msg': 'OK',
'status': False}
使用短语进行查询,例如要同时满足rock climbing
短语存在才会返回。这个可能在搜索代码时比较有用。
match_search_test_dict= {}
match_search_test_dict['search_index'] = None
match_search_test_dict['search_doc_type'] = None
match_search_test_dict['search_key'] = 'about'
match_search_test_dict['search_val'] = 'rock climbing'
match_search_test_dict['return_recs'] = None
resp = req.post('http://172.17.0.1:24006/phrase_search/', json = match_search_test_dict)
结果:
resp.json()
{'data': {'hits': [{'_id': '1',
'_index': 'megacorp',
'_score': 0.53484553,
'_source': {'about': 'I love to go rock climbing',
'age': 25,
'first_name': 'John',
'interests': ['sports', 'music'],
'last_name': 'Smith'},
'_type': 'employee'}],
'max_score': 0.53484553,
'total': 1},
'msg': 'OK',
'status': False}
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。