当前位置:   article > 正文

【ElasticSearch】ES 5.6.15 向量插件支持_elasticsearch vector scoring插件

elasticsearch vector scoring插件

参考 :
https://github.com/lior-k/fast-elasticsearch-vector-scoring

  1. 下载插件

  2. 安装插件
    插件目录:
    elasticsearch/plugins,
    安装后的目录如下

     plugins
     └── vector
         ├── elasticsearch-binary-vector-scoring-5.6.9.jar
         └── plugin-descriptor.properties
    
    • 1
    • 2
    • 3
    • 4

    修改 plugin-descriptor.properties 中的 elasticsearch.version 为 5.6.15(因为这里使用的是5.6.15版本ES),安装完成后重启ES。

  3. 构建测试索引

    PUT /vector_test
    {
      "settings": {
        "index": {
          "number_of_shards": 3,
          "number_of_replicas": 0
        }
      },
      "mappings": {
        "resume": {
          "dynamic": "strict",
          "properties": {
            "file_hash": {
              "type": "keyword"
            },
            "embedding_vector": {
              "type": "binary",
              "doc_values": true
            },
            "doc": {
              "type": "text"
            }
          }
        }
      }
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
  4. 构建测试数据

使用如下方法生成向量base64字符串

import base64
import numpy as np
 
dfloat32 = np.dtype('>f4')
 
def decode_float_list(base64_string):
    bytes = base64.b64decode(base64_string)
    return np.frombuffer(bytes, dtype=dfloat32).tolist()
 
def encode_array(arr):
    base64_str = base64.b64encode(np.array(arr).astype(dfloat32)).decode("utf-8")
    return base64_str

print(encode_array([0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]))
print(encode_array([0.001,0.002,0.003,0.004,0.005,0.006,0.007,0.008,0.009,0.010]))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

将上述得到的结果放到下面内容(embedding_vector)中,这里 embedding_vector 要求传入上述方式base64生成的字符串

PUT /vector_test/resume/1
{
  "file_hash": "hash1",
  "embedding_vector": "PczMzT5MzM0+mZmaPszMzT8AAAA/GZmaPzMzMz9MzM0/ZmZmP4AAAA==",
  "doc": "This is the content of the first document."
}

PUT /vector_test/resume/2
{
  "file_hash": "hash2",
  "embedding_vector": "OoMSbzsDEm87RJumO4MSbzuj1wo7xJumO+VgQjwDEm88E3S8PCPXCg==",
  "doc": "This is the content of the second document."
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  1. 查询测试

    POST /vector_test/resume/_search
    {
      "query": {
        "function_score": {
          "boost_mode": "replace",
          "script_score": {
            "script": {
              "source": "binary_vector_score",
              "lang": "knn",
              "params": {
                "cosine": true,
                "field": "embedding_vector",
                "vector": [
                  1.0,
                  0.8,
                  0.2223,
                  0.7,
                  0.6,
                  0.5,
                  0.4,
                  0.3,
                  0.2,
                  0.1
                ]
              }
            }
          }
        }
      },
      "size": 2,
      "_source": [
        "file_hash"
      ]
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34

    查询结果

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 4,
        "max_score": 0.998783,
        "hits": [
          {
            "_index": "vector_test",
            "_type": "resume",
            "_id": "4",
            "_score": 0.998783,
            "_source": {
              "file_hash": "hash4"
            }
          },
          {
            "_index": "vector_test",
            "_type": "resume",
            "_id": "1",
            "_score": 0.5818508,
            "_source": {
              "file_hash": "hash1"
            }
          }
        ]
      }
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/空白诗007/article/detail/825899
推荐阅读
相关标签
  

闽ICP备14008679号