当前位置:   article > 正文

GraphRAG如何使用ollama提供的llm model 和Embedding model服务构建本地知识库_zerodivisionerror: weights sum to zero, can't be n

zerodivisionerror: weights sum to zero, can't be normalized

使用GraphRAG踩坑无数

在GraphRAG的使用过程中将需要踩的坑都踩了一遍(不得不吐槽下,官方代码有很多遗留问题,他们自己也承认工作重心在算法的优化而不是各种模型和框架的兼容性适配性上),经过了大量的查阅各种资料以及debug过程(Indexing的过程有点费机器),最终成功运行了GraphRAG项目。先后测试了两种方式,都成功了:

  1. 使用ollama提供本地llm model和Embedding model服务
  2. 使用ollama提供llm model服务,使用lm-studio提供embedding model服务

之所以要使用ollama同时提供llm和Embedding模型服务,是因为ollama实在是太优雅了,使用超级简单,响应速度也超级快。

使用ollama提供服务的方式如下:

1、安装GraphRAG:

pip install graphrag -i https://pypi.tuna.tsinghua.edu.cn/simple
  • 1
  1. 创建一个文件路径:./ragtest/input
mkdir -p ./ragtest/input
  • 1
  1. 将语料文本文件放在这个路径下, 文件格式为txt, 注意:txt文件必须是utf-8编码的,可以用记事本打开另存为得到。
  2. 使用命令python -m graphrag.index --init --root ./ragtest初始化工程:
python -m graphrag.index --init --root ./ragtest
  • 1
  1. 修改.env文件内容如下:
GRAPHRAG_API_KEY=ollama
GRAPHRAG_CLAIM_EXTRACTION_ENABLED=True
  • 1
  • 2

注意:必须加上参数GRAPHRAG_CLAIM_EXTRACTION_ENABLED=True,否则无法生成协变量covariates, 在Local Search时会出错。

  1. 修改.setting.yaml文件,内容如下:
encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ollama
  type: openai_chat # or azure_openai_chat
  model: qwen2
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1/
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ollama
    type: openai_embedding # or azure_openai_embedding
    model: nomic-embed-text
    api_base: http://localhost:11434/v1/
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
    ...
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  1. 使用ollama启动llm和Embedding服务,其中embedding 模型是nomic-embed-text:
ollama pull qwen2
ollama pull nomic-embed-text
ollama serve
  • 1
  • 2
  • 3
  1. 修改文件:D:\ProgramData\miniconda3\envs\graphRAG\Lib\site-packages\graphrag\llm\openai\openai_embeddings_llm.py内容(根据大家自己安装GraphRAG的路径查找),调用ollama服务:
import ollama

# ....

class OpenAIEmbeddingsLLM(BaseLLM[EmbeddingInput, EmbeddingOutput]):
    """A text-embedding generator LLM."""

    _client: OpenAIClientTypes
    _configuration: OpenAIConfiguration

    def __init__(self, client: OpenAIClientTypes, configuration: OpenAIConfiguration):
        self.client = client
        self.configuration = configuration

    async def _execute_llm(
        self, input: EmbeddingInput, **kwargs: Unpack[LLMInput]
    ) -> EmbeddingOutput | None:
        args = {
            "model": self.configuration.model,
            **(kwargs.get("model_parameters") or {}),
        }
        '''
        embedding = await self.client.embeddings.create(
            input=input,
            **args,
        )
        return [d.embedding for d in embedding.data]
        '''
        embedding_list = []
        for inp in input:
            embedding = ollama.embedding(model="nomic-embed-text",prompt=inp)
            embedding_list.append(embedding["embedding"])
        return embedding_list

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34

上面注释部分为官方原始代码,增加的代码是:

        embedding_list = []
        for inp in input:
            embedding = ollama.embedding(model="nomic-embed-text",prompt=inp)
            embedding_list.append(embedding["embedding"])
        return embedding_list
  • 1
  • 2
  • 3
  • 4
  • 5
  1. 修改文件:D:\ProgramData\miniconda3\envs\graphRAG\Lib\site-packages\graphrag\query\llm\oai\embedding.py, 调用ollama提供的模型服务, 代码位置在:
import ollama
#.....

embedding = ollama.embeddings(model='nomic-embed-text', prompt=chunk)['embedding']
  • 1
  • 2
  • 3
  • 4

在这里插入图片描述
上面注释的是官方代码,箭头指向的是要新增的代码。

  1. 修改文件:D:\ProgramData\miniconda3\envs\graphRAG\Lib\site-packages\graphrag\query\llm\text_utils.py里关于chunk_text()函数的定义:
def chunk_text(
    text: str, max_tokens: int, token_encoder: tiktoken.Encoding | None = None
):
    """Chunk text by token length."""
    if token_encoder is None:
        token_encoder = tiktoken.get_encoding("cl100k_base")
    tokens = token_encoder.encode(text)  # type: ignore
    tokens = token_encoder.decode(tokens) # 将tokens解码成字符串

    chunk_iterator = batched(iter(tokens), max_tokens)
    yield from chunk_iterator
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

增加的语句是:

tokens = token_encoder.decode(tokens) # 将tokens解码成字符串
  • 1

这里应该是GraphRAG官方代码里的bug,开发人员忘记将分词后的token解码成字符串,导致在后续Embedding处理过程中会报错:ZeroDivisionError: Weights sum to zero, can't be normalized

(graphrag) D:\Learn\GraphRAG>python -m graphrag.query --root ./newTest12 --method local "谁是叶文洁"


INFO: Reading settings from newTest12\settings.yaml
creating llm client with {'api_key': 'REDACTED,len=6', 'type': "openai_chat", 'model': 'qwen2', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1/', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
creating embedding llm client with {'api_key': 'REDACTED,len=9', 'type': "openai_embedding", 'model': 'nomic-ai/nomic-embed-text-v1.5/nomic-embed-text-v1.5.Q8_0.gguf', 'max_tokens': 4000, 'temperature': 0, 'top_p': 1, 'n': 1, 'request_timeout': 180.0, 'api_base': 'http://localhost:1234/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 1}
Error embedding chunk {'OpenAIEmbedding': 'Error code: 400 - {\'error\': "\'input\' field must be a string or an array of strings"}'}
Traceback (most recent call last):
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\query\__main__.py", line 76, in <module>
    run_local_search(
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\query\cli.py", line 153, in run_local_search
    result = search_engine.search(query=query)
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\query\structured_search\local_search\search.py", line 118, in search
    context_text, context_records = self.context_builder.build_context(
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\query\structured_search\local_search\mixed_context.py", line 139, in build_context
    selected_entities = map_query_to_entities(
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 55, in map_query_to_entities
    search_results = text_embedding_vectorstore.similarity_search_by_text(
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\vector_stores\lancedb.py", line 118, in similarity_search_by_text
    query_embedding = text_embedder(text)
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 57, in <lambda>
    text_embedder=lambda t: text_embedder.embed(t),
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\query\llm\oai\embedding.py", line 96, in embed
    chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\numpy\lib\function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  1. 开始Indexing处理:
python -m graphrag.index --root ./ragtest
  • 1

运行效果:

(graphrag) D:\Learn\GraphRAG>python -m graphrag.index --root ./newTest12
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/996696
推荐阅读
相关标签