当前位置:   article > 正文

GraphRAG本地部署(Xinference本地模型)+ neo4j可视化_graphrag支持xinference

graphrag支持xinference

本文配置:

模型供应商可以选择ollama,Xinference等供应商,也可以用我们本地api封装的大模型

一、拉取graphrag框架

graphrag我们可快速开始,无需下载官方项目

pip install graphrag

直接运行即可

然后我们只需要在本地创建一个文件夹

  1. mkdir my_graphrag
  2. cd my_graphrag
  3. python -m graphrag.index --init --root ./ragtest

运行成功后目录如下:

接下来我们可以看到有一个yaml文件,打开他可以直接修改配置,注意我们用的是Xinference,所以切记首字母大写,很多博主都是小写,这或许是个坑,这里谨慎一点

 我们只修改这俩部分即可,其他的按个人需求定制即可

接下来我们将一篇txt文件放入input中即可:(我们不用官方的,直接创建文件夹即可)

mkdir input

接下来我们只需要索引即可:

python -m graphrag.index --root ./ragtest

拉取实体可能会慢一些,这个跟网络有关系,所以需要等一会直到看见以下照片,运行正常后:

如果出现报错 create_base_entity_graph 需要修改源码文件

①anaconda->envs->bzp_graphrag(你自己取的环境名字)

②/lib/python3.11/site-packages/graphrag/llm/openai/openai_chat_llm.py

 我们寻找到class OpenAIChatLLM这个类下面的_invoke_json()函数,大概在第60行左右。

然后再定位到是这个出问题。这个代码表达的是如果参数kwargs.get("is_response_valid")可以找到到就采用它,找不到就默认用(lambda _x: True)。我们不使用openai就会导致,每次既能get到kwargs.get("is_response_valid"),又得到kwargs.get("is_response_valid")值是false的。

is_response_valid = kwargs.get("is_response_valid") or (lambda _x: True)


       因此我们就索性,删掉kwargs['is_response_valid']这个选项。我们就在上面这个代码加上如下改动即可,这样每次就会返回lambad等于True

  1. if 'is_response_valid' in kwargs:
  2.     del kwargs['is_response_valid']
  3. is_response_valid = kwargs.get("is_response_valid") or (lambda _x: True)

 修改完后我们重新拉起索引即可,成功后我们就可以和模型对话了

执行global输入

  1. python -m graphrag.query \
  2. --root ./ragtest \
  3. --method global \
  4. "What are the top themes in this story?"

执行local输入

  1. python -m graphrag.query \
  2. --root ./ragtest \
  3. --method local \
  4. "Who is Scrooge, and what are his main relationships?"

这里一般都会正常运行对话就不做演示了。

二、Neo4j导入工作:

我们准备一个脚本把output/xxxx/artifacts/中所有.parquet的文件转换为csv

  1. import os
  2. import pandas as pd
  3. import csv
  4. parquet_dir = 'graphrag_cs_glm/ragtest/output/20240729-141251/artifacts'
  5. csv_dir = 'neo4j-community-4.3.5/import'
  6. def clean_quotes(value):
  7. if isinstance(value,str):
  8. value = value.strip().replace('""','"').replace('"','')
  9. if ',' in value or '"' in value:
  10. value = f'"{value}"'
  11. return value
  12. for file_name in os.listdir(parquet_dir):
  13. if file_name.endswith('.parquet'):
  14. parquet_file = os.path.join(parquet_dir,file_name)
  15. csv_file = os.path.join(csv_dir,file_name.replace('.parquet','.csv'))
  16. df = pd.read_parquet(parquet_file)
  17. for column in df.select_dtypes(include=['object']).columns:
  18. df[column] = df[column].apply(clean_quotes)
  19. df.to_csv(csv_file,index=False,quoting=csv.QUOTE_NONNUMERIC)
  20. print(f'数据{parquet_file} to {csv_file} successfull')
  21. print('All parquet files have been converted to CSV.')

按照需求替换到自己neo4j文件下的路径即可

接下来就是neo4j的工作

我们需要把配置conf文件中的neo4j.conf的input注释去掉

dbms.directories.import=import

接下来我们只需要打开neo4j运行一下命令即可:

  1. // 1. Import Documents
  2. LOAD CSV WITH HEADERS FROM 'file:///create_final_documents.csv' AS row
  3. CREATE (d:Document {
  4. id: row.id,
  5. title: row.title,
  6. raw_content: row.raw_content,
  7. text_unit_ids: row.text_unit_ids
  8. });
  9. // 2. Import Text Units
  10. LOAD CSV WITH HEADERS FROM 'file:///create_final_text_units.csv' AS row
  11. CREATE (t:TextUnit {
  12. id: row.id,
  13. text: row.text,
  14. n_tokens: toFloat(row.n_tokens),
  15. document_ids: row.document_ids,
  16. entity_ids: row.entity_ids,
  17. relationship_ids: row.relationship_ids
  18. });
  19. // 3. Import Entities
  20. LOAD CSV WITH HEADERS FROM 'file:///create_final_entities.csv' AS row
  21. CREATE (e:Entity {
  22. id: row.id,
  23. name: row.name,
  24. type: row.type,
  25. description: row.description,
  26. human_readable_id: toInteger(row.human_readable_id),
  27. text_unit_ids: row.text_unit_ids
  28. });
  29. // 4. Import Relationships
  30. LOAD CSV WITH HEADERS FROM 'file:///create_final_relationships.csv' AS row
  31. CREATE (r:Relationship {
  32. source: row.source,
  33. target: row.target,
  34. weight: toFloat(row.weight),
  35. description: row.description,
  36. id: row.id,
  37. human_readable_id: row.human_readable_id,
  38. source_degree: toInteger(row.source_degree),
  39. target_degree: toInteger(row.target_degree),
  40. rank: toInteger(row.rank),
  41. text_unit_ids: row.text_unit_ids
  42. });
  43. // 5. Import Nodes
  44. LOAD CSV WITH HEADERS FROM 'file:///create_final_nodes.csv' AS row
  45. CREATE (n:Node {
  46. id: row.id,
  47. level: toInteger(row.level),
  48. title: row.title,
  49. type: row.type,
  50. description: row.description,
  51. source_id: row.source_id,
  52. community: row.community,
  53. degree: toInteger(row.degree),
  54. human_readable_id: toInteger(row.human_readable_id),
  55. size: toInteger(row.size),
  56. entity_type: row.entity_type,
  57. top_level_node_id: row.top_level_node_id,
  58. x: toInteger(row.x),
  59. y: toInteger(row.y)
  60. });
  61. // 6. Import Communities
  62. LOAD CSV WITH HEADERS FROM 'file:///create_final_communities.csv' AS row
  63. CREATE (c:Community {
  64. id: row.id,
  65. title: row.title,
  66. level: toInteger(row.level),
  67. raw_community: row.raw_community,
  68. relationship_ids: row.relationship_ids,
  69. text_unit_ids: row.text_unit_ids
  70. });
  71. // 7. Import Community Reports
  72. LOAD CSV WITH HEADERS FROM 'file:///create_final_community_reports.csv' AS row
  73. CREATE (cr:CommunityReport {
  74. id: row.id,
  75. community: row.community,
  76. full_content: row.full_content,
  77. level: toInteger(row.level),
  78. rank: toFloat(row.rank),
  79. title: row.title,
  80. rank_explanation: row.rank_explanation,
  81. summary: row.summary,
  82. findings: row.findings,
  83. full_content_json: row.full_content_json
  84. });
  85. // 8. Create indexes for better performance
  86. CREATE INDEX FOR (d:Document) ON (d.id);
  87. CREATE INDEX FOR (t:TextUnit) ON (t.id);
  88. CREATE INDEX FOR (e:Entity) ON (e.id);
  89. CREATE INDEX FOR (r:Relationship) ON (r.id);
  90. CREATE INDEX FOR (n:Node) ON (n.id);
  91. CREATE INDEX FOR (c:Community) ON (c.id);
  92. CREATE INDEX FOR (cr:CommunityReport) ON (cr.id);
  93. // 9. Create relationships after all nodes are imported
  94. MATCH (d:Document)
  95. UNWIND split(d.text_unit_ids, ',') AS textUnitId
  96. MATCH (t:TextUnit {id: trim(textUnitId)})
  97. CREATE (d)-[:HAS_TEXT_UNIT]->(t);
  98. MATCH (t:TextUnit)
  99. UNWIND split(t.entity_ids, ',') AS entityId
  100. MATCH (e:Entity {id: trim(entityId)})
  101. CREATE (t)-[:HAS_ENTITY]->(e);
  102. MATCH (t:TextUnit)
  103. UNWIND split(t.relationship_ids, ',') AS relId
  104. MATCH (r:Relationship {id: trim(relId)})
  105. CREATE (t)-[:HAS_RELATIONSHIP]->(r);
  106. MATCH (e:Entity)
  107. UNWIND split(e.text_unit_ids, ',') AS textUnitId
  108. MATCH (t:TextUnit {id: trim(textUnitId)})
  109. CREATE (e)-[:MENTIONED_IN]->(t);
  110. MATCH (r:Relationship)
  111. MATCH (source:Entity {name: r.source})
  112. MATCH (target:Entity {name: r.target})
  113. CREATE (source)-[:RELATES_TO]->(target);
  114. MATCH (r:Relationship)
  115. UNWIND split(r.text_unit_ids, ',') AS textUnitId
  116. MATCH (t:TextUnit {id: trim(textUnitId)})
  117. CREATE (r)-[:MENTIONED_IN]->(t);
  118. MATCH (c:Community)
  119. UNWIND split(c.relationship_ids, ',') AS relId
  120. MATCH (r:Relationship {id: trim(relId)})
  121. CREATE (c)-[:HAS_RELATIONSHIP]->(r);
  122. MATCH (c:Community)
  123. UNWIND split(c.text_unit_ids, ',') AS textUnitId
  124. MATCH (t:TextUnit {id: trim(textUnitId)})
  125. CREATE (c)-[:HAS_TEXT_UNIT]->(t);
  126. MATCH (cr:CommunityReport)
  127. MATCH (c:Community {id: cr.community})
  128. CREATE (cr)-[:REPORTS_ON]->(c);

也可以让大模型帮你生成一个,将这段代码放入neo4j的页面中运行即可:

 

这里我已经运行过了就不重复运行了,如果出现id错误,可以删除第八个字段重新运行即可

 

到这里就结束了,近期也在尝试修改prompt的调整,欢迎大家一起讨论。 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/空白诗007/article/detail/1000826
推荐阅读
相关标签
  

闽ICP备14008679号