当前位置:   article > 正文

【使用 langchain 创建RAG知识库完整教程】_langchain rag 知识库

langchain rag 知识库

RAG

Let’s look at adding in a retrieval step to a prompt and LLM, which adds up to a “retrieval-augmented generation” chain

安装对应的库

!pip install langchain openai faiss-cpu tiktoken

 导入工具包

  1. from operator import itemgetter
  2. from langchain.chat_models import ChatOpenAI
  3. from langchain.embeddings import OpenAIEmbeddings
  4. from langchain.prompts import ChatPromptTemplate
  5. from langchain.vectorstores import FAISS
  6. from langchain_core.output_parsers import StrOutputParser
  7. from langchain_core.runnables import RunnableLambda, RunnablePassthrough

初始化一些东西(向量库,模板,模型LLM)

  1. vectorstore = FAISS.from_texts(
  2. ["harrison worked at kensho"], embedding=OpenAIEmbeddings()
  3. )
  4. retriever = vectorstore.as_retriever()
  5. template = """Answer the question based only on the following context:
  6. {context}
  7. Question: {question}
  8. """
  9. prompt = ChatPromptTemplate.from_template(template)
  10. model = ChatOpenAI()

构建chain, 使用LCEL技术

  1. chain = (
  2. {"context": retriever, "question": RunnablePassthrough()}
  3. | prompt
  4. | model
  5. | StrOutputParser()
  6. )

开始使用

  1. chain.invoke("where did harrison work?")
  2. # outout
  3. 'Harrison worked at Kensho.'
'Harrison worked at Kensho.'
  1. template = """Answer the question based only on the following context:
  2. {context}
  3. Question: {question}
  4. Answer in the following language: {language}
  5. """
  6. prompt = ChatPromptTemplate.from_template(template)
  7. chain = (
  8. {
  9. "context": itemgetter("question") | retriever,
  10. "question": itemgetter("question"),
  11. "language": itemgetter("language"),
  12. }
  13. | prompt
  14. | model
  15. | StrOutputParser()
  16. )
chain.invoke({"question": "where did harrison work", "language": "italian"})
'Harrison ha lavorato a Kensho.'

Conversational Retrieval Chain

We can easily add in conversation history. This primarily means adding in chat_message_history

添加记录对话历史的能力

  1. from langchain.schema import format_document
  2. from langchain_core.messages import AIMessage, HumanMessage, get_buffer_string
  3. from langchain_core.runnables import RunnableParallel
  1. from langchain.prompts.prompt import PromptTemplate
  2. _template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.
  3. Chat History:
  4. {chat_history}
  5. Follow Up Input: {question}
  6. Standalone question:"""
  7. CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)
  1. template = """Answer the question based only on the following context:
  2. {context}
  3. Question: {question}
  4. """
  5. ANSWER_PROMPT = ChatPromptTemplate.from_template(template)
  1. DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(template="{page_content}")
  2. def _combine_documents(
  3. docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"
  4. ):
  5. doc_strings = [format_document(doc, document_prompt) for doc in docs]
  6. return document_separator.join(doc_strings)
  1. _inputs = RunnableParallel(
  2. standalone_question=RunnablePassthrough.assign(
  3. chat_history=lambda x: get_buffer_string(x["chat_history"])
  4. )
  5. | CONDENSE_QUESTION_PROMPT
  6. | ChatOpenAI(temperature=0)
  7. | StrOutputParser(),
  8. )
  9. _context = {
  10. "context": itemgetter("standalone_question") | retriever | _combine_documents,
  11. "question": lambda x: x["standalone_question"],
  12. }
  13. conversational_qa_chain = _inputs | _context | ANSWER_PROMPT | ChatOpenAI()
  1. conversational_qa_chain.invoke(
  2. {
  3. "question": "where did harrison work?",
  4. "chat_history": [],
  5. }
  6. )
AIMessage(content='Harrison was employed at Kensho.')
  1. conversational_qa_chain.invoke(
  2. {
  3. "question": "where did he work?",
  4. "chat_history": [
  5. HumanMessage(content="Who wrote this notebook?"),
  6. AIMessage(content="Harrison"),
  7. ],
  8. }
  9. )
AIMessage(content='Harrison worked at Kensho.')

With Memory and returning source documents

This shows how to use memory with the above. For memory, we need to manage that outside at the memory. For returning the retrieved documents, we just need to pass them through all the way.

添加返回检索到的文档的能力

  1. from operator import itemgetter
  2. from langchain.memory import ConversationBufferMemory
  1. memory = ConversationBufferMemory(
  2. return_messages=True, output_key="answer", input_key="question"
  3. )
  1. # First we add a step to load memory
  2. # This adds a "memory" key to the input object
  3. loaded_memory = RunnablePassthrough.assign(
  4. chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter("history"),
  5. )
  6. # Now we calculate the standalone question
  7. standalone_question = {
  8. "standalone_question": {
  9. "question": lambda x: x["question"],
  10. "chat_history": lambda x: get_buffer_string(x["chat_history"]),
  11. }
  12. | CONDENSE_QUESTION_PROMPT
  13. | ChatOpenAI(temperature=0)
  14. | StrOutputParser(),
  15. }
  16. # Now we retrieve the documents
  17. retrieved_documents = {
  18. "docs": itemgetter("standalone_question") | retriever,
  19. "question": lambda x: x["standalone_question"],
  20. }
  21. # Now we construct the inputs for the final prompt
  22. final_inputs = {
  23. "context": lambda x: _combine_documents(x["docs"]),
  24. "question": itemgetter("question"),
  25. }
  26. # And finally, we do the part that returns the answers
  27. answer = {
  28. "answer": final_inputs | ANSWER_PROMPT | ChatOpenAI(),
  29. "docs": itemgetter("docs"),
  30. }
  31. # And now we put it all together!
  32. final_chain = loaded_memory | standalone_question | retrieved_documents | answer
  1. inputs = {"question": "where did harrison work?"}
  2. result = final_chain.invoke(inputs)
  3. result
  1. {'answer': AIMessage(content='Harrison was employed at Kensho.'),
  2. 'docs': [Document(page_content='harrison worked at kensho')]}

 

  1. # Note that the memory does not save automatically
  2. # This will be improved in the future
  3. # For now you need to save it yourself
  4. memory.save_context(inputs, {"answer": result["answer"].content})
memory.load_memory_variables({})
  1. {'history': [HumanMessage(content='where did harrison work?'),
  2. AIMessage(content='Harrison was employed at Kensho.')]}

 

  1. inputs = {"question": "but where did he really work?"}
  2. result = final_chain.invoke(inputs)
  3. result
  1. {'answer': AIMessage(content='Harrison actually worked at Kensho.'),
  2. 'docs': [Document(page_content='harrison worked at kensho')]}
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/菜鸟追梦旅行/article/detail/520810
推荐阅读
相关标签
  

闽ICP备14008679号