那就让我们一起深入LangChain的世界,用Google Colab搭建一个简单的RAG(检索增强生成)应用。
首先,你得有个Google Colab的账号 https://colab.research.google.com/ ,然后准备好你的数据文件,比如我这里就准备了一个包含以下的信息文件。
Nedved yang likes to eat chicken rice
随后,就是一系列的设置环节,包括导入必要的库、设置OpenAI的API密钥,还得把这个文件放到Google Drive里。
- # Import necessary libraries and modules from langchain and other packages
- from langchain.chains import RetrievalQA, ConversationalRetrievalChain
- from langchain.chat_models import ChatOpenAI
- from langchain.document_loaders import TextLoader
- from langchain.vectorstores import DocArrayInMemorySearch, FAISS
- from langchain.embeddings import OpenAIEmbeddings, HuggingFaceInstructEmbeddings
- from langchain.memory import ConversationBufferMemory
- from langchain.indexes import VectorstoreIndexCreator
- from langchain_experimental.agents.agent_toolkits.csv.base import create_csv_agent
- from langchain.agents.agent_types import AgentType
- import openai
- # For Google Colab users, mount Google Drive to access files
- from google.colab import drive
- drive.mount('/content/drive/')
- import os
- # Request and configure the OpenAI API key for usage
- api_key = input("OpenAI API key: ")
- os.environ["OPENAI_API_KEY"] = api_key
- print("OPENAI_API_KEY has been successfully configured.")
- # Display utilities from IPython for enhanced output formatting
- from IPython.display import display, Markdown
- # Note: This code snippet assumes you're working in a Google Colab environment and requires an OpenAI API key.
- # It includes mounting Google Drive for accessing files and setting up environment variables for OpenAI API access.

- # Split the 'data.txt' file into chunks and create embeddings from those chunks.
- # Ensure to check your OpenAI API quota before proceeding.
- from langchain.text_splitter import CharacterTextSplitter
- from langchain.loading import TextLoader
- from langchain.embeddings import OpenAIEmbeddings
- from langchain.vectorstores import FAISS
- # Define the path to your text file
- text_file_path = '/content/drive/MyDrive/LCTest/data.txt'
- # Load the text data from the specified file path
- text_data_loader = TextLoader(file_path=text_file_path, encoding="utf-8")
- text_data = text_data_loader.load()
- # Initialize the text splitter with specific chunk size and overlap
- splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
- # Split the loaded text data into chunks
- chunked_data = splitter.split_documents(text_data)
- # Initialize the embeddings and vector store for the chunked data
- embedder = OpenAIEmbeddings()
- vector_store = FAISS.from_documents(chunked_data, embedding=embedder)

- # Initialize a conversational chain with a language model for dynamic conversation handling.
- from langchain.llms import ChatOpenAI
- from langchain.memories import ConversationBufferMemory
- from langchain.chains import ConversationalRetrievalChain
- # Set up the language model with specific parameters for conversation generation.
- language_model = ChatOpenAI(temperature=0.1, model_name="gpt-3.5-turbo")
- # Configure a memory buffer to store and retrieve conversation history.
- conversation_memory = ConversationBufferMemory(
- memory_key='chat_history', # Key to identify conversation history in memory.
- return_messages=True # Option to return previous messages in the conversation.
- )
- # Create a conversational retrieval chain that leverages the language model,
- # a specified retriever for information retrieval, and a memory buffer for context.
- conversation_chain = ConversationalRetrievalChain.from_llm(
- llm=language_model,
- chain_type="custom", # Specify the type of conversational chain. "stuff" is replaced with "custom" for clarity.
- retriever=vector_store.as_retriever(), # Use the previously created vector store as the information retriever.
- memory=conversation_memory # Include the conversation memory for context-aware conversations.
- )

- # Formulate a query to find out Nedved Yang's favorite food using the conversational chain.
- query_text = "What is the favorite food for Nedved Yang?"
- # Execute the query through the conversation chain to obtain a response.
- query_response = conversation_chain(query={"question": query_text})
- # Extract the answer from the query response.
- favorite_food_answer = query_response["answer"]
- # Display the obtained answer.
- favorite_food_answer
Nedved Yang likes to eat chicken rice.
- # Request suggestions for places in Singapore where Nedved Yang can make purchases.
- purchase_query = "Can you suggest places in Singapore for Nedved Yang to buy?"
- # Submit the query to the conversational chain and capture the response.
- purchase_response = conversation_chain({"question": purchase_query})
- # Extract the suggested places from the response.
- suggested_places = purchase_response["answer"]
- # Output the list of suggested places.
- suggested_places
Nedved Yang can buy chicken rice, his favorite food, at various hawker centers and food courts in Singapore. Some popular places to try chicken rice include Maxwell Food Centre, Tian Tian Hainanese Chicken Rice at Maxwell Road, and Chinatown Complex Food Centre.
记忆被完美保存下来了,而且那份包含了从data.txt获取信息的提示已经成功地送达给了GPT-3.5 Turbo。看起来超级棒。
