当前位置:   article > 正文

win10部署本地大模型langchain+ollama_win10如何运行大模型

win10如何运行大模型

一、环境

windows10、Python 3.9.18、langchain==0.1.9

二、ollama下载

Download Ollama on Windows

0.1.33版本链接icon-default.png?t=N7T8https://objects.githubusercontent.com/github-production-release-asset-2e65be/658928958/35e38c8d-b7f6-48ed-8a9c-f053d04b01a9?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240503%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240503T004753Z&X-Amz-Expires=300&X-Amz-Signature=ead8e1666fde6b2f23c86dec1c46ef2759fa6f05f60de5a506103db53e03478f&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=658928958&response-content-disposition=attachment%3B%20filename%3DOllamaSetup.exe&response-content-type=application%2Foctet-stream

三、ollama安装及迁移到其他盘

1、ollama安装及模型下载

直接运行下载的exe文件即可。安装完成后win+r打开cmd,输入ollama,有显示以下的内容就是安装成功

2、模型下载

详细见library,这里执行ollama run qwen:7b下载了阿里的,4.2G

3、默认位置

ollama及其下载的模型默认在c盘,会占用较大的空间。

ollama默认安装位置在C:\Users\XX\AppData\Local\Programs\Ollama

下载的模型默然安装在C:\Users\XX\.ollama

4、迁移操作

(1)将C:\Users\XX\AppData\Local\Programs\Ollama这个文件夹移动到其他盘(D:\Ollama)

(2)修改环境变量的用户变量,将PATH变量中的C:\Users\XX\AppData\Local\Programs\Ollama修改为步骤(1)的位置

(3)在系统变量中新建一个OLLAMA_MODELS的变量,位置根据其他盘的存储空间去设置,比如在ollama的文件夹下(D:\Ollama\models)

(4)将以下2个文件迁移到新目录

四、ollama的独立使用

1、打印版本号:ollama -v

2、打印已下载的模型:ollama list

3、启动模型:ollama run qwen:7b;(没有的话会先下载)

4、退出会话:crtl+d

5、关闭ollama:任务栏小图标quit

五、ollama+langchain搭建应用服务

1、启动ollama服务

在cmd执行ollama serve,默认占用端口:11434

2、langchain中的封装

miniconda3\envs\py39\Lib\site-packages\langchain_community\llms\ollama.py

  1. class _OllamaCommon(BaseLanguageModel):
  2. base_url: str = "http://localhost:11434"
  3. """Base url the model is hosted under."""
  4. model: str = "llama2"
  5. """Model name to use."""
  6. mirostat: Optional[int] = None
  7. """Enable Mirostat sampling for controlling perplexity.
  8. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)"""
  9. mirostat_eta: Optional[float] = None
  10. """Influences how quickly the algorithm responds to feedback
  11. from the generated text. A lower learning rate will result in
  12. slower adjustments, while a higher learning rate will make
  13. the algorithm more responsive. (Default: 0.1)"""
  14. mirostat_tau: Optional[float] = None
  15. """Controls the balance between coherence and diversity
  16. of the output. A lower value will result in more focused and
  17. coherent text. (Default: 5.0)"""
  18. num_ctx: Optional[int] = None
  19. """Sets the size of the context window used to generate the
  20. next token. (Default: 2048) """
  21. num_gpu: Optional[int] = None
  22. """The number of GPUs to use. On macOS it defaults to 1 to
  23. enable metal support, 0 to disable."""
  24. num_thread: Optional[int] = None
  25. """Sets the number of threads to use during computation.
  26. By default, Ollama will detect this for optimal performance.
  27. It is recommended to set this value to the number of physical
  28. CPU cores your system has (as opposed to the logical number of cores)."""
  29. num_predict: Optional[int] = None
  30. """Maximum number of tokens to predict when generating text.
  31. (Default: 128, -1 = infinite generation, -2 = fill context)"""
  32. repeat_last_n: Optional[int] = None
  33. """Sets how far back for the model to look back to prevent
  34. repetition. (Default: 64, 0 = disabled, -1 = num_ctx)"""
  35. repeat_penalty: Optional[float] = None
  36. """Sets how strongly to penalize repetitions. A higher value (e.g., 1.5)
  37. will penalize repetitions more strongly, while a lower value (e.g., 0.9)
  38. will be more lenient. (Default: 1.1)"""
  39. temperature: Optional[float] = None
  40. """The temperature of the model. Increasing the temperature will
  41. make the model answer more creatively. (Default: 0.8)"""
  42. stop: Optional[List[str]] = None
  43. """Sets the stop tokens to use."""
  44. tfs_z: Optional[float] = None
  45. """Tail free sampling is used to reduce the impact of less probable
  46. tokens from the output. A higher value (e.g., 2.0) will reduce the
  47. impact more, while a value of 1.0 disables this setting. (default: 1)"""
  48. top_k: Optional[int] = None
  49. """Reduces the probability of generating nonsense. A higher value (e.g. 100)
  50. will give more diverse answers, while a lower value (e.g. 10)
  51. will be more conservative. (Default: 40)"""
  52. top_p: Optional[float] = None
  53. """Works together with top-k. A higher value (e.g., 0.95) will lead
  54. to more diverse text, while a lower value (e.g., 0.5) will
  55. generate more focused and conservative text. (Default: 0.9)"""
  56. system: Optional[str] = None
  57. """system prompt (overrides what is defined in the Modelfile)"""
  58. template: Optional[str] = None
  59. """full prompt or prompt template (overrides what is defined in the Modelfile)"""
  60. format: Optional[str] = None
  61. """Specify the format of the output (e.g., json)"""
  62. timeout: Optional[int] = None
  63. """Timeout for the request stream"""
  64. headers: Optional[dict] = None
  65. """Additional headers to pass to endpoint (e.g. Authorization, Referer).
  66. This is useful when Ollama is hosted on cloud services that require
  67. tokens for authentication.
  68. """

3、示例

(1)基于给定的信息回答用户的问题

这里我们给出的信息是【"小明是一位科学家", "小明在balala地区工作"】,要求大模型介绍小明

(2)“阅读”一份文档后回答用户关于文档内容的问题

  1. from langchain_community.document_loaders import PyPDFLoader
  2. from langchain_community.vectorstores import FAISS
  3. from langchain.text_splitter import RecursiveCharacterTextSplitter
  4. from langchain_core.prompts import ChatPromptTemplate
  5. from langchain.chains.combine_documents import create_stuff_documents_chain
  6. from langchain.chains import create_retrieval_chain
  7. from langchain.chains import create_history_aware_retriever
  8. from langchain_core.prompts import MessagesPlaceholder
  9. from langchain_community.chat_models import ChatOllama
  10. from langchain_community.embeddings import OllamaEmbeddings
  11. model="qwen:7b"
  12. llm = ChatOllama(model=model, temperature=0)
  13. loader = PyPDFLoader('../file/test.pdf')
  14. docs = loader.load()
  15. embeddings = OllamaEmbeddings(model=model)
  16. text_splitter = RecursiveCharacterTextSplitter()
  17. documents = text_splitter.split_documents(docs)[:10]#解析前10页
  18. vector = FAISS.from_documents(documents, embeddings)
  19. # vector = FAISS.from_texts(["小明是一位科学家", "小明在balala地区工作"],embeddings)
  20. retriever = vector.as_retriever()
  21. prompt1 = ChatPromptTemplate.from_messages([
  22. MessagesPlaceholder(variable_name="chat_history"),
  23. ("user", "{input}"),
  24. ("user", "在给定上述对话的情况下,生成一个要查找的搜索查询,以获取与对话相关的信息")
  25. ])
  26. retriever_chain = create_history_aware_retriever(llm, retriever, prompt1)
  27. prompt2 = ChatPromptTemplate.from_messages([
  28. ("system", "根据文章内容回答问题:{context}"),
  29. MessagesPlaceholder(variable_name="chat_history"),
  30. ("user", "{input}"),
  31. ])
  32. document_chain = create_stuff_documents_chain(llm, prompt2)
  33. retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)
  34. chat_history = []
  35. while True:
  36. question = input('用户:')
  37. response = retrieval_chain.invoke({
  38. "chat_history": chat_history,
  39. "input": question
  40. })
  41. answer = response["answer"]
  42. chat_history.extend([question, answer])
  43. print('AI:', answer)

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/花生_TL007/article/detail/671780
推荐阅读
相关标签
  

闽ICP备14008679号