花生_TL007

这个屌丝很懒，什么也没留下！

热门标签

Ollama_ollama导入自定义模型

作者：花生_TL007 | 2024-06-06 06:37:18

踩

ollama导入自定义模型

请添加图片描述

官网：https://ollama.com
github : https://github.com/ollama
blogs : https://ollama.com/blog
ollama-python : https://github.com/ollama/ollama-python

一些文章/教程

五里墩茶社 : 基于Ollama实现100%本地化RAG应用 - ChatOllama
https://www.bilibili.com/video/BV1Zz421X75P/
https://space.bilibili.com/615957867/search/video?keyword=ollama
用 Ollama 轻松玩转本地大模型
https://sspai.com/post/85193

安装使用 Ollama 客户端

Available for macOS, Linux, and Windows (preview)

windows 和 macOS 上都可以通过网页下载：
https://ollama.com/download

linux 通过命令行来下载

curl -fsSL https://ollama.com/install.sh | sh
1

你也可以使用 Docker 运行
https://hub.docker.com/r/ollama/ollama

这里以 macOS 为例
下载得到 Ollama-darwin.zip 文件，解压后是 Ollama.app 应用包，我将应用拖拽到应用程序文件夹，以便后续通过启动台访问。

双击打开应用，会提示安装命令行工具，需要输入管理员密码。

在这里插入图片描述

安装命令行工具成功后，会提示运行下面命令

ollama run llama2
1

执行后，将先下载模型，然后执行；需要注意你的 RAM 大小，和模型参数量是否匹配

在这里插入图片描述

Model library

这里列出了支持的模型 https://ollama.com/library
目前主要有以下：

Model	Parameters	Size	Download
Llama 2	7B	3.8GB	`ollama run llama2`
Mistral	7B	4.1GB	`ollama run mistral`
Dolphin Phi	2.7B	1.6GB	`ollama run dolphin-phi`
Phi-2	2.7B	1.7GB	`ollama run phi`
Neural Chat	7B	4.1GB	`ollama run neural-chat`
Starling	7B	4.1GB	`ollama run starling-lm`
Code Llama	7B	3.8GB	`ollama run codellama`
Llama 2 Uncensored	7B	3.8GB	`ollama run llama2-uncensored`
Llama 2 13B	13B	7.3GB	`ollama run llama2:13b`
Llama 2 70B	70B	39GB	`ollama run llama2:70b`
Orca Mini	3B	1.9GB	`ollama run orca-mini`
Vicuna	7B	3.8GB	`ollama run vicuna`
LLaVA	7B	4.5GB	`ollama run llava`
Gemma	2B	1.4GB	`ollama run gemma:2b`
Gemma	7B	4.8GB	`ollama run gemma:7b`

你需要有至少 8 GB RAM 来运行 7B 的模型， 16 GB 来运行 13B models， 32 GB 来运行 33B。

自定义一个模型

从 GGUF 导入

Ollama 在 Modelfile 支持导入 GGUF models :

1、使用 FROM 指令创建一个名为 Modelfile 的文件，其中包含你想导入模型的本地文件地址

FROM ./vicuna-33b.Q4_0.gguf
1

2、在 Ollama 中创建模型

ollama create example -f Modelfile
1

3、运行 model

ollama run example
1

从 PyTorch 或 Safetensors 导入

查看 guide 来看更多导入模型的信息。

自定义一个 prompt

Ollama library 中的模型，可以使用一个 prompt 自定义。一下示例自定义 llama2 模型：

ollama pull llama2
1

创建一个 Modelfile:

FROM llama2

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""
1
2
3
4
5
6
7
8
9

接下来，创建和运行模型

ollama create mario -f ./Modelfile
ollama run mario
>>> hi
Hello! It's your friend Mario.
1
2
3
4

更多例子可参考 examples 文件夹。
更多使用 Modelfile 的信息，可参考 Modelfile 文档。

CLI Reference

创建一个模型

ollama create 用来从 Modelfile 创建一个模型。

ollama create mymodel -f ./Modelfile
1

拉取 model

ollama pull llama2
1

这个命令也可以被用来更新一个本地模型；只有 diff 会被拉取。

删除一个 model

ollama rm llama2
1

复制模型

ollama cp llama2 my-llama2
1

多行输入

你可以使用 """ 包裹多行输入文本

>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.
1
2
3
4

多模态模型

>>> What's in this image? /Users/jmorgan/Desktop/smile.png
The image features a yellow smiley face, which is likely the central focus of the picture.
1
2

传递prompt 作为参数

$ ollama run llama2 "Summarize this file: $(cat README.md)"
 Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
1
2

列出你电脑上的模型

ollama list
1

开始 Ollama

ollama serve is used when you want to start ollama without running the desktop application.

构建

安装 cmake 和 go:

brew install cmake go
1

将产生依赖 dependencies

go generate ./...
1

构建执行binary

go build .
1

更多信息知道可以在 developer guide 中查看。

运行本地 builds

接下来，启动服务

./ollama serve
1

最后，在另一个shell 中，运行模型：

./ollama run llama2
1

REST API

Ollama 有 REST API 来运行和管理模型.

生成响应

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
}'
1
2
3
4

和一个模型对话

curl http://localhost:11434/api/chat -d '{
  "model": "mistral",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'
1
2
3
4
5
6

查看 API documentation

使用 Ollama Python Library

要求： Python 3.8+
集成非常简单
https://github.com/ollama/ollama

安装

pip install ollama
1

使用

import ollama
response = ollama.chat(model='llama2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])
1
2
3
4
5
6
7
8

流式响应

设置 stream=True 可以使用流式响应；调整函数调用，返回 Python 生成器。

import ollama

stream = ollama.chat(
    model='llama2',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)
1
2
3
4
5
6
7
8
9
10

REST API

Ollama Python 库围绕 Ollama REST API 设计

Chat

ollama.chat(model='llama2', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
1

Generate

ollama.generate(model='llama2', prompt='Why is the sky blue?')
1

List

ollama.list()
1

Show

ollama.show('llama2')
1

Create

modelfile='''
FROM llama2
SYSTEM You are mario from super mario bros.
'''

ollama.create(model='example', modelfile=modelfile)
1
2
3
4
5
6

Copy

ollama.copy('llama2', 'user/llama2')
1

Delete

ollama.delete('llama2')
1

Pull

ollama.pull('llama2')
1

Push

ollama.push('user/llama2')
1

Embeddings

ollama.embeddings(model='llama2', prompt='They sky is blue because of rayleigh scattering')
1

Custom client

A custom client can be created with the following fields:

host: The Ollama host to connect to
timeout: The timeout for requests

from ollama import Client
client = Client(host='http://localhost:11434')
response = client.chat(model='llama2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
1
2
3
4
5
6
7
8

Async client

import asyncio
from ollama import AsyncClient

async def chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  response = await AsyncClient().chat(model='llama2', messages=[message])

asyncio.run(chat())
1
2
3
4
5
6
7
8

Setting stream=True modifies functions to return a Python asynchronous generator:

import asyncio
from ollama import AsyncClient

async def chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  async for part in await AsyncClient().chat(model='llama2', messages=[message], stream=True):
    print(part['message']['content'], end='', flush=True)

asyncio.run(chat())
1
2
3
4
5
6
7
8
9

Errors

Errors are raised if requests return an error status or if an error is detected while streaming.

model = 'does-not-yet-exist'

try:
  ollama.chat(model)
except ollama.ResponseError as e:
  print('Error:', e.error)
  if e.status_code == 404:
    ollama.pull(model)
1
2
3
4
5
6
7
8

伊织 2024-03-05（二）

本文内容由网友自发贡献，转载请注明出处：【wpsshop博客】

Ollama_ollama导入自定义模型

文章目录

安装使用 Ollama 客户端

Model library

自定义一个模型

从 GGUF 导入

从 PyTorch 或 Safetensors 导入

自定义一个 prompt

CLI Reference

创建一个模型

拉取 model

删除一个 model

复制模型

多行输入

多模态模型

传递prompt 作为参数

列出你电脑上的模型

开始 Ollama

构建

运行本地 builds

REST API

生成响应

和一个模型对话

使用 Ollama Python Library

安装

使用

流式响应

REST API

Chat

Generate

List

Show

Create

Copy

Delete

Pull

Push

Embeddings

Custom client

Async client

Errors