赞
踩
一些文章/教程
Available for macOS, Linux, and Windows (preview)
windows 和 macOS 上都可以通过网页下载:
https://ollama.com/download
linux 通过命令行来下载
curl -fsSL https://ollama.com/install.sh | sh
你也可以使用 Docker 运行
https://hub.docker.com/r/ollama/ollama
这里以 macOS 为例
下载得到 Ollama-darwin.zip
文件,解压后是 Ollama.app
应用包,我将应用拖拽到 应用程序文件夹,以便后续通过启动台访问。
双击打开应用,会提示安装 命令行工具,需要输入管理员密码。
安装命令行工具成功后,会提示 运行下面命令
ollama run llama2
执行后,将先下载模型,然后执行;需要注意你的 RAM 大小,和模型参数量是否匹配
这里列出了支持的模型 https://ollama.com/library
目前主要有以下:
Model | Parameters | Size | Download |
---|---|---|---|
Llama 2 | 7B | 3.8GB | ollama run llama2 |
Mistral | 7B | 4.1GB | ollama run mistral |
Dolphin Phi | 2.7B | 1.6GB | ollama run dolphin-phi |
Phi-2 | 2.7B | 1.7GB | ollama run phi |
Neural Chat | 7B | 4.1GB | ollama run neural-chat |
Starling | 7B | 4.1GB | ollama run starling-lm |
Code Llama | 7B | 3.8GB | ollama run codellama |
Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
Llama 2 13B | 13B | 7.3GB | ollama run llama2:13b |
Llama 2 70B | 70B | 39GB | ollama run llama2:70b |
Orca Mini | 3B | 1.9GB | ollama run orca-mini |
Vicuna | 7B | 3.8GB | ollama run vicuna |
LLaVA | 7B | 4.5GB | ollama run llava |
Gemma | 2B | 1.4GB | ollama run gemma:2b |
Gemma | 7B | 4.8GB | ollama run gemma:7b |
你需要有至少 8 GB RAM 来运行 7B 的模型, 16 GB 来运行 13B models, 32 GB 来运行 33B。
Ollama 在 Modelfile 支持导入 GGUF models :
1、使用 FROM
指令 创建一个名为 Modelfile
的文件,其中包含你想导入模型的本地文件地址
FROM ./vicuna-33b.Q4_0.gguf
2、在 Ollama 中创建模型
ollama create example -f Modelfile
3、运行 model
ollama run example
查看 guide 来看更多导入模型的信息。
Ollama library 中的模型,可以使用一个 prompt 自定义。一下示例自定义 llama2
模型:
ollama pull llama2
创建一个 Modelfile
:
FROM llama2
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""
接下来,创建和运行模型
ollama create mario -f ./Modelfile
ollama run mario
>>> hi
Hello! It's your friend Mario.
更多例子可参考 examples 文件夹。
更多使用 Modelfile 的信息,可参考 Modelfile 文档。
ollama create
用来从 Modelfile 创建一个模型。
ollama create mymodel -f ./Modelfile
ollama pull llama2
这个命令也可以被用来更新一个本地模型;只有 diff 会被拉取。
ollama rm llama2
ollama cp llama2 my-llama2
你可以使用 """
包裹多行输入文本
>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.
>>> What's in this image? /Users/jmorgan/Desktop/smile.png
The image features a yellow smiley face, which is likely the central focus of the picture.
$ ollama run llama2 "Summarize this file: $(cat README.md)"
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
ollama list
ollama serve
is used when you want to start ollama without running the desktop application.
安装 cmake
和 go
:
brew install cmake go
将产生依赖 dependencies
go generate ./...
构建执行binary
go build .
更多信息知道可以在 developer guide 中查看。
接下来,启动服务
./ollama serve
最后,在另一个shell 中,运行模型:
./ollama run llama2
Ollama 有 REST API 来运行和管理模型.
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Why is the sky blue?"
}'
curl http://localhost:11434/api/chat -d '{
"model": "mistral",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
要求: Python 3.8+
集成非常简单
https://github.com/ollama/ollama
pip install ollama
import ollama
response = ollama.chat(model='llama2', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
print(response['message']['content'])
设置 stream=True
可以使用流式响应;调整函数调用,返回 Python 生成器。
import ollama
stream = ollama.chat(
model='llama2',
messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
stream=True,
)
for chunk in stream:
print(chunk['message']['content'], end='', flush=True)
Ollama Python 库围绕 Ollama REST API 设计
ollama.chat(model='llama2', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
ollama.generate(model='llama2', prompt='Why is the sky blue?')
ollama.list()
ollama.show('llama2')
modelfile='''
FROM llama2
SYSTEM You are mario from super mario bros.
'''
ollama.create(model='example', modelfile=modelfile)
ollama.copy('llama2', 'user/llama2')
ollama.delete('llama2')
ollama.pull('llama2')
ollama.push('user/llama2')
ollama.embeddings(model='llama2', prompt='They sky is blue because of rayleigh scattering')
A custom client can be created with the following fields:
host
: The Ollama host to connect totimeout
: The timeout for requestsfrom ollama import Client
client = Client(host='http://localhost:11434')
response = client.chat(model='llama2', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
import asyncio
from ollama import AsyncClient
async def chat():
message = {'role': 'user', 'content': 'Why is the sky blue?'}
response = await AsyncClient().chat(model='llama2', messages=[message])
asyncio.run(chat())
Setting stream=True
modifies functions to return a Python asynchronous generator:
import asyncio
from ollama import AsyncClient
async def chat():
message = {'role': 'user', 'content': 'Why is the sky blue?'}
async for part in await AsyncClient().chat(model='llama2', messages=[message], stream=True):
print(part['message']['content'], end='', flush=True)
asyncio.run(chat())
Errors are raised if requests return an error status or if an error is detected while streaming.
model = 'does-not-yet-exist'
try:
ollama.chat(model)
except ollama.ResponseError as e:
print('Error:', e.error)
if e.status_code == 404:
ollama.pull(model)
伊织 2024-03-05(二)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。