当前位置:   article > 正文

尝试FreeBSD下安装ollama

尝试FreeBSD下安装ollama

Ollama是一个用于在本地运行大型语言模型(LLM)的开源框架。它支持多种操作系统,但是唯独不支持FreeBSD,于是尝试在FreeBSD里编译安装。

先上结论,官网的ollama没有编译成功,使用特供版可以安装成功。因为特供版改了代码,为了安全,最后是在FreeBSD jail里操作的。

在FreeBSD下安装ollama(第一次尝试,失败)

编译环境配置

首先安装最新的go

pkg install go122-1.22.5 cmake

后来发现不行,还是安装了默认的go (原来需要使用go122这条命令来执行)

pkg install go

但是这个版本低啊

下载高版本试试。 下载:https://go.dev/dl/go1.22.5.freebsd-amd64.tar.gz

wget https://go.dev/dl/go1.22.5.freebsd-amd64.tar.gz

 解压缩

tar -xzvf go1.22.5.freebsd-amd64.tar.gz

加上路径

export PATH=/home/skywalk/work/go/bin:$PATH

现在go就是1.22.5版本了

  1. $ go version
  2. go version go1.22.5 freebsd/amd64

加速go

  1. # Set the GOPROXY environment variable
  2. export GOPROXY=https://goproxy.io,direct
  3. # Set environment variable allow bypassing the proxy for specified repos (optional)
  4. export GOPRIVATE=git.mycompany.com,github.com/my/private

编译ollama

从官网下载ollama

git clone https://github.com/ollama/ollama

generate

go generate ./...

build

go build . 

但是这里没有编译成功,最后报错

  1. skywalk@fbhost:~/github/ollama $ go build .
  2. package github.com/ollama/ollama
  3. imports github.com/ollama/ollama/cmd
  4. imports github.com/ollama/ollama/server
  5. imports github.com/ollama/ollama/gpu: C source files not allowed when not using cgo or SWIG: gpu_info_cudart.c gpu_info_nvcuda.c gpu_info_nvml.c gpu_info_oneapi.c

在FreeBSD jail里调试(第二次尝试,失败)

创建一个FreeBSDjail,登录

# cbsd jlogin fb12

登录后是csh,如果不适应,可以改成bash

安装需要的包

# pkg install -y git go122 cmake vulkan-headers vulkan-loader

下载特供版本

# git clone --depth 1 https://github.com/prep/ollama.git

# git clone https://github.com/prep/ollama.git

git clone https://github.com/prep/ollama

切branch(这里没切换成)

# cd ollama && git checkout feature/add-bsd-support

先设定加速

csh下

# set GO111MODULE=on

# set GOPROXY=https://goproxy.io,direct
# set GOPRIVATE=git.mycompany.com,github.com/my/private 

bash下

# 启用 Go Modules 功能

export GO111MODULE=on

# Set the GOPROXY environment variable
export GOPROXY=https://goproxy.io,direct
# Set environment variable allow bypassing the proxy for specified repos (optional)
export GOPRIVATE=git.mycompany.com,github.com/my/private

开始go generate和build 

# go122 generate ./...

# go122 build .

最后报错:

go122 build .
go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
convert/gemma.go:12:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9
convert/gemma.go:13:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9

在FreeBSD jail里使用普通用户编译ollama特供版本(第三次尝试,成功)

若有报错,需要修改go.sum文件和go.mod文件。

使用如下命令:

  1. bash
  2. mkdir github.com
  3. cd github.com
  4. git clone https://github.com/prep/ollama.git
  5. cd ollama && git checkout feature/add-bsd-support
  6. # 启用 Go Modules 功能
  7. export GO111MODULE=on
  8. # Set the GOPROXY environment variable
  9. export GOPROXY=https://goproxy.io,direct
  10. # Set environment variable allow bypassing the proxy for specified repos (optional)
  11. export GOPRIVATE=git.mycompany.com,github.com/my/private
  12. go122 generate ./...
  13. go122 build .

报错调试过程

还是有报错: go122 build .
go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
convert/gemma.go:12:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9
convert/gemma.go:13:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9

修改go.sum文件,将里面的pdeviene/tensor 修改成

github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c h1:GwiUUjKefgvSNmv3NCvI/BL0kDebW6Xa+kcdpdc1mTY=
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c/go.mod h1:PSojXDXF7TbgQiD6kkd98IHOS0QqTyUEaWRiS8+BLu8=

还需要修改go.mod文件,将里面的pdevine/tensor版本改成5.10日的最新版本:

github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c

然后重新generate和build

根据实际情况,如果不重新generate,看提示大约需要重新get一下:

go122  get github.com/ollama/ollama/convert

然后再继续build

go122 build .
 

搞定! 

测试一下:

./ollama help | head -n 5

./ollama help | head -n 5
Large language model runner

Usage:
  ollama [flags]
  ollama [command]
证明确实编译成功了!

启动ollama

首先要启动ollama服务

./ollama serve

运行llama3模型

./ollama run llama3

ollama会自动下载模型。模型下载好后,会进入交互界面。

ollama的交互输出

一句回答用了50分钟.....但至少它成了,在FreeBSD下执行成功了!

  1. [skywalk@fb12 ~/gihub.com/ollama]$ ./ollama run llama3
  2. [GIN] 2024/07/15 - 12:01:47 | 200 | 466.704µs | 10.0.0.12 | HEAD "/"
  3. [GIN] 2024/07/15 - 12:01:47 | 404 | 450.54µs | 10.0.0.12 | POST "/api/show"
  4. pulling manifest ⠦ time=2024-07-15T12:01:50.016+08:00 level=INFO source=download.go:136 msg="downloading 6a0746a1ec1a in 47 100 MB part(s)"
  5. pulling manifest
  6. pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB time=2024-07-15T12:20:25.740+08:00 level=INFO source=download.go:136 msg="downloapulling manifest
  7. pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
  8. pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB tpulling manifest
  9. pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
  10. pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB
  11. pulling manifest
  12. pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
  13. pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB
  14. pulling manifest
  15. pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
  16. pulling manifest
  17. pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
  18. pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB
  19. pulling 8ab4849b038c... 100% ▕████████████████▏ 254 B
  20. pulling 577073ffcc6c... 100% ▕████████████████▏ 110 B
  21. pulling 3f8eb4da87fa... 100% ▕████████████████▏ 485 B
  22. verifying sha256 digest
  23. writing manifest
  24. removing any unused layers
  25. success
  26. [GIN] 2024/07/15 - 12:22:06 | 200 | 1.786897ms | 10.0.0.12 | POST "/api/show"
  27. [GIN] 2024/07/15 - 12:22:06 | 200 | 1.384117ms | 10.0.0.12 | POST "/api/show"
  28. time=2024-07-15T12:22:06.288+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
  29. ⠴ time=2024-07-15T12:22:20.820+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
  30. time=2024-07-15T12:22:20.821+08:00 level=INFO source=server.go:289 msg="starting llama server" cmd="/tmp/ollama1084183988/runners/cpu/ollama_llama_server --model /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 62268"
  31. time=2024-07-15T12:22:20.847+08:00 level=INFO source=sched.go:340 msg="loaded runners" count=1
  32. time=2024-07-15T12:22:20.847+08:00 level=INFO source=server.go:432 msg="waiting for llama runner to start responding"
  33. {"function":"server_params_parse","level":"INFO","line":2604,"msg":"logging to file is disabled.","tid":"0x10139f812000","timestamp":1721017340}
  34. ⠦ {"build":2770,"commit":"952d03db","function":"main","level":"INFO","line":2821,"msg":"build info","tid":"0x10139f812000","timestamp":1721017340}
  35. {"function":"main","level":"INFO","line":2828,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"0x10139f812000","timestamp":1721017340,"total_threads":4}
  36. ⠧ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
  37. llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
  38. llama_model_loader: - kv 0: general.architecture str = llama
  39. llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct
  40. llama_model_loader: - kv 2: llama.block_count u32 = 32
  41. llama_model_loader: - kv 3: llama.context_length u32 = 8192
  42. llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
  43. llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
  44. llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
  45. llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
  46. llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
  47. llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
  48. llama_model_loader: - kv 10: general.file_type u32 = 2
  49. llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
  50. llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
  51. llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
  52. llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
  53. ⠇ llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
  54. ⠏ llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
  55. ⠙ llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
  56. llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
  57. llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009
  58. llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
  59. llama_model_loader: - kv 21: general.quantization_version u32 = 2
  60. llama_model_loader: - type f32: 65 tensors
  61. llama_model_loader: - type q4_0: 225 tensors
  62. llama_model_loader: - type q6_K: 1 tensors
  63. ⠹ llm_load_vocab: special tokens definition check successful ( 256/128256 ).
  64. llm_load_print_meta: format = GGUF V3 (latest)
  65. llm_load_print_meta: arch = llama
  66. llm_load_print_meta: vocab type = BPE
  67. llm_load_print_meta: n_vocab = 128256
  68. llm_load_print_meta: n_merges = 280147
  69. llm_load_print_meta: n_ctx_train = 8192
  70. llm_load_print_meta: n_embd = 4096
  71. llm_load_print_meta: n_head = 32
  72. llm_load_print_meta: n_head_kv = 8
  73. llm_load_print_meta: n_layer = 32
  74. llm_load_print_meta: n_rot = 128
  75. llm_load_print_meta: n_embd_head_k = 128
  76. llm_load_print_meta: n_embd_head_v = 128
  77. llm_load_print_meta: n_gqa = 4
  78. llm_load_print_meta: n_embd_k_gqa = 1024
  79. llm_load_print_meta: n_embd_v_gqa = 1024
  80. llm_load_print_meta: f_norm_eps = 0.0e+00
  81. llm_load_print_meta: f_norm_rms_eps = 1.0e-05
  82. llm_load_print_meta: f_clamp_kqv = 0.0e+00
  83. llm_load_print_meta: f_max_alibi_bias = 0.0e+00
  84. llm_load_print_meta: f_logit_scale = 0.0e+00
  85. llm_load_print_meta: n_ff = 14336
  86. llm_load_print_meta: n_expert = 0
  87. llm_load_print_meta: n_expert_used = 0
  88. llm_load_print_meta: causal attn = 1
  89. llm_load_print_meta: pooling type = 0
  90. llm_load_print_meta: rope type = 0
  91. llm_load_print_meta: rope scaling = linear
  92. llm_load_print_meta: freq_base_train = 500000.0
  93. llm_load_print_meta: freq_scale_train = 1
  94. llm_load_print_meta: n_yarn_orig_ctx = 8192
  95. llm_load_print_meta: rope_finetuned = unknown
  96. llm_load_print_meta: ssm_d_conv = 0
  97. llm_load_print_meta: ssm_d_inner = 0
  98. llm_load_print_meta: ssm_d_state = 0
  99. llm_load_print_meta: ssm_dt_rank = 0
  100. llm_load_print_meta: model type = 8B
  101. llm_load_print_meta: model ftype = Q4_0
  102. llm_load_print_meta: model params = 8.03 B
  103. llm_load_print_meta: model size = 4.33 GiB (4.64 BPW)
  104. llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
  105. llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
  106. llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
  107. llm_load_print_meta: LF token = 128 'Ä'
  108. llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
  109. llm_load_tensors: ggml ctx size = 0.15 MiB
  110. llm_load_tensors: CPU buffer size = 4437.80 MiB
  111. .......................................................................................
  112. ⠸ llama_new_context_with_model: n_ctx = 2048
  113. llama_new_context_with_model: n_batch = 512
  114. llama_new_context_with_model: n_ubatch = 512
  115. llama_new_context_with_model: freq_base = 500000.0
  116. llama_new_context_with_model: freq_scale = 1
  117. ⠦ llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
  118. llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
  119. llama_new_context_with_model: CPU output buffer size = 0.50 MiB
  120. llama_new_context_with_model: CPU compute buffer size = 258.50 MiB
  121. llama_new_context_with_model: graph nodes = 1030
  122. llama_new_context_with_model: graph splits = 1
  123. ⠧ {"function":"initialize","level":"INFO","line":448,"msg":"initializing slots","n_slots":1,"tid":"0x10139f812000","timestamp":1721017395}
  124. {"function":"initialize","level":"INFO","line":460,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"0x10139f812000","timestamp":1721017395}
  125. {"function":"main","level":"INFO","line":3065,"msg":"model loaded","tid":"0x10139f812000","timestamp":1721017395}
  126. {"function":"main","hostname":"127.0.0.1","level":"INFO","line":3268,"msg":"HTTP server listening","n_threads_http":"3","port":"62268","tid":"0x10139f812000","timestamp":1721017395}
  127. {"function":"update_slots","level":"INFO","line":1579,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"0x10139f812000","timestamp":1721017395}
  128. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":0,"tid":"0x10139f812000","timestamp":1721017395}
  129. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":1,"tid":"0x10139f812000","timestamp":1721017395}
  130. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":37211,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
  131. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":2,"tid":"0x10139f812000","timestamp":1721017395}
  132. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":60236,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
  133. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":3,"tid":"0x10139f812000","timestamp":1721017395}
  134. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":43135,"status":200,"tid":"0x1013dbe0a000","timestamp":1721017395}
  135. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":4,"tid":"0x10139f812000","timestamp":1721017395}
  136. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":31620,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
  137. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":5,"tid":"0x10139f812000","timestamp":1721017395}
  138. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":56527,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
  139. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":6,"tid":"0x10139f812000","timestamp":1721017395}
  140. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":53213,"status":200,"tid":"0x1013dbe0a000","timestamp":1721017395}
  141. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":7,"tid":"0x10139f812000","timestamp":1721017395}
  142. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":21875,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
  143. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":47567,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
  144. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":8,"tid":"0x10139f812000","timestamp":1721017395}
  145. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":56264,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
  146. ⠇ [GIN] 2024/07/15 - 12:23:15 | 200 | 1m8s | 10.0.0.12 | POST "/api/chat"
  147. >>> hello
  148. time=2024-07-15T14:22:47.710+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
  149. ⠋ time=2024-07-15T14:23:02.785+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
  150. ⠙ time=2024-07-15T14:23:02.789+08:00 level=INFO source=server.go:289 msg="starting llama server" cmd="/tmp/ollama1084183988/runners/cpu/ollama_llama_server --model /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 61604"
  151. time=2024-07-15T14:23:02.811+08:00 level=INFO source=sched.go:340 msg="loaded runners" count=1
  152. time=2024-07-15T14:23:02.812+08:00 level=INFO source=server.go:432 msg="waiting for llama runner to start responding"
  153. {"function":"server_params_parse","level":"INFO","line":2604,"msg":"logging to file is disabled.","tid":"0x20da49412000","timestamp":1721024582}
  154. {"build":2770,"commit":"952d03db","function":"main","level":"INFO","line":2821,"msg":"build info","tid":"0x20da49412000","timestamp":1721024582}
  155. {"function":"main","level":"INFO","line":2828,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"0x20da49412000","timestamp":1721024582,"total_threads":4}
  156. ⠸ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
  157. llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
  158. llama_model_loader: - kv 0: general.architecture str = llama
  159. llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct
  160. llama_model_loader: - kv 2: llama.block_count u32 = 32
  161. llama_model_loader: - kv 3: llama.context_length u32 = 8192
  162. llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
  163. llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
  164. llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
  165. llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
  166. llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
  167. llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
  168. llama_model_loader: - kv 10: general.file_type u32 = 2
  169. llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
  170. llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
  171. llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
  172. llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
  173. ⠼ llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
  174. llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
  175. ⠧ llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
  176. llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
  177. llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009
  178. llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
  179. llama_model_loader: - kv 21: general.quantization_version u32 = 2
  180. llama_model_loader: - type f32: 65 tensors
  181. llama_model_loader: - type q4_0: 225 tensors
  182. llama_model_loader: - type q6_K: 1 tensors
  183. ⠇ llm_load_vocab: special tokens definition check successful ( 256/128256 ).
  184. llm_load_print_meta: format = GGUF V3 (latest)
  185. llm_load_print_meta: arch = llama
  186. llm_load_print_meta: vocab type = BPE
  187. llm_load_print_meta: n_vocab = 128256
  188. llm_load_print_meta: n_merges = 280147
  189. llm_load_print_meta: n_ctx_train = 8192
  190. llm_load_print_meta: n_embd = 4096
  191. llm_load_print_meta: n_head = 32
  192. llm_load_print_meta: n_head_kv = 8
  193. llm_load_print_meta: n_layer = 32
  194. llm_load_print_meta: n_rot = 128
  195. llm_load_print_meta: n_embd_head_k = 128
  196. llm_load_print_meta: n_embd_head_v = 128
  197. llm_load_print_meta: n_gqa = 4
  198. llm_load_print_meta: n_embd_k_gqa = 1024
  199. llm_load_print_meta: n_embd_v_gqa = 1024
  200. llm_load_print_meta: f_norm_eps = 0.0e+00
  201. llm_load_print_meta: f_norm_rms_eps = 1.0e-05
  202. llm_load_print_meta: f_clamp_kqv = 0.0e+00
  203. llm_load_print_meta: f_max_alibi_bias = 0.0e+00
  204. llm_load_print_meta: f_logit_scale = 0.0e+00
  205. llm_load_print_meta: n_ff = 14336
  206. llm_load_print_meta: n_expert = 0
  207. llm_load_print_meta: n_expert_used = 0
  208. llm_load_print_meta: causal attn = 1
  209. llm_load_print_meta: pooling type = 0
  210. llm_load_print_meta: rope type = 0
  211. llm_load_print_meta: rope scaling = linear
  212. llm_load_print_meta: freq_base_train = 500000.0
  213. llm_load_print_meta: freq_scale_train = 1
  214. llm_load_print_meta: n_yarn_orig_ctx = 8192
  215. llm_load_print_meta: rope_finetuned = unknown
  216. llm_load_print_meta: ssm_d_conv = 0
  217. llm_load_print_meta: ssm_d_inner = 0
  218. llm_load_print_meta: ssm_d_state = 0
  219. llm_load_print_meta: ssm_dt_rank = 0
  220. llm_load_print_meta: model type = 8B
  221. llm_load_print_meta: model ftype = Q4_0
  222. llm_load_print_meta: model params = 8.03 B
  223. llm_load_print_meta: model size = 4.33 GiB (4.64 BPW)
  224. llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
  225. llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
  226. llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
  227. llm_load_print_meta: LF token = 128 'Ä'
  228. llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
  229. llm_load_tensors: ggml ctx size = 0.15 MiB
  230. ⠙ llm_load_tensors: CPU buffer size = 4437.80 MiB
  231. .......................................................................................
  232. llama_new_context_with_model: n_ctx = 2048
  233. llama_new_context_with_model: n_batch = 512
  234. llama_new_context_with_model: n_ubatch = 512
  235. llama_new_context_with_model: freq_base = 500000.0
  236. llama_new_context_with_model: freq_scale = 1
  237. ⠴ llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
  238. llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
  239. llama_new_context_with_model: CPU output buffer size = 0.50 MiB
  240. llama_new_context_with_model: CPU compute buffer size = 258.50 MiB
  241. llama_new_context_with_model: graph nodes = 1030
  242. llama_new_context_with_model: graph splits = 1
  243. ⠦ {"function":"initialize","level":"INFO","line":448,"msg":"initializing slots","n_slots":1,"tid":"0x20da49412000","timestamp":1721024651}
  244. {"function":"initialize","level":"INFO","line":460,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"0x20da49412000","timestamp":1721024651}
  245. {"function":"main","level":"INFO","line":3065,"msg":"model loaded","tid":"0x20da49412000","timestamp":1721024651}
  246. {"function":"main","hostname":"127.0.0.1","level":"INFO","line":3268,"msg":"HTTP server listening","n_threads_http":"3","port":"61604","tid":"0x20da49412000","timestamp":1721024651}
  247. {"function":"update_slots","level":"INFO","line":1579,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"0x20da49412000","timestamp":1721024651}
  248. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":0,"tid":"0x20da49412000","timestamp":1721024651}
  249. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":1,"tid":"0x20da49412000","timestamp":1721024651}
  250. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":48229,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
  251. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":2,"tid":"0x20da49412000","timestamp":1721024651}
  252. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":33319,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
  253. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":3,"tid":"0x20da49412000","timestamp":1721024651}
  254. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":54187,"status":200,"tid":"0x20da85a0ae00","timestamp":1721024651}
  255. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":4,"tid":"0x20da49412000","timestamp":1721024651}
  256. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":28162,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
  257. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":33773,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
  258. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":5,"tid":"0x20da49412000","timestamp":1721024651}
  259. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":6,"tid":"0x20da49412000","timestamp":1721024651}
  260. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":19633,"status":200,"tid":"0x20da85a0ae00","timestamp":1721024651}
  261. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":7,"tid":"0x20da49412000","timestamp":1721024651}
  262. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":35779,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
  263. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":18413,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
  264. ⠧ {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":8,"tid":"0x20da49412000","timestamp":1721024651}
  265. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
  266. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":9,"tid":"0x20da49412000","timestamp":1721024651}
  267. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
  268. ⠇ {"function":"log_server_request","level":"INFO","line":2742,"method":"POST","msg":"request","params":{},"path":"/tokenize","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
  269. {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":10,"tid":"0x20da49412000","timestamp":1721024651}
  270. {"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
  271. ⠏ {"function":"launch_slot_with_data","level":"INFO","line":833,"msg":"slot is processing task","slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
  272. {"function":"update_slots","ga_i":0,"level":"INFO","line":1817,"msg":"slot progression","n_past":0,"n_past_se":0,"n_prompt_tokens_processed":10,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
  273. {"function":"update_slots","level":"INFO","line":1841,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
  274. Hello! It's nice to meet you. Is there something I can help you with, or
  275. would you like to chat?{"function":"print_timings","level":"INFO","line":276,"msg":"prompt eval time = 106459.91 ms / 10 tokens (10645.99 ms per token, 0.09 tokens per second)","n_prompt_tokens_processed":10,"n_tokens_second":0.09393207617523164,"slot_id":0,"t_prompt_processing":106459.906,"t_token":10645.990600000001,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
  276. {"function":"print_timings","level":"INFO","line":290,"msg":"generation eval time = 2868918.63 ms / 26 runs (110343.02 ms per token, 0.01 tokens per second)","n_decoded":26,"n_tokens_second":0.00906264811318913,"slot_id":0,"t_token":110343.0241923077,"t_token_generation":2868918.629,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
  277. {"function":"print_timings","level":"INFO","line":299,"msg":" total time = 2975378.54 ms","slot_id":0,"t_prompt_processing":106459.906,"t_token_generation":2868918.629,"t_total":2975378.535,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
  278. {"function":"update_slots","level":"INFO","line":1649,"msg":"slot released","n_cache_tokens":36,"n_ctx":2048,"n_past":35,"n_system_tokens":0,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627,"truncated":false}
  279. {"function":"log_server_request","level":"INFO","line":2742,"method":"POST","msg":"request","params":{},"path":"/completion","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721027627}
  280. [GIN] 2024/07/15 - 15:13:47 | 200 | 50m59s | 10.0.0.12 | POST "/api/chat"

总结

ollama可以在FreeBSD下编译,但是需要特供版本。官网是:GitHub - ollama/ollama: Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. 特供版是:https://github.com/prep/ollama

特供版如果编译时报错,看报错信息,相应修改go.sum go.mod文件里 github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c 这句,修改成5.10日版本。

整个系统在CPU J1900 、8G 内存,软件FreeBSD fbhost 14.1-RELEASE FreeBSD 下调试成功。尽管ollama速度非常慢,大约50分钟回答一个问题,但至少,它确实成功了! 

调试

go build的时候报错

  1. skywalk@fbhost:~/github/ollama $ go build .
  2. package github.com/ollama/ollama
  3. imports github.com/ollama/ollama/cmd
  4. imports github.com/ollama/ollama/server
  5. imports github.com/ollama/ollama/gpu: C source files not allowed when not using cgo or SWIG: gpu_info_cudart.c gpu_info_nvcuda.c gpu_info_nvml.c gpu_info_oneapi.c

怎么会有gpu呢? 哪里配置不对? 

为了FreeBSD下编译查看了ollama的issue

Ollama on FreeBSD · Issue #1102 · ollama/ollama · GitHub

在这个issue里,提到了方法,使用另一个repo:

# pkg install -y git go122 cmake vulkan-headers vulkan-loader

# git clone https://github.com/prep/ollama.git

# cd ollama && git checkout feature/add-bsd-support

# go122 generate ./...

# go122 build .

  1. # ./ollama help | head -n 5
  2. Large language model runner
  3. Usage:
  4. ollama [flags]
  5. ollama [command]

Works fine for me, no problems encountered.

本来好像主repo 也可以FreeBSD下安装的,但是5.6日之后就不行了:Make maximum pending request configurable by dhiltgen · Pull Request #4144 · ollama/ollama · GitHub

 git checkout feature/add-bsd-support报错

git checkout feature/add-bsd-support
error: pathspec 'feature/add-bsd-support' did not match any file(s) known to git

原来是因为前面代码没有下载全的原因。

# git clone --depth 1 https://github.com/prep/ollama.git

切branch(这里没切换成)

# cd ollama && git checkout feature/add-bsd-support

这里不能用--depth 1 ,去掉,

git clone  https://github.com/prep/ollama.git

这样就能git checkout feature/add-bsd-support 成功了。

vulkan-headers 和 vulkan-loader 两个包的功能

vulkan-headers 和 vulkan-loader 是与 Vulkan API 相关的两个关键组件,它们在开发使用 Vulkan 图形和计算 API 的应用程序时起着重要的作用。Vulkan 是一个跨平台的图形和计算 API,由 Khronos Group 开发,旨在提供高性能的 3D 图形渲染能力。

在jail里build的时候报错C source files not allowed

先上结论,是因为github抽风。

在jail里build的时候报错imports github.com/ollama/ollama/gpu: C source files not allowed when not using cgo or SWIG: gpu_info_cpu.c gpu_info_cudart.c

同时还有github连不上的报错:

 fatal: unable to access 'https://github.com/pdevine/tensor/': Failed to connect to github.com port 443 after 75025 ms: Couldn't connect to server
 

go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
package github.com/ollama/ollama
    imports github.com/ollama/ollama/cmd
    imports github.com/ollama/ollama/server
    imports github.com/ollama/ollama/gpu: C source files not allowed when not using cgo or SWIG: gpu_info_cpu.c gpu_info_cudart.c
convert/gemma.go:12:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: git fetch -f origin refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* in /root/go/pkg/mod/cache/vcs/6bf5b14e60582bdf39d55e6388653dd8c2addad6937480b86ddb5a729a838afe: exit status 128:
    fatal: unable to access 'https://github.com/pdevine/tensor/': Failed to connect to github.com port 443 after 75025 ms: Couldn't connect to server
convert/gemma.go:13:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: git fetch -f origin refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* in /root/go/pkg/mod/cache/vcs/6bf5b14e60582bdf39d55e6388653dd8c2addad6937480b86ddb5a729a838afe: exit status 128:
    fatal: unable to access 'https://github.com/pdevine/tensor/': Failed to connect to github.com port 443 after 75025 ms: Couldn't connect to server
 

第一次generate之后,build没成功

+ echo 'go generate completed.  LLM runners: cpu cpu_avx cpu_avx2 vulkan'
go generate completed.  LLM runners: cpu cpu_avx cpu_avx2 vulkan
[root@fb12 ollama]# go122 build .
go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
convert/gemma.go:12:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9
convert/gemma.go:13:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9
不知道什么原因,不过有可能还是github抽风....

再重新generate一下。继续抽风中

前面都是用的root账户,尝试使用普通用户编译试试。

普通用户也是这个报错

修改go.sum文件,将里面的pdeviene/tensor 修改成

github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c h1:GwiUUjKefgvSNmv3NCvI/BL0kDebW6Xa+kcdpdc1mTY=
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c/go.mod h1:PSojXDXF7TbgQiD6kkd98IHOS0QqTyUEaWRiS8+BLu8=

修改之后,go build报错

go build 报错convert/gemma.go:13:2: missing go.sum entry for module providing package

go122 build .
convert/gemma.go:12:2: missing go.sum entry for module providing package github.com/pdevine/tensor (imported by github.com/ollama/ollama/convert); to add:
    go get github.com/ollama/ollama/convert
convert/gemma.go:13:2: missing go.sum entry for module providing package github.com/pdevine/tensor/native (imported by github.com/ollama/ollama/convert); to add:
    go get github.com/ollama/ollama/convert
发现go.mod 文件里也有版本,修改成当前的:

github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c

但是又报错了

go.sum go.mod文件里修改tensor版本后报错

verifying github.com/google/flatbuffers@v1.12.0: checksum mismatch
    downloaded: h1:N8EguYFm2wwdpoNcpchQY0tPs85vOJkboFb2dPxmixo=
    go.sum:     h1:/PtAHvnBY4Kqnx/xCQ3OIV9uYcSFGScBsWI3Oogeh6w=

SECURITY ERROR
This download does NOT match an earlier download recorded in go.sum.
The bits may have been replaced on the origin server, or an attacker may
have intercepted the download attempt.
 

  1. go122 generate ./...
  2. go: downloading github.com/google/flatbuffers v1.12.0
  3. go: downloading gonum.org/v1/gonum v0.8.2
  4. verifying github.com/google/flatbuffers@v1.12.0: checksum mismatch
  5. downloaded: h1:N8EguYFm2wwdpoNcpchQY0tPs85vOJkboFb2dPxmixo=
  6. go.sum: h1:/PtAHvnBY4Kqnx/xCQ3OIV9uYcSFGScBsWI3Oogeh6w=
  7. SECURITY ERROR
  8. This download does NOT match an earlier download recorded in go.sum.
  9. The bits may have been replaced on the origin server, or an attacker may
  10. have intercepted the download attempt.

晕了,这个特供版本有问题啊

go.mod 修改成这样试试 github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c

然后执行 

go122  get github.com/ollama/ollama/convert

然后执行

go122 build .

终于安装完成了。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/正经夜光杯/article/detail/856596
推荐阅读
相关标签
  

闽ICP备14008679号