酷酷是懒虫

这个屌丝很懒，什么也没留下！

热门标签

热门文章

当前位置: article > 正文

【大模型】大模型 CPU 推理之 llama.cpp_llama cpu推理

作者：酷酷是懒虫 | 2024-07-27 14:07:29

赞

踩

llama cpu推理

【大模型】大模型 CPU 推理之 llama.cpp

llama.cpp
安装llama.cpp
Memory/Disk Requirements
Quantization
测试推理
- 下载模型
- 测试
参考

llama.cpp

描述

The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.
- Plain C/C++ implementation without any dependencies
- Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
- AVX, AVX2 and AVX512 support for x86 architectures
- 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
- Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP)
- Vulkan, SYCL, and (partial) OpenCL backend support
- CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity
官网
https://github.com/ggerganov/llama.cpp

Supported platforms:

 Mac OS
 Linux
 Windows (via CMake)
 Docker
 FreeBSD
1
2
3
4
5

Supported models:
- Typically finetunes of the base models below are supported as well.
LLaMA 本文内容由网友自发贡献，转载请注明出处：【wpsshop博客】

推荐阅读

相关标签

Copyright © 2003-2013 www.wpsshop.cn 版权所有，并保留所有权利。

闽ICP备14008679号