weixin_40725706

这个屌丝很懒，什么也没留下！

热门标签

热门文章

当前位置: article > 正文

【大模型】大模型 CPU 推理之 llama.cpp

作者：weixin_40725706 | 2024-04-07 12:22:01

赞

踩

【大模型】大模型 CPU 推理之 llama.cpp

【大模型】大模型 CPU 推理之 llama.cpp

llama.cpp
安装llama.cpp
Memory/Disk Requirements
Quantization
测试推理
- 下载模型
- 测试
参考

llama.cpp

描述

The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.
- Plain C/C++ implementation without any dependencies
- Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
- AVX, AVX2 and AVX512 support for x86 architectures
- 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
- Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP)
- Vulkan, SYCL, and (partial) OpenCL backend support
- CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity
官网
https://github.com/ggerganov/llama.cpp

Supported platforms:

 Mac OS
 Linux
 Windows (via CMake)
 Docker
 FreeBSD
1
2
3
4
5

Supported models:
- Typically finetunes of the base models below are supported as well.
LLaMA
声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/weixin_40725706/article/detail/378566

推荐阅读

相关标签

Copyright © 2003-2013 www.wpsshop.cn 版权所有，并保留所有权利。

闽ICP备14008679号