花生_TL007

这个屌丝很懒，什么也没留下！

热门标签

Meta LLaMA 2实操：从零到一搭建顶尖开源大语言模型【超详篇】

作者：花生_TL007 | 2024-04-07 01:56:30

踩

前言

本文章由 [jfj] 编写，所有内容均为原创。涉及的软件环境是在nvidia-docker环境进行的，如对docker未了解如何使用的，可以移步上一篇文章nvidia-docker安装详解。

在 nvidia-docker 容器中运行时，Docker 已经与 NVIDIA GPU 集成，使得容器能够直接访问和利用宿主机的 GPU 资源。这意味着，只要您的容器内安装了正确的工具和库（如 CUDA、cuDNN 及其他必要的 NVIDIA 库），您的深度学习模型就可以直接利用 GPU 加速。因此，直接在您的 nvidia-docker 容器中按照项目要求安装依赖，确实是最直接、最有效的方式来开始 GPU 加速训练。

一.【为何要本地搭建】

首先，我们需要知道自己这么做的目的：

1.数据隐私和安全性：

●本地处理数据意味着您的数据不必传输到外部服务器，降低了数据泄露的风险。
●您可以在符合严格的数据隐私法规的环境中工作，如GDPR或HIPAA。

2.自定义训练：

●您可以用自己的特定数据集训练模型，这可以使模型更好地适应您的特定用例和业务需求。
●可以调整模型架构和训练过程，以优化模型的表现。

3.避免使用费用：

●使用自己的硬件资源，您不需要为API调用支付费用，这在长期运行大量推理任务时尤其经济。

4.低延迟：

●本地运行模型通常可以降低推理延迟，因为不涉及网络传输。
●对于实时或延迟敏感的应用来说，这是一个显著的优势。

5.离线使用：

●您可以在没有互联网连接的环境下使用模型，这对于一些离线应用或在网络连接不稳定的区域很有帮助。

6.知识和技术积累：

●搭建自己的AI模型可以增强团队的技术能力和对AI内部工作原理的理解。
●长期来看，这有助于建立内部专业知识并减少对第三方服务提供商的依赖。

二.【然而面临的挑战】

搭建和运行自己的大型AI模型也有一些挑战：

●硬件成本：需要投资高性能的硬件，尤其是高端GPU，这对于个人用户或小公司来说可能成本很高。
●技术要求：需要有相应的技术专业知识来搭建、训练和维护这样的模型。
●时间消耗：从头开始训练一个大型模型需要大量时间和资源。
●能源消耗：大规模的模型训练和持续运行需要大量电力。

三.【了解微调】

微调（Fine-tuning）在机器学习和特别是在深度学习领域，是一个特定的过程，其中预训练的模型（比如GPT）使用新的数据集进行额外的训练。这个过程通常涉及以下几个方面：

保留学习：微调保留了模型在大规模数据集上学到的知识，这些知识是通用的、跨域的。

针对性优化：通过在特定领域的较小数据集上继续训练，模型被优化以更好地执行特定任务。例如，尽管GPT已经知道如何生成文本，但它可以通过微调来专门生成医学相关的文本。

较短的训练周期：与从头开始训练相比，微调通常只需要相对较少的训练时间和数据。

参数更新：在微调期间，模型的参数（权重和偏差）被更新，以适应新的任务或数据集。

四.【LLaMA、ChatGLM-6B和BLOOM的技术对比】

通过一些对比（如参数模型），如果说目前最强大的，而且是完全免费开源的AI大模型语言的话，那肯定是由MeTa推出的llama2语言模型，用来做微调/商用都是没问题的。所有AI开源的模型里,Llama肯定是最强的..SO倾向于搭建LLaMA的模型。

五.【下载模型以及运行】

建议硬件配置:

显卡：显存为24G，显存位宽384-bit，显存带宽显存类型：GDDR6，显存带宽：360GB/S或以上

内存：32G或更低,其实训练过程CPU和内存扮演角色的配置并不是要求特高。当然对于特别大的数据集或需要在CPU上进行大量预处理的任务，较好的CPU和更多的内存仍然能带来性能上的提升

电源，其实初级750W够了。

建议软件配置：
宿主机：Ubuntu22.04（这里建议科学上网，毕竟要下载很多插件）

docker：nvidia/cuda 12.2.0-base-ubuntu22.04

总的来说需要看模型的训练程度需求、去选型配置。如果只是想体验本地搭建，白嫖一波阿里云什么的也是相当可以的~_~。

1.Meta-LLaMa

这个GitHub仓库是由Meta（Facebook的母公司）官方提供的，用于执行Llama模型的推理。这些模型是一系列预训练和微调的大型语言模型，参数范围从7B（10亿）到70B不等。该仓库旨在提供一个简单的示例，用于加载Llama 2模型并在本地进行快速推理。

因此，我参考了它：GitHub - meta-llama/llama: Inference code for Llama models

2.通过 Meta 网站下载所需的模型权重和 Tokenizer

这里附上Meta官网下载地址：https://llama.meta.com/llama-downloads/

访问 Meta 网站提交邮箱相关信息并接受许可协议以下载模型权重和 Tokenizer

注册后，不到几分钟（虽然说是24小时）您会通过电子邮件接收到一个包含下载 URL 的邮件。

作为被允许下载，运行提供的 download.sh 脚本。然后可以在你的服务器上开始下载（其实这部分可以在docker里面直接干就行了，但是我打算保留在宿主机）：


noname@noname-System-Product-Name:~/Desktop/chat$ ./download.sh

Enter the URL from email: https://download.llamameta.netXXXXXXXXXX

然后选择你准备部署的模型，这里可以先选择7B-chat，下载多个，我这里还下载了13B-chat等，因为我比较喜欢官方原汁原味的下载渠道（后面再去讲明）

Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all:

下载后，后续请使用docker cp命令拷贝你的模型文件进入容器里面（这里我还是选择在物理机下载，以免有什么意外导致重新下载）

3.进入容器

当拷贝完了模型文件，我们进入docker环境：

docker run -p 7860:7860 --gpus all -it nvidia/cuda:12.2.0-base-ubuntu22.04

请加上--gpus all以便于容器能够使用我们的GPU。

3.软件环境安装

这里废话一下，我们需要安装anaconda，conda 使得在不同的 Python 版本之间轻松切换、安装和管理不同版本的软件包、创建和管理环境等操作变得更加简单。这里先创建一个版本为3.11的虚拟环境

conda create -n textgen python=3.11

如果您的 shell 配置中没有包含 conda 相关的初始化代码，您可能需要执行 conda init 来添加。这样做可以确保您可以在新的 shell 会话中直接使用 conda activate 命令。您可以通过执行以下命令来初始化 shell：

conda init <shell_name>

如果您不确定是否需要执行 conda init，可以先尝试使用 conda activate 命令来激活环境。如果命令能够正常工作，则说明您的环境已经设置正确，不需要执行 conda init。

激活且进入该环境（在该环境中执行的 Python 命令和其他命令将会使用该环境中安装的 Python 版本和库。）：

conda activate textgen

克隆仓库（如果您还没有将代码复制或克隆到容器中）：

apt-get install -y git

然后，克隆llama项目：


git clone https://github.com/meta-llama/llama.git
cd llama

在这个项目中,所需的依赖已经在 setup.py 文件中列出，在顶级目录中运行,自动安装这些依赖：

pip install -e .

4.运行

现在，以下命令在本地运行该模型。


torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir ./Llama-13b-chat/ \
    --tokenizer_path ./Llama-13b-chat/tokenizer.model \
    --max_seq_len 512 --max_batch_size 6

需要注意的是

●替换 llama-2-7b-chat/为你检查点目录的路径和tokenizer.model分词器模型的路径。

●应将其–nproc_per_node设置为您正在使用的型号的MP值。

●根据需要调整max_seq_len和参数max_batch_size。

●--nproc_per_node 参数应根据你使用的模型设置为正确的值。README 文件提供了一个表格,列出了不同模型所需的模型并行度(MP)值。

直至我发现了错误：

xxxxx generator = Llama.build(
File "/llama/llama/generation.py", line 103, in build
assert model_parallel_size == len(
AssertionError: Loading a checkpoint for MP=2 but world size is 1

有趣的是，我没认真阅读官方文件的这里说明，13B模型的checkpoint是为模型并行（MP=2）设置的，这就造成了不匹配。

在单GPU环境下运行本来需要在两个GPU上运行的模型，并行设置会导致错误。因此，你想要选择的模型得依据你的物理硬件再去决定，所以我目前选择7B模型。

重新运行7B-chat模型：

torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir ./llama-2-7b-chat/ --tokenizer_path ./llama-2-7b-chat/tokenizer.model --max_seq_len 512 --max_batch_size 6

接下来，我们可以看到，预训练模型已经正常运行（需要注意的是这些模型未针对聊天或问答进行微调。）

(textgen) root@24a17a3c97f3:/llama# torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir ./llama-2-7b-chat/ --tokenizer_path ./llama-2-7b-chat/tokenizer.model --max_seq_len 512 --max_batch_size 6
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
/root/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/__init__.py:696: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:451.)
_C._set_default_tensor_type(t)
Loaded in 42.28 seconds
User: what is the recipe of mayonnaise?

> Assistant: Mayonnaise is a thick, creamy condiment made from a mixture of egg yolks, oil, and an acid, such as vinegar or lemon juice. Here is a basic recipe for homemade mayonnaise:
Ingredients:
* 2 egg yolks
* 1/2 cup (120 ml) neutral-tasting oil, such as canola or grapeseed
* 1 tablespoon (15 ml) vinegar or lemon juice
* Salt and pepper to taste
Instructions:
1. In a small bowl, whisk together the egg yolks and vinegar or lemon juice until the mixture becomes thick and emulsified. This should take about 5-7 minutes.
2. Slowly pour the oil into the egg yolk mixture while continuously whisking. The mixture should start to thicken and emulsify as you add the oil.
3. Continue whisking until the mixture becomes thick and creamy, and all of the oil has been incorporated. This can take up to 10-15 minutes.
4. Taste and adjust the seasoning as needed. You may want to add more salt, pepper, or vinegar to taste.
5. Transfer the mayonnaise to a jar or airtight container and store it in the fridge for up to 1 week.
Note: It's important to use a low-speed mixer or whisk when making mayonnaise, as it can easily break or turn into scrambled eggs if mixed too quickly.
Here are some tips to help you make the best mayonnaise:
* Use a neutral-tasting oil, such as canola or grapeseed, as it will not affect the flavor of the mayonnaise.
* Use a high-quality egg yolk, as it will give the mayonnaise a richer and creamier texture.
* Make sure the egg yolk is at room temperature before whisking, as this will help the mixture emulsify more easily.
* Don't over-whisk the mayonnaise, as it can break and become too thick.
* Add the oil slowly

==================================

User: I am going to Paris, what should I see?

Assistant: Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:

1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.
2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.
3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.

These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.

User: What is so great about #1?

> Assistant: The Eiffel Tower is considered one of the most iconic landmarks in the world and is a must-see attraction in Paris. Here are some reasons why it's so great:
1. Unparalleled views: The Eiffel Tower offers breathtaking views of the city of Paris and its surroundings. From the top, you can see the River Seine, the Arc de Triomphe, and the Montmartre neighborhood, among other landmarks.
2. Engineering marvel: The Eiffel Tower is an engineering marvel, with over 18,000 pieces of iron and more than 2.5 million rivets holding it together. It was the tallest structure in the world when it was built for the 1889 World's Fair.
3. Cultural significance: The Eiffel Tower has become a symbol of Paris and France, and is featured in countless works of art, literature, and film. It's a place where people come to propose, to celebrate special occasions, and to make memories with loved ones.
4. Historical significance: The Eiffel Tower was built for the 1889 World's Fair, held in Paris to celebrate the

==================================

System: Always answer with Haiku

User: I am going to Paris, what should I see?

> Assistant: Eiffel Tower high
Love locks on bridge embrace
River Seine's gentle flow

==================================

System: Always answer with emojis

User: How to go from Beijing to NY?

> Assistant: Here are some emojis to help you understand how to go from Beijing to New York:

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/花生_TL007/article/detail/375663