当前位置:   article > 正文

增强大型语言模型(LLM)可访问性:深入探究在单块AMD GPU上通过QLoRA微调Llama 2的过程

增强大型语言模型(LLM)可访问性:深入探究在单块AMD GPU上通过QLoRA微调Llama 2的过程

Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama 2 on a single AMD GPU — ROCm Blogs

基于之前的博客《使用LoRA微调Llama 2》的内容,我们深入研究了一种称为量化低秩调整(QLoRA)的参数高效微调(PEFT)方法。本次重点是利用QLoRA技术在单块AMD GPU上,使用ROCm微调Llama-2 7B模型。通过使用QLoRA,可以解决内存和计算能力限制方面的挑战。本次探索旨在展示如何利用QLoRA来增强对开源大型语言模型的可访问性。

QLoRA微调

QLoRA是一种结合了高精度计算技术和低精度存储方法的微调技术。这有助于在确保模型仍然高性能和精确的同时,保持模型大小的小巧。

QLoRA如何工作?

简而言之,QLoRA在不牺牲性能的前提下,优化了LLM微调的内存使用,与标准的16位模型微调形成了对比。具体来说,QLoRA采用4位量化压缩预训练语言模型。然后冻结语言模型参数,并引入少量的可训练参数,以低秩适配器(Low-Rank Adapters)的形式。在微调过程中,QLoRA通过冻结的4位量化预训练语言模型反向传播梯度到低秩适配器中。值得注意的是,在训练期间,只有LoRA层进行更新。要更深入了解LoRA,请参阅原始的LoRA论文。

QLoRA与LoRA的比较

QLoRA和LoRA都是两种参数高效的微调技术。LoRA作为一个独立的微调方法运作,而QLoRA则结合了LoRA作为一个辅助机制,以解决量化过程中引入的错误,并在微调期间进一步最小化资源需求。

一步步使用QLoRA对Llama 2进行微调

本节将指导您通过QLoRA一步步对具有70亿参数的Llama 2模型进行微调,该模型可以在单个AMD GPU上运行。实现这一成就的关键在于QLoRA的关键支持,它在有效减少内存需求方面发挥了不可或缺的作用。
为此,我们将使用以下设置:
- 硬件 & 操作系统:请访问此链接,查看与ROCm兼容的硬件和操作系统列表。
- 软件:
    - ROCm 6.1.0+
    - Pytorch for ROCm 2.0+
- 库:`transformers`、`accelerate`、`peft`、`trl`、`bitsandbytes`、`scipy`

在这篇博客中,我们使用单个MI250GPU以及Docker镜像rocm/pytorch:rocm6.1_ubuntu20.04_py3.9_pytorch_2.1.2进行了实验。

您可以在Github仓库中找到这篇博客中使用的完整代码。

1. 开始

我们的第一步是确认GPU的可用性。

!rocm-smi --showproductname
  1. ========================= ROCm System Management Interface =========================
  2. =================================== Product Info ===================================
  3. GPU[0] : 系列: AMD INSTINCT MI250 (MCM) OAM AC MBA
  4. GPU[0] : 型号: 0x0b0c
  5. GPU[0] : 制造商: Advanced Micro Devices, Inc. [AMD/ATI]
  6. GPU[0] : SKU: D65209
  7. GPU[1] : 系列: AMD INSTINCT MI250 (MCM) OAM AC MBA
  8. GPU[1] : 型号: 0x0b0c
  9. GPU[1] : 制造商: Advanced Micro Devices, Inc. [AMD/ATI]
  10. GPU[1] : SKU: D65207
  11. ====================================================================================
  12. =============================== End of ROCm SMI Log ===============================

如果您的AMD机器上有不止一个GCDs或GPUs,让我们只使用一个图形计算模块(GCD)或GPU。

  1. import os
  2. os.environ["HIP_VISIBLE_DEVICES"]="0"
  3. import torch
  4. use_cuda = torch.cuda.is_available()
  5. if use_cuda:
  6. print('__CUDNN VERSION:', torch.backends.cudnn.version())
  7. print('__Number CUDA Devices:', torch.cuda.device_count())
  8. cunt = torch.cuda.device_count()
  1. __CUDNN VERSION: 2020000
  2. __Number CUDA Devices: 1

接下来我们将安装所需的库。

!pip install -q pandas peft==0.9.0 transformers==4.31.0 trl==0.4.7 accelerate scipy
安装bitsandbytes

ROCm需要特殊版本的bitsandbytes(bitsandbytes-rocm).

1. 使用以下代码安装bitsandbytes。
 

  1. git clone --recurse https://github.com/ROCm/bitsandbytes
  2. cd bitsandbytes
  3. git checkout rocm_enabled
  4. pip install -r requirements-dev.txt
  5. cmake -DCOMPUTE_BACKEND=hip -S . #Use -DBNB_ROCM_ARCH="gfx90a;gfx942" to target specific gpu arch
  6. make
  7. pip install .

2. 检查bitsandbytes版本。

在撰写本博客时,版本为0.43.0。

  1. %%bash
  2. pip list | grep bitsandbytes

3. 引入所需的包。

  1. import torch
  2. from datasets import load_dataset
  3. from transformers import (
  4. AutoModelForCausalLM,
  5. AutoTokenizer,
  6. BitsAndBytesConfig,
  7. TrainingArguments,
  8. pipeline
  9. )
  10. from peft import LoraConfig
  11. from trl import SFTTrainer

2. 配置模型和数据

模型配置

在Hugging Face提交请求并等待几天后,您可以访问Meta的官方Llama-2模型。作为替代,我们将使用NousResearch的Llama-2-7b-chat-hf作为我们的基础模型(它与原始模型相同,但更易于访问)。

  1. # 模型和分词器名称
  2. base_model_name = "NousResearch/Llama-2-7b-chat-hf"
  3. new_model_name = "llama-2-7b-enhanced" #您可以为微调后的模型起自己的名字
  4. # 分词器
  5. llama_tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
  6. llama_tokenizer.pad_token = llama_tokenizer.eos_token
  7. llama_tokenizer.padding_side = "right"
QLoRA 4-bit量化配置

正如论文所述,QLoRA以4位存储权重,允许在16位或32位精度下进行计算。这意呞着每当使用QLoRA权重张量时,我们就将张量去量化到16位或32位精度,然后执行矩阵乘法。可以选择各种组合,例如float16、bfloat16、float32等。可以尝试不同的4位量化变种,包括规范化浮点4(NF4)或纯浮点4量化。然而,根据论文中的理论考量和经验结果,建议选择NF4量化,因为它往往能提供更好的性能。
在我们的案例中,我们选择了以下配置:
- 使用NF4类型的4位量化
- 16位(float16)进行计算
- 双重量化,这在第一次量化后使用第二次量化,可以额外节省每个参数0.3位
量化参数可以通过BitsandbytesConfig控制(参见Hugging Face文档:https://huggingface.co/docs/transformers/main_classes/quantization#transformers.BitsAndBytesConfig)如下:
- 通过load_in_4bit激活4位加载
- 用bnb_4bit_quant_type指定用于量化的数据类型。注意,支持两种量化数据类型:fp4(四位浮点数)和nf4(规范化四位浮点数)。后者对于正态分布的权重理论上是最优的,因此我们推荐使用nf4。
- 用bnb_4bit_compute_dtype指定用于线性层计算的数据类型
- 通过bnb_4bit_use_double_quant激活嵌套量化

  1. # 量化配置
  2. quant_config = BitsAndBytesConfig(
  3. load_in_4bit=True,
  4. bnb_4bit_quant_type="nf4",
  5. bnb_4bit_compute_dtype=torch.float16,
  6. bnb_4bit_use_double_quant=True
  7. )

加载模型并设置量化配置。

  1. base_model = AutoModelForCausalLM.from_pretrained(
  2. base_model_name,
  3. quantization_config=quant_config,
  4. device_map="auto"
  5. )
  6. base_model.config.use_cache = False
  7. base_model.config.pretraining_tp = 1
数据集配置

我们使用一个名为mlabonne/guanaco-llama2-1k的小型数据集对我们的基础模型进行了微调,以进行问答任务。这个数据集是timdettmers/openassistant-guanaco数据集的一个子集(1000个样本)。该数据集是一个由人类生成、人类标注的助理风格对话语料库,它包含35种不同语言的161443条消息,带有461292个质量评分。这导致超过10000棵完全标注的对话树。

  1. # Dataset
  2. data_name = "mlabonne/guanaco-llama2-1k"
  3. training_data = load_dataset(data_name, split="train")
  4. # check the data
  5. print(training_data.shape)
  6. # #11 is a QA sample in English
  7. print(training_data[11])
  1. (1000, 1)
  2. {'text': '<s>[INST] write me a 1000 words essay about deez nuts. [/INST] The Deez Nuts meme first gained popularity in 2015 on the social media platform Vine. The video featured a young man named Rodney Bullard, who recorded himself asking people if they had heard of a particular rapper. When they responded that they had not, he would respond with the phrase "Deez Nuts" and film their reactions. The video quickly went viral, and the phrase became a popular meme. \n\nSince then, Deez Nuts has been used in a variety of contexts to interrupt conversations, derail discussions, or simply add humor to a situation. It has been used in internet memes, in popular music, and even in politics. In the 2016 US presidential election, a 15-year-old boy named Brady Olson registered as an independent candidate under the name Deez Nuts. He gained some traction in the polls and even made appearances on national news programs.\n\nThe Deez Nuts meme has had a significant impact on popular culture. It has become a recognizable catchphrase that people use to add humor to everyday conversations. The meme has also been used to satirize politics and other serious issues. For example, in 2016, a group of activists in the UK used the phrase "Deez Nuts for President" as part of a campaign to encourage young people to vote in the EU referendum. </s><s>[INST] Rewrite the essay in a more casual way. Instead of sounding proffesional, sound like a college student who is forced to write the essay but refuses to do so in the propper way. Use casual words and slang when possible. [/INST] Yo, so you want me to write a 1000-word essay about Deez Nuts? Alright, fine. So, this whole thing started on Vine back in 2015. Some dude named Rodney Bullard made a video where he would ask people if they knew a rapper, and when they said no, he would hit them with the classic line: "Deez Nuts!" People loved it, and it became a viral meme.\n\nNowadays, Deez Nuts is used for all kinds of stuff. You can throw it out there to interrupt someone or just to be funny. It\'s all over the internet, in music, and even in politics. In fact, during the 2016 US presidential election, a kid named Brady Olson registered as an independent candidate under the name Deez Nuts. He actually got some attention from the media and made appearances on TV and everything.\n\nThe impact of Deez Nuts on our culture is pretty huge. It\'s become a thing that everyone knows and uses to add some humor to their everyday conversations. Plus, people have used it to make fun of politics and serious issues too. Like, in the UK, some groups of activists used the phrase "Deez Nuts for President" to encourage young people to vote in the EU referendum.\n\nThere you have it, a thousand words about Deez Nuts in a more casual tone. Can I go back to playing video games now? </s>'}
  1. ## There is a dependency during training
  2. !pip install tensorboardX

3. 开始微调

使用以下代码设置训练参数:

  1. # 训练参数
  2. train_params = TrainingArguments(
  3. output_dir="./results_modified",
  4. num_train_epochs=1,
  5. per_device_train_batch_size=4,
  6. gradient_accumulation_steps=1,
  7. optim="paged_adamw_32bit",
  8. save_steps=50,
  9. logging_steps=50,
  10. learning_rate=2e-4,
  11. weight_decay=0.001,
  12. fp16=False,
  13. bf16=False,
  14. max_grad_norm=0.3,
  15. max_steps=-1,
  16. warmup_ratio=0.03,
  17. group_by_length=True,
  18. lr_scheduler_type="constant",
  19. report_to="tensorboard"
  20. )
使用QLoRA配置训练

现在您可以将LoRA集成到基准模型中,并评估其附加参数。LoRA实质上是向现有权重中添加了一对秩分解权重矩阵(称为更新矩阵),并且只训练新增加的权重。

  1. from peft import get_peft_model
  2. # LoRA配置
  3. peft_parameters = LoraConfig(
  4. lora_alpha=8,
  5. lora_dropout=0.1,
  6. r=8,
  7. bias="none",
  8. task_type="CAUSAL_LM"
  9. )
  10. model = get_peft_model(base_model, peft_parameters)
  11. model.print_trainable_parameters()
    trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06220594176090199

注意LoRA只增加了0.062%的参数,这只是原始模型的一小部分。我们将通过微调更新这一百分比的参数,如下所示。

  1. # 带有QLoRA配置的Trainer
  2. fine_tuning = SFTTrainer(
  3. model=base_model,
  4. train_dataset=training_data,
  5. peft_config=peft_parameters,
  6. dataset_text_field="text",
  7. tokenizer=llama_tokenizer,
  8. args=train_params
  9. )
  10. # 训练
  11. fine_tuning.train()

输出看起来像这样:

  1. [250/250 05:31, Epoch 1/1]\
  2. Step Training Loss \
  3. 50 1.557800 \
  4. 100 1.348100\
  5. 150 1.277000\
  6. 200 1.324300\
  7. 250 1.347700
  8. TrainOutput(global_step=250, training_loss=1.3709784088134767, metrics={'train_runtime': 335.085, 'train_samples_per_second': 2.984, 'train_steps_per_second': 0.746, 'total_flos': 8679674339426304.0, 'train_loss': 1.3709784088134767, 'epoch': 1.0})
使用QLoRA训练期间检查内存使用

在训练期间,您可以在终端使用“rocm-smi”命令来检查内存使用情况。该命令将产生以下输出,它告诉了内存和GPU的使用情况。

  1. ========================= ROCm System Management Interface =========================
  2. =================================== Concise Info ===================================
  3. GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
  4. 0 50.0c 352.0W 1700Mhz 1600Mhz 0% auto 560.0W 17% 100%
  5. ====================================================================================
  6. =============================== End of ROCm SMI Log ================================

为了更全面地理解QLoRA对训练的影响,我们将进行量化分析,比较QLoRA、LoRA和完整参数微调。这项分析将包括内存使用、训练速度、训练损失和其他相关指标,提供它们各自影响的全面评估。

4. QLoRA、LoRA和全参数微调的比较

我们将在前一篇有关如何使用LoRA微调Llama 2模型的博客文章的基础上——该文章展示了用LoRA和全参数方法微调Llama 2模型——增加QLoRA的结果。这旨在提供一个全面概述,结合了使用这三种微调方法所获得的见解。

Metric

Full-parameter

LoRA

QLoRA

Trainable parameters

6,738,415,616

4,194,304

4,194,304

Mem usage/GB

128

83.2

10.88

Number of GCDs

2

2

1

Training Speed

3 hours

9 minutes

6 minutes

Training Loss

1.368

1.377

1.347

• 内存使用量:
    ◦ 在全参数微调的情况下,有 6,738,415,616 个可训练参数,导致在训练反向传播阶段期间内存消耗显著。
    ◦ 相比之下,LoRA和QLoRA只引入了 4,194,304 个可训练参数,仅占全参数微调中总可训练参数的 *0.062%*。
    ◦ 当监控训练期间的内存使用时,很明显,使用LoRA进行微调仅使用了全参数微调内存使用量的65%。而QLoRA的表现更为出色,将内存消耗大幅降低到只有8%。
    ◦ 这为在有限的硬件资源限制下增加批量大小、最大序列长度和在更大的数据集上进行训练提供了机会。
• 训练速度:
    ◦ 结果表明,全参数微调需要 几小时 才能完成,而LoRA和QLoRA的微调仅需 *几分钟*。
    ◦ 训练速度加快的几个因素包括:
        ▪ LoRA中更少的可训练参数意味着更少的导数计算以及更少的存储和更新权重所需的内存。

        ▪ 全参数微调更容易受到内存限制的制约,在数据移动成为训练瓶颈时。这反映在更低的GPU利用率上。虽然调整训练设置可以缓解这一点,但可能需要更多的资源(额外的GPU)和更小的批量大小。
• 准确度:
    ◦ 在两次训练会话中,都观察到了训练损失的显著降低。我们对于三种微调方法都达到了相差无几的训练损失。
    ◦ 在QLoRA的原始研究中,作者提到了由于量化不精确而导致的性能损失,可以通过量化后的适配器微调完全恢复。与这个见解一致,我们的实验验证并呼应了这一观察,强调了在量化过程后恢复性能的适配器微调的有效性。 

5. 使用QLoRA微调后的模型进行测试

  1. # 以FP16重新加载模型,并与微调后的权重合并
  2. base_model = AutoModelForCausalLM.from_pretrained(
  3. base_model_name,
  4. low_cpu_mem_usage=True,
  5. return_dict=True,
  6. torch_dtype=torch.float16,
  7. device_map="auto"
  8. )
  9. from peft import LoraConfig, PeftModel
  10. model = PeftModel.from_pretrained(base_model, new_model_name)
  11. model = model.merge_and_unload()
  12. # 重新加载分词器以便保存
  13. tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
  14. tokenizer.pad_token = tokenizer.eos_token
  15. tokenizer.padding_side = "right"

现在,让我们上传模型到Hugging Face,这样我们就可以进行后续测试或与他人分享。要进行此步骤,您需要一个有效的Hugging Face账户。

现在我们可以使用基础模型(原始的)和微调后的模型进行测试。

测试基础模型
  1. # Generate Text using base model
  2. query = "What do you think is the most important part of building an AI chatbot?"
  3. text_gen = pipeline(task="text-generation", model=base_model_name, tokenizer=llama_tokenizer, max_length=200)
  4. output = text_gen(f"<s>[INST] {query} [/INST]")
  5. print(output[0]['generated_text'])
测试微调后的模型
  1. # Generate Text using fine-tuned model
  2. query = "What do you think is the most important part of building an AI chatbot?"
  3. text_gen = pipeline(task="text-generation", model=new_model_name, tokenizer=llama_tokenizer, max_length=200)
  4. output = text_gen(f"<s>[INST] {query} [/INST]")
  5. print(output[0]['generated_text'])
  1. <s>[INST] What do you think is the most important part of building an AI chatbot? [/INST] The most important part of building an AI chatbot is to ensure that it is able to understand and respond to user input in a way that is both accurate and natural-sounding.
  2. To achieve this, you will need to use a combination of natural language processing (NLP) techniques and machine learning algorithms to enable the chatbot to understand and interpret user input, and to generate appropriate responses.
  3. Some of the key considerations when building an AI chatbot include:
  4. 1. Defining the scope and purpose of the chatbot: What kind of tasks or questions will the chatbot be able to handle? What kind of user input will it be able to understand?
  5. 2. Choosing the right NLP and machine learning algorithms: There are many different NLP and machine learning algorithms available, and the right ones will depend on the

现在您可以根据给定的查询观察两个模型的输出。正如预期的那样,由于微调过程改变了模型的权重,两个输出显示了微小的差异。

运行结果:

Step 1: Getting started
Our first step is to confirm the availability of GPU.

!rocm-smi --showproductname
  1. ============================ ROCm System Management Interface ============================
  2. ====================================== Product Info ======================================
  3. GPU[0] : Card Series: 0x7448
  4. GPU[0] : Card Model: 0x7448
  5. GPU[0] : Card Vendor: Advanced Micro Devices, Inc. [AMD/ATI]
  6. GPU[0] : Card SKU: D7070100
  7. GPU[0] : Subsystem ID: 0x0e0d
  8. GPU[0] : Device Rev: 0x00
  9. GPU[0] : Node ID: 1
  10. GPU[0] : GUID: 19246
  11. GPU[0] : GFX Version: gfx11000
  12. ==========================================================================================
  13. ================================== End of ROCm SMI Log ===================================

Let's use only one Graphics Compute Die (GCD) or GPU, in case you have more than one GCDs or GPUs on your AMD machine.

  1. import os
  2. os.environ["HIP_VISIBLE_DEVICES"]="0"
  3. import torch
  4. use_cuda = torch.cuda.is_available()
  5. if use_cuda:
  6. print('__CUDNN VERSION:', torch.backends.cudnn.version())
  7. print('__Number CUDA Devices:', torch.cuda.device_count())
  8. cunt = torch.cuda.device_count()
  1. __CUDNN VERSION: 3000000
  2. __Number CUDA Devices: 1

We will start by installing the required libraries.
 

!pip install -q pandas peft==0.9.0 transformers==4.31.0 trl==0.4.7 accelerate scipy
  1. [notice] A new release of pip is available: 24.0 -> 24.1.1
  2. [notice] To update, run: python3 -m pip install --upgrade pip

Installing bitsandbytes

ROCm needs a special version of bitsandbytes (bitsandbytes-rocm).

    Install bitsandbytes using the following code.

  1. %%bash
  2. git clone --recurse https://github.com/ROCm/bitsandbytes
  3. cd bitsandbytes
  4. git checkout rocm_enabled
  5. pip install -r requirements-dev.txt
  6. cmake -DCOMPUTE_BACKEND=hip -S . #Use -DBNB_ROCM_ARCH="gfx90a;gfx942" to target specific gpu arch
  7. make
  8. pip install .
  1. %%bash
  2. git clone --recurse https://github.com/ROCm/bitsandbytes
  3. cd bitsandbytes
  4. git checkout rocm_enabled
  5. pip install -r requirements-dev.txt
  6. cmake -DCOMPUTE_BACKEND=hip -S . #Use -DBNB_ROCM_ARCH="gfx90a;gfx942" to target specific gpu arch
  7. make
  8. pip install .
  9. 正克隆到 'bitsandbytes'...
  10. 已经位于 'rocm_enabled'
  11. 您的分支与上游分支 'origin/rocm_enabled' 一致。
  12. Defaulting to user installation because normal site-packages is not writeable
  13. Requirement already satisfied: setuptools>=63 in /usr/local/lib/python3.10/dist-packages (from -r requirements-dev.txt (line 2)) (70.1.0)
  14. Requirement already satisfied: pytest~=8.2.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements-dev.txt (line 3)) (8.2.2)
  15. Requirement already satisfied: einops~=0.8.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements-dev.txt (line 4)) (0.8.0)
  16. Requirement already satisfied: wheel~=0.43.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements-dev.txt (line 5)) (0.43.0)
  17. Requirement already satisfied: lion-pytorch~=0.1.4 in /usr/local/lib/python3.10/dist-packages (from -r requirements-dev.txt (line 6)) (0.1.4)
  18. Requirement already satisfied: scipy~=1.13.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements-dev.txt (line 7)) (1.13.1)
  19. Requirement already satisfied: pandas~=2.2.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements-dev.txt (line 8)) (2.2.2)
  20. Requirement already satisfied: matplotlib~=3.8.4 in /usr/local/lib/python3.10/dist-packages (from -r requirements-dev.txt (line 9)) (3.8.4)
  21. Requirement already satisfied: iniconfig in /usr/local/lib/python3.10/dist-packages (from pytest~=8.2.0->-r requirements-dev.txt (line 3)) (2.0.0)
  22. Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from pytest~=8.2.0->-r requirements-dev.txt (line 3)) (24.0)
  23. Requirement already satisfied: pluggy<2.0,>=1.5 in /usr/local/lib/python3.10/dist-packages (from pytest~=8.2.0->-r requirements-dev.txt (line 3)) (1.5.0)
  24. Requirement already satisfied: exceptiongroup>=1.0.0rc8 in /usr/local/lib/python3.10/dist-packages (from pytest~=8.2.0->-r requirements-dev.txt (line 3)) (1.2.1)
  25. Requirement already satisfied: tomli>=1 in /usr/local/lib/python3.10/dist-packages (from pytest~=8.2.0->-r requirements-dev.txt (line 3)) (2.0.1)
  26. Requirement already satisfied: torch>=1.6 in /usr/local/lib/python3.10/dist-packages (from lion-pytorch~=0.1.4->-r requirements-dev.txt (line 6)) (2.4.0.dev20240424+rocm6.0)
  27. Requirement already satisfied: numpy<2.3,>=1.22.4 in /home/yong/.local/lib/python3.10/site-packages (from scipy~=1.13.0->-r requirements-dev.txt (line 7)) (1.23.5)
  28. Requirement already satisfied: python-dateutil>=2.8.2 in /home/yong/.local/lib/python3.10/site-packages (from pandas~=2.2.2->-r requirements-dev.txt (line 8)) (2.9.0.post0)
  29. Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas~=2.2.2->-r requirements-dev.txt (line 8)) (2022.1)
  30. Requirement already satisfied: tzdata>=2022.7 in /home/yong/.local/lib/python3.10/site-packages (from pandas~=2.2.2->-r requirements-dev.txt (line 8)) (2024.1)
  31. Requirement already satisfied: contourpy>=1.0.1 in /home/yong/.local/lib/python3.10/site-packages (from matplotlib~=3.8.4->-r requirements-dev.txt (line 9)) (1.2.1)
  32. Requirement already satisfied: cycler>=0.10 in /home/yong/.local/lib/python3.10/site-packages (from matplotlib~=3.8.4->-r requirements-dev.txt (line 9)) (0.12.1)
  33. Requirement already satisfied: fonttools>=4.22.0 in /home/yong/.local/lib/python3.10/site-packages (from matplotlib~=3.8.4->-r requirements-dev.txt (line 9)) (4.51.0)
  34. Requirement already satisfied: kiwisolver>=1.3.1 in /home/yong/.local/lib/python3.10/site-packages (from matplotlib~=3.8.4->-r requirements-dev.txt (line 9)) (1.4.5)
  35. Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.8.4->-r requirements-dev.txt (line 9)) (9.3.0)
  36. Requirement already satisfied: pyparsing>=2.3.1 in /usr/lib/python3/dist-packages (from matplotlib~=3.8.4->-r requirements-dev.txt (line 9)) (2.4.7)
  37. Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas~=2.2.2->-r requirements-dev.txt (line 8)) (1.16.0)
  38. Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.6->lion-pytorch~=0.1.4->-r requirements-dev.txt (line 6)) (3.13.1)
  39. Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6->lion-pytorch~=0.1.4->-r requirements-dev.txt (line 6)) (4.8.0)
  40. Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.6->lion-pytorch~=0.1.4->-r requirements-dev.txt (line 6)) (1.12)
  41. Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.6->lion-pytorch~=0.1.4->-r requirements-dev.txt (line 6)) (3.2.1)
  42. Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6->lion-pytorch~=0.1.4->-r requirements-dev.txt (line 6)) (3.1.3)
  43. Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch>=1.6->lion-pytorch~=0.1.4->-r requirements-dev.txt (line 6)) (2024.2.0)
  44. Requirement already satisfied: pytorch-triton-rocm==3.0.0+0a22a91d04 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6->lion-pytorch~=0.1.4->-r requirements-dev.txt (line 6)) (3.0.0+0a22a91d04)
  45. Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.6->lion-pytorch~=0.1.4->-r requirements-dev.txt (line 6)) (2.1.5)
  46. Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.6->lion-pytorch~=0.1.4->-r requirements-dev.txt (line 6)) (1.2.1)
  47. [notice] A new release of pip is available: 24.0 -> 24.1.1
  48. [notice] To update, run: python3 -m pip install --upgrade pip
  49. -- The CXX compiler identification is GNU 11.4.0
  50. -- Detecting CXX compiler ABI info
  51. -- Detecting CXX compiler ABI info - done
  52. -- Check for working CXX compiler: /usr/bin/c++ - skipped
  53. -- Detecting CXX compile features
  54. -- Detecting CXX compile features - done
  55. -- Configuring bitsandbytes (Backend: hip)
  56. -- NO_CUBLASLT := OFF
  57. -- The HIP compiler identification is Clang 17.0.0
  58. -- Detecting HIP compiler ABI info
  59. -- Detecting HIP compiler ABI info - done
  60. -- Check for working HIP compiler: /home/yong/llvm/bin/clang++ - skipped
  61. -- Detecting HIP compile features
  62. -- Detecting HIP compile features - done
  63. -- HIP Compiler: /home/yong/llvm/bin/clang++
  64. -- HIP Targets: gfx900;gfx906;gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx940;gfx941;gfx942
  65. -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  66. -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
  67. -- Found Threads: TRUE
  68. hipblas VERSION: 2.1.0
  69. hiprand VERSION: 2.10.16
  70. hipsparse VERSION: 3.0.1
  71. -- Configuring done (8.2s)
  72. -- Generating done (0.0s)
  73. -- Build files have been written to: /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes
  74. [ 16%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/common.cpp.o
  75. [ 33%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/cpu_ops.cpp.o
  76. [ 50%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/pythonInterface.cpp.o
  77. [ 66%] Building HIP object CMakeFiles/bitsandbytes.dir/csrc/ops.hip.o
  78. [ 83%] Building HIP object CMakeFiles/bitsandbytes.dir/csrc/kernels.hip.o
  79. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  80. __global__ void kspmm_coo_very_sparse_naive(int *max_count, int *max_idx, int *offset_rowidx, int *rowidx, int *colidx, half *values, T *B, half *out, float * __restrict__ const dequant_stats, int nnz, int rowsA, int rowsB, int colsB)
  81. ^
  82. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  83. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  84. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  85. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  86. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  87. 6 warnings generated when compiling for gfx1030.
  88. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  89. __global__ void kspmm_coo_very_sparse_naive(int *max_count, int *max_idx, int *offset_rowidx, int *rowidx, int *colidx, half *values, T *B, half *out, float * __restrict__ const dequant_stats, int nnz, int rowsA, int rowsB, int colsB)
  90. ^
  91. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  92. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  93. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  94. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  95. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  96. 6 warnings generated when compiling for gfx1100.
  97. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  98. __global__ void kspmm_coo_very_sparse_naive(int *max_count, int *max_idx, int *offset_rowidx, int *rowidx, int *colidx, half *values, T *B, half *out, float * __restrict__ const dequant_stats, int nnz, int rowsA, int rowsB, int colsB)
  99. ^
  100. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  101. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  102. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  103. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  104. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  105. 6 warnings generated when compiling for gfx1101.
  106. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  107. __global__ void kspmm_coo_very_sparse_naive(int *max_count, int *max_idx, int *offset_rowidx, int *rowidx, int *colidx, half *values, T *B, half *out, float * __restrict__ const dequant_stats, int nnz, int rowsA, int rowsB, int colsB)
  108. ^
  109. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  110. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  111. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  112. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  113. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  114. 6 warnings generated when compiling for gfx900.
  115. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  116. __global__ void kspmm_coo_very_sparse_naive(int *max_count, int *max_idx, int *offset_rowidx, int *rowidx, int *colidx, half *values, T *B, half *out, float * __restrict__ const dequant_stats, int nnz, int rowsA, int rowsB, int colsB)
  117. ^
  118. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  119. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  120. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  121. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  122. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  123. 6 warnings generated when compiling for gfx906.
  124. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  125. __global__ void kspmm_coo_very_sparse_naive(int *max_count, int *max_idx, int *offset_rowidx, int *rowidx, int *colidx, half *values, T *B, half *out, float * __restrict__ const dequant_stats, int nnz, int rowsA, int rowsB, int colsB)
  126. ^
  127. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  128. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  129. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  130. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  131. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  132. 6 warnings generated when compiling for gfx908.
  133. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  134. __global__ void kspmm_coo_very_sparse_naive(int *max_count, int *max_idx, int *offset_rowidx, int *rowidx, int *colidx, half *values, T *B, half *out, float * __restrict__ const dequant_stats, int nnz, int rowsA, int rowsB, int colsB)
  135. ^
  136. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  137. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  138. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  139. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  140. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  141. 6 warnings generated when compiling for gfx90a.
  142. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  143. __global__ void kspmm_coo_very_sparse_naive(int *max_count, int *max_idx, int *offset_rowidx, int *rowidx, int *colidx, half *values, T *B, half *out, float * __restrict__ const dequant_stats, int nnz, int rowsA, int rowsB, int colsB)
  144. ^
  145. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  146. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  147. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  148. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  149. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  150. 6 warnings generated when compiling for gfx940.
  151. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  152. __global__ void kspmm_coo_very_sparse_naive(int *max_count, int *max_idx, int *offset_rowidx, int *rowidx, int *colidx, half *values, T *B, half *out, float * __restrict__ const dequant_stats, int nnz, int rowsA, int rowsB, int colsB)
  153. ^
  154. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  155. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  156. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  157. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  158. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  159. 6 warnings generated when compiling for gfx941.
  160. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  161. __global__ void kspmm_coo_very_sparse_naive(int *max_count, int *max_idx, int *offset_rowidx, int *rowidx, int *colidx, half *values, T *B, half *out, float * __restrict__ const dequant_stats, int nnz, int rowsA, int rowsB, int colsB)
  162. ^
  163. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  164. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  165. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  166. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  167. /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes/csrc/kernels.hip:2758:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
  168. 6 warnings generated when compiling for gfx942.
  169. [100%] Linking CXX shared library bitsandbytes/libbitsandbytes_hip_nohipblaslt.so
  170. [100%] Built target bitsandbytes
  171. Defaulting to user installation because normal site-packages is not writeable
  172. Processing /home/yong/rocm-blogs/blogs/artificial-intelligence/llama2-Qlora/src/bitsandbytes
  173. Installing build dependencies: started
  174. Installing build dependencies: finished with status 'done'
  175. Getting requirements to build wheel: started
  176. Getting requirements to build wheel: finished with status 'done'
  177. Preparing metadata (pyproject.toml): started
  178. Preparing metadata (pyproject.toml): finished with status 'done'
  179. Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from bitsandbytes==0.43.2.dev0) (2.4.0.dev20240424+rocm6.0)
  180. Requirement already satisfied: numpy in /home/yong/.local/lib/python3.10/site-packages (from bitsandbytes==0.43.2.dev0) (1.23.5)
  181. Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes==0.43.2.dev0) (3.13.1)
  182. Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes==0.43.2.dev0) (4.8.0)
  183. Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes==0.43.2.dev0) (1.12)
  184. Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes==0.43.2.dev0) (3.2.1)
  185. Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes==0.43.2.dev0) (3.1.3)
  186. Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes==0.43.2.dev0) (2024.2.0)
  187. Requirement already satisfied: pytorch-triton-rocm==3.0.0+0a22a91d04 in /usr/local/lib/python3.10/dist-packages (from torch->bitsandbytes==0.43.2.dev0) (3.0.0+0a22a91d04)
  188. Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->bitsandbytes==0.43.2.dev0) (2.1.5)
  189. Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->bitsandbytes==0.43.2.dev0) (1.2.1)
  190. Building wheels for collected packages: bitsandbytes
  191. Building wheel for bitsandbytes (pyproject.toml): started
  192. Building wheel for bitsandbytes (pyproject.toml): finished with status 'done'
  193. Created wheel for bitsandbytes: filename=bitsandbytes-0.43.2.dev0-cp310-cp310-linux_x86_64.whl size=2686797 sha256=77f00c80c2ed346bf1e74a841cad65a3fbca9faecb0114ed04d1129b175e4235
  194. Stored in directory: /tmp/pip-ephem-wheel-cache-ee2fao5j/wheels/d9/69/ae/c69fa5ff700a71615645b1dec2bd1bb1eb18b0c50110020499
  195. Successfully built bitsandbytes
  196. Installing collected packages: bitsandbytes
  197. Successfully installed bitsandbytes-0.43.2.dev0
  198. [notice] A new release of pip is available: 24.0 -> 24.1.1
  199. [notice] To update, run: python3 -m pip install --upgrade pip

Check the bitsandbytes version. At the time of writing this blog, the version is 0.43.0.

  1. %%bash
  2. pip list | grep bitsandbytes
  1. [notice] A new release of pip is available: 24.0 -> 24.1.1
  2. [notice] To update, run: python3 -m pip install --upgrade pip
  3. bitsandbytes 0.43.2.dev0

    Import the required packages.

  1. import torch
  2. from datasets import load_dataset
  3. from transformers import (
  4. AutoModelForCausalLM,
  5. AutoTokenizer,
  6. BitsAndBytesConfig,
  7. TrainingArguments,
  8. pipeline
  9. )
  10. from peft import LoraConfig
  11. from trl import SFTTrainer

2. Configuring the model and data

Model configuration

You can access Meta's official Llama-2 model from Hugging Face after making a request, which can take a couple of days. Instead of waiting, we'll use NousResearch’s Llama-2-7b-chat-hf as our base model (it's the same as the original, but quicker to access).

  1. # Model and tokenizer names
  2. base_model_name = "/home/yong/Llama-2-7b-chat-hf"
  3. new_model_name = "llama-2-7b-enhanced" #You can give your own name for fine tuned model
  4. # Tokenizer
  5. llama_tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
  6. llama_tokenizer.pad_token = llama_tokenizer.eos_token
  7. llama_tokenizer.padding_side = "right"

QLoRA 4-bit quantization configuration

As outlined in the paper, QLoRA stores weights in 4-bits, allowing computation to occur in 16 or 32-bit precision. This means whenever a QLoRA weight tensor is used, we dequantize the tensor to 16 or 32-bit precision, and then perform a matrix multiplication. Various combinations, such as float16, bfloat16, float32, etc., can be chosen. Experimentation with different 4-bit quantization variants, including normalized float 4 (NF4), or pure float4 quantization, is possible. However, guided by theoretical considerations and empirical findings from the paper, the recommendation is to opt for NF4 quantization, as it tends to deliver better performance.

In our case, we chose the following configuration:

    4-bit quantization with NF4 type
    16-bit (float16) for computation
    Double quantization, which uses a second quantization after the first one to save an additional 0.3 bits per parameters

Quantization parameters are controlled from the BitsandbytesConfig (see Hugging Face documentation) as follows:

    Loading in 4 bits is activated through load_in_4bit
    The datatype used for quantization is specified with bnb_4bit_quant_type. Note that there are two supported quantization datatypes fp4 (four bit float) and nf4 (normal four bit float). The latter is theoretically optimal for normally distributed weights and we recommend using nf4
    The datatype used for the linear layer computations with bnb_4bit_compute_dtype
    Nested quantization is activated through bnb_4bit_use_double_quant

  1. # Quantization Config
  2. quant_config = BitsAndBytesConfig(
  3. load_in_4bit=True,
  4. bnb_4bit_quant_type="nf4",
  5. bnb_4bit_compute_dtype=torch.float16,
  6. bnb_4bit_use_double_quant=True
  7. )
  1. # Model
  2. base_model = AutoModelForCausalLM.from_pretrained(
  3. base_model_name,
  4. quantization_config=quant_config,
  5. device_map="auto"
  6. )
  7. base_model.config.use_cache = False
  8. base_model.config.pretraining_tp = 1
  1. Loading checkpoint shards: 100%
  2. 2/2 [01:07<00:00, 31.14s/it]

Dataset configuration

We fine-tune our base model for a question-and-answer task using a small data set called mlabonne/guanaco-llama2-1k, which is a subset (1,000 samples) of the timdettmers/openassistant-guanaco data set. This data set is a human-generated, human-annotated, assistant-style conversation corpus that contains 161,443 messages in 35 different languages, annotated with 461,292 quality ratings. This results in over 10,000 fully annotated conversation trees.
 

  1. # Dataset
  2. data_name = "/home/yong/guanaco-llama2-1k"
  3. training_data = load_dataset(data_name, split="train")
  4. # check the data
  5. print(training_data.shape)
  6. # #11 is a QA sample in English
  7. print(training_data[11])
  1. (1000, 1)
  2. {'text': '<s>[INST] write me a 1000 words essay about deez nuts. [/INST] The Deez Nuts meme first gained popularity in 2015 on the social media platform Vine. The video featured a young man named Rodney Bullard, who recorded himself asking people if they had heard of a particular rapper. When they responded that they had not, he would respond with the phrase "Deez Nuts" and film their reactions. The video quickly went viral, and the phrase became a popular meme. \n\nSince then, Deez Nuts has been used in a variety of contexts to interrupt conversations, derail discussions, or simply add humor to a situation. It has been used in internet memes, in popular music, and even in politics. In the 2016 US presidential election, a 15-year-old boy named Brady Olson registered as an independent candidate under the name Deez Nuts. He gained some traction in the polls and even made appearances on national news programs.\n\nThe Deez Nuts meme has had a significant impact on popular culture. It has become a recognizable catchphrase that people use to add humor to everyday conversations. The meme has also been used to satirize politics and other serious issues. For example, in 2016, a group of activists in the UK used the phrase "Deez Nuts for President" as part of a campaign to encourage young people to vote in the EU referendum. </s><s>[INST] Rewrite the essay in a more casual way. Instead of sounding proffesional, sound like a college student who is forced to write the essay but refuses to do so in the propper way. Use casual words and slang when possible. [/INST] Yo, so you want me to write a 1000-word essay about Deez Nuts? Alright, fine. So, this whole thing started on Vine back in 2015. Some dude named Rodney Bullard made a video where he would ask people if they knew a rapper, and when they said no, he would hit them with the classic line: "Deez Nuts!" People loved it, and it became a viral meme.\n\nNowadays, Deez Nuts is used for all kinds of stuff. You can throw it out there to interrupt someone or just to be funny. It\'s all over the internet, in music, and even in politics. In fact, during the 2016 US presidential election, a kid named Brady Olson registered as an independent candidate under the name Deez Nuts. He actually got some attention from the media and made appearances on TV and everything.\n\nThe impact of Deez Nuts on our culture is pretty huge. It\'s become a thing that everyone knows and uses to add some humor to their everyday conversations. Plus, people have used it to make fun of politics and serious issues too. Like, in the UK, some groups of activists used the phrase "Deez Nuts for President" to encourage young people to vote in the EU referendum.\n\nThere you have it, a thousand words about Deez Nuts in a more casual tone. Can I go back to playing video games now? </s>'}
  1. ## There is a dependency during training
  2. !pip install tensorboardX
  1. ## There is a dependency during training
  2. !pip install tensorboardX
  3. Defaulting to user installation because normal site-packages is not writeable
  4. Requirement already satisfied: tensorboardX in /usr/local/lib/python3.10/dist-packages (2.6.2.2)
  5. Requirement already satisfied: numpy in /home/yong/.local/lib/python3.10/site-packages (from tensorboardX) (1.23.5)
  6. Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from tensorboardX) (24.0)
  7. Requirement already satisfied: protobuf>=3.20 in /usr/local/lib/python3.10/dist-packages (from tensorboardX) (3.20.2)
  8. [notice] A new release of pip is available: 24.0 -> 24.1.1
  9. [notice] To update, run: python3 -m pip install --upgrade pip

3. Start fine-tuning

To set your training parameters, use the following code:
 

  1. # Training Params
  2. train_params = TrainingArguments(
  3. output_dir="./results_modified",
  4. num_train_epochs=1,
  5. per_device_train_batch_size=4,
  6. gradient_accumulation_steps=1,
  7. optim="paged_adamw_32bit",
  8. save_steps=50,
  9. logging_steps=50,
  10. learning_rate=2e-4,
  11. weight_decay=0.001,
  12. fp16=False,
  13. bf16=False,
  14. max_grad_norm=0.3,
  15. max_steps=-1,
  16. warmup_ratio=0.03,
  17. group_by_length=True,
  18. lr_scheduler_type="constant",
  19. report_to="tensorboard"
  20. )

Training with QLoRA configuration
Now you can integrate LoRA into the base model and assess its additional parameters. LoRA essentially adds pairs of rank-decomposition weight matrices (called update matrices) to existing weights, and only trains the newly added weights.
 

  1. from peft import get_peft_model
  2. # LoRA Config
  3. peft_parameters = LoraConfig(
  4. lora_alpha=8,
  5. lora_dropout=0.1,
  6. r=8,
  7. bias="none",
  8. task_type="CAUSAL_LM"
  9. )
  10. model = get_peft_model(base_model, peft_parameters)
  11. model.print_trainable_parameters()
trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06220594176090199

Note that there are only 0.062% parameters added by LoRA, which is a tiny portion of the original model. This is the percentage we'll update through fine-tuning, as follows.
 

  1. # Trainer with QLoRA configuration
  2. fine_tuning = SFTTrainer(
  3. model=base_model,
  4. train_dataset=training_data,
  5. peft_config=peft_parameters,
  6. dataset_text_field="text",
  7. tokenizer=llama_tokenizer,
  8. args=train_params
  9. )
  10. # Training
  11. fine_tuning.train()
  1. /usr/local/lib/python3.10/dist-packages/peft/utils/other.py:145: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead.
  2. warnings.warn(
  3. /usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:159: UserWarning: You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to 1024
  4. warnings.warn(
  5. Map: 100%
  6.  1000/1000 [00:00<00:00, 4212.19 examples/s]
  7. You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  8. /usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py:36: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
  9. return fn(*args, **kwargs)
  10. [250/250 18:43, Epoch 1/1]
  11. Step Training Loss
  12. 50 1.566100
  13. 100 1.349400
  14. 150 1.278100
  15. 200 1.325100
  16. 250 1.347800
  17. /usr/local/lib/python3.10/dist-packages/peft/utils/save_and_load.py:154: UserWarning: Could not find a config file in /home/yong/Llama-2-7b-chat-hf - will assume that the vocabulary was not modified.
  18. warnings.warn(
  19. /usr/local/lib/python3.10/dist-packages/peft/utils/save_and_load.py:154: UserWarning: Could not find a config file in /home/yong/Llama-2-7b-chat-hf - will assume that the vocabulary was not modified.
  20. warnings.warn(
  21. /usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py:36: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
  22. return fn(*args, **kwargs)
  23. /usr/local/lib/python3.10/dist-packages/peft/utils/save_and_load.py:154: UserWarning: Could not find a config file in /home/yong/Llama-2-7b-chat-hf - will assume that the vocabulary was not modified.
  24. warnings.warn(
  25. /usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py:36: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
  26. return fn(*args, **kwargs)
  27. /usr/local/lib/python3.10/dist-packages/peft/utils/save_and_load.py:154: UserWarning: Could not find a config file in /home/yong/Llama-2-7b-chat-hf - will assume that the vocabulary was not modified.
  28. warnings.warn(
  29. /usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py:36: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
  30. return fn(*args, **kwargs)
  31. /usr/local/lib/python3.10/dist-packages/peft/utils/save_and_load.py:154: UserWarning: Could not find a config file in /home/yong/Llama-2-7b-chat-hf - will assume that the vocabulary was not modified.
  32. warnings.warn(
  33. /usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py:36: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
  34. return fn(*args, **kwargs)
  35. /usr/local/lib/python3.10/dist-packages/peft/utils/save_and_load.py:154: UserWarning: Could not find a config file in /home/yong/Llama-2-7b-chat-hf - will assume that the vocabulary was not modified.
  36. warnings.warn(
  37. TrainOutput(global_step=250, training_loss=1.373295639038086, metrics={'train_runtime': 1135.0078, 'train_samples_per_second': 0.881, 'train_steps_per_second': 0.22, 'total_flos': 8679674339426304.0, 'train_loss': 1.373295639038086, 'epoch': 1.0})
  1. # Save Model
  2. fine_tuning.model.save_pretrained(new_model_name)

5. Test the fine-tuned model with LoRA

  1. # Reload model in FP16 and merge it with LoRA weights
  2. base_model = AutoModelForCausalLM.from_pretrained(
  3. base_model_name,
  4. low_cpu_mem_usage=True,
  5. return_dict=True,
  6. torch_dtype=torch.float16,
  7. device_map="auto"
  8. )
  9. from peft import LoraConfig, PeftModel
  10. model = PeftModel.from_pretrained(base_model, new_model_name)
  11. model = model.merge_and_unload()
  12. # Reload tokenizer to save it
  13. tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
  14. tokenizer.pad_token = tokenizer.eos_token
  15. tokenizer.padding_side = "right"
  1. Loading checkpoint shards: 100%
  2. 2/2 [00:04<00:00,  2.13s/it]

Test the base model

  1. # Generate Text using base model
  2. query = "What do you think is the most important part of building an AI chatbot?"
  3. text_gen = pipeline(task="text-generation", model=base_model_name, tokenizer=llama_tokenizer, max_length=200)
  4. output = text_gen(f"<s>[INST] {query} [/INST]")
  5. print(output[0]['generated_text'])
  1. Loading checkpoint shards: 100%
  2. 2/2 [00:03<00:00,  1.73s/it]
  3. Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
  4. pip install xformers.
  5. /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1270: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
  6. warnings.warn(
  7. <s>[INST] What do you think is the most important part of building an AI chatbot? [/INST] There are several important aspects to consider when building an AI chatbot, but here are some of the most critical elements:
  8. 1. Natural Language Processing (NLP): A chatbot's ability to understand and interpret human language is crucial for effective communication. NLP is the foundation of any chatbot, and it involves training the AI model to recognize patterns in language, interpret meaning, and generate responses.
  9. 2. Conversational Flow: A chatbot's conversational flow refers to the way it interacts with users. A well-designed conversational flow should be intuitive, easy to follow, and adaptable to different user scenarios. This involves creating a dialogue flowchart that guides the conversation and ensures the chatbot responds appropriately to user inputs.
  10. 3. Domain Knowledge: A chat

Test the fine-tuned model

  1. # Generate Text using fine-tuned model
  2. query = "What do you think is the most important part of building an AI chatbot?"
  3. text_gen = pipeline(task="text-generation", model=new_model_name, tokenizer=llama_tokenizer, max_length=200)
  4. output = text_gen(f"<s>[INST] {query} [/INST]")
  5. print(output[0]['generated_text'])
  1. ---------------------------------------------------------------------------
  2. OSError Traceback (most recent call last)
  3. Cell In[19], line 3
  4. 1 # Generate Text using fine-tuned model
  5. 2 query = "What do you think is the most important part of building an AI chatbot?"
  6. ----> 3 text_gen = pipeline(task="text-generation", model=new_model_name, tokenizer=llama_tokenizer, max_length=200)
  7. 4 output = text_gen(f"<s>[INST] {query} [/INST]")
  8. 5 print(output[0]['generated_text'])
  9. File /usr/local/lib/python3.10/dist-packages/transformers/pipelines/__init__.py:705, in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, use_auth_token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
  10. 703 hub_kwargs["_commit_hash"] = config._commit_hash
  11. 704 elif config is None and isinstance(model, str):
  12. --> 705 config = AutoConfig.from_pretrained(model, _from_pipeline=task, **hub_kwargs, **model_kwargs)
  13. 706 hub_kwargs["_commit_hash"] = config._commit_hash
  14. 708 custom_tasks = {}
  15. File /usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py:983, in AutoConfig.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
  16. 981 kwargs["name_or_path"] = pretrained_model_name_or_path
  17. 982 trust_remote_code = kwargs.pop("trust_remote_code", None)
  18. --> 983 config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  19. 984 has_remote_code = "auto_map" in config_dict and "AutoConfig" in config_dict["auto_map"]
  20. 985 has_local_code = "model_type" in config_dict and config_dict["model_type"] in CONFIG_MAPPING
  21. File /usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py:617, in PretrainedConfig.get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
  22. 615 original_kwargs = copy.deepcopy(kwargs)
  23. 616 # Get config dict associated with the base config file
  24. --> 617 config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  25. 618 if "_commit_hash" in config_dict:
  26. 619 original_kwargs["_commit_hash"] = config_dict["_commit_hash"]
  27. File /usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py:672, in PretrainedConfig._get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
  28. 668 configuration_file = kwargs.pop("_configuration_file", CONFIG_NAME)
  29. 670 try:
  30. 671 # Load from local folder or from cache or download from model Hub and cache
  31. --> 672 resolved_config_file = cached_file(
  32. 673 pretrained_model_name_or_path,
  33. 674 configuration_file,
  34. 675 cache_dir=cache_dir,
  35. 676 force_download=force_download,
  36. 677 proxies=proxies,
  37. 678 resume_download=resume_download,
  38. 679 local_files_only=local_files_only,
  39. 680 use_auth_token=use_auth_token,
  40. 681 user_agent=user_agent,
  41. 682 revision=revision,
  42. 683 subfolder=subfolder,
  43. 684 _commit_hash=commit_hash,
  44. 685 )
  45. 686 commit_hash = extract_commit_hash(resolved_config_file, commit_hash)
  46. 687 except EnvironmentError:
  47. 688 # Raise any environment error raise by `cached_file`. It will have a helpful error message adapted to
  48. 689 # the original exception.
  49. File /usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:388, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash)
  50. 386 if not os.path.isfile(resolved_file):
  51. 387 if _raise_exceptions_for_missing_entries:
  52. --> 388 raise EnvironmentError(
  53. 389 f"{path_or_repo_id} does not appear to have a file named {full_filename}. Checkout "
  54. 390 f"'https://huggingface.co/{path_or_repo_id}/{revision}' for available files."
  55. 391 )
  56. 392 else:
  57. 393 return None
  58. OSError: llama-2-7b-enhanced does not appear to have a file named config.json. Checkout 'https://huggingface.co/llama-2-7b-enhanced/None' for available files.

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/天景科技苑/article/detail/765466
推荐阅读
相关标签
  

闽ICP备14008679号