赞
踩
conda create --name llama_factory python=3.11
conda activate llama_factory
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip3 install -e ".[torch,metrics]"
# 如果要在 Windows 平台上开启量化 LoRA(QLoRA),需要安装预编译的 bitsandbytes 库
pip3 install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl
# 安装pytorch
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# 如启动报错,出现ImportError: cannot import name 'get_full_repo_name' from 'huggingface_hub',需安装chardet
pip3 install chardet
执行python src/webui.py
启动:
(1)准备样本数据集,如:
{
"db_id": "department_management",
"instruction": "I want you to act as a SQL terminal in front of an example database, you need only to return the sql command to me.Below is an instruction that describes a task, Write a response that appropriately completes the request.\n\"\n##Instruction:\ndepartment_management contains tables such as department, head, management. Table department has columns such as Department_ID, Name, Creation, Ranking, Budget_in_Billions, Num_Employees. Department_ID is the primary key.\nTable head has columns such as head_ID, name, born_state, age. head_ID is the primary key.\nTable management has columns such as department_ID, head_ID, temporary_acting. department_ID is the primary key.\nThe head_ID of management is the foreign key of head_ID of head.\nThe department_ID of management is the foreign key of Department_ID of department.\n\n",
"input": "###Input:\nHow many heads of the departments are older than 56 ?\n\n###Response:",
"output": "SELECT count(*) FROM head WHERE age > 56",
"history": []
}
(2)添加数据集。将数据集json文件复制到LLaMA-Factory/data目录下,在dataset_info.json中添加如下内容:
"text2sql_train": { "file_name": "text2sql_train.json", "columns": { "prompt": "instruction", "query": "input", "response": "output", "history": "history" } }, "text2sql_dev": { "file_name": "text2sql_dev.json", "columns": { "prompt": "instruction", "query": "input", "response": "output", "history": "history" } }
(3)配置llama-factory页面上的各参数,预览命令如下(根据自己的实际情况调整参数):
llamafactory-cli train ` --stage sft ` --do_train True ` --model_name_or_path D:\\LLM\\Qwen2-1.5B-Instruct ` --preprocessing_num_workers 16 ` --finetuning_type lora ` --quantization_method bitsandbytes ` --template qwen ` --flash_attn auto ` --dataset_dir D:\\python_project\\LLaMA-Factory\\data ` --dataset text2sql_train ` --cutoff_len 1024 ` --learning_rate 0.0001 ` --num_train_epochs 2.0 ` --max_samples 100000 ` --per_device_train_batch_size 2 ` --gradient_accumulation_steps 4 ` --lr_scheduler_type cosine ` --max_grad_norm 1.0 ` --logging_steps 20 ` --save_steps 500 ` --warmup_steps 0 ` --optim adamw_torch ` --packing False ` --report_to none ` --output_dir saves\Qwen2-1.5B-Chat\lora\train_2024-07-19-19-45-59 ` --bf16 True ` --plot_loss True ` --ddp_timeout 180000000 ` --include_num_input_tokens_seen True ` --lora_rank 8 ` --lora_alpha 16 ` --lora_dropout 0 ` --lora_target all
(4)微调后,对测试集数据进行推理评估:
{
"predict_bleu-4": 88.43791015473889,
"predict_rouge-1": 92.31425483558995,
"predict_rouge-2": 85.43010570599614,
"predict_rouge-l": 89.06327794970986,
"predict_runtime": 1027.4111,
"predict_samples_per_second": 1.006,
"predict_steps_per_second": 0.503
}
从以上的指标数据看,模型的效果还是挺不错的,大家可以基于自己的样本数据自行设置各个参数。以下是以上的评估指标解读:
1. predict_bleu-4: * BLEU(Bilingual Evaluation Understudy)是一种常用的用于评估机器翻译质量的指标。 * BLEU-4 表示四元语法 BLEU 分数,它衡量模型生成文本与参考文本之间的 n-gram 匹配程度,其中 n=4。 * 值越高表示生成的文本与参考文本越相似,最大值为 100。 2. predict_rouge-1 和 predict_rouge-2: * ROUGE(Recall-Oriented Understudy for Gisting Evaluation)是一种用于评估自动摘要和文本生成模型性能的指标。 * ROUGE-1 表示一元 ROUGE 分数,ROUGE-2 表示二元 ROUGE 分数,分别衡量模型生成文本与参考文本之间的单个词和双词序列的匹配程度。 * 值越高表示生成的文本与参考文本越相似,最大值为 100。 3. predict_rouge-l: * ROUGE-L 衡量模型生成文本与参考文本之间最长公共子序列(Longest Common Subsequence)的匹配程度。 * 值越高表示生成的文本与参考文本越相似,最大值为 100。 4. predict_runtime: * 预测运行时间,表示模型生成一批样本所花费的总时间。 * 单位通常为秒。 5. predict_samples_per_second: * 每秒生成的样本数量,表示模型每秒钟能够生成的样本数量。 * 通常用于评估模型的推理速度。 6. predict_steps_per_second: * 每秒执行的步骤数量,表示模型每秒钟能够执行的步骤数量。 * 对于生成模型,一般指的是每秒钟执行生成操作的次数。
(5)评估感觉达到了微调预期,可以直接导出微调后的模型(注意选上微调生成的检查点):
以上微调导出的是safetensors模型,我们可以使用llama.cpp将safetensors模型转为GGUF模型。推荐使用w64devkit+make方案部署llama.cpp。
下载地址:https://gnuwin32.sourceforge.net/packages/make.htm
点击框中的下载,下载安装后,把安装路径添加到环境变量PATH中。在终端,执行以下命令,将出现Make版本信息:
(llama_factory) D:\LLM\llama.cpp>make -v
GNU Make 3.81
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
This program built for i386-pc-mingw32
下载源文件,把mingw64/bin添加到环境变量PATH中。
下载地址:https://github.com/skeeto/w64devkit/releases
下载 w64devkit-fortran-1.23.0.zip后解压。注意:不要包含中文路径
。
下载地址:https://github.com/ggerganov/llama.cpp
使用我们刚刚下载的w64devkit.exe打开llama.cpp,然后make,成功后就能得到一堆exe文件啦。
~ $ cd D:
D:/ $ cd /LLM/llama.cpp
D:/LLM/llama.cpp $ make
make成功后,安装python依赖:
conda create --name llama_cpp python=3.11
conda activate llama_cpp
pip3 install -r requirements.txt
# model_path/mymodel为我们实际的模型路径
python convert_hf_to_gguf.py model_path/mymodel
运行日志:
(llama_cpp) D:\LLM\llama.cpp>python convert_hf_to_gguf.py D:/python_project/LLaMA-Factory/output_model/model2 INFO:hf-to-gguf:Loading model: model2 INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Exporting model... INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json' INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00002.safetensors' INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {1536, 151936} INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.0.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.0.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.0.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.1.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.1.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.1.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.10.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.10.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.10.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.10.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.10.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.10.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.11.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.11.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.11.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.11.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.11.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.11.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.11.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.12.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.12.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.12.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.12.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.12.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.12.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.13.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.13.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.13.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.13.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.13.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.13.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.13.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.14.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.14.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.14.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.14.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.14.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.14.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.15.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.15.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.15.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.15.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.15.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.15.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.15.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.16.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.16.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.16.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.16.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.16.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.16.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.16.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.2.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.2.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.2.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.3.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.3.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.3.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.4.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.4.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.4.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.5.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.5.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.5.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.6.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.6.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.6.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.7.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.7.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.7.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.8.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.8.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.8.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.8.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.8.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.9.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.9.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.9.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.9.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.9.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.9.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.9.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:gguf: loading model part 'model-00002-of-00002.safetensors' INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.16.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.16.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.16.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.17.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.17.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.17.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.17.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.17.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.17.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.17.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.17.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.17.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.17.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.18.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.18.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.18.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.18.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.18.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.18.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.18.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.18.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.18.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.18.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.19.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.19.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.19.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.19.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.19.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.19.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.19.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.19.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.19.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.19.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.20.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.20.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.20.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.20.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.20.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.20.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.20.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.20.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.20.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.20.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.21.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.21.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.21.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.21.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.21.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.21.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.21.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.21.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.21.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.21.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.22.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.22.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.22.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.22.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.22.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.22.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.22.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.22.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.22.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.22.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.23.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.23.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.23.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.23.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.23.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.23.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.23.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.23.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.23.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.23.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.24.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.24.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.24.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.24.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.24.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.24.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.24.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.24.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.24.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.24.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.24.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.24.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.25.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.25.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.25.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.25.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.25.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.25.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.25.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.25.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.25.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.25.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.25.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.25.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.26.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.26.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.26.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.26.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.26.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.26.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.26.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.26.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.26.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.26.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.26.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.26.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.27.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.27.ffn_down.weight, torch.bfloat16 --> F16, shape = {8960, 1536} INFO:hf-to-gguf:blk.27.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.27.ffn_up.weight, torch.bfloat16 --> F16, shape = {1536, 8960} INFO:hf-to-gguf:blk.27.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.27.attn_k.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.27.attn_k.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:blk.27.attn_output.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.27.attn_q.bias, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:blk.27.attn_q.weight, torch.bfloat16 --> F16, shape = {1536, 1536} INFO:hf-to-gguf:blk.27.attn_v.bias, torch.bfloat16 --> F32, shape = {256} INFO:hf-to-gguf:blk.27.attn_v.weight, torch.bfloat16 --> F16, shape = {1536, 256} INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {1536} INFO:hf-to-gguf:Set meta model INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:gguf: context length = 32768 INFO:hf-to-gguf:gguf: embedding length = 1536 INFO:hf-to-gguf:gguf: feed forward length = 8960 INFO:hf-to-gguf:gguf: head count = 12 INFO:hf-to-gguf:gguf: key-value head count = 2 INFO:hf-to-gguf:gguf: rope theta = 1000000.0 INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06 INFO:hf-to-gguf:gguf: file type = 1 INFO:hf-to-gguf:Set model tokenizer Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. INFO:gguf.vocab:Adding 151387 merge(s). INFO:gguf.vocab:Setting special token type eos to 151645 INFO:gguf.vocab:Setting special token type pad to 151643 INFO:gguf.vocab:Setting special token type bos to 151643 INFO:gguf.vocab:Setting chat_template to {% set system_message = 'You are a helpful assistant.' %}{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% if system_message is defined %}{{ '<|im_start|>system ' + system_message + '<|im_end|> ' }}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|im_start|>user ' + content + '<|im_end|> <|im_start|>assistant ' }}{% elif message['role'] == 'assistant' %}{{ content + '<|im_end|>' + ' ' }}{% endif %}{% endfor %} INFO:hf-to-gguf:Set model quantization version INFO:gguf.gguf_writer:Writing the following files: INFO:gguf.gguf_writer:D:\Llm\Qwen2-1.5B-Instruct-F16.gguf: n_tensors = 338, total_size = 3.1G Writing: 100%|██████████████████████████████████████████████████████████████████| 3.09G/3.09G [00:14<00:00, 218Mbyte/s] INFO:hf-to-gguf:Model successfully exported to D:\Llm\Qwen2-1.5B-Instruct-F16.gguf
# model_path/mymodel-F16.gguf为刚刚得到的FP16模型,model_path/mymodel_Q4_k_M.gguf为要得到的量化模型,Q4_K_M为使用Q4_K_M方法
llama-quantize.exe model_path/mymodel-F16.gguf model_path/mymodel_Q4_k_M.gguf Q4_K_M
运行日志:
(llama_cpp) D:\LLM\llama.cpp>llama-quantize.exe D:/Llm/Qwen2-1.5B-Instruct-F16.gguf D:/Llm/Qwen2-1.5B-Instruct_Q4_k_M.gguf Q4_K_M main: build = 0 (unknown) main: built with cc (GCC) 14.1.0 for x86_64-w64-mingw32 main: quantizing 'D:/Llm/Qwen2-1.5B-Instruct-F16.gguf' to 'D:/Llm/Qwen2-1.5B-Instruct_Q4_k_M.gguf' as Q4_K_M llama_model_loader: loaded meta data with 25 key-value pairs and 338 tensors from D:/Llm/Qwen2-1.5B-Instruct-F16.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = D:\\LLM\\Qwen2 1.5B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = D:\\LLM\\Qwen2 llama_model_loader: - kv 5: general.size_label str = 1.5B llama_model_loader: - kv 6: qwen2.block_count u32 = 28 llama_model_loader: - kv 7: qwen2.context_length u32 = 32768 llama_model_loader: - kv 8: qwen2.embedding_length u32 = 1536 llama_model_loader: - kv 9: qwen2.feed_forward_length u32 = 8960 llama_model_loader: - kv 10: qwen2.attention.head_count u32 = 12 llama_model_loader: - kv 11: qwen2.attention.head_count_kv u32 = 2 llama_model_loader: - kv 12: qwen2.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 13: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 14: general.file_type u32 = 1 llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["臓 臓", "臓臓 臓臓", "i n", "臓 t",... llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 23: tokenizer.chat_template str = {% set system_message = 'You are a he... llama_model_loader: - kv 24: general.quantization_version u32 = 2 llama_model_loader: - type f32: 141 tensors llama_model_loader: - type f16: 197 tensors [ 1/ 338] token_embd.weight - [ 1536, 151936, 1, 1], type = f16, converting to q6_K .. size = 445.12 MiB -> 182.57 MiB [ 2/ 338] blk.0.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 3/ 338] blk.0.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q6_K .. size = 26.25 MiB -> 10.77 MiB [ 4/ 338] blk.0.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 5/ 338] blk.0.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 6/ 338] blk.0.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 7/ 338] blk.0.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 8/ 338] blk.0.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 9/ 338] blk.0.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 10/ 338] blk.0.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 11/ 338] blk.0.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 12/ 338] blk.0.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 13/ 338] blk.0.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q6_K .. size = 0.75 MiB -> 0.31 MiB [ 14/ 338] blk.1.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 15/ 338] blk.1.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q6_K .. size = 26.25 MiB -> 10.77 MiB [ 16/ 338] blk.1.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 17/ 338] blk.1.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 18/ 338] blk.1.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 19/ 338] blk.1.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 20/ 338] blk.1.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 21/ 338] blk.1.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 22/ 338] blk.1.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 23/ 338] blk.1.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 24/ 338] blk.1.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 25/ 338] blk.1.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q6_K .. size = 0.75 MiB -> 0.31 MiB [ 26/ 338] blk.10.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 27/ 338] blk.10.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q6_K .. size = 26.25 MiB -> 10.77 MiB [ 28/ 338] blk.10.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 29/ 338] blk.10.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 30/ 338] blk.10.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 31/ 338] blk.10.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 32/ 338] blk.10.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 33/ 338] blk.10.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 34/ 338] blk.10.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 35/ 338] blk.10.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 36/ 338] blk.10.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 37/ 338] blk.10.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q6_K .. size = 0.75 MiB -> 0.31 MiB [ 38/ 338] blk.11.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 39/ 338] blk.11.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 40/ 338] blk.11.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 41/ 338] blk.11.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 42/ 338] blk.11.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 43/ 338] blk.11.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 44/ 338] blk.11.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 45/ 338] blk.11.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 46/ 338] blk.11.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 47/ 338] blk.11.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 48/ 338] blk.11.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 49/ 338] blk.11.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 50/ 338] blk.12.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 51/ 338] blk.12.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 52/ 338] blk.12.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 53/ 338] blk.12.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 54/ 338] blk.12.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 55/ 338] blk.12.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 56/ 338] blk.12.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 57/ 338] blk.12.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 58/ 338] blk.12.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 59/ 338] blk.12.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 60/ 338] blk.12.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 61/ 338] blk.12.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 62/ 338] blk.13.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 63/ 338] blk.13.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q6_K .. size = 26.25 MiB -> 10.77 MiB [ 64/ 338] blk.13.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 65/ 338] blk.13.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 66/ 338] blk.13.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 67/ 338] blk.13.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 68/ 338] blk.13.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 69/ 338] blk.13.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 70/ 338] blk.13.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 71/ 338] blk.13.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 72/ 338] blk.13.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 73/ 338] blk.13.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q6_K .. size = 0.75 MiB -> 0.31 MiB [ 74/ 338] blk.14.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 75/ 338] blk.14.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 76/ 338] blk.14.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 77/ 338] blk.14.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 78/ 338] blk.14.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 79/ 338] blk.14.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 80/ 338] blk.14.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 81/ 338] blk.14.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 82/ 338] blk.14.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 83/ 338] blk.14.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 84/ 338] blk.14.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 85/ 338] blk.14.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 86/ 338] blk.15.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 87/ 338] blk.15.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 88/ 338] blk.15.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 89/ 338] blk.15.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 90/ 338] blk.15.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 91/ 338] blk.15.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 92/ 338] blk.15.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 93/ 338] blk.15.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 94/ 338] blk.15.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 95/ 338] blk.15.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 96/ 338] blk.15.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 97/ 338] blk.15.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 98/ 338] blk.16.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 99/ 338] blk.16.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 100/ 338] blk.16.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 101/ 338] blk.16.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 102/ 338] blk.16.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 103/ 338] blk.16.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 104/ 338] blk.16.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q6_K .. size = 0.75 MiB -> 0.31 MiB [ 105/ 338] blk.2.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 106/ 338] blk.2.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q6_K .. size = 26.25 MiB -> 10.77 MiB [ 107/ 338] blk.2.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 108/ 338] blk.2.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 109/ 338] blk.2.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 110/ 338] blk.2.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 111/ 338] blk.2.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 112/ 338] blk.2.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 113/ 338] blk.2.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 114/ 338] blk.2.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 115/ 338] blk.2.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 116/ 338] blk.2.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 117/ 338] blk.3.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 118/ 338] blk.3.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 119/ 338] blk.3.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 120/ 338] blk.3.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 121/ 338] blk.3.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 122/ 338] blk.3.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 123/ 338] blk.3.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 124/ 338] blk.3.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 125/ 338] blk.3.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 126/ 338] blk.3.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 127/ 338] blk.3.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 128/ 338] blk.3.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 129/ 338] blk.4.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 130/ 338] blk.4.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 131/ 338] blk.4.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 132/ 338] blk.4.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 133/ 338] blk.4.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 134/ 338] blk.4.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 135/ 338] blk.4.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 136/ 338] blk.4.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 137/ 338] blk.4.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 138/ 338] blk.4.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 139/ 338] blk.4.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 140/ 338] blk.4.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q6_K .. size = 0.75 MiB -> 0.31 MiB [ 141/ 338] blk.5.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 142/ 338] blk.5.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q6_K .. size = 26.25 MiB -> 10.77 MiB [ 143/ 338] blk.5.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 144/ 338] blk.5.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 145/ 338] blk.5.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 146/ 338] blk.5.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 147/ 338] blk.5.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 148/ 338] blk.5.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 149/ 338] blk.5.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 150/ 338] blk.5.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 151/ 338] blk.5.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 152/ 338] blk.5.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 153/ 338] blk.6.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 154/ 338] blk.6.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 155/ 338] blk.6.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 156/ 338] blk.6.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 157/ 338] blk.6.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 158/ 338] blk.6.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 159/ 338] blk.6.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 160/ 338] blk.6.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 161/ 338] blk.6.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 162/ 338] blk.6.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 163/ 338] blk.6.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 164/ 338] blk.6.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 165/ 338] blk.7.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 166/ 338] blk.7.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 167/ 338] blk.7.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 168/ 338] blk.7.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 169/ 338] blk.7.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 170/ 338] blk.7.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 171/ 338] blk.7.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 172/ 338] blk.7.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 173/ 338] blk.7.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 174/ 338] blk.7.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 175/ 338] blk.7.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 176/ 338] blk.7.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q6_K .. size = 0.75 MiB -> 0.31 MiB [ 177/ 338] blk.8.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 178/ 338] blk.8.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q6_K .. size = 26.25 MiB -> 10.77 MiB [ 179/ 338] blk.8.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 180/ 338] blk.8.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 181/ 338] blk.8.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 182/ 338] blk.8.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 183/ 338] blk.8.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 184/ 338] blk.8.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 185/ 338] blk.8.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 186/ 338] blk.8.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 187/ 338] blk.8.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 188/ 338] blk.8.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 189/ 338] blk.9.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 190/ 338] blk.9.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 191/ 338] blk.9.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 192/ 338] blk.9.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 193/ 338] blk.9.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 194/ 338] blk.9.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 195/ 338] blk.9.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 196/ 338] blk.9.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 197/ 338] blk.9.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 198/ 338] blk.9.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 199/ 338] blk.9.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 200/ 338] blk.9.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 201/ 338] blk.16.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 202/ 338] blk.16.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 203/ 338] blk.16.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 204/ 338] blk.16.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 205/ 338] blk.16.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 206/ 338] blk.17.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 207/ 338] blk.17.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q6_K .. size = 26.25 MiB -> 10.77 MiB [ 208/ 338] blk.17.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 209/ 338] blk.17.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 210/ 338] blk.17.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 211/ 338] blk.17.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 212/ 338] blk.17.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 213/ 338] blk.17.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 214/ 338] blk.17.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 215/ 338] blk.17.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 216/ 338] blk.17.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 217/ 338] blk.17.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q6_K .. size = 0.75 MiB -> 0.31 MiB [ 218/ 338] blk.18.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 219/ 338] blk.18.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 220/ 338] blk.18.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 221/ 338] blk.18.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 222/ 338] blk.18.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 223/ 338] blk.18.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 224/ 338] blk.18.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 225/ 338] blk.18.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 226/ 338] blk.18.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 227/ 338] blk.18.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 228/ 338] blk.18.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 229/ 338] blk.18.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 230/ 338] blk.19.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 231/ 338] blk.19.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 232/ 338] blk.19.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 233/ 338] blk.19.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 234/ 338] blk.19.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 235/ 338] blk.19.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 236/ 338] blk.19.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 237/ 338] blk.19.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 238/ 338] blk.19.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 239/ 338] blk.19.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 240/ 338] blk.19.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 241/ 338] blk.19.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 242/ 338] blk.20.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 243/ 338] blk.20.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q6_K .. size = 26.25 MiB -> 10.77 MiB [ 244/ 338] blk.20.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 245/ 338] blk.20.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 246/ 338] blk.20.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 247/ 338] blk.20.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 248/ 338] blk.20.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 249/ 338] blk.20.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 250/ 338] blk.20.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 251/ 338] blk.20.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 252/ 338] blk.20.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 253/ 338] blk.20.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q6_K .. size = 0.75 MiB -> 0.31 MiB [ 254/ 338] blk.21.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 255/ 338] blk.21.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 256/ 338] blk.21.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 257/ 338] blk.21.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 258/ 338] blk.21.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 259/ 338] blk.21.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 260/ 338] blk.21.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 261/ 338] blk.21.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 262/ 338] blk.21.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 263/ 338] blk.21.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 264/ 338] blk.21.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 265/ 338] blk.21.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 266/ 338] blk.22.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 267/ 338] blk.22.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 268/ 338] blk.22.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 269/ 338] blk.22.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 270/ 338] blk.22.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 271/ 338] blk.22.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 272/ 338] blk.22.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 273/ 338] blk.22.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 274/ 338] blk.22.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 275/ 338] blk.22.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 276/ 338] blk.22.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 277/ 338] blk.22.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 278/ 338] blk.23.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 279/ 338] blk.23.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q6_K .. size = 26.25 MiB -> 10.77 MiB [ 280/ 338] blk.23.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 281/ 338] blk.23.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 282/ 338] blk.23.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 283/ 338] blk.23.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 284/ 338] blk.23.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 285/ 338] blk.23.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 286/ 338] blk.23.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 287/ 338] blk.23.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 288/ 338] blk.23.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 289/ 338] blk.23.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q6_K .. size = 0.75 MiB -> 0.31 MiB [ 290/ 338] blk.24.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 291/ 338] blk.24.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q6_K .. size = 26.25 MiB -> 10.77 MiB [ 292/ 338] blk.24.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 293/ 338] blk.24.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 294/ 338] blk.24.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 295/ 338] blk.24.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 296/ 338] blk.24.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 297/ 338] blk.24.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 298/ 338] blk.24.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 299/ 338] blk.24.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 300/ 338] blk.24.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 301/ 338] blk.24.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q6_K .. size = 0.75 MiB -> 0.31 MiB [ 302/ 338] blk.25.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 303/ 338] blk.25.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q6_K .. size = 26.25 MiB -> 10.77 MiB [ 304/ 338] blk.25.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 305/ 338] blk.25.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 306/ 338] blk.25.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 307/ 338] blk.25.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 308/ 338] blk.25.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 309/ 338] blk.25.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 310/ 338] blk.25.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 311/ 338] blk.25.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 312/ 338] blk.25.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 313/ 338] blk.25.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q6_K .. size = 0.75 MiB -> 0.31 MiB [ 314/ 338] blk.26.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 315/ 338] blk.26.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q6_K .. size = 26.25 MiB -> 10.77 MiB [ 316/ 338] blk.26.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 317/ 338] blk.26.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 318/ 338] blk.26.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 319/ 338] blk.26.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 320/ 338] blk.26.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 321/ 338] blk.26.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 322/ 338] blk.26.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 323/ 338] blk.26.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 324/ 338] blk.26.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 325/ 338] blk.26.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q6_K .. size = 0.75 MiB -> 0.31 MiB [ 326/ 338] blk.27.attn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 327/ 338] blk.27.ffn_down.weight - [ 8960, 1536, 1, 1], type = f16, converting to q6_K .. size = 26.25 MiB -> 10.77 MiB [ 328/ 338] blk.27.ffn_gate.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 329/ 338] blk.27.ffn_up.weight - [ 1536, 8960, 1, 1], type = f16, converting to q4_K .. size = 26.25 MiB -> 7.38 MiB [ 330/ 338] blk.27.ffn_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 331/ 338] blk.27.attn_k.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 332/ 338] blk.27.attn_k.weight - [ 1536, 256, 1, 1], type = f16, converting to q4_K .. size = 0.75 MiB -> 0.21 MiB [ 333/ 338] blk.27.attn_output.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 334/ 338] blk.27.attn_q.bias - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB [ 335/ 338] blk.27.attn_q.weight - [ 1536, 1536, 1, 1], type = f16, converting to q4_K .. size = 4.50 MiB -> 1.27 MiB [ 336/ 338] blk.27.attn_v.bias - [ 256, 1, 1, 1], type = f32, size = 0.001 MB [ 337/ 338] blk.27.attn_v.weight - [ 1536, 256, 1, 1], type = f16, converting to q6_K .. size = 0.75 MiB -> 0.31 MiB [ 338/ 338] output_norm.weight - [ 1536, 1, 1, 1], type = f32, size = 0.006 MB llama_model_quantize_internal: model size = 2944.68 MB llama_model_quantize_internal: quant size = 934.69 MB main: quantize time = 14228.85 ms main: total time = 14228.85 ms
最后,得到了Qwen2-1.5B-Instruct_Q4_k_M.gguf
模型。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。