当前位置:   article > 正文

【笔记】Ubuntu中Llama3中文微调,并加载微调后的模型:中文微调数据集介绍、如何使用Ollama 和 LM studio本地加载Fine Tuning后的模型,ollama的安装使用和卸载_error: could not connect to ollama app, is it runn

error: could not connect to ollama app, is it running?

实践:about ollama

安装

curl -fsSL https://ollama.com/install.sh | sh

部署

ollama create example -f Modelfile

运行

ollama run example

终止(ollama加载的大模型将会停止占用显存,此时ollama属于失联状态,部署和运行操作失效,会报错:

Error: could not connect to ollama app, is it running?

需要启动后,才可以进行部署和运行操作)

systemctl stop ollama.service

终止后启动(启动后,可以接着使用ollama 部署和运行大模型)

systemctl start ollama.service

Modelfile contents:

  1. FROM /home/wangbin/Desktop/Llama3/dir-unsloth.F16.gguf
  2. PARAMETER stop "<|im_start|>"
  3. PARAMETER stop "<|im_end|>"
  4. TEMPLATE """
  5. <|im_start|>system
  6. {{ .System }}<|im_end|>
  7. <|im_start|>user
  8. {{ .Prompt }}<|im_end|>
  9. <|im_start|>assistant
  10. """
  11. PARAMETER temperature 0.8
  12. PARAMETER num_ctx 8192
  13. PARAMETER stop "<|system|>"
  14. PARAMETER stop "<|user|>"
  15. PARAMETER stop "<|assistant|>"
  16. SYSTEM """You are a helpful, smart, kind, and efficient AI assistant.Your name is Aila. You always fulfill the user's requests to the best of your ability."""

ollama 参数:

  1. (unsloth_env) wangbin@wangbin-LEGION-REN9000K-34IRZ:~/Desktop/Llama3$ ollama
  2. Usage:
  3. ollama [flags]
  4. ollama [command]
  5. Available Commands:
  6. serve Start ollama
  7. create Create a model from a Modelfile
  8. show Show information for a model
  9. run Run a model
  10. pull Pull a model from a registry
  11. push Push a model to a registry
  12. list List models
  13. ps List running models
  14. cp Copy a model
  15. rm Remove a model
  16. help Help about any command
  17. Flags:
  18. -h, --help help for ollama
  19. -v, --version Show version information

卸载

  1. 1.Stop the Ollama Service
  2. First things first, we need to stop the Ollama service from running. This ensures a smooth uninstallation process. Open your terminal and enter the following command:
  3. sudo systemctl stop ollama
  4. This command halts the Ollama service.
  5. 2.Disable the Ollama Service
  6. Now that the service is stopped, we need to disable it so that it doesn’t start up again upon system reboot. Enter the following command:
  7. sudo systemctl disable ollama
  8. This ensures that Ollama won’t automatically start up in the future.
  9. 3.Remove the Service File
  10. We need to tidy up by removing the service file associated with Ollama. Enter the following command:
  11. sudo rm /etc/systemd/system/ollama.service
  12. This deletes the service file from your system.
  13. 4.Delete the Ollama Binary
  14. Next up, we’ll remove the Ollama binary itself. Enter the following command:
  15. sudo rm $(which ollama)
  16. This command removes the binary from your bin directory.
  17. 5.Remove Downloaded Models and Ollama User
  18. Lastly, we’ll clean up any remaining bits and pieces. Enter the following commands one by one:
  19. sudo rm -r /usr/share/ollama
  20. sudo userdel ollama sudo groupdel ollama
  21. These commands delete any downloaded models and remove the Ollama user and group from your system.

正文:

清洗PDF:

  1. 清洗PDF
  2. import PyPDF2
  3. import re
  4. def clean_extracted_text(text):
  5. """Clean and preprocess extracted text."""
  6. # Remove chapter titles and sections
  7. text = re.sub(r'^(Introduction|Chapter \d+:|What is|Examples:|Chapter \d+)', '', text, flags=re.MULTILINE)
  8. text = re.sub(r'ctitious', 'fictitious', text)
  9. text = re.sub(r'ISBN[- ]13: \d{13}', '', text)
  10. text = re.sub(r'ISBN[- ]10: \d{10}', '', text)
  11. text = re.sub(r'Library of Congress Control Number : \d+', '', text)
  12. text = re.sub(r'(\.|\?|\!)(\S)', r'\1 \2', text) # Ensure space after punctuation
  13. text = re.sub(r'All rights reserved|Copyright \d{4}', '', text)
  14. text = re.sub(r'\n\s*\n', '\n', text)
  15. text = re.sub(r'[^\x00-\x7F]+', ' ', text)
  16. text = re.sub(r'\s{2,}', ' ', text)
  17. # Remove all newlines and replace newlines only after periods
  18. text = text.replace('\n', ' ')
  19. text = re.sub(r'(\.)(\s)', r'\1\n', text)
  20. return text
  21. def extract_text_from_pdf(pdf_path):
  22. """Extract text from a PDF file."""
  23. with open(pdf_path, 'rb') as file:
  24. reader = PyPDF2.PdfReader(file)
  25. text = ''
  26. for page in reader.pages:
  27. if page.extract_text():
  28. text += page.extract_text() + ' ' # Append text of each page
  29. return text
  30. def main():
  31. pdf_path = '/Users/charlesqin/Documents/The Art of Asking ChatGPT.pdf' # Path to your PDF file
  32. extracted_text = extract_text_from_pdf(pdf_path)
  33. cleaned_text = clean_extracted_text(extracted_text)
  34. # Output the cleaned text to a file
  35. with open('cleaned_text_output.txt', 'w', encoding='utf-8') as file:
  36. file.write(cleaned_text)
  37. if __name__ == '__main__':
  38. main()

微调代码:

  1. from unsloth import FastLanguageModel
  2. import torch
  3. from trl import SFTTrainer
  4. from transformers import TrainingArguments
  5. max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
  6. dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
  7. load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
  8. # 4bit pre quantized models we support for 4x faster downloading + no OOMs.
  9. fourbit_models = [
  10. "unsloth/mistral-7b-bnb-4bit",
  11. "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
  12. "unsloth/llama-2-7b-bnb-4bit",
  13. "unsloth/gemma-7b-bnb-4bit",
  14. "unsloth/gemma-7b-it-bnb-4bit", # Instruct version of Gemma 7b
  15. "unsloth/gemma-2b-bnb-4bit",
  16. "unsloth/gemma-2b-it-bnb-4bit", # Instruct version of Gemma 2b
  17. "unsloth/llama-3-8b-bnb-4bit", # [NEW] 15 Trillion token Llama-3
  18. ] # More models at https://huggingface.co/unsloth
  19. model, tokenizer = FastLanguageModel.from_pretrained(
  20. model_name = "unsloth/llama-3-8b-bnb-4bit",
  21. max_seq_length = max_seq_length,
  22. dtype = dtype,
  23. load_in_4bit = load_in_4bit,
  24. # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
  25. )
  26. model = FastLanguageModel.get_peft_model(
  27. model,
  28. r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
  29. target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
  30. "gate_proj", "up_proj", "down_proj",],
  31. lora_alpha = 16,
  32. lora_dropout = 0, # Supports any, but = 0 is optimized
  33. bias = "none", # Supports any, but = "none" is optimized
  34. # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
  35. use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
  36. random_state = 3407,
  37. use_rslora = False, # We support rank stabilized LoRA
  38. loftq_config = None, # And LoftQ
  39. )
  40. alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
  41. ### Instruction:
  42. {}
  43. ### Input:
  44. {}
  45. ### Response:
  46. {}"""
  47. EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
  48. def formatting_prompts_func(examples):
  49. instructions = examples["instruction"]
  50. inputs = examples["input"]
  51. outputs = examples["output"]
  52. texts = []
  53. for instruction, input, output in zip(instructions, inputs, outputs):
  54. # Must add EOS_TOKEN, otherwise your generation will go on forever!
  55. text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
  56. texts.append(text)
  57. return { "text" : texts, }
  58. pass
  59. from datasets import load_dataset
  60. file_path = "/home/Ubuntu/alpaca_gpt4_data_zh.json"
  61. dataset = load_dataset("json", data_files={"train": file_path}, split="train")
  62. dataset = dataset.map(formatting_prompts_func, batched = True,)
  63. trainer = SFTTrainer(
  64. model = model,
  65. tokenizer = tokenizer,
  66. train_dataset = dataset,
  67. dataset_text_field = "text",
  68. max_seq_length = max_seq_length,
  69. dataset_num_proc = 2,
  70. packing = False, # Can make training 5x faster for short sequences.
  71. args = TrainingArguments(
  72. per_device_train_batch_size = 2,
  73. gradient_accumulation_steps = 4,
  74. warmup_steps = 5,
  75. max_steps = 60,
  76. learning_rate = 2e-4,
  77. fp16 = not torch.cuda.is_bf16_supported(),
  78. bf16 = torch.cuda.is_bf16_supported(),
  79. logging_steps = 1,
  80. optim = "adamw_8bit",
  81. weight_decay = 0.01,
  82. lr_scheduler_type = "linear",
  83. seed = 3407,
  84. output_dir = "outputs",
  85. ),
  86. )
  87. trainer_stats = trainer.train()
  88. model.save_pretrained_gguf("dir", tokenizer, quantization_method = "q4_k_m")
  89. model.save_pretrained_gguf("dir", tokenizer, quantization_method = "q8_0")
  90. model.save_pretrained_gguf("dir", tokenizer, quantization_method = "f16")

Ollama:

LM Studio:

我们使用经过Fine Tuning以后的Llama3大模型,询问它问题:

然后我们使用没有经过Fine Tuning的Llama3,还是用刚才的问题询问它:

Reference link:https://www.youtube.com/watch?v=oxTVzGwKeoU

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Guff_9hys/article/detail/772988
推荐阅读
相关标签
  

闽ICP备14008679号