当前位置:   article > 正文

微调alpaca-lora遇到的一些问题_error while deserializing header: invalidheaderdes

error while deserializing header: invalidheaderdeserialization

目录

一、环境简介 

二、混合精度训练Tensor相互计算会报错

三、推理加载lora报错:SafetensorError: Error while deserializing header: InvalidHeaderDeserialization

四、peft(版本0.9.0) save_pretrained 不保存 adapter_model.bin

五、一些代码注释

六、问题问答

  6.1、model 已经使用了load_in_8bit加载后为何又使用prepare_model_for_int8_training 重复执行一次?

  6.2、int8 traing int4 traing与deepspeed V100显卡的匹配问题(待验证)


一、环境简介 

环境:

        系统:Ubuntu

        torch:2.2.1

        python:3.10

        gpu:V100 16g

        peft:0.9.0

使用PEFT中的lora方式微调llama-2-7b-hf,项目地址:alpaca-lora

二、混合精度训练Tensor相互计算会报错

报错内容:

解决方法:

修改 finetune.py 内容

  1. # 修改前
  2. trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  3. # 修改后
  4. with torch.autocast("cuda"): # 加上这行代码,精度自动转换
  5. trainer.train(resume_from_checkpoint=resume_from_checkpoint)

三、推理加载lora报错:SafetensorError: Error while deserializing header: InvalidHeaderDeserialization

  • peft(版本0.9.0) save_pretrained() 仅保存lora权重(不保存原模型权重)
  • huggingface博客:peft使用
  • huggingface文档:peft参数

推理时,加载lora权重时报错,报错内容为:

  1. SafetensorError: Error while deserializing header: InvalidHeaderDeserialization

删除 finetune.py ​​​​​​​ 中部分代码内容

  1. # 删除原因:该代码是为了将原模型的state_dict替换成仅仅包含lora的权重,新版本peft会自动仅保存lora权重,不会保存原本模型的权重
  2. old_state_dict = model.state_dict
  3. model.state_dict = (
  4.         lambda self, *_, **__: get_peft_model_state_dict(
  5.             self, old_state_dict()
  6.         )
  7.     ).__get__(model, type(model))
  8. # 删除原因:torch.compile 与 peft(0.9.0版本)目前似乎不兼容,开启此代码会导致lora权重文件保存的是空字典,推理时加载lora权重会报错
  9. if torch.__version__ >= "2" and sys.platform != "win32":
  10. model = torch.compile(model)

四、peft(版本0.9.0) save_pretrained 不保存 adapter_model.bin

lora权重保存结果:

修改 finetune.py 内容,加入参数, safe_serialization=False

  1. # 修改前
  2. model.save_pretrained("保存目录")
  3. # 修改后
  4. model.save_pretrained("保存目录",safe_serialization=False)

五、一些代码注释

 

  1. # 导入模型所需的包
  2. import os
  3. import sys
  4. from typing import List
  5. import fire
  6. import torch
  7. import torch.distributed
  8. import transformers
  9. from datasets import load_dataset
  10. from typing import List, Optional, Union
  11. """
  12. Unused imports:
  13. import torch.nn as nn
  14. import bitsandbytes as bnb
  15. """
  16. # 从peft框架中导入相关配置文件
  17. from peft import ( # noqa: E402
  18. LoraConfig,
  19. # BottleneckConfig, # TODO 修改官方代码
  20. get_peft_model,
  21. get_peft_model_state_dict,
  22. # TODO 修改官方代码
  23. prepare_model_for_kbit_training,
  24. set_peft_model_state_dict,
  25. )
  26. # 导入加载LlaMA模型所需的库
  27. from transformers import (
  28. AutoModelForCausalLM,
  29. AutoTokenizer,
  30. Seq2SeqTrainingArguments,
  31. set_seed
  32. ) # noqa: F402
  33. # TODO 新增代码
  34. import os
  35. set_seed(42)
  36. os.environ["WANDB_DISABLED"] = "true" # 关闭 wandb
  37. os.environ["TOKENIZERS_PARALLELISM"] = "false" # 禁用并行性:如果您想禁用并行处理以避免死锁,可以将TOKENIZERS_PARALLELISM设置为false。
  38. # TODO 新增代码, 打印模型的是否参与训练的参数名和数据类型
  39. def print_model_allarguments_name_dtype(model):
  40. for n,v in model.named_parameters():
  41. if v.requires_grad:
  42. print(f"trainable model arguments: {n} - {v.dtype} - {v.shape}")
  43. else:
  44. print(f"not trainable model arguments: {n} - {v.dtype} - {v.shape}")
  45. def train(
  46. # model/data params
  47. base_model: str = "", # 必需的模型参数
  48. data_path: str = "yahma/alpaca-cleaned", # 数据路径
  49. output_dir: str = "./lora-alpaca", # 输出模型的目录
  50. adapter_name: str = "lora", # 选择使用LoRA或Bottleneck Adapter进行微调
  51. # training hyperparams
  52. batch_size: int = 128, # 批处理大小
  53. micro_batch_size: int = 4, # 微批处理大小(可以根据batch_size与micro_batch_size的比值得到gradient_accumulation_steps)
  54. num_epochs: int = 3, # 训练轮数
  55. learning_rate: float = 3e-4, # 学习率
  56. cutoff_len: int = 256, # 截断输入文本的最大长度
  57. val_set_size: int = 2000, # 验证集大小
  58. use_gradient_checkpointing: bool = False, # 是否使用梯度检查点(一种用时间换显存的方式)
  59. eval_step: int = 200, # 每多少步进行一次验证
  60. save_step: int = 200, # 每多少步保存一次模型
  61. # lora hyperparams
  62. lora_r: int = 8, # LoRA模型的R参数,矩阵的秩
  63. lora_alpha: int = 16, # LoRA模型的alpha参数
  64. lora_dropout: float = 0.05, # LoRA模型的dropout率
  65. lora_target_modules: List[str] = None, # LoRA模型的目标模块列表
  66. # bottleneck adapter hyperparams
  67. # TODO 修改代码, 删除此适配器
  68. bottleneck_size: int = 256, # Bottleneck适配器的大小
  69. non_linearity: str = "tanh", # 非线性激活函数
  70. adapter_dropout: float = 0.0, # 适配器的dropout率
  71. use_parallel_adapter: bool = False, # 是否使用并行适配器
  72. use_adapterp: bool = False, # 是否使用适配器
  73. target_modules: List[str] = None, # 适配器的目标模块列表
  74. scaling: Union[float, str] = 1.0, # 缩放参数
  75. # llm hyperparams
  76. train_on_inputs: bool = True, # 是否训练输入
  77. group_by_length: bool = False, # 是否按长度分组
  78. # wandb params
  79. wandb_project: str = "", # WandB项目名称
  80. wandb_run_name: str = "", # WandB运行名称
  81. wandb_watch: str = "", # WandB监控选项
  82. wandb_log_model: str = "", # 是否记录模型到WandB
  83. resume_from_checkpoint: str = None, # 从检查点恢复训练
  84. ):
  85. print(
  86. f"Finetuning model with params:\n"
  87. f"base_model: {base_model}\n"
  88. f"data_path: {data_path}\n"
  89. f"output_dir: {output_dir}\n"
  90. f"batch_size: {batch_size}\n"
  91. f"micro_batch_size: {micro_batch_size}\n"
  92. f"num_epochs: {num_epochs}\n"
  93. f"learning_rate: {learning_rate}\n"
  94. f"cutoff_len: {cutoff_len}\n"
  95. f"val_set_size: {val_set_size}\n"
  96. f"use_gradient_checkpointing: {use_gradient_checkpointing}\n"
  97. f"lora_r: {lora_r}\n"
  98. f"lora_alpha: {lora_alpha}\n"
  99. f"lora_dropout: {lora_dropout}\n"
  100. f"lora_target_modules: {lora_target_modules}\n"
  101. f"bottleneck_size: {bottleneck_size}\n"
  102. f"non_linearity: {non_linearity}\n"
  103. f"adapter_dropout: {adapter_dropout}\n"
  104. f"use_parallel_adapter: {use_parallel_adapter}\n"
  105. f"use_adapterp: {use_adapterp}\n"
  106. f"train_on_inputs: {train_on_inputs}\n"
  107. f"scaling: {scaling}\n"
  108. f"adapter_name: {adapter_name}\n"
  109. f"target_modules: {target_modules}\n"
  110. f"group_by_length: {group_by_length}\n"
  111. f"wandb_project: {wandb_project}\n"
  112. f"wandb_run_name: {wandb_run_name}\n"
  113. f"wandb_watch: {wandb_watch}\n"
  114. f"wandb_log_model: {wandb_log_model}\n"
  115. f"resume_from_checkpoint: {resume_from_checkpoint}\n"
  116. )
  117. assert (
  118. base_model
  119. ), "Please specify a --base_model, e.g. --base_model='decapoda-research/LLaMA-7b-hf'"
  120. # 设置梯度累积的步数
  121. gradient_accumulation_steps = batch_size // micro_batch_size
  122. device_map = "auto"
  123. world_size = int(os.environ.get("WORLD_SIZE", 1))
  124. ddp = world_size != 1
  125. if ddp:
  126. device_map = {"": int(os.environ.get("LOCAL_RANK") or 0)}
  127. gradient_accumulation_steps = gradient_accumulation_steps // world_size
  128. # Check if parameter passed or if set within environ, 检查是否设置了 Weights & Biases (wandb) 相关的环境变量或参数
  129. use_wandb = len(wandb_project) > 0 or (
  130. "WANDB_PROJECT" in os.environ and len(os.environ["WANDB_PROJECT"]) > 0
  131. )
  132. # Only overwrite environ if wandb param passed, 如果在命令行中传入了 wandb 参数,则将其设置到环境变量中
  133. if len(wandb_project) > 0:
  134. os.environ["WANDB_PROJECT"] = wandb_project
  135. if len(wandb_watch) > 0:
  136. os.environ["WANDB_WATCH"] = wandb_watch
  137. if len(wandb_log_model) > 0:
  138. os.environ["WANDB_LOG_MODEL"] = wandb_log_model
  139. model = AutoModelForCausalLM.from_pretrained(
  140. base_model,
  141. load_in_8bit=True, # 模型静态量化加载, 节约内容占用, 后面在训练阶段还会对模型进行再处理,没有被量化的层加载时用 fp16, 需要注意的是, 这些没有被量化层仍然会参与训练, 因此需要用prepare_model_for_kbit_training 进一步将这些层的 requires_grad 设置为 False
  142. torch_dtype=torch.float16,
  143. device_map=device_map,
  144. )
  145. # TODO 修改官方代码
  146. tokenizer = AutoTokenizer.from_pretrained(base_model)
  147. # 设置 tokenizer 的 pad_token_id 为 0 (unknown token),以便与 eos_token_id 区分
  148. tokenizer.pad_token_id = (0)
  149. # only-decoder 的LLM会普遍采用left padding,为了输入和输出的连续性。
  150. tokenizer.padding_side = "left" # Allow batched inference
  151. def tokenize(prompt, add_eos_token=True): # 添加 EOS
  152. # there's probably a way to do this with the tokenizer settings
  153. # but again, gotta move fast
  154. result = tokenizer(
  155. prompt,
  156. truncation=True,
  157. max_length=cutoff_len,
  158. padding=False,
  159. return_tensors=None, # None 表示返回 python 对象
  160. )
  161. if (
  162. result["input_ids"][-1] != tokenizer.eos_token_id
  163. and len(result["input_ids"]) < cutoff_len
  164. and add_eos_token
  165. ):
  166. result["input_ids"].append(tokenizer.eos_token_id) # 加入 EOS
  167. result["attention_mask"].append(1) # 加入 attention mask
  168. result["labels"] = result["input_ids"].copy()
  169. # TODO # 返回input_ids labels attention_mask
  170. # 思考: 是否返回 position_ids, 因为tokenizer为left填充, position id 将用于计算RoPE, 格式是否是: 0,0,0,0,1,2,3,4,...
  171. return result
  172. # 定义一个 generate_and_tokenize_prompt 函数,用于生成完整的训练/验证样本
  173. def generate_and_tokenize_prompt(data_point):
  174. full_prompt = generate_prompt(data_point)
  175. tokenized_full_prompt = tokenize(full_prompt)
  176. if not train_on_inputs:
  177. user_prompt = generate_prompt({**data_point, "output": ""})
  178. tokenized_user_prompt = tokenize(user_prompt, add_eos_token=False)
  179. user_prompt_len = len(tokenized_user_prompt["input_ids"])
  180. # 如果不训练输入部分,则将输入部分的 labels 设置为 -100 (忽略),只预测输出部分
  181. tokenized_full_prompt["labels"] = [
  182. -100
  183. ] * user_prompt_len + tokenized_full_prompt["labels"][
  184. user_prompt_len:
  185. ] # could be sped up, probably
  186. return tokenized_full_prompt
  187. # TODO 新增代码, 打印模型参数
  188. print("---> model load_in_8bit or load_in_4bit 后那些没有被量化的层仍会参与训练, 具体见:")
  189. print_model_allarguments_name_dtype(model)
  190. # TODO 前面已经使用了 int8 量化, prepare_model_for_kbit_training的作用?
  191. """
  192. load_in_8bit=True是在从预训练模型加载时使用的,它将模型的参数量化成8位整数,以减少内存占用。这是一种静态量化的方法。
  193. 需要注意的是: 静态量化中没有被量化的层仍然会参与训练(即requires_grad 为 True), 故需要进一步将这些层的 requires_grad 设置为 False
  194. prepare_model_for_kbit_training 将一些未量化的层,如 lm_head, position_embedding 等层的 requires_grad 设置为 False, 并将这些层转化为32位 float, 再开启梯度检查点等一系列操作
  195. """
  196. model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=use_gradient_checkpointing)
  197. # TODO 新增代码, 打印模型参数
  198. print("---> prepare_model_for_kbit_training 后那些没有被量化的层也不会参与训练, 具体见:")
  199. print_model_allarguments_name_dtype(model)
  200. # 此处提供了两种微调方法,一种是lora,一种是bottleneck
  201. if adapter_name == "lora":
  202. config = LoraConfig(
  203. r=lora_r,
  204. lora_alpha=lora_alpha,
  205. target_modules=lora_target_modules,
  206. lora_dropout=lora_dropout,
  207. bias="none",
  208. task_type="CAUSAL_LM",
  209. )
  210. # TODO 修改注释官方代码
  211. # elif adapter_name == "bottleneck":
  212. # config = BottleneckConfig(
  213. # bottleneck_size=bottleneck_size,
  214. # non_linearity=non_linearity,
  215. # adapter_dropout=adapter_dropout,
  216. # use_parallel_adapter=use_parallel_adapter,
  217. # use_adapterp=use_adapterp,
  218. # target_modules=target_modules,
  219. # scaling=scaling,
  220. # bias="none",
  221. # task_type="CAUSAL_LM",
  222. # )
  223. model = get_peft_model(model, config)
  224. # TODO 新增代码, 打印模型参数
  225. print("---> LORA model arguments, 具体见:")
  226. print_model_allarguments_name_dtype(model)
  227. print(f"---> model:\n{model}")
  228. if data_path.endswith(".json"): # todo: support jsonl
  229. data = load_dataset("json", data_files=data_path)
  230. else:
  231. data = load_dataset(data_path)
  232. if resume_from_checkpoint: # 如果指定了从检查点恢复训练,则加载检查点权重
  233. # Check the available weights and load them
  234. checkpoint_name = os.path.join(
  235. resume_from_checkpoint, "pytorch_model.bin"
  236. ) # Full checkpoint
  237. if not os.path.exists(checkpoint_name):
  238. checkpoint_name = os.path.join(
  239. resume_from_checkpoint, "adapter_model.bin"
  240. ) # only LoRA model - LoRA config above has to fit
  241. resume_from_checkpoint = (
  242. False # So the trainer won't try loading its state
  243. )
  244. # The two files above have a different name depending on how they were saved, but are actually the same.
  245. if os.path.exists(checkpoint_name):
  246. print(f"Restarting from {checkpoint_name}")
  247. adapters_weights = torch.load(checkpoint_name)
  248. model = set_peft_model_state_dict(model, adapters_weights)
  249. else:
  250. print(f"Checkpoint {checkpoint_name} not found")
  251. model.print_trainable_parameters() # Be more transparent about the % of trainable params.
  252. # 根据是否有验证集,创建训练集和验证集
  253. if val_set_size > 0:
  254. train_val = data["train"].train_test_split(
  255. test_size=val_set_size, shuffle=True, seed=42
  256. )
  257. train_data = (
  258. train_val["train"].shuffle().map(generate_and_tokenize_prompt)
  259. )
  260. val_data = (
  261. train_val["test"].shuffle().map(generate_and_tokenize_prompt)
  262. )
  263. else:
  264. train_data = data["train"].shuffle().map(generate_and_tokenize_prompt)
  265. val_data = None
  266. print(f"one sample of train datas:\n{train_data[0]}")
  267. # 如果不是分布式训练环境,且有多个GPU,则设置模型为可并行化
  268. # TODO 思考, 是否存在 ddp 为False, torch.cuda.device_count() 这种情况?
  269. if not ddp and torch.cuda.device_count() > 1:
  270. # keeps Trainer from trying its own DataParallelism when more than 1 gpu is available
  271. model.is_parallelizable = True
  272. model.model_parallel = True
  273. trainer = transformers.Trainer(
  274. model=model,
  275. train_dataset=train_data,
  276. eval_dataset=val_data,
  277. args=transformers.TrainingArguments(
  278. per_device_train_batch_size=micro_batch_size,
  279. gradient_accumulation_steps=gradient_accumulation_steps,
  280. warmup_steps=10,
  281. num_train_epochs=num_epochs,
  282. learning_rate=learning_rate,
  283. bf16=True, # 使用 bf16 混合精度训练
  284. logging_steps=1,
  285. optim="adamw_torch",
  286. evaluation_strategy="steps" if val_set_size > 0 else "no", # 存在验证集进行验证
  287. save_strategy="steps",
  288. eval_steps=eval_step if val_set_size > 0 else None,
  289. save_steps=save_step,
  290. output_dir=output_dir,
  291. save_total_limit=3,
  292. load_best_model_at_end=True if val_set_size > 0 else False, # 存在验证集训练结束时加载最优模型参数
  293. ddp_find_unused_parameters=False if ddp else None, # 如果是分布式, 关闭参数检查
  294. group_by_length=group_by_length,
  295. report_to="wandb" if use_wandb else None,
  296. run_name=wandb_run_name if use_wandb else None,
  297. ),
  298. data_collator=transformers.DataCollatorForSeq2Seq(
  299. tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
  300. ),
  301. )
  302. # 禁用模型的使用缓存功能,以节省显存
  303. model.config.use_cache = False
  304. # 替换模型的 state_dict 方法
  305. old_state_dict = model.state_dict
  306. model.state_dict = (
  307. lambda self, *_, **__: get_peft_model_state_dict(
  308. self, old_state_dict()
  309. )
  310. ).__get__(model, type(model))
  311. # 在支持的平台上使用 torch.compile 加速模型推理
  312. # TODO peft 与 torch.compile 似乎不匹配, 是否注释
  313. if torch.__version__ >= "2" and sys.platform != "win32":
  314. model = torch.compile(model)
  315. trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  316. model.save_pretrained(output_dir) # 保存训练好的模型
  317. print(
  318. "\n If there's a warning about missing keys above, please disregard :)"
  319. )
  320. # 获得prompt
  321. def generate_prompt(data_point):
  322. # sorry about the formatting disaster gotta move fast
  323. if data_point["input"]:
  324. return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
  325. ### Instruction:
  326. {data_point["instruction"]}
  327. ### Input:
  328. {data_point["input"]}
  329. ### Response:
  330. {data_point["output"]}""" # noqa: E501
  331. else:
  332. return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
  333. ### Instruction:
  334. {data_point["instruction"]}
  335. ### Response:
  336. {data_point["output"]}""" # noqa: E501
  337. if __name__ == "__main__":
  338. fire.Fire(train)

六、问题问答

  6.1、model 已经使用了load_in_8bit加载后为何又使用prepare_model_for_int8_training 重复执行一次?

  1. model = AutoModelForCausalLM.from_pretrained(
  2. base_model,
  3. load_in_8bit=True, # 模型静态量化加载, 节约内容占用, 后面在训练阶段还会对模型进行动态量化
  4. # 没有被量化的层加载时用 fp16, 需要注意的是, 这些没有被量化层仍然会参与训练, 因此需要用 prepare_model_for_kbit_training 进一步将这些层的 requires_grad 设置为 False
  5. torch_dtype=torch.float16,
  6. # TODO 修改注释掉代码以适用deepspeed
  7. device_map=device_map,
  8. )
  9. model = prepare_model_for_int8_training(model, use_gradient_checkpointing=use_gradient_checkpointing)
  • load_in_8bit=True是在从预训练模型加载时使用的,它将模型的参数量化成8位整数,以减少内存占用。这是一种静态量化的方法。需要注意的是: 静态量化中没有被量化的层仍然会参与训练(即requires_grad 为 True), 故需要用动态量化进一步将这些层的 requires_grad 设置为 False
  • prepare_model_for_int8_training(model, use_gradient_checkpointing=use_gradient_checkpointing)是一个额外的量化步骤,它使用动态量化的方法,在训练期间对模型参数进行实时量化。这样可以进一步优化模型的内存使用,并在训练过程中获得更好的性能。
  • 具体来说,动态量化可以根据每个batch的数据特征,对相应的权重参数进行实时的量化,从而获得更精确的量化效果。这种方法通常能够获得比静态量化更好的模型性能。所以这两步量化的目的是不同的:
    • 静态量化是为了减少加载预训练模型时的内存占用。
    • 动态量化是为了在训练过程中进一步优化内存使用和性能。

  6.2、int8 traing int4 traing与deepspeed V100显卡的匹配问题(待验证)

deepspeed相关

报错信息解决思路(参考)
RuntimeError: CUDA error: an illegal memory access was encountered.CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrectpytorch/pytorch#21819
RuntimeError: Error building extension 'fused_adam'sudo ln -s /usr/local/cuda/lib64/libcudart.so /usr/lib/libcudart.so
RuntimeError: expected scalar type Float but found Halfuse_int8_training和deepspeed不能同时指定
RuntimeError: expected scalar type Float but found HalfV100显卡上 use_int8_training和fp16不能同时指定

transformers 相关 

报错信息参考
AutoTokenizer.from_pretrained("llama_model_path")出现递归error
RecursionError: maximum recursion depth exceeded
有可能是transformers版本的问题,对于LlamaModel,可采用LlamaTokenizer加载
torch.distributed.distributed_c10d.init_process_group() got multiple values for keyword argument 'backend'transformers降低版本至4.28.1

其他问题

报错信息参考

V100机器上8bit量化训练失败或loss不正常

Facico/Chinese-Vicuna#39
TimDettmers/bitsandbytes#100
mymusise/ChatGLM-Tuning#19
tloen/alpaca-lora#170
​​​​​​​
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': . Use repo_type argument if needed.这是因为docker容器内访问不到model_name_or_path,需要挂载到物理机对应的目录。

 这里给出一些实验建议:

  1. 不开deepspeed会占用更多显存,建议全量参数finetune模式尽可能采用deepspeed
  2. LoRA训练如果采用8bit量化,就不能使用deepspeed;如果使用deepspeed,就不能指定use_int8_training
  3. use_int8_training use_int4_training 使用安培架构的显卡训练,如3090 4090 A40 A100 等

关于deepspeed的配置可参考:

  1. microsoft/DeepSpeed#2187​​​​​​​
  2. Installation Details - DeepSpeed
  3. pyg-team/pytorch_geometric#1001

github 原文链接​​​​​​​BELLE/train/docs/FAQ.md at main · LianjiaTech/BELLE · GitHubBELLE: Be Everyone's Large Language model Engine(开源中文对话大模型) - BELLE/train/docs/FAQ.md at main · LianjiaTech/BELLEicon-default.png?t=N7T8https://github.com/LianjiaTech/BELLE/blob/main/train/docs/FAQ.md

待更新....

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Li_阴宅/article/detail/924821
推荐阅读
相关标签
  

闽ICP备14008679号