赞
踩
主要使用了之前用numpy写好的神经网络层,
numpy_transformer/net at master · ZouJiu1/numpy_transformer (github.com),包括了full connect、softmax、embedding、attention、decoder层,详细的内容见下面的内容。train准确率可以达到96%。
九是否随意的称呼:numpy实现embedding层的前向传播和反向传播
损失函数的前向传播和反向传播 - 知乎 (zhihu.com)
全连接层的前向传播和反向传播 - 知乎 (zhihu.com)
九是否随意的称呼:numpy实现layernorm的前向传播和反向传播
九是否随意的称呼:numpy实现multi-attention层的前向传播和反向传播
九是否随意的称呼:损失bce和bcewithlogits函数的前向传播和反向传播
九是否随意的称呼:transformer网络内attention使用的mask
数据是来自:
Werneror/Poetry: 非常全的古诗词数据,收录了从先秦到现代的共计85万余首古诗词。 (github.com)
使用
hanlp 分词然后统计了词频的,并根据词频做了排序,拿到前3000个高频词的,遍历所有的诗词,不使用包含其他词的诗词,筛选出来了共6000行诗词
train_3000.txt,用来train的,同样拿到前6000个高频词,去掉包含其他词的诗词,就筛选出来了共30000行诗词
这的话使用的是 train_3000.txt这个档案的,也就是训练了前3000个高频词所在的诗词。分词使用的program是
numpy_lstm_RNN/tokenlzh.py at master · ZouJiu1/numpy_lstm_RNN (github.com)。
gpt的decoder,首先是position embedding + word embedding,position embedding使用了固定的cos值,word embedding则使用了常规的embedding层,也就是PatchEmbed.py这个档案内的Position_Embedding层。
然后就是接了decoder层attdecoderblock_layer,softmax的反向传播需要特别注意的,需要使用到雅可比矩阵求导,而且不能直接累加,还需要和损失函数传过来的梯度做矩阵运算。也就是attdecoderblock.py这个档案的,包括了decoder的前向传播和反向传播的。
python gpt_train_potry3000.py
run
python gpt_predict_poetrythree.py
训练好的模型产生的古诗词,使用的模型是 gpt_poetry3000_iters1999_1_loss_3259.634242.pkl,脚本是gpt_predict_poetrythree.py
妾如江边花,君如江上水。花落随水流,东风吹不起。 妾家横塘东,与郎乍相逢。郎来不须问, 门外植梧桐。 昼静暖风微,帘垂客到稀。画梁双燕子,不敢傍人飞。 水抱孤村远,山通一
去愁架子之酒楼。 危桥当古寺,闲倚喜同僧。极浦霁秋雨,扁舟明夜灯。风沈人语远,潮涨月华升。万事空凝念,其如总未能。 松间灯夕过,顾影在天涯。雪暝迷归鹤,春寒误早花。艰难知世味,贫病厌年华。故国
拂马似尘飞。 叶浓知柳密,花尽觉梅疏。兰生未可握,蒲小不堪书。 梅含今春树,还临先日池。人怀前岁忆, 花发故年枝。 池平生已合,林花发稍稠。风入花枝动,日照水光浮。 空庭高楼月,非复三五圆。 一戍鸣烟直,平沙落日迟。 露寒金掌重,天近玉绳低。 惊蝉移古柳,斗雀堕寒庭。 鹤传沧海信,僧和白云诗。 鸟暝风沉角,天清月上旗。 多年不道姓,几日旋移家。 喧风生木末,迟景入泉心。 湘云随雁断, 其如旅病牵。抱琴传此意,栖岳计何年。暮倚中流楫,闻歌徒自怜。 侍史趋清禁,承恩下直庐。瓶馀赊得酒, 架积赐来书。刻凤才何有,雕虫习未除。由来少尘事,寂寞意何如。 锦石带寒英,秋光澹客情。色增 一天渠终更谁先,聊复怜渠与酒钱。富贵不愁天不管,不应丘壑也关天。 雨涨平池绿似淮,半扉春水眼慵开。 无钱得买扁舟去,莫道旧来今不来。相望千家信不通,悬知春在雨声中。蹇驴欲去愁泥滑,安得西飞六尺 谁寞经旬见此枝。 花开犹未报人知,花下行吟漫自思。花若能言应笑我,年年无酒只题诗。 晚发西山舟欲北, 天风吹我复还家。十年一到非容易,独立平原看稻花。 新开竹径贮秋多,携酒烦公每见过。月出未高公已 去愁架子之酒楼。 危桥当古寺,闲倚喜同僧。极浦霁秋雨,扁舟明夜灯。风沈人语远,潮涨月华升。万事空凝念, 其如总未能。 松间灯夕过,顾影在天涯。雪暝迷归鹤,春寒误早花。艰难知世味,贫病厌年华。故国 忽孙共读书。 云沈秋驿雨,鸡送晓窗灯。 门当车马道,帘隔利名尘。 云开千里月,风动一天星。 绿涨他山雨, 青浮近市烟。 月色四时好,人心此夜偏。 春水有秀色,野云无俗姿。 出处自有时,人生安得偕 归弱。 问落莫空山里,唤入诗人几案来。 云欲开时又不开,问天觅阵好风催。雨无多落泥偏滑,溪不胜深岸故颓。 添尽红炉著尽衣,一杯方觉暖如痴。人言霜后寒无奈,春在瓮中渠不知。 梅不嫌疏杏要繁,主人何 黯其将雨。嗟我怀人,道修且阻。眷此区区,俯仰再抚。良辰过鸟,逝不我伫。 意不若义,义不若利。利之使人, 能忘生死。利不若义,义不若意。意之使人,能动天地。 居暗观明,居静观动。居简观繁,居轻 更长烛屡花。一轮观浴兔,两部听鸣蛙。 诚能得初心,何必返初服。有以固中扃,不须防外逐。 野驿人稀到, 空庭草自生。霜清殊未觉,雨细更含晴。 老屋愁风破,空林过雨乾。飘零黄叶满,寂寞野香残 月在画楼西。 妾如江边花,君如江上水。花落随水流,东风吹不起。 妾家横塘东,与郎乍相逢。郎来不须问, 门外植梧桐。 昼静暖风微,帘垂客到稀。画梁双燕子,不敢傍人飞。 水抱孤村远,山通一 何山起暮馀。当庭波始阔,峡水月常阴。魂梦犹难到,愁君白发侵。 水如树欲静,滩如风不宁。百里断肠声, 当年游子听。一往不可复,此行安所欲。千古流水心,耿耿在幽独。 疏放难违性,苔荒野巷深。到门黄叶雨 片处处云生。 零落雪霜后,犹含千载春。一株化$石,谁是种时人。 佛心随处见,层出更分明。不用催灯火, 天高月自生。 乾坤皆数五,日月正符同。但仰重离照,难名厚载功。 水畔幡竿险,分符得异恩。 仰彼苍苍可奈何。浊酒一杯愁未解,唾壶击碎不成歌。 木犀香透越山云,记得根从海上分。恨杀西风夜来恶, 一枝摧处正愁君。 天意于人有浅深,人于天意岂容心。一行一止惟时耳,此道堂堂古到今。 丹鼎刀圭炼 镜日上。我怀前岁忆,花发故年枝。 池平生已合,林花发稍稠。风入花枝动,日照水光浮。 空庭高楼月, 非复三五圆。何须照床里,终是一人眠。 别怨凄歌响,离啼湿舞衣。愿假乌栖曲,翻从南向飞。 三洲断江口 月上秋来醉,空斋夜落声。隔床惊昨梦,隐几话平生。灯净书还读,香销句忽成。他年相望处,吾亦用吾情。 羡棹吴松曲,来寻独冷盟。误听对床雨,唤作打篷声。漏缓更筹滴,春从水驿生。晓云驱宿翳,我欲趁新晴
import os abspath = os.path.abspath(__file__) filename = os.sep.join(abspath.split(os.sep)[-2:]) abspath = abspath.replace(filename, "") import sys sys.path.append(abspath) from net.loss import cross_entropy_loss import numpy as np import pickle from net.layernorm import layer_norm from PatchEmbed import Position_Embedding from attdecoderblock import attdecoderblock_layer from net.layernorm import layer_norm from net.fullconnect import fclayer from gpt.gpt_linear import gpt_linear_layer import re from classify import classify_layer from net.flatten import flatten_layer from copy import deepcopy import json # https://en.wikipedia.org/wiki/AlexNet # https://pytorch.org/vision/stable/_modules/torchvision/models/alexnet.html#alexnet # https://github.com/l5shi/Image-Recognition-on-MNIST-dataset/blob/master/AlexNet.ipynb def getdata(): dataset = os.path.join(abspath, 'dataset') os.makedirs(dataset, exist_ok=True) id2char_char2id = os.path.join(abspath, 'dataset', r"gptpoetry3000.json") # inpath = os.path.join(abspath, 'dataset', r"train_10000.txt") inpath = r'C:\Users\10696\Desktop\access\numpy_transformer\dataset\train_3000.txt' with open(inpath, 'r', encoding='utf-8') as obj: readcontent = obj.read() kk = [i if i!='\n' else " " for i in readcontent] kk = "".join(kk) kk = re.sub(r' ', " ", kk) kk = re.sub(r' ', " ", kk) kk = list(kk) # inpath = os.path.join(abspath, 'dataset', r"train_token_1000.txt") # with open(inpath, 'r', encoding='utf-8') as obj: # for i in obj.readlines(): # kk.extend(i.strip().split(" ")) while '□' in kk: kk.remove("□") unique = np.unique(kk) length = len(unique) id2char = {i:char for i, char in enumerate(unique)} char2id = {char:i for i, char in enumerate(unique)} if not os.path.exists(id2char_char2id): with open(id2char_char2id, 'w', encoding='utf-8') as obj: json.dump({"id2char":id2char, 'char2id':char2id}, obj, indent=2, separators=(",", ":"), ensure_ascii=False) else: with open(id2char_char2id, 'r', encoding='utf-8') as obj: jsonfile = json.load(obj) id2chark = jsonfile["id2char"] char2id = jsonfile["char2id"] length = len(id2char) id2char = {} for key, value in id2chark.items(): id2char[int(key)] = value return length, id2char, char2id, kk def create_masks_future(inputs): # future n, sequence_length = inputs.shape input_mask = np.tril(np.ones((sequence_length, sequence_length))) input_mask[input_mask==0] = -np.inf # input_mask[input_mask==0] = -1e6 input_mask[input_mask==1] = 0 return input_mask def create_masks_pad(input_mask): # pad input_mask = np.array(input_mask) n, sequence_length = input_mask.shape k1 = input_mask[:, None, :] k2 = np.ones_like(input_mask)[:, :, None] k = k1 * k2 k = (1.0 - k) k[k==1.0] = -np.inf return k # k = create_masks_pad([[1, 1, 1, 1, 1, 0, 0], [1, 1, 1, 1, 1, 1, 0]]) def getinputs(context_length, batchsize, input_texts, char2id, id2char): inputs = [] label = [] input_mask = [] id_start = np.random.randint(0, len(input_texts) - context_length -1, (batchsize)) markedchar = [',', '. '] for id in id_start: tmp = [char2id[ci] for ci in input_texts[id : id + context_length + 1]] # inputchar = "".join([id2char[ci] for ci in tmp]) # input_mask.append([1 for ci in range(context_length-1)]) # input_mask[-1].extend([0]) inputs.append(tmp[:-1]) label.append(tmp[1:]) inputs = np.array(inputs) if len(input_mask)==0: input_mask = np.ones_like(inputs) input_mask_fut = create_masks_future(inputs) # input_mask_pad = create_masks_pad(input_mask) input_mask = input_mask_fut label_single = np.array(label) #.reshape(-1) return inputs, input_mask, label_single def transformer_image_train(): vocab_size, id2char, char2id, input_texts = getdata() all_steps = 3000 - 1000 batchsize = 63 + 1 learning_rate = 0.003 # batchsize embed_dim = 192 ## vocab_size if vocab_size%3==0 else (vocab_size//3) * 3 + 3 # 192 num_layer = 10 + 1 + 1 num_h = [3] * num_layer context_length = 100 ADAM = True cls_token = True float32 = True logfile = os.path.join(logdir, 'log_gpt_poetry3000.txt') fpwrite = open(logfile, 'w', encoding='utf-8') patchemb = Position_Embedding(context_length, vocab_size, embed_dim, adam=ADAM) layers = [patchemb] at0 = attdecoderblock_layer(embed_dim, num_h[0], adam=ADAM, float32=float32) at1 = attdecoderblock_layer(embed_dim, num_h[1], adam=ADAM, float32=float32) at2 = attdecoderblock_layer(embed_dim, num_h[2], adam=ADAM, float32=float32) at3 = attdecoderblock_layer(embed_dim, num_h[3], adam=ADAM, float32=float32) at4 = attdecoderblock_layer(embed_dim, num_h[4], adam=ADAM, float32=float32) at5 = attdecoderblock_layer(embed_dim, num_h[5], adam=ADAM, float32=float32) at6 = attdecoderblock_layer(embed_dim, num_h[6], adam=ADAM, float32=float32) at7 = attdecoderblock_layer(embed_dim, num_h[7], adam=ADAM, float32=float32) at8 = attdecoderblock_layer(embed_dim, num_h[8], adam=ADAM, float32=float32) at9 = attdecoderblock_layer(embed_dim, num_h[9], adam=ADAM, float32=float32) at10 = attdecoderblock_layer(embed_dim, num_h[10], adam=ADAM, float32=float32) at11 = attdecoderblock_layer(embed_dim, num_h[11], adam=ADAM, float32=float32) # at12 = attdecoderblock_layer(embed_dim, num_h[12], adam=ADAM, float32=float32) # at13 = attdecoderblock_layer(embed_dim, num_h[13], adam=ADAM, float32=float32) # layers += [at0, at1, at2, at3, at4, at5, at6, at7, at8, at9, at10, at11, at12] layers += [at0, at1, at2, at3, at4, at5, at6, at7, at8, at9, at10, at11] # layers += [at0, at1, at2, at3, at4, at5, at6] norm = layer_norm(embed_dim, adam=ADAM) # if not cls_token: # cll = classify_layer(embed_dim, batchsize, 1, vocab_size, cls_token, adam=ADAM, relu=False, float32=float32) # else: cll = fclayer(embed_dim, vocab_size, True, adam=ADAM, float32=float32) layers += [norm, cll] datapath = os.path.join(abspath, 'dataset') os.makedirs(datapath, exist_ok=True) modelpath = os.path.join(abspath, 'gpt', 'model') os.makedirs(modelpath, exist_ok=True) if os.path.exists(pretrained_model): with open(pretrained_model, 'rb') as obj: models = pickle.load(obj) cnt = 0 for l in layers: k = dir(l) if 'restore_model' in k and 'save_model' in k: l.restore_model(models[cnt]) cnt += 1 del models alliter = 0 lr = learning_rate start_epoch = 1 try: if os.path.exists(pretrained_model): start_epoch = int(pretrained_model.split(os.sep)[-1].split("_")[3]) + 1 except: start_epoch = 1 while alliter < all_steps: meanloss = 0 jk = 0 pre_col = [] while True: if alliter > all_steps: break if alliter <= 100: lr = learning_rate * alliter / 100 if alliter==23*all_steps//30: lr = learning_rate * 0.1 elif alliter==28*all_steps//30: lr = learning_rate * 0.1 * 0.1 alliter += 1 jk += 1 inputs, input_mask, label_single = getinputs(context_length, batchsize, input_texts, char2id, id2char) for l in range(len(layers)): if isinstance(layers[l], attdecoderblock_layer): inputs = layers[l].forward(inputs, input_mask) else: inputs = layers[l].forward(inputs) ishape = inputs.shape inputs = np.reshape(inputs, (-1, vocab_size)) labels = np.zeros_like(inputs) labels[np.arange(len(inputs)), label_single.reshape(-1)] = 1 loss, delta, predict = cross_entropy_loss(inputs, labels) # loss = loss * batchsize # delta = delta * batchsize delta = np.reshape(delta, ishape) # delta = np.zeros_like(inputs) # loss = 0 # predict = np.zeros_like(inputs[0]) # for ik in range(batchsize): # labels = np.zeros_like(inputs[ik]) # labels[np.arange(len(inputs[ik])), label_single[ik]] = 1 # losskkk, deltakkk, predictkkk = cross_entropy_loss(inputs[ik], labels) # delta[ik, :, :] = deltakkk # loss += losskkk # predict = np.concatenate([predict, predictkkk], axis = 0) # predict = predict[32*16//2:, :] # delta *= batchsize # loss *= batchsize for l in range(len(layers)-1, -1, -1): delta = layers[l].backward(delta) layers[l].update(lr) layers[l].setzero() p = np.argmax(predict, axis=-1) precision = np.sum(label_single.reshape(-1)==p) / len(p) pre_col.append(precision) meanloss += loss i = alliter * (context_length + 1) // len(input_texts) if alliter%30==0: inputs, input_mask, label_single = getinputs(context_length, batchsize, input_texts, char2id, id2char) for l in range(len(layers)): if isinstance(layers[l], attdecoderblock_layer): inputs = layers[l].forward(inputs, input_mask) else: inputs = layers[l].forward(inputs) ishape = inputs.shape inputs = np.reshape(inputs, (-1, vocab_size)) labels = np.zeros_like(inputs) labels[np.arange(len(inputs)), label_single.reshape(-1)] = 1 # k = np.sum(labels, axis = -1) _, _, predict = cross_entropy_loss(inputs, labels) p = np.argmax(predict, axis=-1) valpre = np.sum(label_single.reshape(-1)==p) / len(p) output = ''.join([id2char[int(ij)] for ij in p[:(len(p)//batchsize)]]) + "\n" else: output = "\n" valpre = 0 fpwrite.write("epoch:{}, lr: {:.6f}, loss: {:.6f}, iters: {}, precision: {:.6f}, valpre: {:.6f}\n{}". \ format(i, lr, loss, str(jk) +"_"+ str(alliter), precision, valpre, output)) fpwrite.flush() # savemodel if (alliter + 1) % 100==0: allmodel = [] for l in layers: k = dir(l) if 'restore_model' in k and 'save_model' in k: allmodel.append(l.save_model()) name = f"gpt_poetry3000_iters{alliter}_"+str(i)+"_loss_"+str(round(meanloss, 6))+".pkl" with open(os.path.join(modelpath, name), 'wb') as obj: pickle.dump(allmodel, obj) meanloss /= jk fpwrite.write("epoch: {}, {}\n\n".format(i, ''.join(output[:200]))) fpwrite.flush() fpwrite.close() if __name__ =="__main__": savepath = abspath pretrained_model = r'' logdir = os.path.join(savepath, 'gpt', 'log') os.makedirs(logdir, exist_ok=True) transformer_image_train() ''' https://github.com/google-research/vision_transformer/blob/main/vit_jax/models_vit.py https://github.com/UdbhavPrasad072300/Transformer-Implementations/blob/main/notebooks/MNIST%20Classification%20-%20ViT.ipynb https://github.com/s-chh/PyTorch-Vision-Transformer-ViT-MNIST/tree/main https://itp.uni-frankfurt.de/~gros/StudentProjects/WS22_23_VisualTransformer/ https://jamesmccaffrey.wordpress.com/2023/01/10/a-naive-transformer-architecture-for-mnist-classification-using-pytorch/ https://medium.com/mlearning-ai/vision-transformers-from-scratch-pytorch-a-step-by-step-guide-96c3313c2e0c https://github.com/BrianPulfer/PapersReimplementations/blob/main/vit/vit_torch.py https://github.com/microsoft/Swin-Transformer https://huggingface.co/docs/transformers/v4.27.0/model_doc/vit '''
https://github.com/google-research/vision_transformer/blob/main/vit_jax/models_vit.py
https://github.com/BrianPulfer/PapersReimplementations/blob/main/vit/vit_torch.py
https://huggingface.co/docs/transformers
https://github.com/microsoft/Swin-Transformer
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。