当前位置:   article > 正文

【NLP】kaggle临床患者病历评分比赛baseline_module.weight.data[module.padding_idx].zero_()

module.weight.data[module.padding_idx].zero_()

来源:投稿 作者:William

编辑:学姐

William

研究生毕业于美国TOP20大学

现就职国内某互联网大厂

赛题分析+baseline

 

1、赛题链接

https://www.kaggle.com/c/nbme-score-clinical-patient-notes

2、赛题描述

本次竞赛的目标是通过建立一个模型来将不同病情的临床表现在病例中找出,具体而言就是将病情描述+病人病例 一起输入模型, 在病例中提取出对应的span位置。

商业价值:帮助医生可以快速的定位病人的病情,从而对症下药

※ 比赛时间线

  • 2022 2月 1日 年 - 开始日期。

  • 2022 4 月 26 日 年 - 报名截止日期。 您必须在此日期之前接受比赛规则才能参加比赛。

  • 2022 4 月 26 日 年 - 团队合并截止日期。 这是参与者可以加入或合并团队的最后一天。

  • 2022 5 月 3 日 年 - 最终提交截止日期。

※ 丰厚的奖金

  • 一等奖:15,000美元

  • 二等奖:10,000美元

  • 三等奖:8,000美元

  • 四等奖:7,000美元

  • 五等奖:5,000美元

  • 六等奖:5,000美元

3、数据描述

本次比赛提供了5份数据分别是 train, test, features, patient_notes, submission, 其中test, submission为提交答案时用。

重点是如下3个文件

train文件标记了每个病例中,不同症状的相关描述

features中给出了所有病症的名称和id

patient_notes中给出了每份病例的详细描述

3.1 训练数据分析:

  • id - 一个unique 标记符来表示 patient note number - feature number对.

  • pn_num - patient note number 可以当成病例号.

  • feature_num - feature number 可以看作不同病症的一个id.

  • case_num - 病例所属的case id 之后会用来关联起病人patient note的文本描述和对应症状的文本描述.

  • annotation - patient note中体现相关症状的描述, 一个病例中可能对同一个疾病症状存在多处描述.

  • location - annotation所在的病例中的char 级别的位置.

其中——

  1. Number of rows in train data: 14300
  2. Number of columns in train data: 6
  3. Number of values in train data: 85800
  4. Number of Empty annotions and locations =  4399

在标记数据中,有的病情会在病例中有多处体现,具体的分布如下:

 

可以看到大部分病情之在病例中有一处对应,大约有4399条在原文中并没有找到对应的部分。

平均annotation 的char 级别长度为16.53,具体的分布如下:

 

  1. Columns Name以其释意
  2. pn_num - patient note 病例id.
  3. case_num - case num 用来关联起病人patient note的文本描述和对应症状的文本描述.
  4. Number of rows in 病例: 42146
  5. Number of columns in 病例: 3
  6. Number of values in 病例: 126438

下面是这个病例的内容:

 

病例的char级别的长度分布如下

 

  1. feature_num - feature number 可以看作不同病症的一个id.
  2. case_num - 病例所属的case id 之后会用来关联起病人patient note的文本描述和对应症状的文本描述.
  3. feature_text - 疾病文本描述
  4. Number of rows in feature: 143
  5. Number of columns in feature: 3
  6. Number of values in feature: 429

其实feature就是疾病对应的专业描述,像下面这个例子

'Family-history-of-MI-OR-Family-history-of-myocardial-infarction'

病例的char级别的长度分布如下,平均char级别长度为23

 

4、评价指标

micro-averaged:

https://scikit-learn.org/stable/modules/model_evaluation.html#from-binary-to-multiclass-and-multilabel

F1 :

https://scikit-learn.org/stable/modules/model_evaluation.html#precision-recall-f-measure-metrics

5、构建训练数据

下面演示如何将3个数据merge到一起, 具体可以详见baseline代码,里面有更为详细的介绍。

领baseline代码

关注【学姐带你玩AI】公众号

回复“top”添加小享领取

 

  1. import pandas as pd
  2. import ast
  3. train = pd.read_csv('train.csv')
  4. train['annotation'] = train['annotation'].apply(ast.literal_eval)
  5. train['location'] = train['location'].apply(ast.literal_eval)
  6. features = pd.read_csv('features.csv')
  7. train = train.merge(features, on=['feature_num', 'case_num'], how='left')
  8. train = train.merge(patient_notes, on=['pn_num', 'case_num'], how='left')

merge以后数据张这样子, feature text是疾病名称,pn history就是病例annotation就是对应的体现这种疾病的具体描述。

 

根据label 生成 tensor,结合代码,我们看一下label 数据是怎么生成的

举个例子(不一定对应真实数据,仅作为说明使用):

  1. PN:17-year-old male, has come to the student health clinic complaining of heart pounding. Mr. Cleveland's mother has given verbal consent for a history, physical examination, and treatment
  2. -began 2-3 months ago,sudden,intermittent for 2 days(lasting 3-4 min),worsening,non-allev/aggrav
  3. -associated with dispnea on exersion and rest,stressed out about school
  4. -reports fe feels like his heart is jumping out of his chest
  5. -ros:denies chest pain,dyaphoresis,wt loss,chills,fever,nausea,vomiting,pedal edeam
  6. -pmh:non,meds :aderol (from a friend),nkda
  7. -fh:father had MI recently,mother has thyroid dz
  8. -sh:non-smoker,mariguana 5-6 months ago,3 beers on the weekend, basketball at school
  9. -sh:no std
  10. feature:'Family-history-of-MI-OR-Family-history-of-myocardial-infarction'
  11. annotation: father had MI recently
  12. 我们需要做的就是把 pn 中 tokenized 以后原始father had MI recently 这个标记位置对应
  13. 的token位置标记为1,其他位置标记为0,cls,sep pading位置标记为-1

  1. def create_label(cfg, text, annotation_length, location_list):
  2.    encoded = cfg.tokenizer(text,
  3.                            add_special_tokens=True,
  4.                            max_length=CFG.max_len,
  5.                            padding="max_length",
  6.                            return_offsets_mapping=True)
  7.    offset_mapping = encoded['offset_mapping']
  8.    
  9.    ignore_idxes = np.where(np.array(encoded.sequence_ids()) != 0)[0]
  10.    label = np.zeros(len(offset_mapping))
  11.    label[ignore_idxes] = -1
  12.    if annotation_length != 0:
  13.        for location in location_list:
  14.            for loc in [s.split() for s in location.split(';')]:
  15.                start_idx = -1
  16.                end_idx = -1
  17.                start, end = int(loc[0]), int(loc[1])
  18.                for idx in range(len(offset_mapping)):
  19.                    if (start_idx == -1) & (start < offset_mapping[idx][0]):
  20.                        start_idx = idx - 1
  21.                    if (end_idx == -1) & (end <= offset_mapping[idx][1]):
  22.                        end_idx = idx + 1
  23.                if start_idx == -1:
  24.                    start_idx = end_idx
  25.                if (start_idx != -1) & (end_idx != -1):
  26.                    label[start_idx:end_idx] = 1
  27.    return torch.tensor(label, dtype=torch.float)

6、Baseline流程

1、加载数据,切分CV,定义dataloader

  1. from transformers.models.deberta_v2 import DebertaV2TokenizerFast
  2. tokenizer = DebertaV2TokenizerFast.from_pretrained(CFG.model)
  3. #tokenizer = AutoTokenizer.from_pretrained(CFG.model)
  4. CFG.tokenizer = tokenizer
  5. # ====================================================
  6. # Define max_len
  7. # ====================================================
  8. for text_col in ['pn_history']:
  9.    pn_history_lengths = []
  10.    tk0 = tqdm(patient_notes[text_col].fillna("").values, total=len(patient_notes))
  11.    for text in tk0:
  12.        length = len(tokenizer(text, add_special_tokens=False)['input_ids'])
  13.        pn_history_lengths.append(length)
  14.    LOGGER.info(f'{text_col} max(lengths): {max(pn_history_lengths)}')
  15. for text_col in ['feature_text']:
  16.    features_lengths = []
  17.    tk0 = tqdm(features[text_col].fillna("").values, total=len(features))
  18.    for text in tk0:
  19.        length = len(tokenizer(text, add_special_tokens=False)['input_ids'])
  20.        features_lengths.append(length)
  21.    LOGGER.info(f'{text_col} max(lengths): {max(features_lengths)}')
  22. CFG.max_len = max(pn_history_lengths) + max(features_lengths) + 3 # cls & sep & sep
  23. LOGGER.info(f"max_len: {CFG.max_len}")
  24. # ====================================================
  25. # Dataset
  26. # ====================================================
  27. def prepare_input(cfg, text, feature_text):
  28.    inputs = cfg.tokenizer(text, feature_text,
  29.                           add_special_tokens=True,
  30.                           max_length=CFG.max_len,
  31.                           padding="max_length",
  32.                           return_offsets_mapping=False)
  33.    for k, v in inputs.items():
  34.        inputs[k] = torch.tensor(v, dtype=torch.long)
  35.    return inputs
  36. def create_label(cfg, text, annotation_length, location_list):
  37.    encoded = cfg.tokenizer(text,
  38.                            add_special_tokens=True,
  39.                            max_length=CFG.max_len,
  40.                            padding="max_length",
  41.                            return_offsets_mapping=True)
  42.    offset_mapping = encoded['offset_mapping']
  43.    ignore_idxes = np.where(np.array(encoded.sequence_ids()) != 0)[0]
  44.    label = np.zeros(len(offset_mapping))
  45.    label[ignore_idxes] = -1
  46.    if annotation_length != 0:
  47.        for location in location_list:
  48.            for loc in [s.split() for s in location.split(';')]:
  49.                start_idx = -1
  50.                end_idx = -1
  51.                start, end = int(loc[0]), int(loc[1])
  52.                for idx in range(len(offset_mapping)):
  53.                    if (start_idx == -1) & (start < offset_mapping[idx][0]):
  54.                        start_idx = idx - 1
  55.                    if (end_idx == -1) & (end <= offset_mapping[idx][1]):
  56.                        end_idx = idx + 1
  57.                if start_idx == -1:
  58.                    start_idx = end_idx
  59.                if (start_idx != -1) & (end_idx != -1):
  60.                    label[start_idx:end_idx] = 1
  61.    return torch.tensor(label, dtype=torch.float)
  62. class TrainDataset(Dataset):
  63.    def __init__(self, cfg, df):
  64.        self.cfg = cfg
  65.        self.feature_texts = df['feature_text'].values
  66.        self.pn_historys = df['pn_history'].values
  67.        self.annotation_lengths = df['annotation_length'].values
  68.        self.locations = df['location'].values
  69.    def __len__(self):
  70.        return len(self.feature_texts)
  71.    def __getitem__(self, item):
  72.        inputs = prepare_input(self.cfg,
  73.                               self.pn_historys[item],
  74.                               self.feature_texts[item])
  75.        label = create_label(self.cfg,
  76.                             self.pn_historys[item],
  77.                             self.annotation_lengths[item],
  78.                             self.locations[item])
  79.        return inputs, label

2、定义模型

  1. # ====================================================
  2. # Model
  3. # ====================================================
  4. class CustomModel(nn.Module):
  5.    def __init__(self, cfg, config_path=None, pretrained=False):
  6.        super().__init__()
  7.        self.cfg = cfg
  8.        if config_path is None:
  9.            self.config = AutoConfig.from_pretrained(cfg.model, output_hidden_states=True)
  10.        else:
  11.            self.config = torch.load(config_path)
  12.        if pretrained:
  13.            self.model = AutoModel.from_pretrained(cfg.model, config=self.config)
  14.        else:
  15.            self.model = AutoModel(self.config)
  16.        self.fc_dropout_0 = nn.Dropout(0.1)
  17.        self.fc_dropout_1 = nn.Dropout(cfg.fc_dropout)
  18.        self.fc_dropout_2 = nn.Dropout(0.3)
  19.        self.fc_dropout_3 = nn.Dropout(0.4)
  20.        self.fc_dropout_4 = nn.Dropout(0.5)
  21.        self.fc = nn.Linear(self.config.hidden_size, 1)
  22.        self._init_weights(self.fc)
  23.        
  24.    def _init_weights(self, module):
  25.        if isinstance(module, nn.Linear):
  26.            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
  27.            if module.bias is not None:
  28.                module.bias.data.zero_()
  29.        elif isinstance(module, nn.Embedding):
  30.            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
  31.            if module.padding_idx is not None:
  32.                module.weight.data[module.padding_idx].zero_()
  33.        elif isinstance(module, nn.LayerNorm):
  34.            module.bias.data.zero_()
  35.            module.weight.data.fill_(1.0)
  36.        
  37.    def feature(self, inputs):
  38.        outputs = self.model(**inputs)
  39.        last_hidden_states = outputs[0]
  40.        return last_hidden_states
  41.    def forward(self, inputs):
  42.        feature = self.feature(inputs)
  43.        #output_0 = self.fc(self.fc_dropout_0(feature))
  44.        output_1 = self.fc(self.fc_dropout_1(feature))
  45.        #output_2 = self.fc(self.fc_dropout_2(feature))
  46.        #output_3 = self.fc(self.fc_dropout_3(feature))
  47.        #output_4 = self.fc(self.fc_dropout_4(feature))
  48.        output = output_1 #(output_0 + output_1 + output_2 + output_3 + output_4) / 5
  49.        return output

3、定义训练函数

  1. # ====================================================
  2. # Helper functions
  3. # ====================================================
  4. class AverageMeter(object):
  5.    """Computes and stores the average and current value"""
  6.    def __init__(self):
  7.        self.reset()
  8.    def reset(self):
  9.        self.val = 0
  10.        self.avg = 0
  11.        self.sum = 0
  12.        self.count = 0
  13.    def update(self, val, n=1):
  14.        self.val = val
  15.        self.sum += val * n
  16.        self.count += n
  17.        self.avg = self.sum / self.count
  18. def asMinutes(s):
  19.    m = math.floor(s / 60)
  20.    s -= m * 60
  21.    return '%dm %ds' % (m, s)
  22. def timeSince(since, percent):
  23.    now = time.time()
  24.    s = now - since
  25.    es = s / (percent)
  26.    rs = es - s
  27.    return '%s (remain %s)' % (asMinutes(s), asMinutes(rs))
  28. def train_fn(fold, train_loader, model, criterion, optimizer, epoch, scheduler, device):
  29.    model.train()
  30.    scaler = torch.cuda.amp.GradScaler(enabled=CFG.apex)
  31.    losses = AverageMeter()
  32.    start = end = time.time()
  33.    global_step = 0
  34.    for step, (inputs, labels) in enumerate(train_loader):
  35.        for k, v in inputs.items():
  36.            inputs[k] = v.to(device)
  37.        labels = labels.to(device)
  38.        batch_size = labels.size(0)
  39.        y_preds = model(inputs)
  40.        loss = criterion(y_preds.view(-1, 1), labels.view(-1, 1))
  41.        loss = torch.masked_select(loss, labels.view(-1, 1) != -1).mean()
  42.        if CFG.gradient_accumulation_steps > 1:
  43.            loss = loss / CFG.gradient_accumulation_steps
  44.        losses.update(loss.item(), batch_size)
  45.        loss.backward()
  46.        #grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), CFG.max_grad_norm)
  47.        if (step + 1) % CFG.gradient_accumulation_steps == 0:
  48.            optimizer.step()
  49.            optimizer.zero_grad()
  50.            global_step += 1
  51.            if CFG.batch_scheduler:
  52.                scheduler.step()
  53.        end = time.time()
  54.        if step % CFG.print_freq == 0 or step == (len(train_loader)-1):
  55.            print('Epoch: [{0}][{1}/{2}] '
  56.                  'Elapsed {remain:s} '
  57.                  'Loss: {loss.val:.4f}({loss.avg:.4f}) '
  58.                  'LR: {lr:.8f}  '
  59.                  .format(epoch+1, step, len(train_loader),
  60.                          remain=timeSince(start, float(step+1)/len(train_loader)),
  61.                          loss=losses,
  62.                          lr=scheduler.get_lr()[0]))
  63.        
  64.    return losses.avg
  65. def train_fn_adv(fold, train_loader, model, criterion, optimizer, epoch, scheduler, device):
  66.    model.train()
  67.    scaler = torch.cuda.amp.GradScaler(enabled=CFG.apex)
  68.    losses = AverageMeter()
  69.    start = end = time.time()
  70.    global_step = 0
  71.    fgm = FGM(model)
  72.    for step, (inputs, labels) in enumerate(train_loader):
  73.        for k, v in inputs.items():
  74.            inputs[k] = v.to(device)
  75.        labels = labels.to(device)
  76.        batch_size = labels.size(0)
  77.        y_preds = model(inputs)
  78.        loss = criterion(y_preds.view(-1, 1), labels.view(-1, 1))
  79.        loss = torch.masked_select(loss, labels.view(-1, 1) != -1).mean()
  80.        if CFG.gradient_accumulation_steps > 1:
  81.            loss = loss / CFG.gradient_accumulation_steps
  82.        losses.update(loss.item(), batch_size)
  83.        loss.backward()
  84.        grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), CFG.max_grad_norm)
  85.        if (step + 1) % CFG.gradient_accumulation_steps == 0:
  86.            fgm.attack()
  87.            # embedding参数被修改,此时,输入序列得到的embedding表征不一样                    
  88.            y_preds_adv = model(inputs)
  89.            loss_adv = criterion(y_preds_adv.view(-1, 1), labels.view(-1, 1))
  90.            loss_adv = torch.masked_select(loss_adv, labels.view(-1, 1) != -1).mean()
  91.            # 反向传播,并在正常的grad基础上,累加对抗训练的梯度
  92.            loss_adv.backward()
  93.            # 恢复embedding参数
  94.            fgm.restore()
  95.            optimizer.step()
  96.            optimizer.zero_grad()
  97.            global_step += 1
  98.            if CFG.batch_scheduler:
  99.                scheduler.step()
  100.        end = time.time()
  101.        if step % CFG.print_freq == 0 or step == (len(train_loader)-1):
  102.            print('Epoch: [{0}][{1}/{2}] '
  103.                  'Elapsed {remain:s} '
  104.                  'Loss: {loss.val:.4f}({loss.avg:.4f}) '
  105.                  'LR: {lr:.8f}  '
  106.                  .format(epoch+1, step, len(train_loader),
  107.                          remain=timeSince(start, float(step+1)/len(train_loader)),
  108.                          loss=losses,
  109.                          lr=scheduler.get_lr()[0]))
  110.        
  111.    return losses.avg
  112. # ====================================================
  113. # train loop
  114. # ====================================================
  115. def train_loop(folds, fold):
  116.    
  117.    LOGGER.info(f"========== fold: {fold} training ==========")
  118.    # ====================================================
  119.    # loader
  120.    # ====================================================
  121.    train_folds = folds[folds['fold'] != fold].reset_index(drop=True)
  122.    valid_folds = folds[folds['fold'] == fold].reset_index(drop=True)
  123.    valid_texts = valid_folds['pn_history'].values
  124.    valid_labels = create_labels_for_scoring(valid_folds)
  125.    
  126.    train_dataset = TrainDataset(CFG, train_folds)
  127.    valid_dataset = TrainDataset(CFG, valid_folds)
  128.    train_loader = DataLoader(train_dataset,
  129.                              batch_size=CFG.batch_size,
  130.                              shuffle=True,
  131.                              num_workers=CFG.num_workers, pin_memory=True, drop_last=True)
  132.    valid_loader = DataLoader(valid_dataset,
  133.                              batch_size=CFG.batch_size,
  134.                              shuffle=False,
  135.                              num_workers=CFG.num_workers, pin_memory=True, drop_last=False)
  136.    # calculate warm up steps
  137.    CFG.num_warmup_steps = int(CFG.num_warmup_steps * len(train_dataset) / CFG.batch_size * CFG.epochs)
  138.    # ====================================================
  139.    # model & optimizer
  140.    # ====================================================
  141.    model = CustomModel(CFG, config_path=None, pretrained=True)
  142.    torch.save(model.config, OUTPUT_DIR+'config.pth')
  143.    model.to(device)
  144.    
  145.    def get_optimizer_params(model, encoder_lr, decoder_lr, weight_decay=0.0):
  146.        param_optimizer = list(model.named_parameters())
  147.        no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
  148.        optimizer_parameters = [
  149.            {'params': [p for n, p in model.model.named_parameters() if not any(nd in n for nd in no_decay)],
  150.             'lr': encoder_lr, 'weight_decay': weight_decay},
  151.            {'params': [p for n, p in model.model.named_parameters() if any(nd in n for nd in no_decay)],
  152.             'lr': encoder_lr, 'weight_decay': 0.0},
  153.            {'params': [p for n, p in model.named_parameters() if "model" not in n],
  154.             'lr': decoder_lr, 'weight_decay': 0.0}
  155.        ]
  156.        return optimizer_parameters
  157.    optimizer_parameters = get_optimizer_params(model,
  158.                                                encoder_lr=CFG.encoder_lr,
  159.                                                decoder_lr=CFG.decoder_lr,
  160.                                                weight_decay=CFG.weight_decay)
  161.    optimizer = AdamW(optimizer_parameters, lr=CFG.encoder_lr, eps=CFG.eps, betas=CFG.betas)
  162.    
  163.    # ====================================================
  164.    # scheduler
  165.    # ====================================================
  166.    def get_scheduler(cfg, optimizer, num_train_steps):
  167.        if cfg.scheduler=='linear':
  168.            scheduler = get_linear_schedule_with_warmup(
  169.                optimizer, num_warmup_steps=cfg.num_warmup_steps, num_training_steps=num_train_steps
  170.            )
  171.        elif cfg.scheduler=='cosine':
  172.            scheduler = get_cosine_schedule_with_warmup(
  173.                optimizer, num_warmup_steps=cfg.num_warmup_steps, num_training_steps=num_train_steps, num_cycles=cfg.num_cycles
  174.            )
  175.        return scheduler
  176.    
  177.    num_train_steps = int(len(train_folds) / CFG.batch_size * CFG.epochs)
  178.    scheduler = get_scheduler(CFG, optimizer, num_train_steps)
  179.    # ====================================================
  180.    # loop
  181.    # ====================================================
  182.    criterion = nn.BCEWithLogitsLoss(reduction="none")
  183.    
  184.    best_score = 0.
  185.    for epoch in range(CFG.epochs):
  186.        start_time = time.time()
  187.        # train
  188.        avg_loss = train_fn(fold, train_loader, model, criterion, optimizer, epoch, scheduler, device)
  189.        # eval
  190.        avg_val_loss, predictions = valid_fn(valid_loader, model, criterion, device)
  191.        predictions = predictions.reshape((len(valid_folds), CFG.max_len))
  192.        
  193.        # scoring
  194.        char_probs = get_char_probs(valid_texts, predictions, CFG.tokenizer)
  195.        results = get_results(char_probs, th=0.5)
  196.        preds = get_predictions(results)
  197.        score = get_score(valid_labels, preds)
  198.        elapsed = time.time() - start_time
  199.        LOGGER.info(f'Epoch {epoch+1} - avg_train_loss: {avg_loss:.4f}  avg_val_loss: {avg_val_loss:.4f}  time: {elapsed:.0f}s')
  200.        LOGGER.info(f'Epoch {epoch+1} - Score: {score:.4f}')
  201.        
  202.        
  203.        if best_score < score:
  204.            best_score = score
  205.            LOGGER.info(f'Epoch {epoch+1} - Save Best Score: {best_score:.4f} Model')
  206.            torch.save({'model': model.state_dict(),
  207.                        'predictions': predictions},
  208.                        OUTPUT_DIR+f"{CFG.model.replace('/', '-')}_fold{fold}_best.pth")
  209.    predictions = torch.load(OUTPUT_DIR+f"{CFG.model.replace('/', '-')}_fold{fold}_best.pth",
  210.                             map_location=torch.device('cpu'))['predictions']
  211.    valid_folds[[i for i in range(CFG.max_len)]] = predictions
  212.    torch.cuda.empty_cache()
  213.    gc.collect()
  214.    del scheduler
  215.    del optimizer
  216.    del model
  217.    return valid_folds

4、定义eval函数

  1. def valid_fn(valid_loader, model, criterion, device):
  2.    losses = AverageMeter()
  3.    model.eval()
  4.    preds = []
  5.    start = end = time.time()
  6.    for step, (inputs, labels) in enumerate(valid_loader):
  7.        for k, v in inputs.items():
  8.            inputs[k] = v.to(device)
  9.        labels = labels.to(device)
  10.        batch_size = labels.size(0)
  11.        with torch.no_grad():
  12.            y_preds = model(inputs)
  13.        loss = criterion(y_preds.view(-1, 1), labels.view(-1, 1))
  14.        loss = torch.masked_select(loss, labels.view(-1, 1) != -1).mean()
  15.        if CFG.gradient_accumulation_steps > 1:
  16.            loss = loss / CFG.gradient_accumulation_steps
  17.        losses.update(loss.item(), batch_size)
  18.        preds.append(y_preds.sigmoid().to('cpu').numpy())
  19.        end = time.time()
  20.        if step % CFG.print_freq == 0 or step == (len(valid_loader)-1):
  21.            print('EVAL: [{0}/{1}] '
  22.                  'Elapsed {remain:s} '
  23.                  'Loss: {loss.val:.4f}({loss.avg:.4f}) '
  24.                  .format(step, len(valid_loader),
  25.                          loss=losses,
  26.                          remain=timeSince(start, float(step+1)/len(valid_loader))))
  27.    predictions = np.concatenate(preds)
  28.    return losses.avg, predictions
  29. def inference_fn(test_loader, model, device):
  30.    preds = []
  31.    model.eval()
  32.    model.to(device)
  33.    tk0 = tqdm(test_loader, total=len(test_loader))
  34.    for inputs in tk0:
  35.        for k, v in inputs.items():
  36.            inputs[k] = v.to(device)
  37.        with torch.no_grad():
  38.            y_preds = model(inputs)
  39.        preds.append(y_preds.sigmoid().to('cpu').numpy())
  40.    predictions = np.concatenate(preds)
  41.    return predictions

5、调参完成训练,上传权重提交成绩

  1. # ====================================================
  2. # CFG
  3. # ====================================================
  4. class CFG:
  5.    debug=False
  6.    apex=False
  7.    print_freq=100
  8.    num_workers=4
  9.    model="microsoft/deberta-v3-large"
  10.    scheduler='cosine' # ['linear', 'cosine']
  11.    batch_scheduler=True
  12.    num_cycles=0.5
  13.    num_warmup_steps=0.1
  14.    epochs=5
  15.    encoder_lr=2e-5
  16.    decoder_lr=2e-5
  17.    min_lr=1e-6
  18.    eps=1e-6
  19.    betas=(0.9, 0.999)
  20.    batch_size=8
  21.    fc_dropout=0.2
  22.    max_len=512
  23.    weight_decay=0.01
  24.    gradient_accumulation_steps=1
  25.    max_grad_norm=500
  26.    seed=42
  27.    n_fold=5
  28.    trn_fold=[4]
  29.    train=True
  30.    
  31. if CFG.debug:
  32.    CFG.epochs = 5
  33.    CFG.trn_fold = [0,1,2,3,4]

7、赛题难点思考

1、label中大量为0, 1很少label不平横对模型的影响

2、专业领域很多简称和没有在vocab中收录的词会不会对模型造成影响

8、无痛涨分Trick

  • 多drop out 对比学习

  • 对抗训练

  • r_drop

  • 模型融合

领baseline代码

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/花生_TL007/article/detail/399188
推荐阅读
相关标签