当前位置:   article > 正文

实验记录一_this is not expected if you are initializing bertm

this is not expected if you are initializing bertmodel from the checkpoint o

/home/zyp/module/anaconda3/envs/idna/bin/python3.9 /home/zyp/project/iDNA_ABF/main/train.py 

train../data/DNA_MS/tsv/4mC/4mC_C.equisetifolia/train.tsv test../data/DNA_MS/tsv/4mC/4mC_C.equisetifolia/test.tsv

2023-04-02_11:32:27 INFO: Set IO Over.

2023-04-02_11:32:27 INFO: Set Visualization Over.

len(data_loader) 23

len(data_loader) 23

2023-04-02_11:32:27 INFO: Load Data Over.

Some weights of the model checkpoint at ../pretrain/DNAbert_3mer were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight']

- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

2023-04-02_11:32:29 INFO: Init Model Over.

-------------------------------------------------- Model.named_parameters --------------------------------------------------

[Ws]->[torch.Size([1, 768])],[requires_grad:True]

[Wh]->[torch.Size([1, 768])],[requires_grad:True]

[bertone.bert.embeddings.word_embeddings.weight]->[torch.Size([69, 768])],[requires_grad:True]

[bertone.bert.embeddings.position_embeddings.weight]->[torch.Size([512, 768])],[requires_grad:True]

[bertone.bert.embeddings.token_type_embeddings.weight]->[torch.Size([2, 768])],[requires_grad:True]

[bertone.bert.embeddings.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

Some weights of the model checkpoint at ../pretrain/DNAbert_6mer were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight']

- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

[bertone.bert.embeddings.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.0.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.0.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.0.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.0.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.0.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.0.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.0.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.0.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.0.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.0.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.0.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.0.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[bertone.bert.encoder.layer.0.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[bertone.bert.encoder.layer.0.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.0.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.0.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.1.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.1.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.1.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.1.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.1.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.1.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.1.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.1.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.1.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.1.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.1.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.1.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[bertone.bert.encoder.layer.1.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[bertone.bert.encoder.layer.1.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.1.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.1.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.2.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.2.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.2.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.2.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.2.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.2.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.2.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.2.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.2.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.2.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.2.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.2.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[bertone.bert.encoder.layer.2.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[bertone.bert.encoder.layer.2.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.2.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.2.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.3.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.3.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.3.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.3.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.3.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.3.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.3.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.3.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.3.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.3.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.3.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.3.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[bertone.bert.encoder.layer.3.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[bertone.bert.encoder.layer.3.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.3.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.3.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.4.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.4.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.4.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.4.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.4.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.4.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.4.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.4.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.4.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.4.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.4.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.4.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[bertone.bert.encoder.layer.4.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[bertone.bert.encoder.layer.4.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.4.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.4.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.5.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.5.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.5.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.5.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.5.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.5.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.5.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.5.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.5.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.5.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.5.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.5.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[bertone.bert.encoder.layer.5.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[bertone.bert.encoder.layer.5.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.5.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.5.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.6.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.6.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.6.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.6.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.6.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.6.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.6.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.6.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.6.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.6.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.6.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.6.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[bertone.bert.encoder.layer.6.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[bertone.bert.encoder.layer.6.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.6.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.6.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.7.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.7.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.7.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.7.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.7.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.7.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.7.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.7.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.7.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.7.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.7.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.7.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[bertone.bert.encoder.layer.7.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[bertone.bert.encoder.layer.7.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.7.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.7.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.8.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.8.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.8.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.8.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.8.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.8.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.8.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.8.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.8.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.8.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.8.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.8.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[bertone.bert.encoder.layer.8.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[bertone.bert.encoder.layer.8.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.8.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.8.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.9.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.9.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.9.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.9.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.9.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.9.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.9.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.9.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.9.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.9.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.9.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.9.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[bertone.bert.encoder.layer.9.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[bertone.bert.encoder.layer.9.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.9.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.9.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.10.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.10.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.10.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.10.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.10.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.10.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.10.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.10.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.10.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.10.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.10.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.10.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[bertone.bert.encoder.layer.10.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[bertone.bert.encoder.layer.10.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.10.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.10.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.11.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.11.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.11.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.11.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.11.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.11.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.11.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.11.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.11.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.11.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.11.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[bertone.bert.encoder.layer.11.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[bertone.bert.encoder.layer.11.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[bertone.bert.encoder.layer.11.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.11.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.encoder.layer.11.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[bertone.bert.pooler.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[bertone.bert.pooler.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.embeddings.word_embeddings.weight]->[torch.Size([4101, 768])],[requires_grad:True]

[berttwo.bert.embeddings.position_embeddings.weight]->[torch.Size([512, 768])],[requires_grad:True]

[berttwo.bert.embeddings.token_type_embeddings.weight]->[torch.Size([2, 768])],[requires_grad:True]

[berttwo.bert.embeddings.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.embeddings.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.0.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.1.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.2.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.3.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.4.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.5.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.6.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.7.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.8.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.9.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.10.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.attention.self.query.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.attention.self.query.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.attention.self.key.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.attention.self.key.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.attention.self.value.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.attention.self.value.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.attention.output.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.attention.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.attention.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.attention.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.intermediate.dense.weight]->[torch.Size([3072, 768])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.intermediate.dense.bias]->[torch.Size([3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.output.dense.weight]->[torch.Size([768, 3072])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.output.dense.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.output.LayerNorm.weight]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.encoder.layer.11.output.LayerNorm.bias]->[torch.Size([768])],[requires_grad:True]

[berttwo.bert.pooler.dense.weight]->[torch.Size([768, 768])],[requires_grad:True]

[berttwo.bert.pooler.dense.bias]->[torch.Size([768])],[requires_grad:True]

[classification.0.weight]->[torch.Size([20, 768])],[requires_grad:True]

[classification.0.bias]->[torch.Size([20])],[requires_grad:True]

[classification.3.weight]->[torch.Size([2, 20])],[requires_grad:True]

[classification.3.bias]->[torch.Size([2])],[requires_grad:True]

================================================== Number of total parameters:175302206 ==================================================

2023-04-02_11:32:29 INFO: Adjust Model Over.

2023-04-02_11:32:29 INFO: Init Optimizer Over.

2023-04-02_11:32:29 INFO: Define Loss Function Over.

2023-04-02_11:32:29 INFO: Train Model Start.

2023-04-02_11:32:29 INFO: Learn Name: trainCross

2023-04-02_11:32:29 INFO: Config: Namespace(learn_name='trainCross', path_save='../result/', save_best=False, threshold=0.95, cuda=True, device=0, num_workers=4, num_class=2, kmer=6, adversarial=True, train_name=None, test_name=None, path_train_data='../data/DNA_MS/tsv/4mC/4mC_C.equisetifolia/train.tsv', path_test_data='../data/DNA_MS/tsv/4mC/4mC_C.equisetifolia/test.tsv', path_params=None, model_save_name='BERT', save_figure_type='png', mode='train-test', model='FusionDNAbert', interval_log=10, interval_test=1, epoch=20, optimizer='AdamW', loss_func='CE', batch_size=16, lr=5e-05, reg=0.003, b=0.06, gamma=3, alpha=0.4, kmers=[3, 6])

Epoch[1] Batch[10] - loss: 0.615995 | ACC: 68.7500%(11/16)

Epoch[1] Batch[20] - loss: 0.629160 | ACC: 75.0000%(12/16)

Evaluation - loss: 0.634505  ACC: 66.9399%(245/366)

2023-04-02_11:35:27 INFO: 

==================== Test Performance. Epoch[1] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.6694,   0.3497,   0.9891,   0.6991,   0.4406

============================================================

Epoch[2] Batch[30] - loss: 0.657815 | ACC: 62.5000%(10/16)

Epoch[2] Batch[40] - loss: 0.526238 | ACC: 87.5000%(14/16)

Evaluation - loss: 0.502189  ACC: 80.0546%(293/366)

2023-04-02_11:38:24 INFO: 

==================== Test Performance. Epoch[2] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.8005,   0.7814,   0.8197,   0.8480,   0.6015

============================================================

Epoch[3] Batch[50] - loss: 0.405876 | ACC: 87.5000%(14/16)

Epoch[3] Batch[60] - loss: 0.338735 | ACC: 100.0000%(16/16)

Evaluation - loss: 0.479141  ACC: 79.2350%(290/366)

2023-04-02_11:41:21 INFO: 

==================== Test Performance. Epoch[3] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.7923,   0.7213,   0.8634,   0.8709,   0.5907

============================================================

Epoch[4] Batch[70] - loss: 0.433974 | ACC: 81.2500%(13/16)

Epoch[4] Batch[80] - loss: 0.387404 | ACC: 68.7500%(11/16)

Epoch[4] Batch[90] - loss: 0.531118 | ACC: 68.7500%(11/16)

Evaluation - loss: 0.502320  ACC: 73.4973%(269/366)

2023-04-02_11:44:18 INFO: 

==================== Test Performance. Epoch[4] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.7350,   0.5246,   0.9454,   0.8786,   0.5180

============================================================

Epoch[5] Batch[100] - loss: 0.801469 | ACC: 62.5000%(10/16)

Epoch[5] Batch[110] - loss: 0.442303 | ACC: 87.5000%(14/16)

Evaluation - loss: 0.525165  ACC: 77.5956%(284/366)

2023-04-02_11:47:15 INFO: 

==================== Test Performance. Epoch[5] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.7760,   0.6448,   0.9071,   0.8565,   0.5719

============================================================

Epoch[6] Batch[120] - loss: 0.343935 | ACC: 87.5000%(14/16)

Epoch[6] Batch[130] - loss: 0.425401 | ACC: 75.0000%(12/16)

Evaluation - loss: 0.488533  ACC: 79.5082%(291/366)

2023-04-02_11:50:12 INFO: 

==================== Test Performance. Epoch[6] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.7951,   0.6940,   0.8962,   0.8621,   0.6026

============================================================

Epoch[7] Batch[140] - loss: 0.462007 | ACC: 75.0000%(12/16)

Epoch[7] Batch[150] - loss: 0.306372 | ACC: 87.5000%(14/16)

Epoch[7] Batch[160] - loss: 0.286699 | ACC: 100.0000%(16/16)

Evaluation - loss: 0.522383  ACC: 78.6885%(288/366)

2023-04-02_11:53:09 INFO: 

==================== Test Performance. Epoch[7] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.7869,   0.6776,   0.8962,   0.8554,   0.5880

============================================================

Epoch[8] Batch[170] - loss: 0.375740 | ACC: 87.5000%(14/16)

Epoch[8] Batch[180] - loss: 0.187097 | ACC: 100.0000%(16/16)

Evaluation - loss: 0.500395  ACC: 78.4153%(287/366)

2023-04-02_11:56:05 INFO: 

==================== Test Performance. Epoch[8] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.7842,   0.6831,   0.8852,   0.8795,   0.5803

============================================================

Epoch[9] Batch[190] - loss: 0.431559 | ACC: 87.5000%(14/16)

Epoch[9] Batch[200] - loss: 0.451815 | ACC: 87.5000%(14/16)

Evaluation - loss: 0.556607  ACC: 72.9508%(267/366)

2023-04-02_11:59:03 INFO: 

==================== Test Performance. Epoch[9] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.7295,   0.4973,   0.9617,   0.8664,   0.5183

============================================================

Epoch[10] Batch[210] - loss: 0.298760 | ACC: 87.5000%(14/16)

Epoch[10] Batch[220] - loss: 0.278708 | ACC: 93.7500%(15/16)

Epoch[10] Batch[230] - loss: 0.136271 | ACC: 100.0000%(14/14)

Evaluation - loss: 0.495753  ACC: 80.6011%(295/366)

2023-04-02_12:02:00 INFO: 

==================== Test Performance. Epoch[10] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.8060,   0.7650,   0.8470,   0.8724,   0.6141

============================================================

Epoch[11] Batch[240] - loss: 0.687751 | ACC: 75.0000%(12/16)

Epoch[11] Batch[250] - loss: 0.395731 | ACC: 87.5000%(14/16)

Evaluation - loss: 0.468437  ACC: 80.6011%(295/366)

2023-04-02_12:04:57 INFO: 

==================== Test Performance. Epoch[11] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.8060,   0.7486,   0.8634,   0.8574,   0.6161

============================================================

Epoch[12] Batch[260] - loss: 0.257664 | ACC: 93.7500%(15/16)

Epoch[12] Batch[270] - loss: 0.500586 | ACC: 87.5000%(14/16)

Evaluation - loss: 0.502179  ACC: 77.8689%(285/366)

2023-04-02_12:07:54 INFO: 

==================== Test Performance. Epoch[12] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.7787,   0.6612,   0.8962,   0.8466,   0.5734

============================================================

Epoch[13] Batch[280] - loss: 0.475970 | ACC: 81.2500%(13/16)

Epoch[13] Batch[290] - loss: 0.281460 | ACC: 93.7500%(15/16)

Evaluation - loss: 0.521349  ACC: 75.6831%(277/366)

2023-04-02_12:10:51 INFO: 

==================== Test Performance. Epoch[13] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.7568,   0.5847,   0.9290,   0.8672,   0.5471

============================================================

Epoch[14] Batch[300] - loss: 0.495558 | ACC: 75.0000%(12/16)

Epoch[14] Batch[310] - loss: 0.684222 | ACC: 50.0000%(8/16)

Epoch[14] Batch[320] - loss: 0.674430 | ACC: 50.0000%(8/16)

Evaluation - loss: 0.706398  ACC: 50.0000%(183/366)

2023-04-02_12:13:48 INFO: 

==================== Test Performance. Epoch[14] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.5000,   1.0000,   0.0000,   0.7776,   0.0000

============================================================

Epoch[15] Batch[330] - loss: 0.681710 | ACC: 62.5000%(10/16)

Epoch[15] Batch[340] - loss: 0.731509 | ACC: 31.2500%(5/16)

Evaluation - loss: 0.694631  ACC: 50.0000%(183/366)

2023-04-02_12:16:45 INFO: 

==================== Test Performance. Epoch[15] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.5000,   0.0000,   1.0000,   0.2438,   0.0000

============================================================

Epoch[16] Batch[350] - loss: 0.692716 | ACC: 56.2500%(9/16)

Epoch[16] Batch[360] - loss: 0.709091 | ACC: 25.0000%(4/16)

Evaluation - loss: 0.693373  ACC: 50.0000%(183/366)

2023-04-02_12:19:42 INFO: 

==================== Test Performance. Epoch[16] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.5000,   0.0000,   1.0000,   0.2462,   0.0000

============================================================

Epoch[17] Batch[370] - loss: 0.693505 | ACC: 43.7500%(7/16)

Epoch[17] Batch[380] - loss: 0.698691 | ACC: 37.5000%(6/16)

Epoch[17] Batch[390] - loss: 0.713705 | ACC: 31.2500%(5/16)

Evaluation - loss: 0.694033  ACC: 50.0000%(183/366)

2023-04-02_12:22:39 INFO: 

==================== Test Performance. Epoch[17] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.5000,   0.0000,   1.0000,   0.2652,   0.0000

============================================================

Epoch[18] Batch[400] - loss: 0.688194 | ACC: 50.0000%(8/16)

Epoch[18] Batch[410] - loss: 0.704442 | ACC: 37.5000%(6/16)

Evaluation - loss: 0.693286  ACC: 50.0000%(183/366)

2023-04-02_12:25:37 INFO: 

==================== Test Performance. Epoch[18] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.5000,   0.0000,   1.0000,   0.3027,   0.0000

============================================================

Epoch[19] Batch[420] - loss: 0.694271 | ACC: 37.5000%(6/16)

Epoch[19] Batch[430] - loss: 0.705233 | ACC: 37.5000%(6/16)

Evaluation - loss: 0.693144  ACC: 50.0000%(183/366)

2023-04-02_12:28:34 INFO: 

==================== Test Performance. Epoch[19] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.5000,   0.0000,   1.0000,   0.2426,   0.0000

============================================================

Epoch[20] Batch[440] - loss: 0.689379 | ACC: 62.5000%(10/16)

Epoch[20] Batch[450] - loss: 0.679922 | ACC: 50.0000%(8/16)

Epoch[20] Batch[460] - loss: 0.687366 | ACC: 50.0000%(7/14)

Evaluation - loss: 0.693312  ACC: 50.0000%(183/366)

2023-04-02_12:31:31 INFO: 

==================== Test Performance. Epoch[20] ====================

[ACC,      SE,          SP,         AUC,      MCC]

0.5000,   0.0000,   1.0000,   0.2419,   0.0000

============================================================

2023-04-02_12:31:31 INFO: Best Performance: [0.8060109289617486, 0.7650273224043715, 0.8469945355191257, 0.8724058646122608, 0.6140882486291177]

2023-04-02_12:31:31 INFO: Performance: [[0.6693989071038251, 0.34972677595628415, 0.9890710382513661, 0.6991250858490847, 0.4406148138015949], [0.8005464480874317, 0.7814207650273224, 0.819672131147541, 0.8479799337095763, 0.6015331289822348], [0.7923497267759563, 0.7213114754098361, 0.8633879781420765, 0.8709128370509719, 0.5906916183066317], [0.7349726775956285, 0.5245901639344263, 0.9453551912568307, 0.8785571381647705, 0.518034691835255], [0.7759562841530054, 0.644808743169399, 0.907103825136612, 0.8564901908089224, 0.5719374037244465], [0.7950819672131147, 0.6939890710382514, 0.8961748633879781, 0.8621039744393681, 0.6026095183710446], [0.7868852459016393, 0.6775956284153005, 0.8961748633879781, 0.8553554898623429, 0.5879885225760578], [0.7841530054644809, 0.6830601092896175, 0.8852459016393442, 0.8794529547015437, 0.5802906473202653], [0.7295081967213115, 0.4972677595628415, 0.9617486338797814, 0.866374033264654, 0.518321055348816], [0.8060109289617486, 0.7650273224043715, 0.8469945355191257, 0.8724058646122608, 0.6140882486291177], [0.8060109289617486, 0.7486338797814208, 0.8633879781420765, 0.857445728448147, 0.6160918045188195], [0.7786885245901639, 0.6612021857923497, 0.8961748633879781, 0.8466063483531905, 0.5734320125978848], [0.7568306010928961, 0.5846994535519126, 0.9289617486338798, 0.8672399892502016, 0.5471037316861695], [0.5, 1.0, 0.0, 0.777613544746036, 0], [0.5, 0.0, 1.0, 0.24375167965600644, 0], [0.5, 0.0, 1.0, 0.2462450356833587, 0], [0.5, 0.0, 1.0, 0.26519155543611334, 0], [0.5, 0.0, 1.0, 0.3026964077756875, 0], [0.5, 0.0, 1.0, 0.24260204843381408, 0], [0.5, 0.0, 1.0, 0.24187046492878256, 0]]

[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460]

2023-04-02_12:31:37 INFO: Train Model Over.

进程已结束,退出代码0

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/2023面试高手/article/detail/356588
推荐阅读
  

闽ICP备14008679号