赞
踩
Word2vec是Goolge发布的、应用最广泛的词嵌入表示学习技术,其主要作用是高效获取词语的词向量,目前被用作许多NLP任务的特征工程。Word2vec 可以根据给定的语料库,通过优化后的训练模型快速有效地将一个词语表达成向量形式,为自然语言处理领域的应用研究提供了新的工具,包含Skip-gram(跳字模型)和CBOW(连续词袋模型)来建立词语的词嵌入表示。Skip-gram的主要作用是根据当前词,预测背景词(前后的词);CBOW的主要作用是根据背景词(前后的词)预测当前词。
Skip-gram的主要作用是根据当前词,预测背景词(前后的词),其结构图如下图所示:
例如有如下语句:呼伦贝尔大草原
_ _贝_ _草原
呼_ _尔_ _原
呼伦_ _大_ _
预测出前后词的数量,称为window_size(以上示例中windows_size为2),实际是要将以下概率最大化:
P(呼|贝)P(伦|贝)P(尔|贝)P(大|贝)
P(伦|尔)P(贝|尔)P(大|尔)P(草|尔)
P(贝|大)P(尔|大)P(草|大)P(草|原)
可以写出概率的一般化表达式,设有文本Text,由N个单词组成:
T
e
x
t
=
w
1
,
w
2
,
w
3
,
.
.
.
,
w
n
Text = {w_1, w_2, w_3, ..., w_n}
Text=w1,w2,w3,...,wn
目标函数可以写作:
a
r
g
m
a
x
∏
w
∈
T
e
x
t
∏
c
∈
c
(
w
)
P
(
c
∣
w
;
θ
)
argmax \prod_{w \in Text} \ \ \prod_{c \in c(w)} P(c|w; \theta)
argmaxw∈Text∏ c∈c(w)∏P(c∣w;θ)
其中,
w
w
w为当前词,
c
c
c为
w
w
w的上下文词,
θ
\theta
θ为要优化的参数,这个参数即每个词(或字)的稠密向量表示,形如:
[
呼
:
θ
11
θ
12
θ
13
.
.
.
θ
1
n
伦
:
θ
21
θ
22
θ
23
.
.
.
θ
2
n
贝
:
θ
31
θ
32
θ
33
.
.
.
θ
3
n
尔
:
θ
41
θ
42
θ
43
.
.
.
θ
4
n
大
:
θ
51
θ
52
θ
53
.
.
.
θ
5
n
草
:
θ
61
θ
62
θ
63
.
.
.
θ
6
n
原
:
θ
71
θ
72
θ
73
.
.
.
θ
7
n
]
\left[
该参数
θ
\theta
θ能够使得目标函数最大化。因为概率均为0~1之间的数字,连乘计算较为困难,所以转换为对数相加形式:
a
r
g
m
a
x
∑
w
∈
T
e
x
t
∑
c
∈
c
(
w
)
l
o
g
P
(
c
∣
w
;
θ
)
argmax \sum_{w \in Text} \ \sum_{c \in c(w)} logP(c|w;\theta)
argmaxw∈Text∑ c∈c(w)∑logP(c∣w;θ)
再表示为softmax形式:
a
r
g
m
a
x
∑
w
∈
T
e
x
t
∑
c
∈
c
(
w
)
l
o
g
(
e
u
c
⋅
v
w
/
∑
c
′
∈
v
o
c
a
b
e
u
c
′
⋅
v
w
)
argmax \sum_{w \in Text} \sum_{c \in c(w)} log \Big(e^{u_c \cdot v_w} / \sum_{c' \in vocab } e^{u_{c'} \cdot v_w} \Big)
argmaxw∈Text∑c∈c(w)∑log(euc⋅vw/c′∈vocab∑euc′⋅vw)
其中,U为上下文单词矩阵,V为同样大小的中心词矩阵,因为每个词可以作为上下文词,同时也可以作为中心词,
u
c
⋅
v
w
u_c \cdot v_w
uc⋅vw表示上下文词和中心词向量的内积(内积表示向量的相似度),相似度越大,概率越高;分母部分是以
w
w
w为中心词,其它所有上下文词
c
′
c'
c′内积之和,再将上一步公式进行简化:
=
a
r
g
m
a
x
∑
w
∈
T
e
x
t
∑
c
∈
c
(
w
)
(
l
o
g
(
e
u
c
⋅
v
w
)
−
l
o
g
(
∑
c
′
∈
v
o
c
a
b
e
u
c
′
⋅
v
w
)
)
=
a
r
g
m
a
x
∑
w
∈
T
e
x
t
∑
c
∈
c
(
w
)
(
u
c
⋅
v
w
−
l
o
g
∑
c
′
∈
v
o
c
a
b
e
u
c
′
⋅
v
w
)
= argmax \sum_{w \in Text} \sum_{c \in c(w)} \Big(log(e^{u_c \cdot v_w}) - log(\sum_{c' \in vocab } e^{u_{c'} \cdot v_w}) \Big)\\ = argmax \sum_{w \in Text} \sum_{c \in c(w)} \Big(u_c \cdot v_w - log \sum_{c' \in vocab }e^{u_{c'} \cdot v_w} \Big)
=argmaxw∈Text∑c∈c(w)∑(log(euc⋅vw)−log(c′∈vocab∑euc′⋅vw))=argmaxw∈Text∑c∈c(w)∑(uc⋅vw−logc′∈vocab∑euc′⋅vw)
上式中,由于需要在整个词汇表中进行遍历,如果词汇表很大,计算效率会很低。所以,真正进行优化时,采用另一种优化形式。例如有如下语料库:
文本:呼伦贝尔大草原
将window_size设置为1,构建正案例词典、负案例词典(一般来说,负样本词典比正样本词典大的多):
正样本:D = {(呼,伦),(伦,呼),(伦,贝),(贝,伦),(贝,尔),(尔,贝),(尔,大),(大,尔),(大,草)(草,大),(草,原),(原,草)}
负样本:D’= {(呼,贝),(呼,尔),(呼,大),(呼,草),(呼,原),(伦,尔),(伦,大),(伦,草),(伦,原),(贝,呼),(贝,大),(贝,草),(贝,原),(尔,呼),(尔,伦)(尔,草),(尔,原),(大,呼),(大,伦),(大,原),(草,呼),(草,伦),(草,贝),(原,呼),(原,伦),(原,贝),(原,尔),(原,大)}
词向量优化的目标函数定义为正样本、负样本公共概率最大化函数:
a
r
g
m
a
x
(
∏
w
,
c
∈
D
l
o
g
P
(
D
=
1
∣
w
,
c
;
θ
)
∏
w
,
c
∈
D
′
P
(
D
=
0
∣
w
,
c
;
θ
)
)
=
a
r
g
m
a
x
(
∏
w
,
c
∈
D
1
1
+
e
x
p
(
−
U
c
⋅
V
w
)
∏
w
,
c
∈
D
′
[
1
−
1
1
+
e
x
p
(
−
U
c
⋅
V
w
)
]
)
=
a
r
g
m
a
x
(
∑
w
,
c
∈
D
l
o
g
σ
(
U
c
⋅
V
w
)
+
∑
w
,
c
∈
D
′
l
o
g
σ
(
−
U
c
⋅
V
w
)
)
argmax (\prod_{w,c \in D} log P(D=1|w,c; \theta) \prod_{w, c \in D'} P(D=0|w, c; \theta)) \\ = argmax (\prod_{w,c \in D} \frac{1}{1+exp(-U_c \cdot V_w)} \prod_{w, c \in D'} [1- \frac{1}{1+exp(-U_c \cdot V_w)}]) \\ = argmax(\sum_{w,c \in D} log \sigma (U_c \cdot V_w) + \sum_{w,c \in D'} log \sigma (-U_c \cdot V_w))
argmax(w,c∈D∏logP(D=1∣w,c;θ)w,c∈D′∏P(D=0∣w,c;θ))=argmax(w,c∈D∏1+exp(−Uc⋅Vw)1w,c∈D′∏[1−1+exp(−Uc⋅Vw)1])=argmax(w,c∈D∑logσ(Uc⋅Vw)+w,c∈D′∑logσ(−Uc⋅Vw))
在实际训练时,会从负样本集合中选取部分样本(称之为“负采样”)来进行计算,从而降低运算量.要训练词向量,还需要借助于语言模型.
CBOW模型全程为Continous Bag of Words(连续词袋模型),其核心思想是用上下文来预测中心词,例如:
呼伦贝_大草原
其模型结构示意图如下:
数据集:来自中文wiki文章,AIStudio下数据集名称:中文维基百科语料库
代码:建议在AIStudio下执行
!pip install gensim==3.8.1 # 如果不在AIStudio下执行去掉前面的叹号
输出:
Looking in indexes: https://mirror.baidu.com/pypi/simple/, https://mirrors.aliyun.com/pypi/simple/ Collecting gensim==3.8.1 Downloading https://mirrors.aliyun.com/pypi/packages/44/93/c6011037f24e3106d13f3be55297bf84ece2bf15b278cc4776339dc52db5/gensim-3.8.1-cp37-cp37m-manylinux1\_x86\_64.whl (24.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.2/24.2 MB 4.4 MB/s eta 0:00:0000:0100:01 Collecting smart-open>=1.8.1 Downloading https://mirrors.aliyun.com/pypi/packages/ad/08/dcd19850b79f72e3717c98b2088f8a24b549b29ce66849cd6b7f44679683/smart\_open-7.0.1-py3-none-any.whl (60 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.8/60.8 kB 5.0 MB/s eta 0:00:00 Requirement already satisfied: scipy>=0.18.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from gensim==3.8.1) (1.6.3) Requirement already satisfied: numpy>=1.11.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from gensim==3.8.1) (1.19.5) Requirement already satisfied: six>=1.5.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from gensim==3.8.1) (1.16.0) Requirement already satisfied: wrapt in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim==3.8.1) (1.12.1) Installing collected packages: smart-open, gensim Successfully installed gensim-3.8.1 smart-open-7.0.1 [notice] A new release of pip available: 22.1.2 -> 24.0 [notice] To update, run: pip install --upgrade pip
# 利用wiki百科语料库训练词向量 ##################### 解压语料库 ################### import logging import os import os.path from gensim.corpora import WikiCorpus # 输入文件 in_file = "data/data104767/articles.xml.bz2" # 输出文件 out_file = open("wiki.zh.text", "w", encoding="utf-8") count = 0 # lemmatize:控制是否要做词性还原 wiki = WikiCorpus(in_file, lemmatize=False, dictionary={}) for text in wiki.get_texts(): # 遍历语料库 out_file.write(" ".join(text) + "\n") # 写入一行 count += 1 if count % 200 == 0: # 每200笔打印一次 print("处理笔数:", count) if count >= 20000: break out_file.close() # 关闭文件
输出
处理笔数: 200 处理笔数: 400 处理笔数: 600 处理笔数: 800 处理笔数: 1000 处理笔数: 1200 处理笔数: 1400 处理笔数: 1600 处理笔数: 1800 处理笔数: 2000 处理笔数: 2200 处理笔数: 2400 处理笔数: 2600 处理笔数: 2800 处理笔数: 3000 处理笔数: 3200 处理笔数: 3400 处理笔数: 3600 处理笔数: 3800 处理笔数: 4000 处理笔数: 4200 处理笔数: 4400 处理笔数: 4600 处理笔数: 4800 处理笔数: 5000 处理笔数: 5200 处理笔数: 5400 处理笔数: 5600 处理笔数: 5800 处理笔数: 6000 处理笔数: 6200 处理笔数: 6400 处理笔数: 6600 处理笔数: 6800 处理笔数: 7000 处理笔数: 7200 处理笔数: 7400 处理笔数: 7600 处理笔数: 7800 处理笔数: 8000 处理笔数: 8200 处理笔数: 8400 处理笔数: 8600 处理笔数: 8800 处理笔数: 9000 处理笔数: 9200 处理笔数: 9400 处理笔数: 9600 处理笔数: 9800 处理笔数: 10000 处理笔数: 10200 处理笔数: 10400 处理笔数: 10600 处理笔数: 10800 处理笔数: 11000 处理笔数: 11200 处理笔数: 11400 处理笔数: 11600 处理笔数: 11800 处理笔数: 12000 处理笔数: 12200 处理笔数: 12400 处理笔数: 12600 处理笔数: 12800 处理笔数: 13000 处理笔数: 13200 处理笔数: 13400 处理笔数: 13600 处理笔数: 13800 处理笔数: 14000 处理笔数: 14200 处理笔数: 14400 处理笔数: 14600 处理笔数: 14800 处理笔数: 15000 处理笔数: 15200 处理笔数: 15400 处理笔数: 15600 处理笔数: 15800 处理笔数: 16000 处理笔数: 16200 处理笔数: 16400 处理笔数: 16600 处理笔数: 16800 处理笔数: 17000 处理笔数: 17200 处理笔数: 17400 处理笔数: 17600 处理笔数: 17800 处理笔数: 18000 处理笔数: 18200 处理笔数: 18400 处理笔数: 18600 处理笔数: 18800 处理笔数: 19000 处理笔数: 19200 处理笔数: 19400 处理笔数: 19600 处理笔数: 19800 处理笔数: 20000
##################### 分词 ################### import jieba import jieba.analyse import codecs # 工具包模块 def process_wiki_text(src_file, dest_file): # 参数为源文件,目标文件 with codecs.open(src_file, "r", "utf-8") as f_in, codecs.open(dest_file, "w", "utf-8") as f_out: # 打开源文件,目标文件 num = 1 for line in f_in.readlines(): line_seg = " ".join(jieba.cut(line)) f_out.writelines(line_seg) num += 1 if num % 200 == 0: print("完成笔数:", num) process_wiki_text("wiki.zh.text", "wiki.zh.text.seg")
输出:
Building prefix dict from the default dictionary ... Dumping model to file cache /tmp/jieba.cache Loading model cost 0.835 seconds. Prefix dict has been built successfully. 完成笔数: 200 完成笔数: 400 完成笔数: 600 完成笔数: 800 完成笔数: 1000 完成笔数: 1200 完成笔数: 1400 完成笔数: 1600 完成笔数: 1800 完成笔数: 2000 完成笔数: 2200 完成笔数: 2400 完成笔数: 2600 完成笔数: 2800 完成笔数: 3000 完成笔数: 3200 完成笔数: 3400 完成笔数: 3600 完成笔数: 3800 完成笔数: 4000 完成笔数: 4200 完成笔数: 4400 完成笔数: 4600 完成笔数: 4800 完成笔数: 5000 完成笔数: 5200 完成笔数: 5400 完成笔数: 5600 完成笔数: 5800 完成笔数: 6000 完成笔数: 6200 完成笔数: 6400 完成笔数: 6600 完成笔数: 6800 完成笔数: 7000 完成笔数: 7200 完成笔数: 7400 完成笔数: 7600 完成笔数: 7800 完成笔数: 8000 完成笔数: 8200 完成笔数: 8400 完成笔数: 8600 完成笔数: 8800 完成笔数: 9000 完成笔数: 9200 完成笔数: 9400 完成笔数: 9600 完成笔数: 9800 完成笔数: 10000 完成笔数: 10200 完成笔数: 10400 完成笔数: 10600 完成笔数: 10800 完成笔数: 11000 完成笔数: 11200 完成笔数: 11400 完成笔数: 11600 完成笔数: 11800 完成笔数: 12000 完成笔数: 12200 完成笔数: 12400 完成笔数: 12600 完成笔数: 12800 完成笔数: 13000 完成笔数: 13200 完成笔数: 13400 完成笔数: 13600 完成笔数: 13800 完成笔数: 14000 完成笔数: 14200 完成笔数: 14400 完成笔数: 14600 完成笔数: 14800 完成笔数: 15000 完成笔数: 15200 完成笔数: 15400 完成笔数: 15600 完成笔数: 15800 完成笔数: 16000 完成笔数: 16200 完成笔数: 16400 完成笔数: 16600 完成笔数: 16800 完成笔数: 17000 完成笔数: 17200 完成笔数: 17400 完成笔数: 17600 完成笔数: 17800 完成笔数: 18000 完成笔数: 18200 完成笔数: 18400 完成笔数: 18600 完成笔数: 18800 完成笔数: 19000 完成笔数: 19200 完成笔数: 19400 完成笔数: 19600 完成笔数: 19800 完成笔数: 20000
##################### 训练 ################### import logging import sys import multiprocessing from gensim.models import Word2Vec from gensim.models.word2vec import LineSentence # 按行读取 logger = logging.getLogger(__name__) # format: 指定输出的格式和内容,format可以输出很多有用信息, # %(asctime)s: 打印日志的时间 # %(levelname)s: 打印日志级别名称 # %(message)s: 打印日志信息 logging.basicConfig(format='%(asctime)s: %(levelname)s: %(message)s') logging.root.setLevel(level=logging.INFO) in_file = "wiki.zh.text.seg" # 输入文件(经过分词结果) out_file1 = "wiki.zh.text.model" # 存模型 out_file2 = "wiki.zh.text.vector" # 权重(词向量) model = Word2Vec(LineSentence(in_file), # 输入 size=100, # 词维度向量(推荐50~300之间) window=3, # 窗口大小 min_count=5, # 出现次数小于5,忽略 workers=multiprocessing.cpu_count()) # 线程数量(和CPU一致) model.save(out_file1) # 保存模型 model.wv.save_word2vec_format(out_file2, # 权重文件 binary=False) # 不保存成二进制
输出:
2024-02-29 18:46:07,204: INFO: collecting all words and their counts 2024-02-29 18:46:07,206: INFO: PROGRESS: at sentence #0, processed 0 words, keeping 0 word types 2024-02-29 18:46:12,568: INFO: PROGRESS: at sentence #10000, processed 12880963 words, keeping 865015 word types 2024-02-29 18:46:16,906: INFO: PROGRESS: at sentence #20000, processed 22396155 words, keeping 1278795 word types 2024-02-29 18:46:16,946: INFO: collected 1282620 word types from a corpus of 22481838 raw words and 20090 sentences 2024-02-29 18:46:16,947: INFO: Loading a fresh vocabulary 2024-02-29 18:46:18,289: INFO: effective\_min\_count=5 retains 240560 unique words (18% of original 1282620, drops 1042060) 2024-02-29 18:46:18,290: INFO: effective\_min\_count=5 leaves 20963673 word corpus (93% of original 22481838, drops 1518165) 2024-02-29 18:46:19,098: INFO: deleting the raw counts dictionary of 1282620 items 2024-02-29 18:46:19,158: INFO: sample=0.001 downsamples 17 most-common words 2024-02-29 18:46:19,159: INFO: downsampling leaves estimated 19623071 word corpus (93.6% of prior 20963673) 2024-02-29 18:46:20,367: INFO: estimated required memory for 240560 words and 100 dimensions: 312728000 bytes 2024-02-29 18:46:20,368: INFO: resetting layer weights 2024-02-29 18:47:00,223: INFO: training model with 24 workers on 240560 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=5 window=3 2024-02-29 18:47:01,239: INFO: EPOCH 1 - PROGRESS: at 0.69% examples, 331735 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:47:02,292: INFO: EPOCH 1 - PROGRESS: at 1.81% examples, 374411 words/s, in\_qsize 0, out\_qsize 2 2024-02-29 18:47:03,300: INFO: EPOCH 1 - PROGRESS: at 3.23% examples, 403992 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:47:04,310: INFO: EPOCH 1 - PROGRESS: at 4.81% examples, 418015 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:47:05,318: INFO: EPOCH 1 - PROGRESS: at 6.47% examples, 423913 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:47:06,318: INFO: EPOCH 1 - PROGRESS: at 8.22% examples, 426120 words/s, in\_qsize 1, out\_qsize 0 2024-02-29 18:47:07,319: INFO: EPOCH 1 - PROGRESS: at 9.80% examples, 428029 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:47:08,325: INFO: EPOCH 1 - PROGRESS: at 11.57% examples, 426433 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:47:09,344: INFO: EPOCH 1 - PROGRESS: at 13.26% examples, 428083 words/s, in\_qsize 6, out\_qsize 0 2024-02-29 18:47:10,406: INFO: EPOCH 1 - PROGRESS: at 15.17% examples, 428272 words/s, in\_qsize 7, out\_qsize 0 2024-02-29 18:47:11,489: INFO: EPOCH 1 - PROGRESS: at 17.59% examples, 424947 words/s, in\_qsize 13, out\_qsize 6 2024-02-29 18:47:12,501: INFO: EPOCH 1 - PROGRESS: at 19.53% examples, 429682 words/s, in\_qsize 33, out\_qsize 0 2024-02-29 18:47:13,503: INFO: EPOCH 1 - PROGRESS: at 21.84% examples, 432925 words/s, in\_qsize 36, out\_qsize 0 2024-02-29 18:47:14,532: INFO: EPOCH 1 - PROGRESS: at 23.66% examples, 432996 words/s, in\_qsize 45, out\_qsize 2 2024-02-29 18:47:15,538: INFO: EPOCH 1 - PROGRESS: at 25.82% examples, 434819 words/s, in\_qsize 43, out\_qsize 0 2024-02-29 18:47:16,593: INFO: EPOCH 1 - PROGRESS: at 28.29% examples, 432322 words/s, in\_qsize 41, out\_qsize 3 2024-02-29 18:47:17,598: INFO: EPOCH 1 - PROGRESS: at 30.61% examples, 433390 words/s, in\_qsize 46, out\_qsize 1 2024-02-29 18:47:18,614: INFO: EPOCH 1 - PROGRESS: at 32.65% examples, 432558 words/s, in\_qsize 43, out\_qsize 0 2024-02-29 18:47:19,692: INFO: EPOCH 1 - PROGRESS: at 35.02% examples, 432228 words/s, in\_qsize 46, out\_qsize 1 2024-02-29 18:47:20,694: INFO: EPOCH 1 - PROGRESS: at 37.10% examples, 431887 words/s, in\_qsize 36, out\_qsize 1 2024-02-29 18:47:21,792: INFO: EPOCH 1 - PROGRESS: at 39.20% examples, 427554 words/s, in\_qsize 43, out\_qsize 4 2024-02-29 18:47:22,793: INFO: EPOCH 1 - PROGRESS: at 41.81% examples, 430783 words/s, in\_qsize 38, out\_qsize 4 2024-02-29 18:47:23,816: INFO: EPOCH 1 - PROGRESS: at 44.25% examples, 431738 words/s, in\_qsize 41, out\_qsize 3 2024-02-29 18:47:24,827: INFO: EPOCH 1 - PROGRESS: at 46.95% examples, 433221 words/s, in\_qsize 40, out\_qsize 0 2024-02-29 18:47:25,928: INFO: EPOCH 1 - PROGRESS: at 48.98% examples, 431594 words/s, in\_qsize 39, out\_qsize 1 2024-02-29 18:47:26,974: INFO: EPOCH 1 - PROGRESS: at 51.14% examples, 432264 words/s, in\_qsize 44, out\_qsize 0 2024-02-29 18:47:27,998: INFO: EPOCH 1 - PROGRESS: at 53.19% examples, 431828 words/s, in\_qsize 47, out\_qsize 0 2024-02-29 18:47:29,095: INFO: EPOCH 1 - PROGRESS: at 55.78% examples, 430677 words/s, in\_qsize 43, out\_qsize 4 2024-02-29 18:47:30,144: INFO: EPOCH 1 - PROGRESS: at 58.52% examples, 431111 words/s, in\_qsize 47, out\_qsize 0 2024-02-29 18:47:31,204: INFO: EPOCH 1 - PROGRESS: at 61.01% examples, 431090 words/s, in\_qsize 44, out\_qsize 0 2024-02-29 18:47:32,213: INFO: EPOCH 1 - PROGRESS: at 63.87% examples, 431603 words/s, in\_qsize 39, out\_qsize 1 2024-02-29 18:47:33,216: INFO: EPOCH 1 - PROGRESS: at 66.45% examples, 430454 words/s, in\_qsize 37, out\_qsize 2 2024-02-29 18:47:34,232: INFO: EPOCH 1 - PROGRESS: at 68.92% examples, 429998 words/s, in\_qsize 38, out\_qsize 2 2024-02-29 18:47:35,245: INFO: EPOCH 1 - PROGRESS: at 71.61% examples, 429666 words/s, in\_qsize 39, out\_qsize 0 2024-02-29 18:47:36,303: INFO: EPOCH 1 - PROGRESS: at 74.06% examples, 429054 words/s, in\_qsize 45, out\_qsize 0 2024-02-29 18:47:37,308: INFO: EPOCH 1 - PROGRESS: at 76.31% examples, 429393 words/s, in\_qsize 39, out\_qsize 0 2024-02-29 18:47:38,316: INFO: EPOCH 1 - PROGRESS: at 78.99% examples, 428799 words/s, in\_qsize 47, out\_qsize 0 2024-02-29 18:47:39,361: INFO: EPOCH 1 - PROGRESS: at 81.92% examples, 428748 words/s, in\_qsize 36, out\_qsize 0 2024-02-29 18:47:40,394: INFO: EPOCH 1 - PROGRESS: at 84.19% examples, 427402 words/s, in\_qsize 36, out\_qsize 2 2024-02-29 18:47:41,414: INFO: EPOCH 1 - PROGRESS: at 86.84% examples, 426989 words/s, in\_qsize 35, out\_qsize 3 2024-02-29 18:47:42,432: INFO: EPOCH 1 - PROGRESS: at 89.64% examples, 426468 words/s, in\_qsize 42, out\_qsize 0 2024-02-29 18:47:43,441: INFO: EPOCH 1 - PROGRESS: at 92.60% examples, 426671 words/s, in\_qsize 46, out\_qsize 0 2024-02-29 18:47:44,492: INFO: EPOCH 1 - PROGRESS: at 95.35% examples, 425763 words/s, in\_qsize 47, out\_qsize 0 2024-02-29 18:47:45,518: INFO: EPOCH 1 - PROGRESS: at 98.36% examples, 427010 words/s, in\_qsize 37, out\_qsize 0 2024-02-29 18:47:45,798: INFO: worker thread finished; awaiting finish of 23 more threads 2024-02-29 18:47:45,820: INFO: worker thread finished; awaiting finish of 22 more threads 2024-02-29 18:47:45,821: INFO: worker thread finished; awaiting finish of 21 more threads 2024-02-29 18:47:45,822: INFO: worker thread finished; awaiting finish of 20 more threads 2024-02-29 18:47:45,890: INFO: worker thread finished; awaiting finish of 19 more threads 2024-02-29 18:47:45,892: INFO: worker thread finished; awaiting finish of 18 more threads 2024-02-29 18:47:45,905: INFO: worker thread finished; awaiting finish of 17 more threads 2024-02-29 18:47:45,920: INFO: worker thread finished; awaiting finish of 16 more threads 2024-02-29 18:47:46,012: INFO: worker thread finished; awaiting finish of 15 more threads 2024-02-29 18:47:46,014: INFO: worker thread finished; awaiting finish of 14 more threads 2024-02-29 18:47:46,028: INFO: worker thread finished; awaiting finish of 13 more threads 2024-02-29 18:47:46,030: INFO: worker thread finished; awaiting finish of 12 more threads 2024-02-29 18:47:46,085: INFO: worker thread finished; awaiting finish of 11 more threads 2024-02-29 18:47:46,088: INFO: worker thread finished; awaiting finish of 10 more threads 2024-02-29 18:47:46,089: INFO: worker thread finished; awaiting finish of 9 more threads 2024-02-29 18:47:46,091: INFO: worker thread finished; awaiting finish of 8 more threads 2024-02-29 18:47:46,093: INFO: worker thread finished; awaiting finish of 7 more threads 2024-02-29 18:47:46,096: INFO: worker thread finished; awaiting finish of 6 more threads 2024-02-29 18:47:46,097: INFO: worker thread finished; awaiting finish of 5 more threads 2024-02-29 18:47:46,101: INFO: worker thread finished; awaiting finish of 4 more threads 2024-02-29 18:47:46,103: INFO: worker thread finished; awaiting finish of 3 more threads 2024-02-29 18:47:46,109: INFO: worker thread finished; awaiting finish of 2 more threads 2024-02-29 18:47:46,111: INFO: worker thread finished; awaiting finish of 1 more threads 2024-02-29 18:47:46,112: INFO: worker thread finished; awaiting finish of 0 more threads 2024-02-29 18:47:46,113: INFO: EPOCH - 1 : training on 22481838 raw words (19622451 effective words) took 45.9s, 427691 effective words/s 2024-02-29 18:47:47,204: INFO: EPOCH 2 - PROGRESS: at 1.01% examples, 449689 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:47:48,207: INFO: EPOCH 2 - PROGRESS: at 2.25% examples, 457446 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:47:49,216: INFO: EPOCH 2 - PROGRESS: at 3.82% examples, 451966 words/s, in\_qsize 0, out\_qsize 1 2024-02-29 18:47:50,219: INFO: EPOCH 2 - PROGRESS: at 5.19% examples, 454192 words/s, in\_qsize 1, out\_qsize 0 2024-02-29 18:47:51,223: INFO: EPOCH 2 - PROGRESS: at 6.85% examples, 450118 words/s, in\_qsize 6, out\_qsize 1 2024-02-29 18:47:52,231: INFO: EPOCH 2 - PROGRESS: at 8.74% examples, 449910 words/s, in\_qsize 8, out\_qsize 1 2024-02-29 18:47:53,294: INFO: EPOCH 2 - PROGRESS: at 10.49% examples, 448478 words/s, in\_qsize 24, out\_qsize 0 2024-02-29 18:47:54,298: INFO: EPOCH 2 - PROGRESS: at 12.07% examples, 444214 words/s, in\_qsize 34, out\_qsize 0 2024-02-29 18:47:55,302: INFO: EPOCH 2 - PROGRESS: at 13.65% examples, 440894 words/s, in\_qsize 25, out\_qsize 2 2024-02-29 18:47:56,318: INFO: EPOCH 2 - PROGRESS: at 15.50% examples, 438951 words/s, in\_qsize 35, out\_qsize 3 2024-02-29 18:47:57,319: INFO: EPOCH 2 - PROGRESS: at 18.03% examples, 440375 words/s, in\_qsize 45, out\_qsize 1 2024-02-29 18:47:58,408: INFO: EPOCH 2 - PROGRESS: at 19.82% examples, 435335 words/s, in\_qsize 40, out\_qsize 4 2024-02-29 18:47:59,411: INFO: EPOCH 2 - PROGRESS: at 21.98% examples, 436860 words/s, in\_qsize 35, out\_qsize 5 2024-02-29 18:48:00,497: INFO: EPOCH 2 - PROGRESS: at 23.67% examples, 432678 words/s, in\_qsize 38, out\_qsize 9 2024-02-29 18:48:01,515: INFO: EPOCH 2 - PROGRESS: at 25.70% examples, 432856 words/s, in\_qsize 42, out\_qsize 5 2024-02-29 18:48:02,543: INFO: EPOCH 2 - PROGRESS: at 28.45% examples, 433628 words/s, in\_qsize 37, out\_qsize 0 2024-02-29 18:48:03,598: INFO: EPOCH 2 - PROGRESS: at 30.65% examples, 432969 words/s, in\_qsize 47, out\_qsize 1 2024-02-29 18:48:04,608: INFO: EPOCH 2 - PROGRESS: at 32.73% examples, 433091 words/s, in\_qsize 40, out\_qsize 4 2024-02-29 18:48:05,613: INFO: EPOCH 2 - PROGRESS: at 34.90% examples, 432350 words/s, in\_qsize 34, out\_qsize 4 2024-02-29 18:48:06,691: INFO: EPOCH 2 - PROGRESS: at 37.35% examples, 433678 words/s, in\_qsize 42, out\_qsize 0 2024-02-29 18:48:07,700: INFO: EPOCH 2 - PROGRESS: at 39.60% examples, 432978 words/s, in\_qsize 46, out\_qsize 1 2024-02-29 18:48:08,818: INFO: EPOCH 2 - PROGRESS: at 41.95% examples, 430732 words/s, in\_qsize 38, out\_qsize 10 2024-02-29 18:48:09,898: INFO: EPOCH 2 - PROGRESS: at 44.53% examples, 431617 words/s, in\_qsize 37, out\_qsize 7 2024-02-29 18:48:10,914: INFO: EPOCH 2 - PROGRESS: at 47.03% examples, 431466 words/s, in\_qsize 40, out\_qsize 1 2024-02-29 18:48:12,015: INFO: EPOCH 2 - PROGRESS: at 49.07% examples, 430086 words/s, in\_qsize 37, out\_qsize 9 2024-02-29 18:48:13,023: INFO: EPOCH 2 - PROGRESS: at 51.26% examples, 432197 words/s, in\_qsize 46, out\_qsize 0 2024-02-29 18:48:14,103: INFO: EPOCH 2 - PROGRESS: at 53.28% examples, 430321 words/s, in\_qsize 38, out\_qsize 7 2024-02-29 18:48:15,110: INFO: EPOCH 2 - PROGRESS: at 56.13% examples, 432085 words/s, in\_qsize 36, out\_qsize 2 2024-02-29 18:48:16,128: INFO: EPOCH 2 - PROGRESS: at 58.71% examples, 431576 words/s, in\_qsize 43, out\_qsize 1 2024-02-29 18:48:17,130: INFO: EPOCH 2 - PROGRESS: at 61.10% examples, 432362 words/s, in\_qsize 39, out\_qsize 0 2024-02-29 18:48:18,132: INFO: EPOCH 2 - PROGRESS: at 63.86% examples, 432119 words/s, in\_qsize 33, out\_qsize 0 2024-02-29 18:48:19,136: INFO: EPOCH 2 - PROGRESS: at 66.49% examples, 431220 words/s, in\_qsize 47, out\_qsize 0 2024-02-29 18:48:20,189: INFO: EPOCH 2 - PROGRESS: at 69.01% examples, 430994 words/s, in\_qsize 48, out\_qsize 0 2024-02-29 18:48:21,195: INFO: EPOCH 2 - PROGRESS: at 71.91% examples, 431854 words/s, in\_qsize 44, out\_qsize 2 2024-02-29 18:48:22,198: INFO: EPOCH 2 - PROGRESS: at 74.54% examples, 432033 words/s, in\_qsize 47, out\_qsize 0 2024-02-29 18:48:23,198: INFO: EPOCH 2 - PROGRESS: at 76.90% examples, 432597 words/s, in\_qsize 37, out\_qsize 0 2024-02-29 18:48:24,200: INFO: EPOCH 2 - PROGRESS: at 79.89% examples, 432334 words/s, in\_qsize 38, out\_qsize 2 2024-02-29 18:48:25,213: INFO: EPOCH 2 - PROGRESS: at 82.33% examples, 431913 words/s, in\_qsize 43, out\_qsize 2 2024-02-29 18:48:26,297: INFO: EPOCH 2 - PROGRESS: at 85.37% examples, 432540 words/s, in\_qsize 41, out\_qsize 0 2024-02-29 18:48:27,308: INFO: EPOCH 2 - PROGRESS: at 88.31% examples, 432506 words/s, in\_qsize 45, out\_qsize 0 2024-02-29 18:48:28,333: INFO: EPOCH 2 - PROGRESS: at 91.29% examples, 432265 words/s, in\_qsize 46, out\_qsize 1 2024-02-29 18:48:29,367: INFO: EPOCH 2 - PROGRESS: at 94.23% examples, 432832 words/s, in\_qsize 39, out\_qsize 1 2024-02-29 18:48:30,411: INFO: EPOCH 2 - PROGRESS: at 97.04% examples, 432374 words/s, in\_qsize 46, out\_qsize 1 2024-02-29 18:48:31,206: INFO: worker thread finished; awaiting finish of 23 more threads 2024-02-29 18:48:31,216: INFO: worker thread finished; awaiting finish of 22 more threads 2024-02-29 18:48:31,224: INFO: worker thread finished; awaiting finish of 21 more threads 2024-02-29 18:48:31,237: INFO: worker thread finished; awaiting finish of 20 more threads 2024-02-29 18:48:31,246: INFO: worker thread finished; awaiting finish of 19 more threads 2024-02-29 18:48:31,246: INFO: worker thread finished; awaiting finish of 18 more threads 2024-02-29 18:48:31,248: INFO: worker thread finished; awaiting finish of 17 more threads 2024-02-29 18:48:31,255: INFO: worker thread finished; awaiting finish of 16 more threads 2024-02-29 18:48:31,256: INFO: worker thread finished; awaiting finish of 15 more threads 2024-02-29 18:48:31,259: INFO: worker thread finished; awaiting finish of 14 more threads 2024-02-29 18:48:31,261: INFO: worker thread finished; awaiting finish of 13 more threads 2024-02-29 18:48:31,391: INFO: worker thread finished; awaiting finish of 12 more threads 2024-02-29 18:48:31,396: INFO: worker thread finished; awaiting finish of 11 more threads 2024-02-29 18:48:31,398: INFO: worker thread finished; awaiting finish of 10 more threads 2024-02-29 18:48:31,399: INFO: worker thread finished; awaiting finish of 9 more threads 2024-02-29 18:48:31,400: INFO: worker thread finished; awaiting finish of 8 more threads 2024-02-29 18:48:31,400: INFO: worker thread finished; awaiting finish of 7 more threads 2024-02-29 18:48:31,401: INFO: worker thread finished; awaiting finish of 6 more threads 2024-02-29 18:48:31,401: INFO: worker thread finished; awaiting finish of 5 more threads 2024-02-29 18:48:31,402: INFO: worker thread finished; awaiting finish of 4 more threads 2024-02-29 18:48:31,403: INFO: worker thread finished; awaiting finish of 3 more threads 2024-02-29 18:48:31,421: INFO: EPOCH 2 - PROGRESS: at 99.88% examples, 433489 words/s, in\_qsize 2, out\_qsize 1 2024-02-29 18:48:31,422: INFO: worker thread finished; awaiting finish of 2 more threads 2024-02-29 18:48:31,424: INFO: worker thread finished; awaiting finish of 1 more threads 2024-02-29 18:48:31,426: INFO: worker thread finished; awaiting finish of 0 more threads 2024-02-29 18:48:31,426: INFO: EPOCH - 2 : training on 22481838 raw words (19623552 effective words) took 45.2s, 433810 effective words/s 2024-02-29 18:48:32,490: INFO: EPOCH 3 - PROGRESS: at 0.99% examples, 418910 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:48:33,500: INFO: EPOCH 3 - PROGRESS: at 2.25% examples, 446236 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:48:34,505: INFO: EPOCH 3 - PROGRESS: at 3.85% examples, 447744 words/s, in\_qsize 0, out\_qsize 1 2024-02-29 18:48:35,519: INFO: EPOCH 3 - PROGRESS: at 5.21% examples, 449918 words/s, in\_qsize 1, out\_qsize 0 2024-02-29 18:48:36,523: INFO: EPOCH 3 - PROGRESS: at 7.03% examples, 450463 words/s, in\_qsize 0, out\_qsize 1 2024-02-29 18:48:37,534: INFO: EPOCH 3 - PROGRESS: at 8.89% examples, 450600 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:48:38,539: INFO: EPOCH 3 - PROGRESS: at 10.50% examples, 448792 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:48:39,600: INFO: EPOCH 3 - PROGRESS: at 12.22% examples, 447007 words/s, in\_qsize 0, out\_qsize 2 2024-02-29 18:48:40,612: INFO: EPOCH 3 - PROGRESS: at 14.11% examples, 448933 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:48:41,622: INFO: EPOCH 3 - PROGRESS: at 16.20% examples, 449907 words/s, in\_qsize 0, out\_qsize 1 2024-02-29 18:48:42,632: INFO: EPOCH 3 - PROGRESS: at 18.66% examples, 450790 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:48:43,637: INFO: EPOCH 3 - PROGRESS: at 20.68% examples, 450063 words/s, in\_qsize 0, out\_qsize 1 2024-02-29 18:48:44,700: INFO: EPOCH 3 - PROGRESS: at 22.46% examples, 445005 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:48:45,711: INFO: EPOCH 3 - PROGRESS: at 24.22% examples, 441886 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:48:46,725: INFO: EPOCH 3 - PROGRESS: at 26.33% examples, 440543 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:48:47,736: INFO: EPOCH 3 - PROGRESS: at 29.09% examples, 440090 words/s, in\_qsize 0, out\_qsize 1 2024-02-29 18:48:48,790: INFO: EPOCH 3 - PROGRESS: at 31.03% examples, 439551 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:48:49,794: INFO: EPOCH 3 - PROGRESS: at 33.19% examples, 439530 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:48:50,803: INFO: EPOCH 3 - PROGRESS: at 35.60% examples, 440299 words/s, in\_qsize 0, out\_qsize 1 2024-02-29 18:48:51,806: INFO: EPOCH 3 - PROGRESS: at 37.85% examples, 440844 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:48:52,823: INFO: EPOCH 3 - PROGRESS: at 40.15% examples, 439859 words/s, in\_qsize 0, out\_qsize 2 2024-02-29 18:48:53,841: INFO: EPOCH 3 - PROGRESS: at 42.83% examples, 440999 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:48:54,900: INFO: EPOCH 3 - PROGRESS: at 45.18% examples, 440537 words/s, in\_qsize 11, out\_qsize 1 2024-02-29 18:48:55,908: INFO: EPOCH 3 - PROGRESS: at 47.63% examples, 441675 words/s, in\_qsize 6, out\_qsize 0 2024-02-29 18:48:56,927: INFO: EPOCH 3 - PROGRESS: at 49.79% examples, 441534 words/s, in\_qsize 5, out\_qsize 0 2024-02-29 18:48:57,987: INFO: EPOCH 3 - PROGRESS: at 51.68% examples, 440260 words/s, in\_qsize 20, out\_qsize 1 2024-02-29 18:48:59,021: INFO: EPOCH 3 - PROGRESS: at 53.91% examples, 439462 words/s, in\_qsize 40, out\_qsize 4 2024-02-29 18:49:00,103: INFO: EPOCH 3 - PROGRESS: at 56.47% examples, 437661 words/s, in\_qsize 44, out\_qsize 2 2024-02-29 18:49:01,192: INFO: EPOCH 3 - PROGRESS: at 59.31% examples, 437396 words/s, in\_qsize 42, out\_qsize 5 2024-02-29 18:49:02,195: INFO: EPOCH 3 - PROGRESS: at 61.75% examples, 438319 words/s, in\_qsize 47, out\_qsize 0 2024-02-29 18:49:03,207: INFO: EPOCH 3 - PROGRESS: at 64.32% examples, 436848 words/s, in\_qsize 38, out\_qsize 6 2024-02-29 18:49:04,210: INFO: EPOCH 3 - PROGRESS: at 67.21% examples, 437024 words/s, in\_qsize 45, out\_qsize 0 2024-02-29 18:49:05,230: INFO: EPOCH 3 - PROGRESS: at 69.48% examples, 435721 words/s, in\_qsize 35, out\_qsize 4 2024-02-29 18:49:06,234: INFO: EPOCH 3 - PROGRESS: at 72.35% examples, 436815 words/s, in\_qsize 44, out\_qsize 0 2024-02-29 18:49:07,296: INFO: EPOCH 3 - PROGRESS: at 74.96% examples, 435926 words/s, in\_qsize 43, out\_qsize 1 2024-02-29 18:49:08,311: INFO: EPOCH 3 - PROGRESS: at 77.20% examples, 435575 words/s, in\_qsize 42, out\_qsize 0 2024-02-29 18:49:09,311: INFO: EPOCH 3 - PROGRESS: at 80.20% examples, 435663 words/s, in\_qsize 37, out\_qsize 1 2024-02-29 18:49:10,327: INFO: EPOCH 3 - PROGRESS: at 82.80% examples, 435220 words/s, in\_qsize 39, out\_qsize 1 2024-02-29 18:49:11,402: INFO: EPOCH 3 - PROGRESS: at 85.69% examples, 435276 words/s, in\_qsize 40, out\_qsize 0 2024-02-29 18:49:12,490: INFO: EPOCH 3 - PROGRESS: at 88.34% examples, 433332 words/s, in\_qsize 44, out\_qsize 3 2024-02-29 18:49:13,511: INFO: EPOCH 3 - PROGRESS: at 91.61% examples, 434387 words/s, in\_qsize 41, out\_qsize 0 2024-02-29 18:49:14,520: INFO: EPOCH 3 - PROGRESS: at 94.45% examples, 434627 words/s, in\_qsize 39, out\_qsize 0 2024-02-29 18:49:15,619: INFO: EPOCH 3 - PROGRESS: at 97.01% examples, 432646 words/s, in\_qsize 42, out\_qsize 7 2024-02-29 18:49:16,301: INFO: worker thread finished; awaiting finish of 23 more threads 2024-02-29 18:49:16,332: INFO: worker thread finished; awaiting finish of 22 more threads 2024-02-29 18:49:16,349: INFO: worker thread finished; awaiting finish of 21 more threads 2024-02-29 18:49:16,350: INFO: worker thread finished; awaiting finish of 20 more threads 2024-02-29 18:49:16,350: INFO: worker thread finished; awaiting finish of 19 more threads 2024-02-29 18:49:16,360: INFO: worker thread finished; awaiting finish of 18 more threads 2024-02-29 18:49:16,396: INFO: worker thread finished; awaiting finish of 17 more threads 2024-02-29 18:49:16,503: INFO: worker thread finished; awaiting finish of 16 more threads 2024-02-29 18:49:16,505: INFO: worker thread finished; awaiting finish of 15 more threads 2024-02-29 18:49:16,521: INFO: worker thread finished; awaiting finish of 14 more threads 2024-02-29 18:49:16,522: INFO: worker thread finished; awaiting finish of 13 more threads 2024-02-29 18:49:16,524: INFO: worker thread finished; awaiting finish of 12 more threads 2024-02-29 18:49:16,526: INFO: worker thread finished; awaiting finish of 11 more threads 2024-02-29 18:49:16,528: INFO: worker thread finished; awaiting finish of 10 more threads 2024-02-29 18:49:16,530: INFO: worker thread finished; awaiting finish of 9 more threads 2024-02-29 18:49:16,534: INFO: worker thread finished; awaiting finish of 8 more threads 2024-02-29 18:49:16,588: INFO: worker thread finished; awaiting finish of 7 more threads 2024-02-29 18:49:16,590: INFO: worker thread finished; awaiting finish of 6 more threads 2024-02-29 18:49:16,592: INFO: worker thread finished; awaiting finish of 5 more threads 2024-02-29 18:49:16,593: INFO: worker thread finished; awaiting finish of 4 more threads 2024-02-29 18:49:16,595: INFO: worker thread finished; awaiting finish of 3 more threads 2024-02-29 18:49:16,601: INFO: worker thread finished; awaiting finish of 2 more threads 2024-02-29 18:49:16,603: INFO: worker thread finished; awaiting finish of 1 more threads 2024-02-29 18:49:16,604: INFO: worker thread finished; awaiting finish of 0 more threads 2024-02-29 18:49:16,604: INFO: EPOCH - 3 : training on 22481838 raw words (19623091 effective words) took 45.2s, 434423 effective words/s 2024-02-29 18:49:17,620: INFO: EPOCH 4 - PROGRESS: at 0.98% examples, 429701 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:49:18,622: INFO: EPOCH 4 - PROGRESS: at 2.10% examples, 430190 words/s, in\_qsize 1, out\_qsize 0 2024-02-29 18:49:19,635: INFO: EPOCH 4 - PROGRESS: at 3.71% examples, 437753 words/s, in\_qsize 10, out\_qsize 0 2024-02-29 18:49:20,693: INFO: EPOCH 4 - PROGRESS: at 5.06% examples, 436125 words/s, in\_qsize 15, out\_qsize 0 2024-02-29 18:49:21,701: INFO: EPOCH 4 - PROGRESS: at 6.59% examples, 432637 words/s, in\_qsize 13, out\_qsize 1 2024-02-29 18:49:22,707: INFO: EPOCH 4 - PROGRESS: at 8.38% examples, 432137 words/s, in\_qsize 23, out\_qsize 4 2024-02-29 18:49:23,731: INFO: EPOCH 4 - PROGRESS: at 10.00% examples, 433981 words/s, in\_qsize 40, out\_qsize 2 2024-02-29 18:49:24,794: INFO: EPOCH 4 - PROGRESS: at 11.84% examples, 432224 words/s, in\_qsize 43, out\_qsize 0 2024-02-29 18:49:25,798: INFO: EPOCH 4 - PROGRESS: at 13.42% examples, 431804 words/s, in\_qsize 44, out\_qsize 3 2024-02-29 18:49:26,844: INFO: EPOCH 4 - PROGRESS: at 15.42% examples, 431292 words/s, in\_qsize 35, out\_qsize 0 2024-02-29 18:49:27,892: INFO: EPOCH 4 - PROGRESS: at 17.81% examples, 429704 words/s, in\_qsize 41, out\_qsize 2 2024-02-29 18:49:28,893: INFO: EPOCH 4 - PROGRESS: at 19.74% examples, 431790 words/s, in\_qsize 41, out\_qsize 0 2024-02-29 18:49:29,898: INFO: EPOCH 4 - PROGRESS: at 21.87% examples, 432649 words/s, in\_qsize 42, out\_qsize 0 2024-02-29 18:49:30,909: INFO: EPOCH 4 - PROGRESS: at 23.74% examples, 434112 words/s, in\_qsize 42, out\_qsize 0 2024-02-29 18:49:31,991: INFO: EPOCH 4 - PROGRESS: at 25.82% examples, 432283 words/s, in\_qsize 45, out\_qsize 2 2024-02-29 18:49:33,001: INFO: EPOCH 4 - PROGRESS: at 28.64% examples, 434800 words/s, in\_qsize 46, out\_qsize 0 2024-02-29 18:49:34,103: INFO: EPOCH 4 - PROGRESS: at 30.87% examples, 433345 words/s, in\_qsize 43, out\_qsize 2 2024-02-29 18:49:35,112: INFO: EPOCH 4 - PROGRESS: at 33.13% examples, 435904 words/s, in\_qsize 41, out\_qsize 0 2024-02-29 18:49:36,128: INFO: EPOCH 4 - PROGRESS: at 35.30% examples, 433737 words/s, in\_qsize 47, out\_qsize 0 2024-02-29 18:49:37,212: INFO: EPOCH 4 - PROGRESS: at 37.32% examples, 430731 words/s, in\_qsize 40, out\_qsize 6 2024-02-29 18:49:38,213: INFO: EPOCH 4 - PROGRESS: at 39.66% examples, 431553 words/s, in\_qsize 42, out\_qsize 5 2024-02-29 18:49:39,293: INFO: EPOCH 4 - PROGRESS: at 42.19% examples, 431437 words/s, in\_qsize 44, out\_qsize 3 2024-02-29 18:49:40,304: INFO: EPOCH 4 - PROGRESS: at 44.65% examples, 432954 words/s, in\_qsize 45, out\_qsize 0 2024-02-29 18:49:41,333: INFO: EPOCH 4 - PROGRESS: at 47.12% examples, 432816 words/s, in\_qsize 37, out\_qsize 3 2024-02-29 18:49:42,410: INFO: EPOCH 4 - PROGRESS: at 49.09% examples, 431043 words/s, in\_qsize 39, out\_qsize 8 2024-02-29 18:49:43,429: INFO: EPOCH 4 - PROGRESS: at 51.39% examples, 433326 words/s, in\_qsize 44, out\_qsize 1 2024-02-29 18:49:44,440: INFO: EPOCH 4 - PROGRESS: at 53.24% examples, 431354 words/s, in\_qsize 47, out\_qsize 0 2024-02-29 18:49:45,491: INFO: EPOCH 4 - PROGRESS: at 55.48% examples, 428766 words/s, in\_qsize 38, out\_qsize 9 2024-02-29 18:49:46,559: INFO: EPOCH 4 - PROGRESS: at 57.71% examples, 426289 words/s, in\_qsize 35, out\_qsize 12 2024-02-29 18:49:47,608: INFO: EPOCH 4 - PROGRESS: at 60.59% examples, 427738 words/s, in\_qsize 45, out\_qsize 2 2024-02-29 18:49:48,625: INFO: EPOCH 4 - PROGRESS: at 63.18% examples, 427658 words/s, in\_qsize 39, out\_qsize 1 2024-02-29 18:49:49,637: INFO: EPOCH 4 - PROGRESS: at 65.67% examples, 426155 words/s, in\_qsize 37, out\_qsize 1 2024-02-29 18:49:50,664: INFO: EPOCH 4 - PROGRESS: at 67.88% examples, 424348 words/s, in\_qsize 44, out\_qsize 0 2024-02-29 18:49:51,688: INFO: EPOCH 4 - PROGRESS: at 70.41% examples, 423689 words/s, in\_qsize 42, out\_qsize 3 2024-02-29 18:49:52,705: INFO: EPOCH 4 - PROGRESS: at 72.83% examples, 423343 words/s, in\_qsize 40, out\_qsize 4 2024-02-29 18:49:53,715: INFO: EPOCH 4 - PROGRESS: at 75.34% examples, 424144 words/s, in\_qsize 45, out\_qsize 1 2024-02-29 18:49:54,785: INFO: EPOCH 4 - PROGRESS: at 78.02% examples, 423646 words/s, in\_qsize 37, out\_qsize 0 2024-02-29 18:49:55,797: INFO: EPOCH 4 - PROGRESS: at 81.04% examples, 424429 words/s, in\_qsize 44, out\_qsize 0 2024-02-29 18:49:56,821: INFO: EPOCH 4 - PROGRESS: at 83.55% examples, 424469 words/s, in\_qsize 39, out\_qsize 1 2024-02-29 18:49:57,893: INFO: EPOCH 4 - PROGRESS: at 86.34% examples, 424022 words/s, in\_qsize 43, out\_qsize 2 2024-02-29 18:49:58,894: INFO: EPOCH 4 - PROGRESS: at 89.27% examples, 424351 words/s, in\_qsize 47, out\_qsize 0 2024-02-29 18:49:59,911: INFO: EPOCH 4 - PROGRESS: at 92.15% examples, 424126 words/s, in\_qsize 39, out\_qsize 1 2024-02-29 18:50:00,919: INFO: EPOCH 4 - PROGRESS: at 95.19% examples, 424954 words/s, in\_qsize 38, out\_qsize 3 2024-02-29 18:50:02,003: INFO: EPOCH 4 - PROGRESS: at 97.84% examples, 423982 words/s, in\_qsize 46, out\_qsize 3 2024-02-29 18:50:02,395: INFO: worker thread finished; awaiting finish of 23 more threads 2024-02-29 18:50:02,417: INFO: worker thread finished; awaiting finish of 22 more threads 2024-02-29 18:50:02,418: INFO: worker thread finished; awaiting finish of 21 more threads 2024-02-29 18:50:02,505: INFO: worker thread finished; awaiting finish of 20 more threads 2024-02-29 18:50:02,508: INFO: worker thread finished; awaiting finish of 19 more threads 2024-02-29 18:50:02,509: INFO: worker thread finished; awaiting finish of 18 more threads 2024-02-29 18:50:02,519: INFO: worker thread finished; awaiting finish of 17 more threads 2024-02-29 18:50:02,520: INFO: worker thread finished; awaiting finish of 16 more threads 2024-02-29 18:50:02,536: INFO: worker thread finished; awaiting finish of 15 more threads 2024-02-29 18:50:02,603: INFO: worker thread finished; awaiting finish of 14 more threads 2024-02-29 18:50:02,615: INFO: worker thread finished; awaiting finish of 13 more threads 2024-02-29 18:50:02,617: INFO: worker thread finished; awaiting finish of 12 more threads 2024-02-29 18:50:02,620: INFO: worker thread finished; awaiting finish of 11 more threads 2024-02-29 18:50:02,621: INFO: worker thread finished; awaiting finish of 10 more threads 2024-02-29 18:50:02,629: INFO: worker thread finished; awaiting finish of 9 more threads 2024-02-29 18:50:02,631: INFO: worker thread finished; awaiting finish of 8 more threads 2024-02-29 18:50:02,632: INFO: worker thread finished; awaiting finish of 7 more threads 2024-02-29 18:50:02,694: INFO: worker thread finished; awaiting finish of 6 more threads 2024-02-29 18:50:02,697: INFO: worker thread finished; awaiting finish of 5 more threads 2024-02-29 18:50:02,699: INFO: worker thread finished; awaiting finish of 4 more threads 2024-02-29 18:50:02,701: INFO: worker thread finished; awaiting finish of 3 more threads 2024-02-29 18:50:02,701: INFO: worker thread finished; awaiting finish of 2 more threads 2024-02-29 18:50:02,702: INFO: worker thread finished; awaiting finish of 1 more threads 2024-02-29 18:50:02,702: INFO: worker thread finished; awaiting finish of 0 more threads 2024-02-29 18:50:02,703: INFO: EPOCH - 4 : training on 22481838 raw words (19624175 effective words) took 46.1s, 425765 effective words/s 2024-02-29 18:50:03,718: INFO: EPOCH 5 - PROGRESS: at 0.99% examples, 448165 words/s, in\_qsize 0, out\_qsize 0 2024-02-29 18:50:04,724: INFO: EPOCH 5 - PROGRESS: at 2.21% examples, 445807 words/s, in\_qsize 3, out\_qsize 2 2024-02-29 18:50:05,726: INFO: EPOCH 5 - PROGRESS: at 3.81% examples, 451942 words/s, in\_qsize 0, out\_qsize 1 2024-02-29 18:50:06,806: INFO: EPOCH 5 - PROGRESS: at 5.07% examples, 433831 words/s, in\_qsize 9, out\_qsize 7 2024-02-29 18:50:07,816: INFO: EPOCH 5 - PROGRESS: at 6.80% examples, 440929 words/s, in\_qsize 16, out\_qsize 3 2024-02-29 18:50:08,850: INFO: EPOCH 5 - PROGRESS: at 8.61% examples, 437993 words/s, in\_qsize 22, out\_qsize 0 2024-02-29 18:50:09,891: INFO: EPOCH 5 - PROGRESS: at 10.26% examples, 438057 words/s, in\_qsize 39, out\_qsize 0 2024-02-29 18:50:10,893: INFO: EPOCH 5 - PROGRESS: at 11.96% examples, 436458 words/s, in\_qsize 42, out\_qsize 1 2024-02-29 18:50:11,921: INFO: EPOCH 5 - PROGRESS: at 13.59% examples, 435171 words/s, in\_qsize 41, out\_qsize 0 2024-02-29 18:50:12,937: INFO: EPOCH 5 - PROGRESS: at 15.48% examples, 434585 words/s, in\_qsize 41, out\_qsize 0 2024-02-29 18:50:14,065: INFO: EPOCH 5 - PROGRESS: at 17.99% examples, 430844 words/s, in\_qsize 43, out\_qsize 0 2024-02-29 18:50:15,097: INFO: EPOCH 5 - PROGRESS: at 19.98% examples, 431863 words/s, in\_qsize 47, out\_qsize 0 2024-02-29 18:50:16,109: INFO: EPOCH 5 - PROGRESS: at 21.77% examples, 427576 words/s, in\_qsize 42, out\_qsize 5 2024-02-29 18:50:17,119: INFO: EPOCH 5 - PROGRESS: at 23.64% examples, 429198 words/s, in\_qsize 38, out\_qsize 2 2024-02-29 18:50:18,194: INFO: EPOCH 5 - PROGRESS: at 25.58% examples, 425934 words/s, in\_qsize 42, out\_qsize 4 2024-02-29 18:50:19,198: INFO: EPOCH 5 - PROGRESS: at 28.08% examples, 426551 words/s, in\_qsize 44, out\_qsize 0 2024-02-29 18:50:20,211: INFO: EPOCH 5 - PROGRESS: at 30.47% examples, 428039 words/s, in\_qsize 47, out\_qsize 0 2024-02-29 18:50:21,246: INFO: EPOCH 5 - PROGRESS: at 32.64% examples, 428557 words/s, in\_qsize 47, out\_qsize 0 2024-02-29 18:50:22,249: INFO: EPOCH 5 - PROGRESS: at 34.85% examples, 429321 words/s, in\_qsize 42, out\_qsize 0 2024-02-29 18:50:23,330: INFO: EPOCH 5 - PROGRESS: at 37.07% examples, 428671 words/s, in\_qsize 48, out\_qsize 0 2024-02-29 18:50:24,394: INFO: EPOCH 5 - PROGRESS: at 39.49% examples, 428834 words/s, in\_qsize 38, out\_qsize 0 2024-02-29 18:50:25,396: INFO: EPOCH 5 - PROGRESS: at 41.68% examples, 427864 words/s, in\_qsize 41, out\_qsize 3 2024-02-29 18:50:26,400: INFO: EPOCH 5 - PROGRESS: at 43.97% examples, 427630 words/s, in\_qsize 47, out\_qsize 0 2024-02-29 18:50:27,412: INFO: EPOCH 5 - PROGRESS: at 46.74% examples, 429013 words/s, in\_qsize 39, out\_qsize 4 2024-02-29 18:50:28,488: INFO: EPOCH 5 - PROGRESS: at 48.66% examples, 428027 words/s, in\_qsize 41, out\_qsize 3 2024-02-29 18:50:29,507: INFO: EPOCH 5 - PROGRESS: at 50.84% examples, 428856 words/s, in\_qsize 45, out\_qsize 2 2024-02-29 18:50:30,512: INFO: EPOCH 5 - PROGRESS: at 52.94% examples, 429820 words/s, in\_qsize 44, out\_qsize 1 2024-02-29 18:50:31,546: INFO: EPOCH 5 - PROGRESS: at 55.59% examples, 430131 words/s, in\_qsize 44, out\_qsize 0 2024-02-29 18:50:32,548: INFO: EPOCH 5 - PROGRESS: at 58.32% examples, 430904 words/s, in\_qsize 36, out\_qsize 2 2024-02-29 18:50:33,599: INFO: EPOCH 5 - PROGRESS: at 60.84% examples, 430905 words/s, in\_qsize 43, out\_qsize 1 2024-02-29 18:50:34,610: INFO: EPOCH 5 - PROGRESS: at 63.49% examples, 430790 words/s, in\_qsize 43, out\_qsize 4 2024-02-29 18:50:35,695: INFO: EPOCH 5 - PROGRESS: at 66.64% examples, 431336 words/s, in\_qsize 42, out\_qsize 4 2024-02-29 18:50:36,700: INFO: EPOCH 5 - PROGRESS: at 69.35% examples, 432565 words/s, in\_qsize 44, out\_qsize 1 2024-02-29 18:50:37,702: INFO: EPOCH 5 - PROGRESS: at 72.20% examples, 433320 words/s, in\_qsize 44, out\_qsize 0 2024-02-29 18:50:38,702: INFO: EPOCH 5 - PROGRESS: at 74.68% examples, 432933 words/s, in\_qsize 40, out\_qsize 1 2024-02-29 18:50:39,731: INFO: EPOCH 5 - PROGRESS: at 77.04% examples, 433221 words/s, in\_qsize 44, out\_qsize 0 2024-02-29 18:50:40,794: INFO: EPOCH 5 - PROGRESS: at 80.27% examples, 433750 words/s, in\_qsize 39, out\_qsize 0 2024-02-29 18:50:41,813: INFO: EPOCH 5 - PROGRESS: at 82.70% examples, 432642 words/s, in\_qsize 39, out\_qsize 4 2024-02-29 18:50:42,822: INFO: EPOCH 5 - PROGRESS: at 85.57% examples, 433342 words/s, in\_qsize 36, out\_qsize 2 2024-02-29 18:50:43,823: INFO: EPOCH 5 - PROGRESS: at 88.48% examples, 433558 words/s, in\_qsize 45, out\_qsize 0 2024-02-29 18:50:44,850: INFO: EPOCH 5 - PROGRESS: at 91.58% examples, 433633 words/s, in\_qsize 48, out\_qsize 0 2024-02-29 18:50:45,909: INFO: EPOCH 5 - PROGRESS: at 94.50% examples, 433724 words/s, in\_qsize 39, out\_qsize 0 2024-02-29 18:50:46,918: INFO: EPOCH 5 - PROGRESS: at 97.38% examples, 433581 words/s, in\_qsize 46, out\_qsize 1 2024-02-29 18:50:47,544: INFO: worker thread finished; awaiting finish of 23 more threads 2024-02-29 18:50:47,589: INFO: worker thread finished; awaiting finish of 22 more threads 2024-02-29 18:50:47,591: INFO: worker thread finished; awaiting finish of 21 more threads 2024-02-29 18:50:47,612: INFO: worker thread finished; awaiting finish of 20 more threads 2024-02-29 18:50:47,823: INFO: worker thread finished; awaiting finish of 19 more threads 2024-02-29 18:50:47,825: INFO: worker thread finished; awaiting finish of 18 more threads 2024-02-29 18:50:47,829: INFO: worker thread finished; awaiting finish of 17 more threads 2024-02-29 18:50:47,833: INFO: worker thread finished; awaiting finish of 16 more threads 2024-02-29 18:50:47,836: INFO: worker thread finished; awaiting finish of 15 more threads 2024-02-29 18:50:47,839: INFO: worker thread finished; awaiting finish of 14 more threads 2024-02-29 18:50:47,843: INFO: worker thread finished; awaiting finish of 13 more threads 2024-02-29 18:50:47,846: INFO: worker thread finished; awaiting finish of 12 more threads 2024-02-29 18:50:47,888: INFO: worker thread finished; awaiting finish of 11 more threads 2024-02-29 18:50:47,892: INFO: worker thread finished; awaiting finish of 10 more threads 2024-02-29 18:50:47,893: INFO: worker thread finished; awaiting finish of 9 more threads 2024-02-29 18:50:47,898: INFO: worker thread finished; awaiting finish of 8 more threads 2024-02-29 18:50:47,900: INFO: worker thread finished; awaiting finish of 7 more threads 2024-02-29 18:50:47,901: INFO: worker thread finished; awaiting finish of 6 more threads 2024-02-29 18:50:47,902: INFO: worker thread finished; awaiting finish of 5 more threads 2024-02-29 18:50:47,986: INFO: EPOCH 5 - PROGRESS: at 99.79% examples, 432737 words/s, in\_qsize 4, out\_qsize 1 2024-02-29 18:50:47,988: INFO: worker thread finished; awaiting finish of 4 more threads 2024-02-29 18:50:47,989: INFO: worker thread finished; awaiting finish of 3 more threads 2024-02-29 18:50:47,989: INFO: worker thread finished; awaiting finish of 2 more threads 2024-02-29 18:50:47,995: INFO: worker thread finished; awaiting finish of 1 more threads 2024-02-29 18:50:48,000: INFO: worker thread finished; awaiting finish of 0 more threads 2024-02-29 18:50:48,001: INFO: EPOCH - 5 : training on 22481838 raw words (19624912 effective words) took 45.3s, 433313 effective words/s 2024-02-29 18:50:48,002: INFO: training on a 112409190 raw words (98118181 effective words) took 227.8s, 430764 effective words/s 2024-02-29 18:50:48,002: INFO: saving Word2Vec object under wiki.zh.text.model, separately None 2024-02-29 18:50:48,003: INFO: storing np array 'vectors' to wiki.zh.text.model.wv.vectors.npy 2024-02-29 18:50:48,099: INFO: not storing attribute vectors\_norm 2024-02-29 18:50:48,100: INFO: storing np array 'syn1neg' to wiki.zh.text.model.trainables.syn1neg.npy 2024-02-29 18:50:48,189: INFO: not storing attribute cum\_table 2024-02-29 18:50:48,857: INFO: saved wiki.zh.text.model 2024-02-29 18:50:48,859: INFO: storing 240560x100 projection weights into wiki.zh.text.vector
##################### 测试 ################### import gensim from gensim.models import Word2Vec model = Word2Vec.load("wiki.zh.text.model") # 加载模型 count = 0 # 打印前10个词向量 for word in model.wv.index2word: print(word, '[', model[word], ']') # 打印每个词对应的词向量 count += 1 if count >= 10: break print("") result = model.most_similar(u"铁路") # 返回跟指定词语相似度最高的词 for r in result: print(r) print("") result = model.most_similar(u"中药") # 返回跟指定词语相似度最高的词 for r in result: print(r) print("") result = model.most_similar(u"普京") # 返回跟指定词语相似度最高的词 for r in result: print(r) print("")
输出(训练过程略):
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel\_launcher.py:9: DeprecationWarning: Call to deprecated \`\_\_getitem\_\_\` (Method will be removed in 4.0.0, use self.wv.\_\_getitem\_\_() instead). if \_\_name\_\_ == '\_\_main\_\_': /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel\_launcher.py:15: DeprecationWarning: Call to deprecated \`most\_similar\` (Method will be removed in 4.0.0, use self.wv.most\_similar() instead). from ipykernel import kernelapp as app /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel\_launcher.py:20: DeprecationWarning: Call to deprecated \`most\_similar\` (Method will be removed in 4.0.0, use self.wv.most\_similar() instead). 的 \[ \[ 0.05327865 -0.53783256 0.4011491 1.0377467 -0.37736186 -1.3369029 1.0646014 -0.1514761 1.5452628 -1.0377175 0.0472149 -0.8363005 -0.10084558 -0.48065135 -0.3601034 0.94604933 0.8394771 -0.6131299 1.5977417 -2.3976393 0.4921375 0.7588338 -0.32347357 -0.1854495 -0.5323685 -0.5173837 0.51421505 -0.52293605 -0.36417717 -0.90888894 -0.13158794 1.4198147 -0.8280145 -0.3019174 2.1073706 -0.760788 1.2119099 0.3257739 -0.8752619 0.13358122 -1.3738849 -0.57065696 0.5482845 -0.44655856 -0.2226158 -0.274896 0.6865202 -0.0033878 -0.2763316 -0.36525854 -0.70503396 -0.64678556 -0.29910237 0.38098514 -1.4872898 0.03365944 -0.82742816 -0.43514818 -1.2130035 -0.11949506 -0.29346514 2.0837998 0.17063631 0.17331794 0.33808464 -0.4261683 1.0569005 -1.4714183 0.33974665 -1.5394073 -0.9799224 -0.54741603 0.48417726 0.51358485 -0.5715329 -0.12952082 0.4293968 -1.0172116 1.1407273 -0.88506544 -0.5702839 -1.3481365 -0.39994067 -2.0000238 1.2328296 -1.1719111 -0.9050281 1.1634829 0.07408974 -1.2275641 0.27946717 -1.2653685 1.3553606 -1.6927024 -0.7033033 0.2693373 1.253629 0.6496037 0.2191684 -0.78412926\] \] 在 \[ \[-0.42463258 -2.4437954 0.53217393 0.9996975 0.37987134 -0.88477343 1.0971447 1.5294083 1.0648928 -1.558458 -0.36555728 -0.42607346 -1.3693268 -0.6986576 0.5938768 0.0783068 0.758885 -0.83025175 1.2274495 -0.9937148 -0.33092266 2.093802 -0.33651614 0.21035707 0.63450843 1.1645601 -1.1898849 -0.07593375 -0.88322973 -0.19563046 0.9769286 0.74906284 -0.70940083 1.4368366 0.9067723 -0.44570225 1.0358443 1.1667545 1.0775245 -0.9527623 -0.95970166 -0.70979124 -0.5310172 0.36139968 -0.18026341 1.4736971 1.6084458 -1.705582 -0.10648517 -0.1105919 -0.25159562 -0.00873835 -0.26249817 2.1622958 -1.5742291 0.14910135 -1.2894114 0.2511249 -1.1792454 -0.72360325 -0.07263664 3.9779882 -0.82457787 0.02922174 -0.57287693 0.34086442 1.1984884 1.0886639 0.8197982 -0.77552193 0.70042676 -1.4123865 -0.54220575 -1.0212927 0.2889944 0.24239118 0.1648594 -0.7769327 0.16848186 1.2421886 -1.6019603 -1.6944915 1.0919546 -0.3363789 -0.82180744 1.2388902 -1.4864118 0.7689477 -1.170416 -1.4182153 1.2648599 -2.2608922 1.408338 -1.6608157 -0.6797598 1.494119 1.197311 1.7018986 -1.045801 -0.14495309\] \] 是 \[ \[ 2.0627701 -1.1885465 1.3413388 -0.63310647 -1.0378323 0.191146 -1.0518003 -1.0862353 1.4891928 0.37143752 -0.70622504 0.0780537 0.02657107 -1.2314191 -0.09800842 0.7191717 1.0831716 -0.6678763 2.553883 -1.2007903 -0.73821676 0.3530014 1.721368 1.3866593 0.7923605 -1.3692601 0.41672337 -0.96140575 -2.3858385 -0.26267844 1.114126 1.1806949 1.0037898 1.5768572 2.0220714 0.5763852 1.6764815 1.9266133 -1.358343 0.15191413 -0.5946121 -2.6195357 1.6187361 -0.7356461 -1.473615 -0.76726705 2.1406848 0.30505633 2.9442768 0.3789943 -1.1278807 0.3917617 0.770161 0.73717993 -0.9430313 -0.2679599 -2.0000083 -0.7008843 -1.0499135 -0.6178511 2.1850324 2.9828587 0.6661941 2.053006 -1.086323 0.04475425 2.53099 -0.8302406 0.64353186 -1.8250593 1.1532271 -1.144374 0.589161 -1.6655053 -0.9623368 1.7211503 0.9463574 -1.9714044 -1.3936982 -1.1316496 0.17267536 -0.4145161 -0.3099934 0.256562 1.015426 0.3788599 -1.9755847 0.81467634 1.02809 -1.4076906 -1.868017 -1.1592947 -0.26673457 -1.2610213 0.5215924 -1.102127 -0.27354524 -1.1902376 -0.36140302 0.69883007\] \] 年 \[ \[-0.7686671 0.08463874 1.32112 0.90194964 1.3142313 0.2762654 -0.92944527 0.7551384 1.0904723 -1.7634455 -0.14772087 0.9270074 0.70371234 0.0248665 2.1429276 0.3456471 1.2979926 1.3280556 -1.0709156 -1.4325314 -2.0553591 1.9210931 -0.2635952 0.89939356 -0.24535367 0.12382335 -0.34222543 -1.4257516 -0.16413423 2.005949 1.1495656 0.9052044 -1.0064452 1.6927723 1.2470132 0.85299474 1.7945793 -2.3215466 -1.4006611 0.16407704 -0.17039333 0.59470963 -1.1873173 1.6482116 -0.9101744 2.4193552 0.07334835 -1.1106066 -0.32891974 -0.10809796 -1.2178708 1.6356549 -0.46528345 1.185685 -1.0507497 -0.16127113 -0.40930077 -1.7317686 -1.3865246 1.4947401 0.03928837 1.915953 0.2928778 1.169346 -1.2584093 0.75220495 -1.5323597 0.6966527 1.0286992 -3.1477675 -0.01011731 0.47440663 -0.01000467 -1.3776034 -0.16935979 -0.2138399 0.5649436 -2.1339822 -1.8523834 -0.82796097 0.86329234 -4.4993134 0.31964254 -0.16166072 1.3355536 -1.6848409 -0.40493208 -0.1744769 -0.0065302 -1.1696439 -0.29254827 2.644544 1.0520754 -2.0030859 1.9328154 -1.7976549 1.9641755 -0.9358227 -2.0198176 1.7621045 \] \] 和 \[ \[-1.7936664 -0.60258466 -0.33815056 1.7664076 1.1888294 -0.5124309 -0.982421 -1.5381647 1.3690406 -0.34940612 -0.18159316 0.34924024 -2.1175427 -0.39139643 -0.97609144 0.7091031 1.2836043 0.59916985 -0.44169012 -1.2265047 0.25998732 0.9211792 -0.4099178 0.11590376 -0.28670695 2.7602255 -0.77220744 -0.7016491 -1.6584926 -1.2257215 0.88776666 -0.25778687 0.49061418 0.48738685 -0.56769925 -2.2035594 2.6515436 -0.37563872 -0.08108984 0.17916249 -1.954872 -0.32587513 -0.2813556 -0.71491474 -0.55605733 0.27773035 0.22445679 -0.1038675 -0.66065305 1.0714678 -1.412449 -0.14055876 0.17481208 0.6475259 1.9205297 -0.85978746 -1.7288083 -0.92688423 -0.1334583 0.09569005 -1.290859 0.77196133 -0.01910409 0.5789173 -0.51498264 -0.8700445 2.5325863 -0.4368111 0.79267114 -0.28794852 0.7135503 0.00821535 -0.13613094 0.7516194 -0.8653614 1.2800741 -0.51343066 -1.5026168 0.54250664 -1.3580089 -1.1880492 -1.5932618 1.1894258 -1.4418019 -0.41850865 -1.1452755 -0.9339831 -0.12613311 0.7358218 0.08830387 -0.5067824 0.03577109 -2.0329843 -0.35796794 -1.0161774 -1.0131522 -0.43137753 0.33253494 0.4018132 -1.41702 \] \] 了 \[ \[-1.9436138e+00 1.6483014e+00 7.3397523e-01 2.5087681e+00 1.2167963e-01 -4.3142447e+00 1.8856712e-01 -5.9046990e-01 8.0753857e-01 -1.4349873e+00 -2.4201753e+00 -9.8747307e-01 -1.1762297e+00 -4.0771633e-01 -4.7250494e-01 1.4274366e+00 4.2959139e-01 1.1849896e+00 1.4658893e+00 -2.1643031e+00 -4.9282961e-02 -2.5011623e-01 -7.6726717e-01 2.0264297e+00 -9.4920123e-01 -1.2521985e+00 -2.0591247e+00 -1.2519429e+00 -1.5353471e+00 9.2382416e-02 4.3579984e-01 3.0063396e+00 -2.4839160e-01 8.4310241e-02 2.0635054e+00 -1.1391885e+00 2.2873564e+00 -1.3363756e+00 2.0226948e+00 -1.0125631e+00 5.2646023e-01 -2.0331869e+00 -1.9216677e+00 4.7612253e-01 -9.8945385e-01 6.6175139e-01 1.1421987e+00 2.1541378e-01 7.3244750e-01 2.6114142e-01 1.7791729e-01 -2.2847609e-01 4.1287571e-01 8.8611461e-02 -6.1350155e-01 2.7324381e+00 -2.9383843e+00 1.7865591e+00 1.0036942e+00 -2.0990545e-01 2.7161896e-01 2.4153254e+00 1.5154275e-01 -1.2099750e+00 -1.5965548e+00 1.6759452e+00 -8.3815706e-01 9.7805393e-01 1.5085987e+00 3.0611422e-02 1.8509774e+00 7.3120952e-01 1.6457441e+00 -2.7104132e+00 1.2034345e+00 -2.1080136e+00 -7.5097762e-02 -6.3763016e-01 1.1206281e+00 6.5688306e-01 -8.1922483e-01 -6.7665690e-01 -9.2754817e-01 -2.1539629e+00 3.3879298e-01 -8.6143786e-01 3.0885071e-01 1.6986367e-03 -2.8715498e+00 -2.4140685e+00 -7.0239681e-01 -3.5281119e-01 -1.1388317e+00 -2.9193931e+00 -8.3260250e-01 1.1267102e+00 6.9696531e-02 7.8351122e-01 -1.2417021e+00 5.3507799e-01\] \] 於 \[ \[-0.53269184 -0.36012843 -0.90692663 -0.362973 1.6366956 0.43958563 3.5067036 2.6491318 2.0490243 -2.5787504 -0.21314327 0.4410392 -1.6150179 -1.46432 -1.2484831 0.1407568 1.9192587 -2.6820233 1.0737547 -0.24800494 -1.0269834 1.7373953 1.3810781 -0.4585215 0.2519634 3.2757533 -0.6035296 0.35779628 -0.948003 -0.16447543 0.31204602 0.15876536 1.2921379 0.35225153 2.5887783 0.41650772 0.35305426 2.0292234 0.10431505 1.3056135 -2.4140575 -2.002597 -1.6638165 2.0507886 0.98914206 -0.43331012 1.8916605 0.05047855 1.0437186 -1.1184938 -2.9102814 -0.44862002 -1.5645779 2.8235185 1.924463 -3.3811007 -0.35813704 -1.3784553 -0.13405009 -0.46785113 0.43813846 -0.22884116 -2.258125 1.3256868 0.9471638 -0.7640105 1.5571808 3.67068 -1.6710087 -0.7790608 2.6240816 -0.45023972 1.9663687 1.0383086 -1.4122825 -1.4562843 0.91329277 1.5664321 -2.1609735 0.69369364 0.80576926 -2.248431 -0.3148186 3.7729433 1.9283202 1.3789957 1.7161353 0.41128302 -3.1765096 -2.251166 0.8560807 -2.5485353 1.842208 -0.95322704 -0.38343704 1.9791752 1.5500413 1.2364751 -2.791452 0.86852646\] \] 有 \[ \[ 0.18258221 -2.3049097 1.497023 -1.3487074 -3.727378 0.50723153 0.24911746 -0.9672969 -0.48191318 -1.9050581 -0.43654805 2.3170543 1.1036432 -0.10333104 -0.600115 1.1932797 -0.2998131 -1.5992122 1.2433572 -2.4837353 -2.0780206 0.5371243 -0.26808724 0.8863078 0.1829582 -1.0246236 -1.1198604 -0.9830677 0.32930338 -1.9471537 -0.50897574 1.2545011 -0.21091224 0.8305599 1.4226604 0.5430306 2.6515467 0.5407714 0.9081938 0.97991794 2.4480078 -0.37713084 -0.6813183 2.2026594 -0.98280716 0.64117205 0.84968406 2.2401826 0.03101416 0.7823304 -0.7782107 -2.7762618 0.13561708 2.07437 -2.5257614 0.05108101 -1.4901428 -1.2402513 0.8309426 -0.7211121 1.9019513 3.960054 -1.8887383 0.39035076 -0.8641255 -0.61770415 1.5876379 -2.6439407 0.40260366 -0.29260564 2.5874152 -0.9009424 -1.1908774 -0.9699979 0.5119676 0.45808062 -1.3735437 -0.36920473 1.3141601 -2.0569677 -0.38173217 0.6845369 0.42968374 0.6309501 -0.107759 -1.9057575 -1.5358291 -1.5482831 -1.9485159 -1.0988526 0.13084021 -2.2655997 1.5260206 -2.5642633 -0.5744566 -0.08428011 -1.4260695 0.19865772 0.4512534 -0.9715485 \] \] 為 \[ \[ 0.1896151 -1.8181373 3.3572814 -1.0685782 0.49208143 2.378115 0.20101228 1.4529976 0.9294902 -2.4987757 -1.0333834 1.4453149 0.16965653 1.0906299 0.0800195 0.340483 0.4134813 -1.2748445 -0.36895636 -2.4484384 2.7829435 2.1415136 1.1581845 1.3086056 2.269279 -3.1014345 -1.1449586 -1.3106853 0.72433496 1.2913316 0.58723754 0.7778022 -0.15846896 -0.8188704 3.476332 -0.39220434 0.4255195 1.1091534 0.5255888 1.3774976 1.6827629 -1.4297134 3.231139 -1.0098424 2.2370512 1.4140384 -1.3761084 -0.91320014 3.5050528 0.13942066 -2.3848357 -0.13616315 -0.7808771 -2.407467 0.86087775 -1.0559365 -0.48088276 -0.06485796 -1.6015757 -0.09015828 1.7890335 1.3084754 -0.24928983 0.4265427 0.5235788 -2.5840218 3.446129 -0.38306347 -1.2760977 -1.5464348 0.5212346 -0.13336264 2.2953007 2.7280948 -0.96069217 2.5137131 0.42497525 -0.70072883 -2.267927 1.168881 -0.9020308 -0.09913182 1.2413316 -0.3555878 2.8514225 -0.75041753 1.0707642 -0.24892485 0.1871088 2.4870126 0.85072696 -0.09363261 -1.7077832 -1.4241735 2.1220694 0.6601661 2.066149 0.6744381 -1.6973271 1.0169114 \] \] 中 \[ \[ 1.4874371 1.2361885 -0.65242714 -0.34126204 0.57889384 -0.23008534 -0.30447042 -0.57755953 -1.7507975 -0.81123376 -1.793821 1.1634924 -1.3117273 -2.5925312 -0.09789076 -0.9175423 -1.2740778 -0.10426655 -1.0767089 -2.4226363 -2.5593858 -0.3632854 -2.9014764 -1.2972332 -0.47120202 -0.43429497 1.5338455 -0.04679739 -0.6710644 -0.3594158 0.19282657 3.6521492 -0.92375535 1.4474547 3.8261755 -1.9815593 0.447895 0.19661807 -0.46812743 1.6137925 -5.175673 3.2903085 -1.316274 -3.0216594 1.1939027 -0.09521949 -0.5766235 -2.2970293 1.2614003 2.533164 -3.493826 1.6417836 1.1442548 1.4973636 -2.3305657 1.1097547 -1.1043872 1.1209316 -1.3346034 -2.21357 3.2227323 1.248244 -1.1044198 -0.26365292 -0.5852314 -1.1633244 2.2887418 1.0944943 0.03403454 -3.1973522 -0.07695059 -1.8952466 -2.0021343 -0.9144443 2.0054252 -1.1020823 -1.0487053 3.0473692 1.7026799 -0.8561669 -0.61132085 0.48333165 0.35812032 0.08411325 -0.31317002 -1.3281633 -1.668745 0.05250288 1.6916655 -2.6553738 -0.35415223 -0.31771886 0.18930124 0.3642429 0.06295905 -1.6075228 1.0787768 -1.5568794 -1.8582122 2.0176907 \] \] ('高铁', 0.8286420106887817) ('客运专线', 0.8204636573791504) ('高速铁路', 0.8059936761856079) ('城际', 0.803480327129364) ('支线', 0.8015587329864502) ('枢纽', 0.7872340679168701) ('环线', 0.7759756445884705) ('干线', 0.7744237184524536) ('通车', 0.774136483669281) ('胶济', 0.7733167409896851) ('草药', 0.8472081422805786) ('气功', 0.8342577219009399) ('生药', 0.8325771689414978) ('矿物', 0.8319119215011597) ('药用', 0.829121470451355) ('调味', 0.8272895812988281) ('中药材', 0.8242448568344116) ('有机合成', 0.8207097053527832) ('名贵', 0.8184367418289185) ('工艺品', 0.8182870149612427) ('布什', 0.8476501703262329) ('李光耀', 0.79795241355896) ('尼克森', 0.7966246604919434) ('里根', 0.7949744462966919) ('歐巴馬', 0.784071683883667) ('奥巴马', 0.7817918658256531) ('希特勒', 0.779426097869873) ('希拉里', 0.7774235010147095) ('巴拉克', 0.7772145867347717) ('雷根', 0.7767146229743958) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel\_launcher.py:25: DeprecationWarning: Call to deprecated \`most\_similar\` (Method will be removed in 4.0.0, use self.wv.most\_similar() instead).
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。