赞
踩
PT之BERT:基于torch框架(特征编码+BERT作为文本编码器+分类器)针对UCI新闻数据集利用Transformer-BERT算法(模型实时保存)实现新闻文本多分类案例
目录
基于torch框架(特征编码+BERT作为文本编码器+分类器)针对UCI新闻数据集利用Transformer-BERT算法(模型实时保存)实现新闻文本多分类
2.1、筛选特征:数据集包含标题(title)和类别(category)两列
# 2.4、数据集规范化:模型可接受的torch向量形式,以便用于训练或推理
# 3.2、数据集Torch化并进入数据加载器:需要设置批量大小和最大序列长度
相关文章
PT之BERT:基于torch框架(特征编码+BERT作为文本编码器+分类器)针对UCI新闻数据集利用Transformer-BERT算法(模型实时保存)实现新闻文本多分类案例
PT之BERT:基于torch框架(特征编码+BERT作为文本编码器+分类器)针对UCI新闻数据集利用Transformer-BERT算法(模型实时保存)实现新闻文本多分类案例实现代码
ID | TITLE | URL | PUBLISHER | CATEGORY | STORY | HOSTNAME | TIMESTAMP |
1 | Fed official says weak data caused by weather, should not slow taper | http://www.latimes.com/business/money/la-fi-mo-federal-reserve-plosser-stimulus-economy-20140310,0,1312750.story\?track=rss | Los Angeles Times | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.latimes.com | 1.39447E+12 |
2 | Fed's Charles Plosser sees high bar for change in pace of tapering | http://www.livemint.com/Politics/H2EvwJSK2VE6OF7iK1g3PP/Feds-Charles-Plosser-sees-high-bar-for-change-in-pace-of-ta.html | Livemint | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.livemint.com | 1.39447E+12 |
3 | US open: Stocks fall after Fed official hints at accelerated tapering | http://www.ifamagazine.com/news/us-open-stocks-fall-after-fed-official-hints-at-accelerated-tapering-294436 | IFA Magazine | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.ifamagazine.com | 1.39447E+12 |
4 | Fed risks falling 'behind the curve', Charles Plosser says | http://www.ifamagazine.com/news/fed-risks-falling-behind-the-curve-charles-plosser-says-294430 | IFA Magazine | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.ifamagazine.com | 1.39447E+12 |
5 | Fed's Plosser: Nasty Weather Has Curbed Job Growth | http://www.moneynews.com/Economy/federal-reserve-charles-plosser-weather-job-growth/2014/03/10/id/557011 | Moneynews | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.moneynews.com | 1.39447E+12 |
6 | Plosser: Fed May Have to Accelerate Tapering Pace | http://www.nasdaq.com/article/plosser-fed-may-have-to-accelerate-tapering-pace-20140310-00371 | NASDAQ | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.nasdaq.com | 1.39447E+12 |
7 | Fed's Plosser: Taper pace may be too slow | http://www.marketwatch.com/story/feds-plosser-taper-pace-may-be-too-slow-2014-03-10\?reflink=MW_news_stmp | MarketWatch | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.marketwatch.com | 1.39447E+12 |
8 | Fed's Plosser expects US unemployment to fall to 6.2% by the end of 2014 | http://www.fxstreet.com/news/forex-news/article.aspx\?storyid=23285020-b1b5-47ed-a8c4-96124bb91a39 | FXstreet.com | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.fxstreet.com | 1.39447E+12 |
9 | US jobs growth last month hit by weather:Fed President Charles Plosser | http://economictimes.indiatimes.com/news/international/business/us-jobs-growth-last-month-hit-by-weatherfed-president-charles-plosser/articleshow/31788000.cms | Economic Times | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | economictimes.indiatimes.com | 1.39447E+12 |
10 | ECB unlikely to end sterilisation of SMP purchases - traders | http://www.iii.co.uk/news-opinion/reuters/news/152615 | Interactive Investor | b | dPhGU51DcrolUIMxbRm0InaHGA2XM | www.iii.co.uk | 1.39447E+12 |
- <class 'pandas.core.frame.DataFrame'>
- RangeIndex: 422419 entries, 0 to 422418
- Data columns (total 8 columns):
- # Column Non-Null Count Dtype
- --- ------ -------------- -----
- 0 ID 422419 non-null int64
- 1 TITLE 422419 non-null object
- 2 URL 422419 non-null object
- 3 PUBLISHER 422417 non-null object
- 4 CATEGORY 422419 non-null object
- 5 STORY 422419 non-null object
- 6 HOSTNAME 422419 non-null object
- 7 TIMESTAMP 422419 non-null int64
- dtypes: int64(2), object(6)
- memory usage: 25.8+ MB
- TITLE CATEGORY
- 0 Fed official says weak data caused by weather,... b
- 1 Fed's Charles Plosser sees high bar for change... b
- 2 US open: Stocks fall after Fed official hints ... b
- 3 Fed risks falling 'behind the curve', Charles ... b
- 4 Fed's Plosser: Nasty Weather Has Curbed Job Gr... b
- ... ... ...
- 422414 Surgeons to remove 4-year-old's rib to rebuild... m
- 422415 Boy to have surgery on esophagus after battery... m
- 422416 Child who swallowed battery to have reconstruc... m
- 422417 Phoenix boy undergoes surgery to repair throat... m
- 422418 Phoenix boy undergoes surgery to repair throat... m
-
- [422419 rows x 2 columns]
- TITLE CATEGORY
- 0 Fed official says weak data caused by weather,... b
- 1 Fed's Charles Plosser sees high bar for change... b
- 2 US open: Stocks fall after Fed official hints ... b
- 3 Fed risks falling 'behind the curve', Charles ... b
- 4 Fed's Plosser: Nasty Weather Has Curbed Job Gr... b
- ... ... ...
- 422414 Surgeons to remove 4-year-old's rib to rebuild... m
- 422415 Boy to have surgery on esophagus after battery... m
- 422416 Child who swallowed battery to have reconstruc... m
- 422417 Phoenix boy undergoes surgery to repair throat... m
- 422418 Phoenix boy undergoes surgery to repair throat... m
-
- [422419 rows x 2 columns]
- TITLE CATEGORY
- 0 Fed official says weak data caused by weather,... 0
- 1 Fed's Charles Plosser sees high bar for change... 0
- 2 US open: Stocks fall after Fed official hints ... 0
- 3 Fed risks falling 'behind the curve', Charles ... 0
- 4 Fed's Plosser: Nasty Weather Has Curbed Job Gr... 0
- ... ... ...
- 422414 Surgeons to remove 4-year-old's rib to rebuild... 2
- 422415 Boy to have surgery on esophagus after battery... 2
- 422416 Child who swallowed battery to have reconstruc... 2
- 422417 Phoenix boy undergoes surgery to repair throat... 2
- 422418 Phoenix boy undergoes surgery to repair throat... 2
-
- [422419 rows x 2 columns]
- input_ids tensor([[ 101, 7349, 2880, ..., 0, 0, 0],
- [ 101, 7349, 1005, ..., 0, 0, 0],
- [ 101, 2149, 2330, ..., 0, 0, 0],
- ...,
- [ 101, 2878, 1011, ..., 0, 0, 0],
- [ 101, 2878, 1011, ..., 0, 0, 0],
- [ 101, 20077, 1996, ..., 0, 0, 0]])
- attention_masks tensor([[1, 1, 1, ..., 0, 0, 0],
- [1, 1, 1, ..., 0, 0, 0],
- [1, 1, 1, ..., 0, 0, 0],
- ...,
- [1, 1, 1, ..., 0, 0, 0],
- [1, 1, 1, ..., 0, 0, 0],
- [1, 1, 1, ..., 0, 0, 0]])
- labels tensor([0, 0, 0, ..., 2, 2, 2])
- train_dataset
- <__main__.NewsDataset object at 0x0000021A1EE9ECA0>
- (tensor([ 101, 20228, 15094, 2121, 1024, 7349, 2089, 2031, 2000, 23306,
- 6823, 4892, 6393, 102, 0, 0, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]), tensor(0))
- (tensor([ 101, 2149, 2330, 1024, 15768, 2991, 2044, 7349, 2880, 20385,
- 2012, 14613, 6823, 4892, 102, 0, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), tensor(0))
- (tensor([ 101, 7349, 1005, 1055, 20228, 15094, 2121, 1024, 11808, 4633,
- 2038, 13730, 2098, 3105, 3930, 102, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0]), tensor(0))
- (tensor([ 101, 7349, 10831, 4634, 1005, 2369, 1996, 7774, 1005, 1010,
- 2798, 20228, 15094, 2121, 2758, 102, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]), tensor(0))
- test_dataloader
- <torch.utils.data.dataloader.DataLoader object at 0x0000021A1EF422E0>
- [tensor([[ 101, 7349, 2880, 2758, 5410, 2951, 3303, 2011, 4633, 1010,
- 2323, 2025, 4030, 6823, 2099, 102, 0, 0, 0],
- [ 101, 7349, 1005, 1055, 2798, 20228, 15094, 2121, 5927, 2152,
- 3347, 2005, 2689, 1999, 6393, 1997, 6823, 4892, 102]]), tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
- [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]]), tensor([0, 0])]
- Epoch: 01
- Train Loss: 0.7342, Train Acc: 0.7198
- Eval Loss: 0.2669, Eval Acc: 46.0000
- Epoch: 02
- Train Loss: 0.1879, Train Acc: 0.9464
- Eval Loss: 0.1194, Eval Acc: 48.2812
- Epoch: 03
- Train Loss: 0.0991, Train Acc: 0.9731
- Eval Loss: 0.1043, Eval Acc: 48.2500
- Epoch: 04
- Train Loss: 0.0630, Train Acc: 0.9811
- Eval Loss: 0.1025, Eval Acc: 48.5312
- Epoch: 05
- Train Loss: 0.0439, Train Acc: 0.9866
- Eval Loss: 0.1078, Eval Acc: 48.5938
- This is a breaking news about politics
- Predicted class: 0
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。