当前位置:   article > 正文

Semeval 14 restaurant 实战_semeval 2014,csv

semeval 2014,csv

1.准备数据

1.1 准备训练数据

用的是原始的xml数据,选择v2版本,根据这篇文章,进行格式转化,转换成csv文件

import xml.etree.cElementTree as ET
path = 'Restaurants_Train_v2.xml'
tree = ET.parse(path)
root = tree.getroot()

# category级别
data = []
for sentence in root.findall('sentence'):
    text = sentence.find('text').text
    aspectCategories = sentence.find('aspectCategories')
    for aspectCategory in aspectCategories.findall('aspectCategory'):
        category = aspectCategory.get('category')
        polarity = aspectCategory.get('polarity')
        data.append((text, category, polarity))
import pandas as pd
df=pd.DataFrame(data,columns=['text','category','polarity'])
df.to_csv('restaurant_train_category.csv',index=False)
df.head()

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
# aspect级别
data=[]
for sentence in root.findall('.//aspectTerms/..'):
    text = sentence.find('text').text
    aspectTerms=sentence.find('aspectTerms')
    for aspectTerm in aspectTerms.findall('aspectTerm'):
        term = aspectTerm.get('term')
        polarity = aspectTerm.get('polarity')
        data.append((text, term, polarity))

df = pd.DataFrame(data,columns=['text', 'term', 'polarity'])
df = df[df['polarity'].isin(['positive', 'negative', 'neutral'])]
df['polarity'] = df['polarity'].map(
    {
   'positive': 1, 'neutral': 0, 'negative': -1})
    
df.to_csv('restaurant_train_aspectterm.csv',index=0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop】
推荐阅读
相关标签
  

闽ICP备14008679号