赞
踩
笔记为自我总结整理的学习笔记,若有错误欢迎指出哟~
确保doccano已经安装完成
可以参考文章:
【doccano】文本标注工具——安装运行教程
选择序列标注
在标注文本时允许标注的区间出现重叠
勾选allow overlapping spans
在文本中标注实体之间的关系
勾选use relation labeling
数据集格式为txt文本
每行一条评论
选择textline,导入
导入完成
或者导入自定义标签
[
{
"text": "体验:1",
"background_color": "#FF0000",
"text_color": "#ffffff"
},
{
"text": "体验:-1",
"background_color": "#FF0000",
"text_color": "#ffffff"
},
{
"text": "设计:1",
"background_color": "#00FF00",
"text_color": "#000000"
},
{
"text": "设计:-1",
"background_color": "#00FF00",
"text_color": "#000000"
},
{
"text": "电池:1",
"background_color": "#0000FF",
"text_color": "#ffffff"
},
{
"text": "电池:-1",
"background_color": "#0000FF",
"text_color": "#ffffff"
},
{
"text": "性能:1",
"background_color": "#FFFF00",
"text_color": "#000000"
},
{
"text": "性能:-1",
"background_color": "#FFFF00",
"text_color": "#000000"
},
{
"text": "摄像:1",
"background_color": "#FF00FF",
"text_color": "#ffffff"
},
{
"text": "摄像:-1",
"background_color": "#FF00FF",
"text_color": "#ffffff"
},
{
"text": "通信:1",
"background_color": "#00FFFF",
"text_color": "#000000"
},
{
"text": "通信:-1",
"background_color": "#00FFFF",
"text_color": "#000000"
},
]
导出标注数据为jsonl格式,改后缀名为json格式
转为txt格式
import json
# 读取JSON文件并处理每条数据
with open('admin.json', 'r', encoding='utf-8') as file:
lines = file.readlines()
for line in lines:
data = json.loads(line)
# 处理每条数据并写入txt文件
id = data['id']
text = data['text']
label = data['label']
with open('output.txt', 'a', encoding='utf-8') as output_file:
for lbl in label:
start = lbl[0]
end = lbl[1]
category = lbl[2].split(":")[0] # 获取类别名称
tag = lbl[2].split(":")[1] # 获取类别标签
output_file.write(f"{tag}\t{category}#{text[start:end]}\t{text}\n")
输出格式:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。