赞
踩
Text summarization is compress the source text into a diminished version conserving its information content and overall meaning
单文档摘要和多文档摘要single , mul-summarization
Extractive and abstactive summarization
The most of the current automated text summarization systems use extradiction methods. Extractive summarization process can be divided into three phases.
First phase is Pre-Processing, second phase isProcessing.
(1)Part of Speech(POS) Tagging 词性标注
(2)Stop Word Filtering 停用词过滤
a, an, in, by can be considered as a stop words and filtered from plain text
(3)Stemming 抓出词干
removing from –ed or –ing from verbs, using singular instead of plural noun, etc.
(4)Feature Calculation
The total term weight is calculated by computing tf and idf for document.
Here idf refers to inverse document frequency which simply tells about whether the term is common or rare across all documents.
The score of important score wi of word i can be calculated by the traditional tf.idf methods.
This feature is suitable when eliminating the sentences which are too short such as datelines or author names
适合日期,作者名字比较短的句子
This feature is related with domain specific words which occur frequently in a document are probably related topic
经常出现的特殊词往往与话题有关
在基于查询的文本摘要中,给定文档的句子的评分是基于单词或短语的频率计数。 包含查询短语的句子的分数较高,而单个查询词的分数较高。
the HMM does not assume that the probability that sentence i is in the summary is independent of whether sentence i-1 is in the summary
The main idea is using a sequential model to account for local dependencies between sentences. In HMM Model, three features were used:
position of the sentence in the document,
number of terms in the sentence,
likeliness of the sentence terms given the document terms.
obtained the maximum-likelihood estimate for each transition probability,forming the transition matrix estimate
f1 = Paragraph follows title (Paragraph Position)
f2 = Paragraph location in document
f3 = Sentence location paragraph
f4 = First sentence in paragraph
f5 = Sentence Length
f6 = Number of thematic words in sentence
f7 = Number of title words in sentence
Text Summarization process consists of three phases: training, feature fusion and sentence selection
模糊逻辑方法使用模糊规则和三角形隶属函数。模糊规则是IF-THEN的形式。三角形隶属函数将每个得分模糊为3个值中的一个,即LOW,MEDIUM和HIGH
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。