当前位置:   article > 正文

NLP-Lecture 3 Text Classification and Ranking_hu and liu opinion lexicon

hu and liu opinion lexicon

Learning Objective

  • Text Classification
    – Linguistic Features
    – Classification Models
    – Performance Evaluation
  • Text Ranking
    – Cosine Similarity
    – Performance Evaluation

Text Classification

Text Classification Task

  1. Binary Classification
    Sentiment Classification: Determines whether the sentiment orientation that a writer expresses towards some object is positive or negative .
    Email Spam Detection: Detects whether an email is spam or not-spam.
  2. Multi-Class Classification
    News Categorization: Identifies the topic that a news talks about, such as business, technology, entertainment, sports, science and health, etc.

Sentiment Classification(情感分类)


Sentiment Analysis and Opinion Mining (P23):
Document sentiment classification is perhaps the most extensively studied topic. It aims to classify an opinion document as expressing a positive or negative opinion or sentiment. A large majority of research papers on this topic classifies online reviews.
Problem definition: Given an opinion document d evaluating an entity, determine the overall sentiment s of the opinion holder about the entity, i.e., determine s expressed on aspect GENERAL in the quintuple (_, GENERA L, s, _, _ ), where the entity e, opinion holder h, and time of opinion t are assumed known or irrelevant (do not care).
Assumption: Sentiment classification or regression assumes that the opinion document d (e.g., a product review) expresses opinions on a single entity e and contains opinions from a single opinion holder h. This assumption holds for reviews of products and services because each review usually focuses on evaluating a single product or service and is written by a single reviewer.

If sentiment takes categorical values, e.g., positive and negative, then it is a typical classification problem.
If sentiment takes numeric values or ordinal scores within a given range, e.g., 1~5, the problem becomes regression.

Most existing techniques for document-level classification use supervised learning, although there are also unsupervised methods. Sentiment regression has been done mainly using supervised learning.

Sentiment classification is essentially a text classification problem.
Traditional text classification mainly classifies documents of different topics, e.g., politics, sciences, and sports. In such classifications, topic-related words are the key features.
However, in sentiment classification, sentiment or opinion words that indicate positive or negative opinions

