当前位置:   article > 正文

斯坦福NLP CS224N Winter 2024 作业:Assignment 1

斯坦福NLP CS224N Winter 2024 作业:Assignment 1

原始题目
https://web.stanford.edu/class/cs224n/assignments/a1_preview/exploring_word_vectors.html

Question 1.1: Implement distinct_words [code] (2 points)

输入:
· corpus: 包括多个句子的语料

输出

  • corpus_words: 先去重,再排序后的单词列表
  • num_corpus_words:去重重复后的单词数量
def distinct_words(corpus):
    """ Determine a list of distinct words for the corpus.
        Params:
            corpus (list of list of strings): corpus of documents
        Return:
            corpus_words (list of strings): sorted list of distinct words across the corpus
            n_corpus_words (integer): number of distinct words across the corpus
    """
    corpus_words = []
    n_corpus_words = -1
    
    # ------------------
    # Write your implementation here.
    distinct_words_set  = set()
    for sentence in corpus:
        distinct_words_set.update(sentence )

    corpus_words = sorted(list(distinct_words_set))
    num_corpus_words = len(corpus_words)
   
    # ------------------

    return corpus_words, num_corpus_words
    ``
    # Question 1.2: Implement compute_co_occurrence_matrix [code] (3 points)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Cpp五条/article/detail/629724
推荐阅读
相关标签
  

闽ICP备14008679号