Python - 词干算法

  • 简述

    在自然语言处理领域,我们遇到两个或多个单词有共同词根的情况。例如,同意、同意和同意这三个词具有相同的词根同意。涉及这些词中的任何一个的搜索都应将它们视为同一个词,即根词。因此,将所有单词链接到它们的词根就变得至关重要。NLTK 库具有执行此链接并提供显示根词的输出的方法。
    nltk 中提供了三种最常用的词干提取算法。它们给出的结果略有不同。下面的示例显示了所有三种词干算法的使用及其结果。
    
    import nltk
    from nltk.stem.porter import PorterStemmer
    from nltk.stem.lancaster import LancasterStemmer
    from nltk.stem import SnowballStemmer 
    porter_stemmer = PorterStemmer()
    lanca_stemmer = LancasterStemmer()
    sb_stemmer = SnowballStemmer("english",)
    word_data = "Aging head of famous crime family decides to transfer his position to one of his subalterns" 
    # First Word tokenization
    nltk_tokens = nltk.word_tokenize(word_data)
    #Next find the roots of the word
    print '***PorterStemmer****\n'
    for w_port in nltk_tokens:
       print "Actual: %s  || Stem: %s"  % (w_port,porter_stemmer.stem(w_port))
    print '\n***LancasterStemmer****\n'    
    for w_lanca in nltk_tokens:
          print "Actual: %s  || Stem: %s"  % (w_lanca,lanca_stemmer.stem(w_lanca))
    print '\n***SnowballStemmer****\n' 
    for w_snow in nltk_tokens:
          print "Actual: %s  || Stem: %s"  % (w_snow,sb_stemmer.stem(w_snow))   
    
    当我们运行上述程序时,我们得到以下输出 -
    
    ***PorterStemmer****
    Actual: Aging  || Stem: age
    Actual: head  || Stem: head
    Actual: of  || Stem: of
    Actual: famous  || Stem: famou
    Actual: crime  || Stem: crime
    Actual: family  || Stem: famili
    Actual: decides  || Stem: decid
    Actual: to  || Stem: to
    Actual: transfer  || Stem: transfer
    Actual: his  || Stem: hi
    Actual: position  || Stem: posit
    Actual: to  || Stem: to
    Actual: one  || Stem: one
    Actual: of  || Stem: of
    Actual: his  || Stem: hi
    Actual: subalterns  || Stem: subaltern
    ***LancasterStemmer****
    Actual: Aging  || Stem: ag
    Actual: head  || Stem: head
    Actual: of  || Stem: of
    Actual: famous  || Stem: fam
    Actual: crime  || Stem: crim
    Actual: family  || Stem: famy
    Actual: decides  || Stem: decid
    Actual: to  || Stem: to
    Actual: transfer  || Stem: transf
    Actual: his  || Stem: his
    Actual: position  || Stem: posit
    Actual: to  || Stem: to
    Actual: one  || Stem: on
    Actual: of  || Stem: of
    Actual: his  || Stem: his
    Actual: subalterns  || Stem: subaltern
    ***SnowballStemmer****
    Actual: Aging  || Stem: age
    Actual: head  || Stem: head
    Actual: of  || Stem: of
    Actual: famous  || Stem: famous
    Actual: crime  || Stem: crime
    Actual: family  || Stem: famili
    Actual: decides  || Stem: decid
    Actual: to  || Stem: to
    Actual: transfer  || Stem: transfer
    Actual: his  || Stem: his
    Actual: position  || Stem: posit
    Actual: to  || Stem: to
    Actual: one  || Stem: one
    Actual: of  || Stem: of
    Actual: his  || Stem: his
    Actual: subalterns  || Stem: subaltern