Skip to content Skip to sidebar Skip to footer

Getting The Closest Noun From A Stemmed Word

Short version: If I have a stemmed word: Say 'comput' for 'computing', or 'sugari' for 'sugary' Is there a way to construct it's closest noun form? That is 'computer', or 'sugar' r

Solution 1:

You might want to look at this example:

>>>from nltk.stem.wordnet import WordNetLemmatizer>>>WordNetLemmatizer().lemmatize('having','v')
'have'

(from this SO answer) to see if it sends you in the right direction.

Solution 2:

First extract all the possible candidates from wordnet synsets. Then use difflib to compare the strings against your target stem.

>>>from nltk.corpus import wordnet as wn>>>from itertools import chain>>>from difflib import get_close_matches as gcm>>>target = "comput">>>candidates = set(chain(*[ss.lemma_names for ss in wn.all_synsets('n') iflen([i for i in ss.lemma_names if target in i]) > 0]))>>>gcm(target,candidates)[0]

A more human readable way to compute the candidates is as such:

candidates = set()
for ss in wn.all_synsets('n'):
  forlnin ss.lemma_names: # get all possible lemmas for this synset.for lemma inln:
      if target in lemma:
        candidates.add(target)

Post a Comment for "Getting The Closest Noun From A Stemmed Word"