Getting The Closest Noun From A Stemmed Word
Short version: If I have a stemmed word: Say 'comput' for 'computing', or 'sugari' for 'sugary' Is there a way to construct it's closest noun form? That is 'computer', or 'sugar' r
Solution 1:
You might want to look at this example:
>>>from nltk.stem.wordnet import WordNetLemmatizer>>>WordNetLemmatizer().lemmatize('having','v')
'have'
(from this SO answer) to see if it sends you in the right direction.
Solution 2:
First extract all the possible candidates from wordnet
synsets.
Then use difflib
to compare the strings against your target stem.
>>>from nltk.corpus import wordnet as wn>>>from itertools import chain>>>from difflib import get_close_matches as gcm>>>target = "comput">>>candidates = set(chain(*[ss.lemma_names for ss in wn.all_synsets('n') iflen([i for i in ss.lemma_names if target in i]) > 0]))>>>gcm(target,candidates)[0]
A more human readable way to compute the candidates is as such:
candidates = set()
for ss in wn.all_synsets('n'):
forlnin ss.lemma_names: # get all possible lemmas for this synset.for lemma inln:
if target in lemma:
candidates.add(target)
Post a Comment for "Getting The Closest Noun From A Stemmed Word"