Lemmatization Pandas (python)
I am a beginner at Pandas and I am trying to figure out how to lemmatize a single column of my dataframe. Take the following example (this is some text after (un)common word remova
Solution 1:
You probably don't need anymore solution, but if you want to lemmatize on many pos, you can use:
If you want more, you can try the following code:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet
lemmatizer = nltk.stem.WordNetLemmatizer()
wordnet_lemmatizer = WordNetLemmatizer()
stop = stopwords.words('english')
defnltk_tag_to_wordnet_tag(nltk_tag):
if nltk_tag.startswith('J'):
return wordnet.ADJ
elif nltk_tag.startswith('V'):
return wordnet.VERB
elif nltk_tag.startswith('N'):
return wordnet.NOUN
elif nltk_tag.startswith('R'):
return wordnet.ADV
else:
returnNonedeflemmatize_sentence(sentence):
#tokenize the sentence and find the POS tag for each token
nltk_tagged = nltk.pos_tag(nltk.word_tokenize(sentence))
#tuple of (token, wordnet_tag)
wordnet_tagged = map(lambda x: (x[0], nltk_tag_to_wordnet_tag(x[1])), nltk_tagged)
lemmatized_sentence = []
for word, tag in wordnet_tagged:
if tag isNone:
#if there is no available tag, append the token as is
lemmatized_sentence.append(word)
else:
#else use the tag to lemmatize the token
lemmatized_sentence.append(lemmatizer.lemmatize(word, tag))
return" ".join(lemmatized_sentence)
# Lemmatizing
df['Lemmatize'] = df['word'].apply(lambda x: lemmatize_sentence(x))
print(df.head())
df result:
word | Lemmatize
0 Best scores, good cats, it rocks | Best score , good cat , it rock
1 You received best scores | You receive best score
2 Good news | Good news
3 Bad news | Bad news
4 I am loving it | I be love it
5 it rocks a lot | it rock a lot
6 it is still good todo better | it be still good todo good
Post a Comment for "Lemmatization Pandas (python)"