Lemmatization Pandas (python)

February 24, 2024 Post a Comment

I am a beginner at Pandas and I am trying to figure out how to lemmatize a single column of my dataframe. Take the following example (this is some text after (un)common word remova

Solution 1:

You probably don't need anymore solution, but if you want to lemmatize on many pos, you can use:

If you want more, you can try the following code:

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet
lemmatizer = nltk.stem.WordNetLemmatizer()
wordnet_lemmatizer = WordNetLemmatizer()
stop = stopwords.words('english')


defnltk_tag_to_wordnet_tag(nltk_tag):
    if nltk_tag.startswith('J'):
        return wordnet.ADJ
    elif nltk_tag.startswith('V'):
        return wordnet.VERB
    elif nltk_tag.startswith('N'):
        return wordnet.NOUN
    elif nltk_tag.startswith('R'):
        return wordnet.ADV
    else:
        returnNonedeflemmatize_sentence(sentence):
    #tokenize the sentence and find the POS tag for each token
    nltk_tagged = nltk.pos_tag(nltk.word_tokenize(sentence))
    #tuple of (token, wordnet_tag)
    wordnet_tagged = map(lambda x: (x[0], nltk_tag_to_wordnet_tag(x[1])), nltk_tagged)
    lemmatized_sentence = []
    for word, tag in wordnet_tagged:
        if tag isNone:
            #if there is no available tag, append the token as is
            lemmatized_sentence.append(word)
        else:
            #else use the tag to lemmatize the token
            lemmatized_sentence.append(lemmatizer.lemmatize(word, tag))
    return" ".join(lemmatized_sentence)



# Lemmatizing
df['Lemmatize'] = df['word'].apply(lambda x: lemmatize_sentence(x))
print(df.head())

df result:

Baca Juga

         word                       |        Lemmatize

0  Best scores, good cats, it rocks | Best score , good cat , it rock

1          You received best scores |          You receive best score

2                         Good news |                       Good news

3                          Bad news |                        Bad news

4                    I am loving it |                    I be love it

5                    it rocks a lot |                   it rock a lot

6     it is still good todo better |     it be still good todo good

Getting Started with Python

Lemmatization Pandas (python)

Solution 1:

Post a Comment for "Lemmatization Pandas (python)"