Skip to content Skip to sidebar Skip to footer

Use Of Num_words In The Tokenizer Class In Keras

Wanted to understand the difference between, from tensorflow.keras.preprocessing.text import Tokenizer sentences = [ 'i love my dog', 'I, love my cat', 'You love my do

Solution 1:

word_index it's simply a mapping of words to ids for the entire text corpus passed whatever the num_words is

the difference is evident in the usage. for example, if we call texts_to_sequences

sentences = [
    'i love my dog',
    'I, love my cat',
    'You love my dog!'
]

tokenizer = Tokenizer(num_words = 1+1)
tokenizer.fit_on_texts(sentences)
tokenizer.texts_to_sequences(sentences) # [[1], [1], [1]]

only the love id is returned because the most frequent word

instead

sentences = [
    'i love my dog',
    'I, love my cat',
    'You love my dog!'
]

tokenizer = Tokenizer(num_words = 100+1)
tokenizer.fit_on_texts(sentences)
tokenizer.texts_to_sequences(sentences) # [[3, 1, 2, 4], [3, 1, 2, 5], [6, 1, 2, 4]]

the ids of the most 100 frequent words is returned

Post a Comment for "Use Of Num_words In The Tokenizer Class In Keras"