Use Of Num_words In The Tokenizer Class In Keras
Wanted to understand the difference between, from tensorflow.keras.preprocessing.text import Tokenizer sentences = [ 'i love my dog', 'I, love my cat', 'You love my do
Solution 1:
word_index it's simply a mapping of words to ids for the entire text corpus passed whatever the num_words is
the difference is evident in the usage. for example, if we call texts_to_sequences
sentences = [
'i love my dog',
'I, love my cat',
'You love my dog!'
]
tokenizer = Tokenizer(num_words = 1+1)
tokenizer.fit_on_texts(sentences)
tokenizer.texts_to_sequences(sentences) # [[1], [1], [1]]
only the love id is returned because the most frequent word
instead
sentences = [
'i love my dog',
'I, love my cat',
'You love my dog!'
]
tokenizer = Tokenizer(num_words = 100+1)
tokenizer.fit_on_texts(sentences)
tokenizer.texts_to_sequences(sentences) # [[3, 1, 2, 4], [3, 1, 2, 5], [6, 1, 2, 4]]
the ids of the most 100 frequent words is returned
Post a Comment for "Use Of Num_words In The Tokenizer Class In Keras"