Different Models With Gensim Word2vec On Python
I am trying to apply the word2vec model implemented in the library gensim in python. I have a list of sentences (each sentences is a list of words). For instance let us have: sente
Solution 1:
Looking at the gensim
documentation, there is some randomization when you run Word2Vec
:
seed
= for the random number generator. Initial vectors for each word are seeded with a hash of the concatenation of word + str(seed). Note that for a fully deterministically-reproducible run, you must also limit the model to a single worker thread, to eliminate ordering jitter from OS thread scheduling.
Thus if you want to have reproducible results, you will need to set the seed:
In [1]: import gensim
In [2]: sentences=[['first','second','third','fourth']]*1000
In [3]: model1 = gensim.models.Word2Vec(sentences, min_count = 1, size = 2)
In [4]: model2 = gensim.models.Word2Vec(sentences, min_count = 1, size = 2)
In [5]: print(all(model1['first']==model2['first']))
False
In [6]: model3 = gensim.models.Word2Vec(sentences, min_count = 1, size = 2, seed = 1234)
In [7]: model4 = gensim.models.Word2Vec(sentences, min_count = 1, size = 2, seed = 1234)
In [11]: print(all(model3['first']==model4['first']))
True
Post a Comment for "Different Models With Gensim Word2vec On Python"