Word2vec - Get Rank Of Similarity
Given I got a word2vec model (by gensim), I want to get the rank similarity between to words. For example, let's say I have the word 'desk' and the most similar words to 'desk' are
Solution 1:
You can use the rank(entity1, entity2)
to get the distance - same as the index.
model.wv.rank(sample_word, most_similar_word)
A separate function as given below won't be necessary here. Keeping it for information sake.
Assuming you have the list of words and their vectors in a list of tuples, returned by model.wv.most_similar(sample_word)
as shown
[('table', 0.64), ('chair', 0.61), ('book', 0.59), ('pencil', 0.52)]
The following function accepts the sample word and the most similar word as params, and returns the index or rank (eg. [2]) if it's present in the output
defrank_of_most_similar_word(sample_word, most_similar_word):
l = model.wv.most_similar(sample_word)
return [x+1for x, y inenumerate(l) if y[0] == most_similar_word]
sample_word = 'desk'
most_similar_word = 'book'
rank_of_most_similar_word(sample_word, most_similar_word)
Note: use topn=x
to get the top x most similar words while using model.wv.most_similar()
, as suggested in the comments.
Post a Comment for "Word2vec - Get Rank Of Similarity"