Skip to content Skip to sidebar Skip to footer

Spacy Similarity Function

I'm trying to use Spacy Library for sentences similarity, and I want to understand how it's work!? Their documentation is not clear: By default, spaCy uses an average-of-vectors a

Solution 1:

With a closer look in the documentation I think that you can find what you are searching for.

First of all, a doc object contains a lot of tokens. The vector for the doc is the average of the vectors of the tokens.

Now, what is the vector of a token? If you use an md or lg model, that means that the token.vec gives you the token vector, which is actually a gloVe vector. If you are using an sm model, then the documentation says that this vector contains structural information. That means that tokens which have the same PoS tag and similar DEP behavior will have higher similarity scores than other tokens, even if their semantics are quite different.

A general comment would be not to use the similarity method for sm models if you are not intending to get you semantics with this because most probably you will get inaccurate results.

Post a Comment for "Spacy Similarity Function"