Python Gensim Word2vec Vocabulary Key
I want to make word2vec with gensim. I heard that vocabulary corpus should be unicode so I converted it to unicode. # -*- encoding:utf-8 -*- # !/usr/bin/env python import sys reloa
Solution 1:
Word2Vec requires text examples that are broken into word-tokens. It appears you are simply providing strings to Word2Vec, so when it iterates over them, it will only be seeing single-characters as words.
Does Korean use spaces to delimit words? If so, break your texts by spaces before handing the list-of-words as a text example to Word2Vec.
If not, you'll need to use some external word-tokenizer (not part of gensim) before passing your sentences to Word2Vec.
Post a Comment for "Python Gensim Word2vec Vocabulary Key"