Skip to content Skip to sidebar Skip to footer

How Can We Train Onevsrestclassifier With The Output Of Fasttext

Can we train OneVsRestClassifier with the output of FastText as shown below: GENSIM Library with FastText fasttext_out=model_ted.wv.most_similar('The Lemon Drop Kid , a New York Ci

Solution 1:

The Gensim .most_similar() method, in the form you're using, expects a single word, not a full string sentence.

But, the FastText model vectors can take any word, even one that's out-of-vocabulary ("OOV") compared to tis training data, and synthesize a guess vector for it.

By providing it that long string, "The Lemon Drop Kid , a New York City swindler, is illegally touting horses at a Florida racetrack. After several successful hustles, the Kid comes across a beautiful, but gullible, woman intending to bet a lot of money. The Kid convinces her to switch her bet, employing a prefabricated con. Unfortunately for the Kid, the woman belongs to notorious gangster Moose Moran , as does the money.", you've asked it to synthesize a word-vector for a 391-letter word – where some of the 'letters' are spaces and punctuation. And then, to report other known, in-vocabulary words that are close to that vector.

So, it'll calculate such a vector as requested, but it's probably not very good! And the list of nearest-neighbors to that vector, as returned by .most_similar(), aren't what you'd need for a downstream classifier.

FastText inherently only gives you word-vectors: a vector per word. If you want a vector for a longer run of text, like a lot of words, you'll need to make more decisions about how you want to turn a bunch of individual word-vectors into something else.

It's an OK decision to simply try: averaging all those words together. (There are many other ways to represent larger texts as vectors, or bags-of-other-values.)

You could then try passing those averages as the features to a downstream classifier.

The FastText -supervised mode, available in the original FaceBook FastText but not Gensim, is a way to combine FastText training and a simple classifier at the same time, with the word-vector optimized for how well they help the classifier, under this same sort of "Average all the words together" combination. You may want to try that, too, though it may be harder to customize in Python.

Post a Comment for "How Can We Train Onevsrestclassifier With The Output Of Fasttext"