Classification Is Poor Although Term Frequency Is Right
I am checking using the below function what are the most frequent words per category and then observe how some sentences would be classified. The results are surprisingly wrong: #
Solution 1:
The order of names in cat
variable and newsgroup_train.target_names
is different. The labels assigned target_names
are sorted, see here
Output of:
print(cat)
['sci.space','rec.autos','rec.motorcycles']
print(newsgroups_train.target_names)
['rec.autos', 'rec.motorcycles', 'sci.space']
You should this line:
print(" - Predicted as: '{}'".format(cats[predicted]))
to
print(" - Predicted as: '{}'".format(newsgroup_train.target_names[predicted]))
Post a Comment for "Classification Is Poor Although Term Frequency Is Right"