Skip to content Skip to sidebar Skip to footer

Classification Is Poor Although Term Frequency Is Right

I am checking using the below function what are the most frequent words per category and then observe how some sentences would be classified. The results are surprisingly wrong: #

Solution 1:

The order of names in cat variable and newsgroup_train.target_names is different. The labels assigned target_names are sorted, see here

Output of: print(cat)

['sci.space','rec.autos','rec.motorcycles']

print(newsgroups_train.target_names)

['rec.autos', 'rec.motorcycles', 'sci.space']

You should this line:

print(" - Predicted as: '{}'".format(cats[predicted]))

to

print(" - Predicted as: '{}'".format(newsgroup_train.target_names[predicted]))


Post a Comment for "Classification Is Poor Although Term Frequency Is Right"