Skip to content Skip to sidebar Skip to footer

How To Run A Random Classifer In The Following Case

I am trying to experiment with sentiment analysis case and I am trying to run a random classifier for the following: |Topic |value|label| |Apples are great |-0.99|

Solution 1:

The issue is concatenating another column to sparse matrix (the output from countvector.fit_transform ). For simplicity sake, let's say your training is:

x = pd.DataFrame({'Topics':['Apples are great','Balloon is red','cars are running',
                           'dear diary','elephant is huge','facebook is great'],
                  'value':[-0.99,-0.98,-0.93,0.8,0.91,0.97,],
                  'label':[0,1,0,1,1,0]})

You can see this gives you something weird:

countvector=CountVectorizer(ngram_range=(2,2))
traindataset=countvector.fit_transform(x['Topics'])

train_set = pd.concat([x['value'], pd.DataFrame(traindataset)], axis=1)

train_set.head(2)

    value   00-0.99   (0, 0)\t1\n (0, 1)\t1
1-0.98   (0, 3)\t1\n (0, 10)\t1

It is possible to convert your sparse to a dense numpy array and then your pandas dataframe will work, however if your dataset is huge this is extremely costly. To keep it as sparse, you can do:

from scipy importsparsetrain_set= scipy.sparse.hstack([sparse.csr_matrix(x['value']).reshape(-1,1),traindataset])

randomclassifier=RandomForestClassifier(n_estimators=200,criterion='entropy')
randomclassifier.fit(train_set,x['label'])

Check out also the help page for sparse

Post a Comment for "How To Run A Random Classifer In The Following Case"