Skip to content Skip to sidebar Skip to footer

H2o Vs Scikit Learn Confusion Matrix

Anyone able to match the sklearn confusion matrix to h2o? They never match.... Doing something similar with Keras produces a perfect match. But in h2o they are always off. Tried it

Solution 1:

This does the trick, thx for the hunch Vivek. Still not an exact match but extremely close.

perf = model.model_performance(train)
threshold = perf.find_threshold_by_max_metric('f1')
model.model_performance(test).confusion_matrix(thresholds=threshold)

Solution 2:

I also meet the same issue. Here is what I would do to make a fair comparison:

model.train(x=x, y=y, training_frame=train, validation_frame=test)
cm1 = model.confusion_matrix(metrics=['F1'], valid=True)

Since we train the model using training data and validation data, then the pred['predict'] will use the threshold which maximizes the F1 score of validation data. To make sure, one can use these lines:

threshold = perf.find_threshold_by_max_metric(metric='F1', valid=True)
pred_df['predict'] = pred_df['p1'].apply(lambda x: 0 if x < threshold else 1)

To get another confusion matrix from scikit learn:

from sklearn.metrics import confusion_matrix

cm2 = confusion_matrix(y_true, pred_df['predict'])

In my case, I don't understand why I get slightly different results. Something like, for example:

print(cm1)
>> [[3063  176]
    [  94  146]]

print(cm2)
>> [[3063  176]
    [  95  145]]

Post a Comment for "H2o Vs Scikit Learn Confusion Matrix"