H2o Vs Scikit Learn Confusion Matrix

October 27, 2022 Post a Comment

Anyone able to match the sklearn confusion matrix to h2o? They never match.... Doing something similar with Keras produces a perfect match. But in h2o they are always off. Tried it

Solution 1:

This does the trick, thx for the hunch Vivek. Still not an exact match but extremely close.

perf = model.model_performance(train)
threshold = perf.find_threshold_by_max_metric('f1')
model.model_performance(test).confusion_matrix(thresholds=threshold)

Solution 2:

I also meet the same issue. Here is what I would do to make a fair comparison:

model.train(x=x, y=y, training_frame=train, validation_frame=test)
cm1 = model.confusion_matrix(metrics=['F1'], valid=True)

Since we train the model using training data and validation data, then the pred['predict'] will use the threshold which maximizes the F1 score of validation data. To make sure, one can use these lines:

threshold = perf.find_threshold_by_max_metric(metric='F1', valid=True)
pred_df['predict'] = pred_df['p1'].apply(lambda x: 0 if x < threshold else 1)

To get another confusion matrix from scikit learn:

from sklearn.metrics import confusion_matrix

cm2 = confusion_matrix(y_true, pred_df['predict'])

In my case, I don't understand why I get slightly different results. Something like, for example:

print(cm1)
>> [[3063  176]
    [  94  146]]

print(cm2)
>> [[3063  176]
    [  95  145]]

Getting Started with Python

H2o Vs Scikit Learn Confusion Matrix

Solution 1:

Solution 2:

Post a Comment for "H2o Vs Scikit Learn Confusion Matrix"