ValueError: Array Length Does Not Match Index Length
I am practicing for contests like kaggle and I have been trying to use XGBoost and am trying to get myself familiar with python 3rd party libraries like pandas and numpy. I have be
Solution 1:
The problem is that you defining X_test
twice as @maxymoo mentioned. First you defined it as
X_test = df_test.drop(['ID'], axis=1).values
And then you redefine that with:
X_train, X_test, y_train, y_test = cv.train_test_split(X_train, y_train, random_state=1301, test_size=0.4)
Which means now X_test
have size equal to 0.4*len(X_train)
. Then after:
y_pred = clf.predict_proba(X_test)
you've got predictions for that part of X_train
and you trying to create dataframe with that and initial id_test
which has length of the original X_test
.
You could use X_fit
and X_eval
in train_test_split
and not hide initial X_train
and X_test
because for your cross_validation
you also has different X_train
which means you'll not get right answer or you cv
would be inaccurate with public/private score.
Post a Comment for "ValueError: Array Length Does Not Match Index Length"