Skip to content Skip to sidebar Skip to footer

ValueError: Array Length Does Not Match Index Length

I am practicing for contests like kaggle and I have been trying to use XGBoost and am trying to get myself familiar with python 3rd party libraries like pandas and numpy. I have be

Solution 1:

The problem is that you defining X_test twice as @maxymoo mentioned. First you defined it as

X_test = df_test.drop(['ID'], axis=1).values

And then you redefine that with:

X_train, X_test, y_train, y_test = cv.train_test_split(X_train, y_train, random_state=1301, test_size=0.4)

Which means now X_test have size equal to 0.4*len(X_train). Then after:

y_pred = clf.predict_proba(X_test)

you've got predictions for that part of X_train and you trying to create dataframe with that and initial id_test which has length of the original X_test.
You could use X_fit and X_eval in train_test_split and not hide initial X_train and X_test because for your cross_validation you also has different X_train which means you'll not get right answer or you cv would be inaccurate with public/private score.


Post a Comment for "ValueError: Array Length Does Not Match Index Length"