I do not understand why do I get the error KeyError: '[ 1351 1352 1353 ... 13500 13501 13502] not in index'
when I run this code:
cv = KFold(n_splits=10)
for train_index, test_index in cv.split(X):
f_train_X, f_valid_X = X[train_index], X[test_index]
f_train_y, f_valid_y = y[train_index], y[test_index]
I use X
(a Pandas dataframe) to split I cv.split(X)
.
X.shape
y.shape
Out: (13503, 17)
Out: (13503,)
How to Fix the KeyError? We can simply fix the error by correcting the spelling of the key. If we are not sure about the spelling we can simply print the list of all column names and crosscheck.
Use DataFrame.reset_index() function reset_index() to reset the index of the updated DataFrame. By default, it adds the current row index as a new column called 'index' in DataFrame, and it will create a new row index as a range of numbers starting at 0.
To get the index of a Pandas DataFrame, call DataFrame. index property. The DataFrame. index property returns an Index object representing the index of this DataFrame.
The problem is the way you are trying to index the X
using X[train_index]
.
You need to use .loc
or .iloc
since you have pandas
dataframe.
cv = KFold(n_splits=10)
for train_index, test_index in cv.split(X):
f_train_X, f_valid_X = X.iloc[train_index], X.iloc[test_index]
f_train_y, f_valid_y = y.iloc[train_index], y.iloc[test_index]
iloc
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df[[1,2]]
#KeyError: '[1 2] not in index'
df.iloc[[1,2]]
# A B C D
#1 25 97 78 74
#2 6 84 16 21
df = df.values
#now this should work fine
df[[1,2]]
#array([[25, 97, 78, 74],
# [ 6, 84, 16, 21]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With