KeyError: “None of [Index([‘Location_Adelaide’, ‘Location_Albany’, ‘Location_Albury’,\n ‘Location_AliceSprings’, ‘Location_BadgerysCreek’, ‘Location_Ballarat’,\n ‘Location_Bendigo’, ‘Location_Brisbane’, ‘Location_Cairns’,\n ‘Location_Canberra’,\n …\n ‘WindDir3pm_SE’, ‘WindDir3pm_SSE’, ‘WindDir3pm_SSW’, ‘WindDir3pm_SW’,\n ‘WindDir3pm_Unknown’, ‘WindDir3pm_W’, ‘WindDir3pm_WNW’,\n ‘WindDir3pm_WSW’, ‘RainToday_No’, ‘RainToday_Yes’],\n dtype=‘object’, length=102)] are in the [columns]”
This is the error that I am getting while trying to encode the categorical values. Please help.
Thank you.

Hey, welcome to the community.
Can you please share screenshots of the code or share your notebook?

Yes, I also get that error, after applying the following code, as mentioned in the lecture:

train_inputs[encoded_cols] = encoder.transform(train_inputs[categorical_cols].fillna('Unknown'))
val_inputs[encoded_cols] = encoder.transform(val_inputs[categorical_cols].fillna('Unknown'))
test_inputs[encoded_cols] = encoder.transform(test_inputs[categorical_cols].fillna('Unknown'))

This is the whole Error Message:

Other code is same as the given notebook which is used in the video

categorical_inputs should have the names of the categorical columns in the raw df. Please check the categorical_inputs variable.

it does have the names of the categorical columns in the raw df.

Can you please share your notebook link? I have to run your notebook and check what’s the error.

how to share the link? i have downloaded the html file

You just have to open your Notebook on Jovian Site and copy that link to share your notebook
Btw… This is your notebook’s link…

This notebook is working fine, can you just restart the runtime and run all cells again? @abhiramdegwekar

Yes sir. I tried but its not working

Actually, I have noticed that the following does seem to work, where an intermediate variable ‘new_cols_train_inputs’ has been created. It must be noted that immediately after the Encoder is applied, an array is created, and not a DataFrame, so conversion to the latter is required. This has been demonstrated for train_inputs, but a similar methodology holds for the other two variables:

new_cols_train_inputs = encoder.transform(train_inputs[categorical_cols].fillna('Unknown'))

train_inputs[encoded_cols] = pd.DataFrame(new_cols_train_inputs)

i think there will be a problem with indexes if you wrap it with DataFrame, you can see when test_inputs is called, the categorical columns dont match up with the one hot encoded, and then also the tail end of the dataframe is filled with NaN

I know that somewhere I had to perform a reset_index type operation for it to work.

yea, i ended up doing something like this:

encoded_train = pd.DataFrame(encoder.transform(train_inputs[categorical_cols]),columns= encoded_cols)
train_inputs[encoded_cols] = encoded_train