Logistic Regression for Classification | Lesson 2 of 6

KeyError: “None of [Index([‘Location_Adelaide’, ‘Location_Albany’, ‘Location_Albury’,\n ‘Location_AliceSprings’, ‘Location_BadgerysCreek’, ‘Location_Ballarat’,\n ‘Location_Bendigo’, ‘Location_Brisbane’, ‘Location_Cairns’,\n ‘Location_Canberra’,\n …\n ‘WindDir3pm_SE’, ‘WindDir3pm_SSE’, ‘WindDir3pm_SSW’, ‘WindDir3pm_SW’,\n ‘WindDir3pm_Unknown’, ‘WindDir3pm_W’, ‘WindDir3pm_WNW’,\n ‘WindDir3pm_WSW’, ‘RainToday_No’, ‘RainToday_Yes’],\n dtype=‘object’, length=102)] are in the [columns]”
KeyError: “None of [Index([‘Location_Adelaide’, ‘Location_Albany’, ‘Location_Albury’,\n ‘Location_AliceSprings’, ‘Location_BadgerysCreek’, ‘Location_Ballarat’,\n ‘Location_Bendigo’, ‘Location_Brisbane’, ‘Location_Cairns’,\n ‘Location_Canberra’,\n …\n ‘WindDir3pm_SE’, ‘WindDir3pm_SSE’, ‘WindDir3pm_SSW’, ‘WindDir3pm_SW’,\n ‘WindDir3pm_Unknown’, ‘WindDir3pm_W’, ‘WindDir3pm_WNW’,\n ‘WindDir3pm_WSW’, ‘RainToday_No’, ‘RainToday_Yes’],\n dtype=‘object’, length=102)] are in the [columns]”

This is the error that I am getting while trying to encode the categorical values. Please help.
Thank you.

Hey, welcome to the community.
Can you please share screenshots of the code or share your notebook?

Yes, I also get that error, after applying the following code, as mentioned in the lecture:

train_inputs[encoded_cols] = encoder.transform(train_inputs[categorical_cols].fillna('Unknown'))
val_inputs[encoded_cols] = encoder.transform(val_inputs[categorical_cols].fillna('Unknown'))
test_inputs[encoded_cols] = encoder.transform(test_inputs[categorical_cols].fillna('Unknown'))

This is the whole Error Message:

KeyError                                  Traceback (most recent call last)
<ipython-input-76-6befa68430f6> in <module>
----> 1 train_inputs[encoded_cols] = encoder.transform(train_inputs2[categorical_cols].fillna('Unknown'))
      2 val_inputs[encoded_cols] = encoder.transform(val_inputs2[categorical_cols].fillna('Unknown'))
      3 test_inputs[encoded_cols] = encoder.transform(test_inputs2[categorical_cols].fillna('Unknown'))

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
   3365             self._setitem_frame(key, value)
   3366         elif isinstance(key, (Series, np.ndarray, list, Index)):
-> 3367             self._setitem_array(key, value)
   3368         else:
   3369             # set column

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in _setitem_array(self, key, value)
   3391                     self[k1] = value[k2]
   3392             else:
-> 3393                 indexer = self.loc._convert_to_indexer(key, axis=1)
   3394                 self._check_setitem_copy()
   3395                 self.loc._setitem_with_indexer((slice(None), indexer), value)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, obj, axis, is_setter, raise_missing)
   1352                 kwargs = {'raise_missing': True if is_setter else
   1353                           raise_missing}
-> 1354                 return self._get_listlike_indexer(obj, axis, **kwargs)[1]
   1355         else:
   1356             try:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1159         self._validate_read_indexer(keyarr, indexer,
   1160                                     o._get_axis_number(axis),
-> 1161                                     raise_missing=raise_missing)
   1162         return keyarr, indexer

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1244                 raise KeyError(
   1245                     u"None of [{key}] are in the [{axis}]".format(
-> 1246                         key=key, axis=self.obj._get_axis_name(axis)))
   1248             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Index(['Location_Adelaide', 'Location_Albany', 'Location_Albury',\n       'Location_AliceSprings', 'Location_BadgerysCreek', 'Location_Ballarat',\n       'Location_Bendigo', 'Location_Brisbane', 'Location_Cairns',\n       'Location_Canberra',\n       ...\n       'WindDir3pm_SE', 'WindDir3pm_SSE', 'WindDir3pm_SSW', 'WindDir3pm_SW',\n       'WindDir3pm_Unknown', 'WindDir3pm_W', 'WindDir3pm_WNW',\n       'WindDir3pm_WSW', 'RainToday_No', 'RainToday_Yes'],\n      dtype='object', length=102)] are in the [columns]"

Other code is same as the given notebook which is used in the video

categorical_inputs should have the names of the categorical columns in the raw df. Please check the categorical_inputs variable.

it does have the names of the categorical columns in the raw df.

Can you please share your notebook link? I have to run your notebook and check what’s the error.

how to share the link? i have downloaded the html file

You just have to open your Notebook on Jovian Site and copy that link to share your notebook
Btw… This is your notebook’s link…

1 Like

This notebook is working fine, can you just restart the runtime and run all cells again? @abhiramdegwekar

Yes sir. I tried but its not working

Actually, I have noticed that the following does seem to work, where an intermediate variable ‘new_cols_train_inputs’ has been created. It must be noted that immediately after the Encoder is applied, an array is created, and not a DataFrame, so conversion to the latter is required. This has been demonstrated for train_inputs, but a similar methodology holds for the other two variables:

new_cols_train_inputs = encoder.transform(train_inputs[categorical_cols].fillna('Unknown'))

train_inputs[encoded_cols] = pd.DataFrame(new_cols_train_inputs)

1 Like

i think there will be a problem with indexes if you wrap it with DataFrame, you can see when test_inputs is called, the categorical columns dont match up with the one hot encoded, and then also the tail end of the dataframe is filled with NaN

I know that somewhere I had to perform a reset_index type operation for it to work.

yea, i ended up doing something like this:

encoded_train = pd.DataFrame(encoder.transform(train_inputs[categorical_cols]),columns= encoded_cols)
train_inputs[encoded_cols] = encoded_train