Assignment 1 : Getting errors with NaN values


Getting error : Input contains NaN
due to this code :
input_df[encoded_cols] = encoder.transform(input_df[categorical_cols].values)
I have corrected it by replacing .values with .fillna(‘unknown’). Is this is the correct way ?
Can you tell me why it is not showing error after replacing NaN values by unknown ?
Thanks in advance.

Hey, You are getting this error because there are missing values(NaN) in the categorical column which was not encoded. When you do fillna(“Unknown”) you are basically making a new category “Unknown” which will now be included in the encoded_cols. You can handle this with a newer version of scikit-learn using the handle_unknown="ignore" parameter of OneHotEncoder(). But for this the scikit-learn library needs to be updated using !pip install scikit-learn --upgrade

3 Likes

Thanks for the fix @birajde . It is really insighltful !!

Although, !pip install scikit-learn --update was not wokring in Google Colab. Use the below to install the newer version in Google Colab. The command for that is !pip install -U scikit-learn and it worked.

After the package is installed, please restart the notebook to see the changes reflect in the present session.

Sorry, it should be !pip install scikit-learn --upgrade, It was a typo from my end. I will be updating the previous post, Thanks for replying.

Hey Hi

Great!!!
I didn’t know that !pip install scikit-learn --upgrade would also work.
Now I know 2 ways to update the package.

Thanks anyways!!

1 Like