Getting error : Input contains NaN
due to this code : input_df[encoded_cols] = encoder.transform(input_df[categorical_cols].values)
I have corrected it by replacing .values with .fillna(‘unknown’). Is this is the correct way ?
Can you tell me why it is not showing error after replacing NaN values by unknown ?
Thanks in advance.
Hey, You are getting this error because there are missing values(NaN) in the categorical column which was not encoded. When you do fillna(“Unknown”) you are basically making a new category “Unknown” which will now be included in the encoded_cols. You can handle this with a newer version of scikit-learn using the handle_unknown="ignore" parameter of OneHotEncoder(). But for this the scikit-learn library needs to be updated using !pip install scikit-learn --upgrade
Thanks for the fix @birajde . It is really insighltful !!
Although, !pip install scikit-learn --update was not wokring in Google Colab. Use the below to install the newer version in Google Colab. The command for that is !pip install -U scikit-learn and it worked.
After the package is installed, please restart the notebook to see the changes reflect in the present session.