Error in onehot encoder

When i ran this i encountered the below error

Kindly review

ValueError Traceback (most recent call last)
in ()
----> 1 enc.fit(medical_df[[‘region’]])
2 enc.categories_

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\data.py in fit(self, X, y)
1954 self
1955 “”"
→ 1956 self.fit_transform(X)
1957 return self
1958

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\data.py in fit_transform(self, X, y)
2017 “”"
2018 return _transform_selected(X, self._fit_transform,
→ 2019 self.categorical_features, copy=True)
2020
2021 def _transform(self, X):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\data.py in _transform_selected(X, transform, selected, copy)
1807 X : array or sparse matrix, shape=(n_samples, n_features_new)
1808 “”"
→ 1809 X = check_array(X, accept_sparse=‘csc’, copy=copy, dtype=FLOAT_DTYPES)
1810
1811 if isinstance(selected, six.string_types) and selected == “all”:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
→ 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in array(self, dtype)
1779
1780 def array(self, dtype=None) → np.ndarray:
→ 1781 return np.asarray(self._values, dtype=dtype)
1782
1783 def array_wrap(self, result, context=None):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order)
81
82 “”"
—> 83 return array(a, dtype, copy=False, order=order)
84
85

ValueError: could not convert string to float: ‘southwest’

2 Likes

The error doesn’t seem to originate from this cell, have you reset the value of enc somewhere in a cell before? Use type(enc) to check it belongs to OneHotEncoding class or not.

me too

KeyError: “None of [Index([‘northeast’, ‘northwest’, ‘southeast’, ‘southwest’], dtype=‘object’)] are in the [columns]”

what can I do?

Hey, maybe you are using df[[‘northeast’, ‘northwest’, ‘southeast’, ‘southwest’]], this is used to access the columns mentioned from the dataframe, but these columns are not in the dataframe, use df[‘northeast’, ‘northwest’, ‘southeast’, ‘southwest’] to set the values of these columns.

medical_df[‘northeast’, ‘northwest’, ‘southeast’, ‘southwest’] = one_hot

ValueError: Wrong number of items passed 4, placement implies 1

How can I add these 4 columns plz?

Try this,
df[[‘northeast’, ‘northwest’, ‘southeast’, ‘southwest’]] = encoder.transform(train_inputs[categorical_cols])

Hey, Please run the cells before to this one, it seems you have not executed the cell where you are defining train_inputs.

Thank you for your patience, sir
I have tried in various ways
But unfortunately, nothing works :pensive:
I did all the exercises
But I can’t go on without solving this problem

Can you just try to execute and see what’s inside train_inputs it seems train_inputs is not defined and does not exist in this notebook.

1 Like

what I must type for defining it?

Is this the lecture notebook? It should be defined in the lecture notebook. Can you share me the link of this notebook?

Here is my notebook: https://jovian.ai/math-nights/python-sklearn-linear-regression

I download it then run it with my Jupyter notebook on my computer
I do my work & solve exercises
then save it at Jovian

In this notebook, the train_inputs, is defined as inputs_train during splitting the input data into validation and train dataset. You have to use inputs_train, inplace of train_inputs.

1 Like

When I run the cell on Binder, everything Ok.
But on my computer, the cell doesn’t work at all.

Maybe it’s a version issue, try updating pandas. !pip install pandas --upgrade

It’s still showing the same error, unfortunately!

Hey, I ran your code and debugged the error.


You were facing the issue because,

  • inputs_target was written after this section and was not executed so it was saying name error, you have to use medical_df instead.
  • categorical_cols should be the column names that have categories in the medical_df and not the new columns formed after one hot encoded.
  • The new columns formed after one hot encoding are named as encoded_cols in the code given above.

If all of these seems difficult now, don’t worry more on one-hot-encoding is covered on the next lecture and the assignment notebook, go through them and return back after you are have done some practice on the lecture notebook.
PS: The cell In [162] is not required, I used it test if the encoder was working correctly or not.

1 Like

Finally :grinning:

Thanks for your patience with us, sir.
Thank you for your great effort in answering our questions in a clear, understandable, and simple manner.
Best Regards.

1 Like

Have been able to resolved this. Am moving to the next video