I’m working on Diabetes Dataset and I’m stuck on Data Cleaning Phase…
Can anyone tell how to impute the missing values by looking at the histogram…that either to fill the dataset with mean, median, or mode?
I want to replace zero values in Glucose, BP, SkinThickness, Insulin, BMI with some values that best fits it
I would recommend you to check the below post.
I think it should be considered about how you are planning to work or what you are trying to build. Regardless I would not recommend filling the values with
mean where there are outliners. When the data is concentrated in one region for example in
Glucose I will suggest you use mean, for others you can use median. Where the data is too large in one data point for example in
insulin, using mode can be a good option too. That said imputation depends on your plan of work, You might not even need to fill in the missing values.