Remove the Outliers

Why he did not eliminate the outliers?

I am not sure, but it was Lesson 1 so Sir was trying to make it simple that’s why I guess he didn’t went into much detail and I think it’s good for all…
Maybe he will Explain it in future lessons

Hey @amaniharv20, welcome to the community.
I don’t see any outliners in the first lectures dataset. If you see the image below, it is mentioned that,

The ranges of values in the numerical columns seem reasonable too (no negative ages!), so we may not have to do much data cleaning or correction. The “charges” column seems to be significantly skewed however, as the median (50 percentile) is much lower than the maximum value.

1 Like

But he said many times that there were outliers.

I guess by outliers Sir wants to say that thses values are affecting the result

Well when you look at the graph, this seems like outliners, but these are necessary. The data seems to be differing with the most common values of charges but still in the acceptable range and doesn’t seem to be extraordinary/experimental error/input error. We don’t need to remove these cause these are real world data and justifiable(I mean some person of certain age might have high Medical charges due to accidents or some serious illness in the real world), if we remove these our model will be generalized with some values and won’t perform well in real world.

1 Like

Thank you, I got it. I really apperaciate your dediction.

1 Like