How to decide .describe() have reasonable values?

Do the ranges of the numeric columns seem reasonable? If not, we may have to do some data cleaning as well.

How can I check if something is reasonable or not?
What are the steps to clean the data according to the reasonability?

Hey @prashantbharti19, welcome to the forum.
When you are working with data you should have knowledge about it. If not, you have to look up the internet/ask experts on the same field to gain more knowledge. For example, say there’s a feature highest rain (in cm) and the max value is 100000, but we all know that 100000 cm of rainfall is not possible(on Earth) so this does not seem reasonable, another example would be the StackOverflow survey age feature, Suppose the minimum age in the survey is 1 and the maximum age is 120/130 (seems unreasonable). This might be a typing error, or someone(1 or 2 persons) of age 120-130 has really filled the survey. But, in these case we can ignore those two surveys cause we are not sure if the data was added intentionally or mistakenly.

okay got it.
Can you tell me what are the steps do I have to follow while cleaning the data? or you can send me links of the articles.
Thanks in advance

1 Like

Here’s an excellent article by Tableau → https://www.tableau.com/learn/articles/what-is-data-cleaning
There are many great articles out there(Just google it) I will also send any good articles if I find any in the future. Also all of these comes with practice and working with the data.