How to encode ordinal variables for machine learning?

  • What are ordinal variables?
  • When should one encode ordinal variables?
  • How to encode ordinal variables?
1 Like

Categorical variables can be divided into two subcategories based on the kind of elements they group:

  • Nominal variables are those whose categories do not have a natural order or ranking. Examples that fit in this category are gender, postal codes, hair colour, etc.
  • Ordinal variables have an inherent order which is somehow significant. An example would be tracking student grades where Grade 1 > Grade 2 > Grade 3. Another example would the socio-economic status of people where be the “high income” > “low income”.

To know more about encoding nominal variables, check out What is dummy variable trap?


Let us look at the following column from a data set containing candidates for a job. The column consists the highest qualification of a candidate and educational background holds significance for the company hiring for a particular role.


Here, the highest qualification holds significance and can be encoded in the following way:



Now consider another example where the column consists of when the application was filed by the candidate:

Now, the days have a certain ordering to it, but the order doesn’t hold any significance in this case and hence can be treated as a nominal variable or can be dropped as well, depending on the desired output.



So, it is to be noted that ordinal variables can be encoded by adding weights in the order of significance and then applying the desired machine learning algorithm to the data set.
Also, ordinal variables can be treated as nominal variables if the natural ordering does not hold any significance for the desired output.

4 Likes

Many thanks for this clarification.

1 Like

Thanks for the nice explanation

1 Like