Lesson 4 - Random Forests and Regularization

:arrow_forward: Lecture Video will be available on the course page :point_up_2:

Topics Covered:

  • Training and interpreting random forests
  • Overfitting, hyperparameter tuning & regularization
  • Making predictions on single inputs

:spiral_notepad: Notebooks used in this lesson:

:writing_hand: Please provide your valuable feedback on this link to help us improve the course experience.

:computer: Join the Jovian Discord Server to interact with the course team, share resources and attend the study hours :point_right: Jovian Discord Server

:question: Asking/Answering Questions

Reply to this thread to ask questions. Before asking, scroll through the thread and check if your question (or a similar one) is already present. If yes, just like it. We will give priority to the questions with the most likes. The rest will be answered by our mentors or the community. If you see a question you know the answer to, please post your answer as a reply to that question. Let’s help each other learn!

When doing a parametric test on Hyperparameters in a Decision Tree/Random Forest, is there any particular order that it is widely accepted to perform the optimization?

In other words, we do not want to end up spending several hours optimizing a particular sub-set of parameters, only to find that optmizing a different parameter completely dominates the reduction of the loss value.

How to use a decision tree from the random forest as a separate model?

There’s a question about validating individual tree of a random forest against the forest. However, if I use the estimators_ method to get individual tree, I am not able to use score method on the that individual tree.

For eg:
If I get a tree:

tree = model.estimators_[1]
tree.score(X_train, train_target)

I get an error

1 Like

my feature importances vary from what Aakash got in the video. is that alright? as it doesn’t match my decision tree @birajde

I don’t think it should vary if you have set the random_state=42 parameter while defining the Decision tree model. Nevertheless, this is fine too. All models can be different from each other and that’s completely fine.

What’s the error you are getting?

TypeError: Labels in y_true and y_pred should be of the same type. Got y_true=[‘No’ ‘Yes’] and y_pred=[0. 1.]. Make sure that the predictions provided by the classifier coincides with the true labels.

You are checking the score right? You have to pass the targets, and the predictions to check the score and not the input_columns and targets.

I thought so too but if you see all the other places, we check model score as
model.score(X_train, train_target)
So I tried getting a prediction from the model and then check but I got the same error
tree1 = model.estimators_[1] pred = tree1.predict(X_train) model.score(pred,train_target)

…and I get the same error

Run and check what’s inside pred and train_target, if pred has only 0, and 1 try to convert the train_target from Yes → 1 and No → 0, and then try model.score() again.

Even after hyperparameter tuning, my base model random forest have a higher accuracy score. Is that normal ?

Sure, but why is this only happening when I am using individual tree? I don’t get this issue when I am using the score method on normal decision tree or random forest.

Hi, couldn’t get this statement - “max_depth_error(md) for md in range(1, 21)” in

pd.DataFrame([max_depth_error(md) for md in range(1, 21)])

Why do we keep random_state = 42?