as it mentioned when we have test data is available then we can split train-validation data in 75-25 or 70-30, i am confuses how to split in this situtation
If you have separate test data set available, divide the training set into any of these:
Training/Validation
- 70/30
- 75/25
- 80/20
- 90/10
Keep the majority chunk for training and a small portion for validation.
Thank you for your quick reply and i got your point but my question is when we have separate test data, so then how to assign inputs & target col.
We don’t need to make split for test data and we can directly assign input-target for test set.
For example:
In case of training data we split them to form train and validation set
# Create training and validation sets
train_inputs, val_inputs, train_targets, val_targets = train_test_split(
inputs_df[numeric_cols + encoded_cols], targets, test_size=0.25, random_state=42)
# For test set ( If we don't have target variable available )
test_input = test_df[numeric_cols + encoded_cols]
# For test set ( If target variable is available )
test_input = test_df[numeric_cols + encoded_cols]
test_target = test_df[target]
Its just for representational though, hope it helps.