Default classification reporting
It's time to take a closer look at the evaluation of the model. Here is where setting the threshold for probability of default will help you analyze the model's performance through classification reporting.
Creating a data frame of the probabilities makes them easier to work with, because you can use all the power of pandas. Apply the threshold to the data and check the value counts for both classes of loan_status to see how many predictions of each are being created. This will help with insight into the scores from the classification report.
The cr_loan_prep data set, trained logistic regression clf_logistic, true loan status values y_test, and predicted probabilities, preds are loaded in the workspace.
This exercise is part of the course
Credit Risk Modeling in Python
Exercise instructions
- Create a data frame of just the probabilities of default from
predscalledpreds_df. - Reassign
loan_statusvalues based on a threshold of0.50for probability of default inpreds_df. - Print the value counts of the number of rows for each
loan_status. - Print the classification report using
y_testandpreds_df.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a dataframe for the probabilities of default
____ = pd.____(____[:,1], columns = ['prob_default'])
# Reassign loan status based on the threshold
____[____] = ____[____].apply(lambda x: 1 if x > ____ else 0)
# Print the row counts for each loan status
print(____[____].____())
# Print the classification report
target_names = ['Non-Default', 'Default']
print(____(____, ____['loan_status'], target_names=target_names))