Default classification reporting

It's time to take a closer look at the evaluation of the model. Here is where setting the threshold for probability of default will help you analyze the model's performance through classification reporting.

Creating a data frame of the probabilities makes them easier to work with, because you can use all the power of pandas. Apply the threshold to the data and check the value counts for both classes of loan_status to see how many predictions of each are being created. This will help with insight into the scores from the classification report.

The cr_loan_prep data set, trained logistic regression clf_logistic, true loan status values y_test, and predicted probabilities, preds are loaded in the workspace.

This exercise is part of the course

Credit Risk Modeling in Python

View Course

Exercise instructions

Create a data frame of just the probabilities of default from preds called preds_df.
Reassign loan_status values based on a threshold of 0.50 for probability of default in preds_df.
Print the value counts of the number of rows for each loan_status.
Print the classification report using y_test and preds_df.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create a dataframe for the probabilities of default
____ = pd.____(____[:,1], columns = ['prob_default'])

# Reassign loan status based on the threshold
____[____] = ____[____].apply(lambda x: 1 if x > ____ else 0)

# Print the row counts for each loan status
print(____[____].____())

# Print the classification report
target_names = ['Non-Default', 'Default']
print(____(____, ____['loan_status'], target_names=target_names))

Edit and Run Code