Published on

Confusion Matrix is Not Confusing

Authors
  • avatar
    Name
    Rosa Tiara
    Twitter

Overview

Confusion matrix is a powerful tool used to evaluate the performance of a machine learning model. It provides a clear visual representation of the model's predictions and how they compare to the true labels of the data.

It may seem confusing at first glance, but it actually provides a wealth of information about your model's performance. In this blog post, we'll take a closer look at what a confusion matrix is and how it can be used to assess the accuracy of a classification model.


What is a confusion matrix?

If you're familiar with machine learning, you may have implement some methods but confused which model would be the best fit for your dataset. Well, confusion matrix is just a NxN matrix used for answering your confusion by evaluating the performance of a classification model. It shows you how accurate a machine learning model to predict the actual values. Let's take a deeper look:

confusion matrix

The rows of a confusion matrix correspond to what the algorithm predicted, and the columns correspond to the actual values. The green squares on the diagonal tell us how many times the model is predicting a correct output, whereas the red squares tell us how many times the algorithm messed up.

Terms & definitions:

  • True Positives (TP) = positive cases that are correctly classified.

  • False Positives (FP) = positive cases that are incorrectly classified.

  • False Negatives (FN) = negative cases that are incorrectly classified.

  • True Negatives (TN) = negative cases that are correctly classified.

  • Positives (P = TP + FN) = number of real positive cases in the data.

  • Negatives (N = FP + TN) = number of real negative cases in the data.

You can think of it with this formula as well:

confusion matrix

Practical Example

Now let's say you want to predict whether a student will pass their Linear Algebra class and you've tried to use Random Forest, Stochastic Gradient Descent, and Naive Bayes for your model. Since there will be only two outputs that will be produced—"pass" or "does not pass"— we'll have a 2x2 matrix.

Now, our next task is to make a confusion matrix for each model, compare them, and choose the best one.

How do we evaluate which model is the best?

We can evaluate each model by the calculating precision and recall rates. For your model to be considered as a good classifier, you want both of them to be one (or as close as possible). So we need a metric called F1-Score which takes them into account.

Precision

Precision is the ratio of correct positive predictions to the total number of positive predictions.

P=TPTP+FPP = { TP \over TP+FP}

Recall

Recall, also called as sensitivity or true positive rate, is the ratio of predictive positives to the total positive labels.

R=TPPR = { TP \over P}

F1-Score

F1-Score is the harmonic mean of the recall rate and precision.

F1=2PrecisionRecallPrecision+RecallF1 = { 2*{Precision * Recall \over Precision + Recall}}

Let's start evaluate each of our models.

confusion matrix of random forest

P=101101+57=0.63P = { 101 \over 101+57} = 0.63

R=101101+99=0.505R = { 101 \over 101+99} = 0.505

F1=20.630.5050.63+0.505=0.56=56%F1 = { 2*{0.63 * 0.505 \over 0.63 + 0.505}} = 0.56 = 56\%

confusion matrix of random forest

P=147147+100=0.59P = { 147 \over 147+100} = 0.59

R=147147+23=0.86R = { 147 \over 147+23} = 0.86

F1=20.590.860.59+0.86=0.7=70%F1 = { 2*{0.59 * 0.86 \over 0.59 + 0.86 }} = 0.7 = 70\%

confusion matrix of random forest

P=8989+61=0.593P = { 89 \over 89+61} = 0.593

R=8989+88=0.502R = { 89 \over 89+88} = 0.502

F1=20.5930.5020.593+0.502=0.54=54%F1 = { 2*{0.593 * 0.502 \over 0.593 + 0.502}} = 0.54 = 54\%

Based on the calculations above, our Stochastic Gradient Descent (SGD) model has the highest F1-Score rate, which is 70%, so we're gonna use SGD as our model.

Calculating Precision, Recall, and F1-Score in Python (Fraud Detection)

Let's assume we are classifying whether a transaction in a bank is fraudulent or not. In this case, we will have two categories:

  • 1 = fraudulent (positive)
  • 0 = not fraudulent, or normal (negative)

We'll have two arrays, one for the actual data (stored as actual_data) and the other for the predictions (stored as predictions).

actual_data = [1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1]
predictions = [1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1]

Calculating All Elements of a Confusion Matrix

actual_data = [1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1]
predictions = [1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0]

TP, TN, FP, FN = 0, 0, 0, 0
for i in range(0, len(actual_data)):
      if actual_data[i] == predictions[i] and actual_data[i] == 1:
            TP += 1 # true positives
      elif actual_data[i] == predictions[i] and actual_data[i] == 0:
            TN += 1 # true negatives
      elif actual_data[i] != predictions[i] and actual_data[i] == 1:
            FN += 1 # false negatives
      elif actual_data[i] != predictions[i] and actual_data[i] == 0:
            FP += 1 # false positives
      else:
            print("Error")
print("True Positives = ", TP)
print("True Negatives = ", TN)
print("False Positives = ", FP)
print("False Negatives = ", FN)

Output:

conf matrix result

Create Confusion Matrix

To create a confusion matrix, there are few ways you can choose. Here we'll see how the pandas package can help us with it.

import pandas as pd
data = {'Actual' : actual_data, 'Predictions': predictions}
df = pd.DataFrame(data, columns=['Actual','Predictions'])
confusion_matrix = pd.crosstab(df['Actual'], df['Predictions'], rownames=['Actual'], colnames=['Predictions'])
print (confusion_matrix)
conf matrix result

If you want your matrix to be more aesthetically pleasing, you can use sklearn package and display it as heatmap:

from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
confusion = confusion_matrix(actual_data, predictions)

sns.heatmap(confusion , annot=True , xticklabels=['Negative' , 'Positive'] , yticklabels=['Negative' , 'Positive'])
plt.ylabel("Label")
plt.xlabel("Predicted")
plt.show()
conf matrix result

Now let's calculate the precision, recall, and F1-score rates.

# Precision
precision = TP / (TP + FP)
print("Precision = ", precision*100, "%")

# Recall
recall = TP / (TP + FN)
print("Recall = ", recall*100, "%")

# F1 Score
f1_score = 2 * (precision * recall) / (precision + recall)
print("F1 Score = ", f1_score*100, "%")
conf matrix result

Conclusion

In conclusion, a confusion matrix is a powerful tool for evaluating the performance of a classification model. It allows us to visualize the number of true positive, true negative, false positive, and false negative predictions made by the model, and to calculate various evaluation metrics such as precision, recall, and F1-score. Understanding the strengths and limitations of a confusion matrix can help us improve the performance of our models and make more informed decisions about their use. Whether you are a beginner or an experienced data scientist, learning how to interpret and use a confusion matrix is an essential skill for any machine learning practitioner.

Good luck and happy learning! :)