 Published on
Confusion Matrix is Not Confusing
 Authors
 Name
 Rosa Tiara
Overview
Confusion matrix is a powerful tool used to evaluate the performance of a machine learning model. It provides a clear visual representation of the model's predictions and how they compare to the true labels of the data.
It may seem confusing at first glance, but it actually provides a wealth of information about your model's performance. In this blog post, we'll take a closer look at what a confusion matrix is and how it can be used to assess the accuracy of a classification model.
What is a confusion matrix?
If you're familiar with machine learning, you may have implement some methods but confused which model would be the best fit for your dataset. Well, confusion matrix is just a NxN matrix used for answering your confusion by evaluating the performance of a classification model. It shows you how accurate a machine learning model to predict the actual values. Let's take a deeper look:
The rows of a confusion matrix correspond to what the algorithm predicted, and the columns correspond to the actual values. The green squares on the diagonal tell us how many times the model is predicting a correct output, whereas the red squares tell us how many times the algorithm messed up.
Terms & definitions:

True Positives (TP) = positive cases that are correctly classified.

False Positives (FP) = positive cases that are incorrectly classified.

False Negatives (FN) = negative cases that are incorrectly classified.

True Negatives (TN) = negative cases that are correctly classified.

Positives (P = TP + FN) = number of real positive cases in the data.

Negatives (N = FP + TN) = number of real negative cases in the data.
You can think of it with this formula as well:
Practical Example
Now let's say you want to predict whether a student will pass their Linear Algebra class and you've tried to use Random Forest, Stochastic Gradient Descent, and Naive Bayes for your model. Since there will be only two outputs that will be produced—"pass" or "does not pass"— we'll have a 2x2 matrix.
Now, our next task is to make a confusion matrix for each model, compare them, and choose the best one.
How do we evaluate which model is the best?
We can evaluate each model by the calculating precision and recall rates. For your model to be considered as a good classifier, you want both of them to be one (or as close as possible). So we need a metric called F1Score which takes them into account.
Precision
Precision is the ratio of correct positive predictions to the total number of positive predictions.
$P = { TP \over TP+FP}$
Recall
Recall, also called as sensitivity or true positive rate, is the ratio of predictive positives to the total positive labels.
$R = { TP \over P}$
F1Score
F1Score is the harmonic mean of the recall rate and precision.
$F1 = { 2*{Precision * Recall \over Precision + Recall}}$
Let's start evaluate each of our models.
$P = { 101 \over 101+57} = 0.63$
$R = { 101 \over 101+99} = 0.505$
$F1 = { 2*{0.63 * 0.505 \over 0.63 + 0.505}} = 0.56 = 56\%$
$P = { 147 \over 147+100} = 0.59$
$R = { 147 \over 147+23} = 0.86$
$F1 = { 2*{0.59 * 0.86 \over 0.59 + 0.86 }} = 0.7 = 70\%$
$P = { 89 \over 89+61} = 0.593$
$R = { 89 \over 89+88} = 0.502$
$F1 = { 2*{0.593 * 0.502 \over 0.593 + 0.502}} = 0.54 = 54\%$
Based on the calculations above, our Stochastic Gradient Descent (SGD) model has the highest F1Score rate, which is 70%, so we're gonna use SGD as our model.
Calculating Precision, Recall, and F1Score in Python (Fraud Detection)
Let's assume we are classifying whether a transaction in a bank is fraudulent or not. In this case, we will have two categories:
 1 = fraudulent (positive)
 0 = not fraudulent, or normal (negative)
We'll have two arrays, one for the actual data (stored as actual_data
) and the other for the predictions (stored as predictions
).
actual_data = [1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1]
predictions = [1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1]
Calculating All Elements of a Confusion Matrix
actual_data = [1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1]
predictions = [1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0]
TP, TN, FP, FN = 0, 0, 0, 0
for i in range(0, len(actual_data)):
if actual_data[i] == predictions[i] and actual_data[i] == 1:
TP += 1 # true positives
elif actual_data[i] == predictions[i] and actual_data[i] == 0:
TN += 1 # true negatives
elif actual_data[i] != predictions[i] and actual_data[i] == 1:
FN += 1 # false negatives
elif actual_data[i] != predictions[i] and actual_data[i] == 0:
FP += 1 # false positives
else:
print("Error")
print("True Positives = ", TP)
print("True Negatives = ", TN)
print("False Positives = ", FP)
print("False Negatives = ", FN)
Output:
Create Confusion Matrix
To create a confusion matrix, there are few ways you can choose. Here we'll see how the pandas package can help us with it.
import pandas as pd
data = {'Actual' : actual_data, 'Predictions': predictions}
df = pd.DataFrame(data, columns=['Actual','Predictions'])
confusion_matrix = pd.crosstab(df['Actual'], df['Predictions'], rownames=['Actual'], colnames=['Predictions'])
print (confusion_matrix)
If you want your matrix to be more aesthetically pleasing, you can use sklearn package and display it as heatmap:
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
confusion = confusion_matrix(actual_data, predictions)
sns.heatmap(confusion , annot=True , xticklabels=['Negative' , 'Positive'] , yticklabels=['Negative' , 'Positive'])
plt.ylabel("Label")
plt.xlabel("Predicted")
plt.show()
Now let's calculate the precision, recall, and F1score rates.
# Precision
precision = TP / (TP + FP)
print("Precision = ", precision*100, "%")
# Recall
recall = TP / (TP + FN)
print("Recall = ", recall*100, "%")
# F1 Score
f1_score = 2 * (precision * recall) / (precision + recall)
print("F1 Score = ", f1_score*100, "%")
Conclusion
In conclusion, a confusion matrix is a powerful tool for evaluating the performance of a classification model. It allows us to visualize the number of true positive, true negative, false positive, and false negative predictions made by the model, and to calculate various evaluation metrics such as precision, recall, and F1score. Understanding the strengths and limitations of a confusion matrix can help us improve the performance of our models and make more informed decisions about their use. Whether you are a beginner or an experienced data scientist, learning how to interpret and use a confusion matrix is an essential skill for any machine learning practitioner.
Good luck and happy learning! :)