Of course! Let's dive deep into accuracy_score from Python's popular scikit-learn library.

What is accuracy_score?
accuracy_score is a function used to evaluate the performance of a classification model. It measures the fraction of correct predictions made by the model compared to the total number of predictions.
In simple terms, it answers the question: "Out of all the predictions, what percentage did the model get right?"
The Formula
The formula for accuracy is straightforward:
$$ \text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}} = \frac{TP + TN}{TP + TN + FP + FN} $$

Where:
- TP (True Positives): The model correctly predicted the positive class.
- TN (True Negatives): The model correctly predicted the negative class.
- FP (False Positives): The model incorrectly predicted the positive class (Type I Error).
- FN (False Negatives): The model incorrectly predicted the negative class (Type II Error).
How to Use It (Code Examples)
First, you need to have scikit-learn installed. If you don't, run this in your terminal:
pip install scikit-learn
Example 1: Basic Usage
This is the simplest example where you provide the true labels and the model's predicted labels.
from sklearn.metrics import accuracy_score
# The actual, correct labels
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
# The labels predicted by our model
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]
# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f"True Labels: {y_true}")
print(f"Predicted Labels: {y_pred}")
print(f"Accuracy: {accuracy:.2f}") # Format to 2 decimal places
# Output: Accuracy: 0.80
Explanation:

- Total predictions = 10
- Correct predictions = 8 (indices 0, 1, 2, 4, 5, 7, 8, 9)
- Incorrect predictions = 2 (indices 3 and 6)
- Accuracy = 8 / 10 = 0.80 or 80%.
Example 2: With a Real Model (e.g., Logistic Regression)
This is a more realistic workflow where you train a model and then evaluate it.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# 1. Generate a sample dataset
X, y = make_classification(
n_samples=1000, # 1000 data points
n_features=20, # 20 features
n_informative=10, # 10 useful features
n_redundant=5, # 5 redundant features
n_classes=2, # 2 classes (0 and 1)
random_state=42 # for reproducibility
)
# 2. Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 3. Initialize and train a model
model = LogisticRegression()
model.fit(X_train, y_train)
# 4. Make predictions on the test set
y_pred = model.predict(X_test)
# 5. Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of the Logistic Regression model: {accuracy:.4f}")
# Output might be: Accuracy of the Logistic Regression model: 0.8567
Important Considerations and Limitations
While accuracy is easy to understand, it can be misleading in certain situations. Here’s when you should be cautious:
Imbalanced Datasets
This is the biggest drawback of accuracy. If one class is much more common than the other, a model can achieve high accuracy simply by always predicting the majority class.
Example: Imagine a medical test for a rare disease that affects only 1% of the population.
import numpy as np
from sklearn.metrics import accuracy_score
# True labels: 99 are healthy (0), 1 is sick (1)
y_true = np.array([0] * 99 + [1] * 1)
# A "dumb" model that always predicts "healthy" (0)
y_pred = np.array([0] * 100)
# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f"Accuracy of the 'always predict healthy' model: {accuracy:.2f}")
# Output: Accuracy of the 'always predict healthy' model: 0.99
Conclusion: The model has 99% accuracy, but it's completely useless because it failed to identify the single sick person. In this case, you should use other metrics like Precision, Recall, F1-Score, or the AUC-ROC curve.
Multi-Class Classification
accuracy_score works perfectly for multi-class problems as well. It still just calculates the total number of correct predictions over the total number of predictions.
from sklearn.metrics import accuracy_score
y_true = ['cat', 'dog', 'bird', 'cat', 'dog']
y_pred = ['cat', 'dog', 'cat', 'cat', 'dog']
accuracy = accuracy_score(y_true, y_pred)
print(f"Accuracy: {accuracy:.2f}")
# Output: Accuracy: 0.80 (4 out of 5 are correct)
Parameters of accuracy_score
The function has a few useful parameters:
-
y_true: 1D array-like. The ground truth (correct) labels. -
y_pred: 1D array-like. The predicted labels returned by the classifier. -
normalize(Default:True):- If
True(default), returns the fraction of correct predictions (a float between 0.0 and 1.0). - If
False, returns the number of correct predictions (an integer).y_true = [1, 0, 1, 1, 0] y_pred = [1, 0, 1, 0, 0]
print(accuracy_score(y_true, y_pred, normalize=True)) # Output: 0.8 print(accuracy_score(y_true, y_pred, normalize=False)) # Output: 4
- If
-
sample_weight: 1D array-like. Weights for each sample. This allows you to give more importance to certain predictions when calculating the score.
When to Use accuracy_score
- Your classes are balanced: When each class has a similar number of samples.
- You need a simple, intuitive metric: It's the easiest way to explain model performance to a non-technical audience.
- All types of errors are equally important: When misclassifying a "cat" as a "dog" is just as bad as misclassifying a "dog" as a "cat".
When to Avoid accuracy_score
- Your classes are imbalanced: As shown in the medical example, it can be very deceptive.
- The cost of false positives and false negatives is different: For example, in spam detection, a false positive (marking a real email as spam) is usually much worse than a false negative (letting a spam email into the inbox). Here, Precision and Recall are more appropriate.
