Python中AccuracyScore如何正确计算？-杰瑞科技汇

Of course! Let's dive deep into accuracy_score from Python's popular scikit-learn library.

（图片来源网络，侵删）

What is `accuracy_score`?

accuracy_score is a function used to evaluate the performance of a classification model. It measures the fraction of correct predictions made by the model compared to the total number of predictions.

In simple terms, it answers the question: "Out of all the predictions, what percentage did the model get right?"

The Formula

The formula for accuracy is straightforward:

$$ \text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}} = \frac{TP + TN}{TP + TN + FP + FN} $$

（图片来源网络，侵删）

Where:

TP (True Positives): The model correctly predicted the positive class.
TN (True Negatives): The model correctly predicted the negative class.
FP (False Positives): The model incorrectly predicted the positive class (Type I Error).
FN (False Negatives): The model incorrectly predicted the negative class (Type II Error).

How to Use It (Code Examples)

First, you need to have scikit-learn installed. If you don't, run this in your terminal:

pip install scikit-learn

Example 1: Basic Usage

This is the simplest example where you provide the true labels and the model's predicted labels.

from sklearn.metrics import accuracy_score
# The actual, correct labels
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
# The labels predicted by our model
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]
# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f"True Labels:  {y_true}")
print(f"Predicted Labels: {y_pred}")
print(f"Accuracy: {accuracy:.2f}") # Format to 2 decimal places
# Output: Accuracy: 0.80

Explanation:

（图片来源网络，侵删）

Total predictions = 10
Correct predictions = 8 (indices 0, 1, 2, 4, 5, 7, 8, 9)
Incorrect predictions = 2 (indices 3 and 6)
Accuracy = 8 / 10 = 0.80 or 80%.

Example 2: With a Real Model (e.g., Logistic Regression)

This is a more realistic workflow where you train a model and then evaluate it.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# 1. Generate a sample dataset
X, y = make_classification(
    n_samples=1000,      # 1000 data points
    n_features=20,       # 20 features
    n_informative=10,    # 10 useful features
    n_redundant=5,       # 5 redundant features
    n_classes=2,         # 2 classes (0 and 1)
    random_state=42      # for reproducibility
)
# 2. Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 3. Initialize and train a model
model = LogisticRegression()
model.fit(X_train, y_train)
# 4. Make predictions on the test set
y_pred = model.predict(X_test)
# 5. Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of the Logistic Regression model: {accuracy:.4f}")
# Output might be: Accuracy of the Logistic Regression model: 0.8567

Important Considerations and Limitations

While accuracy is easy to understand, it can be misleading in certain situations. Here’s when you should be cautious:

Imbalanced Datasets

This is the biggest drawback of accuracy. If one class is much more common than the other, a model can achieve high accuracy simply by always predicting the majority class.

Example: Imagine a medical test for a rare disease that affects only 1% of the population.

import numpy as np
from sklearn.metrics import accuracy_score
# True labels: 99 are healthy (0), 1 is sick (1)
y_true = np.array([0] * 99 + [1] * 1)
# A "dumb" model that always predicts "healthy" (0)
y_pred = np.array([0] * 100)
# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f"Accuracy of the 'always predict healthy' model: {accuracy:.2f}")
# Output: Accuracy of the 'always predict healthy' model: 0.99

Conclusion: The model has 99% accuracy, but it's completely useless because it failed to identify the single sick person. In this case, you should use other metrics like Precision, Recall, F1-Score, or the AUC-ROC curve.

Multi-Class Classification

accuracy_score works perfectly for multi-class problems as well. It still just calculates the total number of correct predictions over the total number of predictions.

from sklearn.metrics import accuracy_score
y_true = ['cat', 'dog', 'bird', 'cat', 'dog']
y_pred = ['cat', 'dog', 'cat', 'cat', 'dog']
accuracy = accuracy_score(y_true, y_pred)
print(f"Accuracy: {accuracy:.2f}")
# Output: Accuracy: 0.80 (4 out of 5 are correct)

Parameters of `accuracy_score`

The function has a few useful parameters:

y_true: 1D array-like. The ground truth (correct) labels.
y_pred: 1D array-like. The predicted labels returned by the classifier.
normalize (Default: True):
- If True (default), returns the fraction of correct predictions (a float between 0.0 and 1.0).
- If False, returns the number of correct predictions (an integer).
```
y_true = [1, 0, 1, 1, 0]
y_pred = [1, 0, 1, 0, 0]
```
print(accuracy_score(y_true, y_pred, normalize=True)) # Output: 0.8 print(accuracy_score(y_true, y_pred, normalize=False)) # Output: 4
sample_weight: 1D array-like. Weights for each sample. This allows you to give more importance to certain predictions when calculating the score.

When to Use `accuracy_score`

Your classes are balanced: When each class has a similar number of samples.
You need a simple, intuitive metric: It's the easiest way to explain model performance to a non-technical audience.
All types of errors are equally important: When misclassifying a "cat" as a "dog" is just as bad as misclassifying a "dog" as a "cat".

When to Avoid `accuracy_score`

Your classes are imbalanced: As shown in the medical example, it can be very deceptive.
The cost of false positives and false negatives is different: For example, in spam detection, a false positive (marking a real email as spam) is usually much worse than a false negative (letting a spam email into the inbox). Here, Precision and Recall are more appropriate.

Python中AccuracyScore如何正确计算？

What is `accuracy_score`?

The Formula

How to Use It (Code Examples)

Example 1: Basic Usage

Example 2: With a Real Model (e.g., Logistic Regression)

Important Considerations and Limitations

Imbalanced Datasets

Multi-Class Classification

Parameters of `accuracy_score`

When to Use `accuracy_score`

When to Avoid `accuracy_score`

99ANYc3cd6

Python自然语言处理如何高效处理PDF文档？

Linux下如何执行Python脚本文件？

sourceinsight教程

Python jieba库具体怎么用？

matlab2025b安装教程

androidkiller安装教程

Java 16进制字符串如何转10进制字符串？

Autodesk Inventor教程从哪里开始学？

AE教程PDF哪里找？新手如何学AE？

Java KindEditor图片上传如何配置实现？

Dreamweaver视频教程哪里能下载到？

Python字典遍历有哪些高效方法？

string calendar java

Java字符串如何高效插入另一字符串？

java message service

Effective Python PDF如何高效学习？

Python中AccuracyScore如何正确计算？

What is accuracy_score?

The Formula

How to Use It (Code Examples)

Example 1: Basic Usage

Example 2: With a Real Model (e.g., Logistic Regression)

Important Considerations and Limitations

Imbalanced Datasets

Multi-Class Classification

Parameters of accuracy_score

When to Use accuracy_score

When to Avoid accuracy_score

相关推荐

androidkiller安装教程

What is `accuracy_score`?

Parameters of `accuracy_score`

When to Use `accuracy_score`

When to Avoid `accuracy_score`