Python多项式分布如何实现？-杰瑞科技汇

Of course! The term "multinomial" in Python can refer to two main things:

The Multinomial Distribution: A fundamental concept in probability theory, which is an extension of the binomial distribution for scenarios with more than two outcomes (e.g., rolling a 6-sided die).
The Multinomial Naive Bayes Classifier: A popular machine learning algorithm used for text classification and other tasks where features represent counts or frequencies.

Let's break down both.

The Multinomial Distribution (from `scipy.stats`)

The multinomial distribution models the probability of counts for each possible outcome when you draw from a categorical distribution n times with replacement.

Key Parameters:

n: The total number of trials (e.g., the number of times you roll the die).
p: A list of probabilities for each of the k possible outcomes. The probabilities must sum to 1.
k: The number of possible outcomes (this is often inferred from the length of p).

Use Case: You want to know the probability of getting a specific count for each face of a die when you roll it 20 times.

Example: Rolling a Loaded Die

Imagine a 4-sided die with the following probabilities:

Face 1: 10% chance (p=0.1)
Face 2: 20% chance (p=0.2)
Face 3: 30% chance (p=0.3)
Face 4: 40% chance (p=0.4)

We want to find the probability of rolling the die 10 times and getting:

Face 1: 1 time
Face 2: 2 times
Face 3: 3 times
Face 4: 4 times

(Note: 1 + 2 + 3 + 4 = 10, which matches our number of trials, n=10).

import numpy as np
from scipy.stats import multinomial
# 1. Define the parameters
n_trials = 10  # The total number of die rolls
probabilities = [0.1, 0.2, 0.3, 0.4] # p-values for each face (must sum to 1)
# 2. Define the specific outcome we're interested in
# The counts for each face in the order of the probabilities list
outcome_counts = [1, 2, 3, 4]
# 3. Calculate the probability of this exact outcome
# The .pmf() method calculates the Probability Mass Function
probability = multinomial.pmf(outcome_counts, n=n_trials, p=probabilities)
print(f"The probability of the outcome {outcome_counts} is: {probability:.6f}")
# You can also generate random samples from the distribution
# This simulates rolling the die 10 times, 5 different times.
samples = multinomial.rvs(n=n_trials, p=probabilities, size=5)
print("\n5 random samples (each sample is a list of counts for the 4 faces):")
print(samples)

Output:

The probability of the outcome [1, 2, 3, 4] is: 0.000810
5 random samples (each sample is a list of counts for the 4 faces):
[[2 3 2 3]
 [1 1 5 3]
 [0 4 3 3]
 [2 2 2 4]
 [1 2 2 5]]

The Multinomial Naive Bayes Classifier (from `sklearn`)

This is a classification algorithm based on Bayes' theorem. It's "naive" because it assumes that all features are independent of each other, which is often not true in practice but works surprisingly well, especially for text data.

Why "Multinomial"? It's called "Multinomial" because it is specifically designed for features that are counts or frequencies. For example:

Text Classification: Each feature can represent the count of a particular word in a document.
Image Classification: Each feature can represent the count of a particular pixel intensity.

Example: Document Classification (Spam vs. Ham)

Let's classify text messages as "spam" or "ham" (not spam).

Step 1: Setup and Data Preparation We'll use CountVectorizer to convert our text documents into a matrix of token counts. This is the "multinomial" part of the features.

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
# 1. Sample data: Text messages and their labels
# 'spam' = 1, 'ham' = 0
corpus = [
    'Get a free vacation now!',  # spam
    'Your package has arrived',   # ham
    'Exclusive offer just for you', # spam
    'Meeting at 3pm today',      # ham
    'Claim your prize now',      # spam
    'Lunch tomorrow?',           # ham
    'Free money, click here',    # spam
    'Thanks for your message'    # ham
]
labels = np.array([1, 0, 1, 0, 1, 0, 1, 0]) # 1 for spam, 0 for ham
# 2. Vectorize the text data into a matrix of token counts
# This creates our multinomial features
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
# The features are now counts of each word
# For example, the first document 'Get a free vacation now!' has counts for 'Get', 'a', 'free', etc.
print("Feature Matrix (X):")
print(X.toarray())
# The feature names (words)
print("\nFeature Names (vocabulary):")
print(vectorizer.get_feature_names_out())
# 3. Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.25, random_state=42)

Step 2: Train the Multinomial Naive Bayes Model

# 4. Initialize and train the Multinomial Naive Bayes classifier
# The model learns the probability of each word given the class (spam or ham)
model = MultinomialNB()
model.fit(X_train, y_train)
print("\nModel trained successfully!")

Step 3: Make Predictions

# 5. Make predictions on the test data
y_pred = model.predict(X_test)
# 6. Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"\nAccuracy: {accuracy:.2f}")
# The test set is small, so the report might be simple
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['Ham', 'Spam']))

Step 4: Classify New, Unseen Data

# 7. Classify a new, unseen message
new_messages = [
    "You won a free iPhone, claim now!", # Should be spam
    "Are we still on for dinner?"         # Should be ham
]
# IMPORTANT: You must use the SAME vectorizer to transform the new data
new_messages_counts = vectorizer.transform(new_messages)
# Predict the labels
predictions = model.predict(new_messages_counts)
print("\n--- Predictions for New Messages ---")
for message, prediction in zip(new_messages, predictions):
    label = "Spam" if prediction == 1 else "Ham"
    print(f"Message: '{message}' -> Predicted Label: {label}")

Output of the full example:

Feature Matrix (X):
[[0 0 0 1 0 0 0 0 1 0 0 1 0 0 1]
 [0 1 0 0 0 0 1 0 0 0 0 0 1 0 0]
 [0 0 0 1 0 1 0 0 0 0 0 0 0 1 1]
 [0 0 1 0 0 0 0 0 0 0 1 0 0 0 0]
 [0 0 0 1 0 0 0 0 0 0 0 0 0 1 1]
 [0 0 1 0 0 0 0 0 0 1 0 0 0 0 0]
 [0 0 0 1 0 0 0 1 0 0 0 0 0 1 1]
 [0 1 0 0 0 0 0 0 0 1 0 0 0 0 0]]
Feature Names (vocabulary):
['a' 'arrived' 'at' 'claim' 'click' 'dinner' 'exclusive' 'free' 'for'
 'get' 'ham' 'here' 'lunch' 'money' 'now']
Model trained successfully!
Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support
         Ham       1.00      1.00      1.00         1
        Spam       1.00      1.00      1.00         1
    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2
--- Predictions for New Messages ---
Message: 'You won a free iPhone, claim now!' -> Predicted Label: Spam
Message: 'Are we still on for dinner?' -> Predicted Label: Ham

Summary: Which One to Use?

Feature	`scipy.stats.multinomial`	`sklearn.naive_bayes.MultinomialNB`
Purpose	Probability Theory: To calculate probabilities or generate samples from a multinomial distribution.	Machine Learning: To build a classification model.
Input	`n` (trials), `p` (probabilities), `outcome_counts`.	A feature matrix `X` (counts/frequencies) and a target vector `y` (labels).
Output	A single probability value (`.pmf`) or random samples (`.rvs`).	A trained model that can predict the class of new data.
Common Use Case	Modeling experiments with multiple categorical outcomes (e.g., election results, A/B/C testing).	Text classification (spam detection, sentiment analysis), document categorization.

Python多项式分布如何实现？

The Multinomial Distribution (from `scipy.stats`)

Example: Rolling a Loaded Die

The Multinomial Naive Bayes Classifier (from `sklearn`)

Example: Document Classification (Spam vs. Ham)

Summary: Which One to Use?

99ANYc3cd6

Python sqlite3创建表怎么写？

Java对象如何转成JSON字符串？

Java中如何用正则表达式解析JSON？

Win10封装系统教程步骤详解？

Java数据结构与算法源码，如何高效学习与实战？

Java Socket编程实例具体怎么实现？

Python socket如何实现POST请求？

python windows桌面程序

java nio server

Java正则find()如何匹配所有符合条件的子串？

java setvisible

如何在Linux系统下使用Python的Scipy库？

InDesign CS教程怎么学？新手入门指南？

Java String与File如何高效交互处理？

Python如何获取文件夹下所有文件？

python返回错误提示Python

Python多项式分布如何实现？

The Multinomial Distribution (from scipy.stats)

Example: Rolling a Loaded Die

The Multinomial Naive Bayes Classifier (from sklearn)

Example: Document Classification (Spam vs. Ham)

Summary: Which One to Use?

相关推荐

Java Socket编程实例具体怎么实现？

The Multinomial Distribution (from `scipy.stats`)

The Multinomial Naive Bayes Classifier (from `sklearn`)