Python value_counts()如何统计频次？-杰瑞科技汇

Of course! In Python, "value counts" most commonly refers to the value_counts() method, which is a powerful and essential tool for data analysis, especially when working with tabular data.

（图片来源网络，侵删）

Let's break it down.

The Short Answer: `value_counts()`

The value_counts() method is a built-in function for Pandas Series objects. Its primary job is to count the occurrences of each unique value in a Series and return the results in a new Series, sorted in descending order.

Basic Usage with Pandas

First, you need to have the Pandas library installed. If you don't, you can install it via pip:

pip install pandas

Now, let's look at how it works.

（图片来源网络，侵删）

Example 1: Simple List of Values

Imagine you have a list of survey responses and you want to know how many people chose each option.

import pandas as pd
# Create a pandas Series from a list of categories
data = ['Red', 'Blue', 'Red', 'Green', 'Blue', 'Blue', 'Red']
colors = pd.Series(data)
# Use value_counts() to get the frequency of each color
color_counts = colors.value_counts()
print("Original Data:")
print(colors)
print("\nValue Counts:")
print(color_counts)

Output:

Original Data:
0      Red
1     Blue
2      Red
3    Green
4     Blue
5     Blue
6      Red
dtype: object
Value Counts:
Blue     3
Red      3
Green    1
dtype: int64

As you can see, value_counts() automatically counted the occurrences and sorted them from most frequent (Blue and Red, both with 3) to least frequent (Green with 1).

Example 2: Real-world Scenario (DataFrame Column)

value_counts() is most often used on a single column of a DataFrame. Let's create a sample DataFrame.

（图片来源网络，侵删）

import pandas as pd
import numpy as np
# Create a sample DataFrame
data = {
    'City': ['New York', 'London', 'Paris', 'New York', 'London', 'Tokyo', 'Paris', 'New York'],
    'Age': [25, 30, 25, 40, 30, 35, 25, np.nan], # Including a NaN value
    'Product': ['A', 'B', 'A', 'A', 'C', 'B', 'A', 'C']
}
df = pd.DataFrame(data)
# Get the value counts for the 'City' column
city_counts = df['City'].value_counts()
print("Original DataFrame:")
print(df)
print("\nValue Counts for 'City':")
print(city_counts)

Output:

Original DataFrame:
     City   Age Product
0  New York  25.0       A
1    London  30.0       B
2     Paris  25.0       A
3  New York  40.0       A
4    London  30.0       C
5     Tokyo  35.0       B
6     Paris   NaN       A
7  New York   NaN       C
Value Counts for 'City':
New York    3
London      2
Paris       2
Tokyo       1
Name: City, dtype: int64

Notice how New York appears 3 times, London and Paris appear twice, and Tokyo appears once.

Key Parameters of `value_counts()`

The real power of value_counts() comes from its parameters.

`normalize: bool` (Default: `False`)

If set to True, it returns the relative frequencies (proportions or percentages) instead of counts.

# Get the proportion of each city
city_proportions = df['City'].value_counts(normalize=True)
print("City Proportions:")
print(city_proportions)

Output:

City Proportions:
New York    0.375000
London      0.250000
Paris       0.250000
Tokyo       0.125000
Name: City, dtype: float64

(Note: 3/8 = 0.375, 2/8 = 0.25, etc.)

`sort: bool` (Default: `True`)

By default, it's sorted. You can set sort=False to get the counts in the order of appearance.

# Get counts in the order they first appear in the data
city_counts_unsorted = df['City'].value_counts(sort=False)
print("Unsorted City Counts:")
print(city_counts_unsorted)

Output:

Unsorted City Counts:
New York    3
London      2
Paris       2
Tokyo       1
Name: City, dtype: int64

(This might look the same in this specific case, but if the data was ['London', 'Paris', 'London'], the unsorted output would be London: 2, Paris: 1.)

`ascending: bool` (Default: `False`)

Controls the sort order. False is descending (default), True is ascending.

# Get counts sorted from least to most common
city_counts_ascending = df['City'].value_counts(ascending=True)
print("Ascending City Counts:")
print(city_counts_ascending)

Output:

Ascending City Counts:
Tokyo       1
London      2
Paris       2
New York    3
Name: City, dtype: int64

`dropna: bool` (Default: `True`)

By default, NaN (Not a Number) values are excluded. You can set dropna=False to include them in the count.

# Get counts including NaN values
age_counts_with_nan = df['Age'].value_counts(dropna=False)
print("Age Counts (with NaN):")
print(age_counts_with_nan)

Output:

Age Counts (with NaN):
25.0    3
30.0    2
NaN     2
40.0    1
35.0    1
Name: Age, dtype: int64

`bins: int` or `sequence of scalars`

This is a very useful feature for converting numerical data into categorical bins and then counting the occurrences in each bin. This is called binning or discretization.

# Create a Series with numerical data
ages = pd.Series([18, 22, 25, 27, 30, 35, 40, 45, 50, 55])
# Count how many people fall into age groups (bins)
# We'll create 3 bins: 18-30, 31-40, 41-55
age_group_counts = ages.value_counts(bins=3)
print("Age Group Counts:")
print(age_group_counts)

Output:

Age Group Counts:
(17.967, 36.333]    7
(36.333, 54.667]    2
(54.667, 73.0]      1
Name: Age, dtype: int64

Pandas automatically calculated the bin edges. You can also define your own bins:

my_bins = [0, 25, 35, 100]
age_group_counts_custom = ages.value_counts(bins=my_bins)
print("\nCustom Age Group Counts:")
print(age_group_counts_custom)

Output:

Custom Age Group Counts:
(0.0, 25.0]     3
(25.0, 35.0]    3
(35.0, 100.0]   4
Name: Age, dtype: int64

Alternatives (Without Pandas)

While value_counts() is the standard, it's good to know how to do it with Python's built-in tools.

Using `collections.Counter`

The Counter class from the collections module is designed for exactly this purpose.

from collections import Counter
data = ['Red', 'Blue', 'Red', 'Green', 'Blue', 'Blue', 'Red']
# Counter returns a dictionary-like object
counts = Counter(data)
print(counts)
print(f"Most common: {counts.most_common()}") # .most_common() sorts for you

Output:

Counter({'Blue': 3, 'Red': 3, 'Green': 1})
Most common: [('Blue', 3), ('Red', 3), ('Green', 1)]

Using Pure Python (Dictionary)

You can also achieve this with a simple loop and a dictionary.

data = ['Red', 'Blue', 'Red', 'Green', 'Blue', 'Blue', 'Red']
counts = {}
for item in data:
    counts[item] = counts.get(item, 0) + 1
print(counts)

Output:

{'Red': 3, 'Blue': 3, 'Green': 1}

Summary: When to Use What?

Method	Best For	Pros	Cons
`pandas.Series.value_counts()`	Data analysis on tabular data (DataFrames/Series).	Extremely fast, integrates with the rest of the Pandas ecosystem, powerful parameters (`normalize`, `bins`, `dropna`), returns a Series which is easy to work with.	Requires the Pandas library.
`collections.Counter`	Counting items in any iterable (lists, tuples, etc.).	Part of the standard library, very fast, simple and intuitive API (`.most_common()`).	Less flexible than Pandas for data analysis tasks (e.g., no direct `bins` parameter).
Pure Python Dictionary	Simple counting or when you can't use external libraries.	No dependencies, great for learning the logic.	More verbose, slower for large datasets, requires manual implementation of sorting.

For any serious data manipulation in Python, pandas.value_counts() is the undisputed champion and the tool you should be using.

Python value_counts()如何统计频次？

The Short Answer: `value_counts()`

Basic Usage with Pandas

Example 1: Simple List of Values

Example 2: Real-world Scenario (DataFrame Column)

Key Parameters of `value_counts()`

`normalize: bool` (Default: `False`)

`sort: bool` (Default: `True`)

`ascending: bool` (Default: `False`)

`dropna: bool` (Default: `True`)

`bins: int` or `sequence of scalars`

Alternatives (Without Pandas)

Using `collections.Counter`

Using Pure Python (Dictionary)

Summary: When to Use What?

99ANYc3cd6

Java如何发布WebService？

Java如何调用JavaScript方法？

Multisim 11教程从哪开始学？基础操作怎么掌握？

ckfinder for java如何快速集成使用？

JS如何高效调用Java方法？

androidkiller安装教程

Adobe Audition安装教程在哪找？

Java 16进制字符串如何转10进制字符串？

Adobe Audition录音教程从哪开始学？

Java Web应用如何高效开发与部署？

Ubuntu 12.04如何安装Java？

OpenCV Python示例如何快速入门？

Verilog数字系统设计教程第二版，核心难点在哪？

Oracle for Linux教程，如何快速上手？

Win10装Win8系统，详细教程步骤是怎样的？

GarageBand iPad教程，新手如何快速上手创作？

Python value_counts()如何统计频次？

The Short Answer: value_counts()

Basic Usage with Pandas

Example 1: Simple List of Values

Example 2: Real-world Scenario (DataFrame Column)

Key Parameters of value_counts()

normalize: bool (Default: False)

sort: bool (Default: True)

ascending: bool (Default: False)

dropna: bool (Default: True)

bins: int or sequence of scalars

Alternatives (Without Pandas)

Using collections.Counter

Using Pure Python (Dictionary)

Summary: When to Use What?

相关推荐

androidkiller安装教程

The Short Answer: `value_counts()`

Key Parameters of `value_counts()`

`normalize: bool` (Default: `False`)

`sort: bool` (Default: `True`)

`ascending: bool` (Default: `False`)

`dropna: bool` (Default: `True`)

`bins: int` or `sequence of scalars`

Using `collections.Counter`