杰瑞科技汇

Python collections模块有哪些核心工具类?

Of course! Let's dive deep into Python's collections module.

Python collections模块有哪些核心工具类?-图1
(图片来源网络,侵删)

What is the collections Module?

In simple terms, the collections module is a built-in Python library that provides specialized container datatypes, in addition to the standard built-in types like list, dict, tuple, and set.

Think of it as a "toolkit for data structures." It offers high-performance alternatives and specialized containers that are not available in the core Python language. These are often optimized for specific use cases, making your code more efficient, readable, and robust.

To use it, you simply need to import it:

import collections

The Most Important Container: namedtuple

This is one of the most popular and useful tools in the module. A namedtuple is a tuple with named fields, making it much more readable and self-documenting than a regular, index-based tuple.

Python collections模块有哪些核心工具类?-图2
(图片来源网络,侵删)

Problem with regular tuples:

# A regular tuple
person_data = ("John", 30, "Engineer")
# How do you know what "30" represents? You have to remember the index.
print(person_data[0]) # John
print(person_data[1]) # 30... but what is this? Age? ID?
print(person_data[2]) # Engineer

Solution with namedtuple:

from collections import namedtuple
# 1. Define the "blueprint" for the tuple
Person = namedtuple('Person', ['name', 'age', 'profession'])
# 2. Create an instance (it's still a tuple, but with names!)
person1 = Person(name="John", age=30, profession="Engineer")
person2 = Person("Jane", 28, "Data Scientist") # Positional arguments also work
# 3. Access data using clear, readable attribute names
print(person1.name)
# Output: John
print(person2.profession)
# Output: Data Scientist
# You can still access by index (it's a tuple after all)
print(person1[1])
# Output: 30
# You can get a list of field names
print(Person._fields)
# Output: ('name', 'age', 'profession')

Use Case: Perfect for storing records or rows of data where you want the structure and immutability of a tuple but with the clarity of a dictionary.


Dictionary-like Containers

These are alternatives to the standard dict and defaultdict.

Python collections模块有哪些核心工具类?-图3
(图片来源网络,侵删)

defaultdict

A defaultdict is a subclass of dict that calls a factory function to supply missing values, instead of raising a KeyError.

Problem with standard dictionaries:

# Imagine counting words in a text
text = "the quick brown fox jumps over the lazy dog"
word_counts = {}
for word in text.split():
    if word not in word_counts:
        word_counts[word] = 1  # This is tedious!
    else:
        word_counts[word] += 1
print(word_counts)

Solution with defaultdict:

from collections import defaultdict
# The int() function returns 0 when called with no arguments.
# This becomes the default value for any new key.
word_counts = defaultdict(int)
for word in text.split():
    word_counts[word] += 1 # No need to check for key existence!
print(word_counts)
# Output: defaultdict(<class 'int'>, {'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 1})

You can use any "zero-argument" function as the default factory:

  • list: Creates an empty list [] for a new key.
  • set: Creates an empty set set() for a new key.
  • lambda: 0: A custom function that returns 0.

Use Case: Counting frequencies, grouping items into lists, or any situation where you want to initialize a new key with a default value automatically.

Counter

The Counter is a specialized dictionary subclass for counting hashable objects. It's a subclass of dict, so it behaves like one, but with extra features.

from collections import Counter
# You can initialize it with an iterable (like a list)
letters = 'abracadabra'
letter_counts = Counter(letters)
print(letter_counts)
# Output: Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
# It behaves like a dictionary
print(letter_counts['a'])
# Output: 5
# It has useful methods
print(letter_counts.most_common(2))
# Output: [('a', 5), ('b', 2)] # Get the n most common elements and their counts
# You can perform arithmetic
other_letters = Counter('mississippi')
print(letter_counts + other_letters)
# Output: Counter({'a': 5, 'i': 4, 's': 4, 'b': 2, 'p': 2, 'r': 2, 'c': 1, 'd': 1, 'm': 1})

Use Case: Quickly tallying frequencies, finding the most common items in a dataset.

OrderedDict

In Python 3.7+, standard dictionaries remember the order in which items were inserted. Before that, you needed an OrderedDict to guarantee this behavior. While less critical in modern Python, it's still useful if you need to be explicit about ordering or need features like move_to_end().

from collections import OrderedDict
# A regular dict also preserves order in Python 3.7+
# But let's use OrderedDict for clarity and its extra features
d = OrderedDict()
d['a'] = 1
d['b'] = 2
d['c'] = 3
print(d)
# Output: OrderedDict([('a', 1), ('b', 2), ('c', 3)])
# Useful method: Move an item to the end
d.move_to_end('a')
print(d)
# Output: OrderedDict([('b', 2), ('c', 3), ('a', 1)])
# Useful method: Move an item to the front
d.move_to_end('c', last=False)
print(d)
# Output: OrderedDict([('c', 3), ('b', 2), ('a', 1)])

Use Case: Implementing LRU (Least Recently Used) caches, or when you need explicit control over item ordering.


List-like Containers

deque (Double-Ended Queue)

A deque is a generalization of a stack and queue. It's a list-like container optimized for fast appends and pops from both ends.

Problem with list:

  • list.append() and list.pop() from the end are very fast (O(1) time complexity).
  • list.pop(0) (from the beginning) is very slow (O(n) time complexity) because all other elements must be shifted.

Solution with deque:

from collections import deque
# Create a deque
d = deque(['a', 'b', 'c'])
print(d)
# Output: deque(['a', 'b', 'c'])
# Add to the right (end)
d.append('d')
print(d)
# Output: deque(['a', 'b', 'c', 'd'])
# Add to the left (beginning)
d.appendleft('z')
print(d)
# Output: deque(['z', 'a', 'b', 'c', 'd'])
# Pop from the right (end)
print(d.pop())
# Output: 'd'
# Pop from the left (beginning)
print(d.popleft())
# Output: 'z'
print(d)
# Output: deque(['a', 'b', 'c'])

Use Case: Implementing queues (like for task scheduling), breadth-first search (BFS) algorithms, or any scenario where you need to frequently add/remove items from both ends of a sequence.

ChainMap

A ChainMap groups multiple dictionaries or mappings into a single, updateable view. It's like stacking several dictionaries on top of each other.

from collections import ChainMap
# Create some dictionaries
dict1 = {'a': 1, 'b': 2}
dict2 = {'c': 3, 'd': 4}
dict3 = {'b': 20, 'e': 5} # Note the overlapping key 'b'
# Chain them together
combined = ChainMap(dict1, dict2, dict3)
print(combined)
# Output: ChainMap({'a': 1, 'b': 2}, {'c': 3, 'd': 4}, {'b': 20, 'e': 5})
# Look up a key. It searches the dictionaries in order.
print(combined['b'])
# Output: 2  # It found 'b' in the first dictionary in the chain
print(combined['e'])
# Output: 5 # It found 'e' in the last dictionary
# You can add a new "context" layer
combined_new = combined.new_child({'f': 99})
print(combined_new['f'])
# Output: 99

Use Case: Managing multiple configuration sources (e.g., default settings, user settings, command-line arguments), where you want to look up a value across all sources without merging them.


Other Containers

  • UserDict, UserList, UserTuple: These are classes that act as wrappers around the built-in dict, list, and tuple types. They are designed to be subclassed. The main advantage is that you can override methods without having to worry about the complexities of the underlying C implementation of the built-in types. For example, you can subclass UserList to create a custom list-like object with your own methods.

Summary Table

Container Type Key Feature Best For
namedtuple Tuple-like Immutable tuples with named fields for readability. Storing structured data records (e.g., database rows, CSV entries).
defaultdict Dict-like Automatically creates a default value for a missing key. Counting frequencies, grouping items into lists.
Counter Dict-like Specialized for counting hashable objects. Tallying frequencies, finding the most common items.
OrderedDict Dict-like Remembers insertion order and has move_to_end() method. LRU caches, explicit control over key order.
deque List-like Fast appends and pops from both ends. Queues, stacks, and algorithms requiring two-ended operations (like BFS).
ChainMap Dict-like Groups multiple dictionaries into a single, searchable view. Managing layered configurations.
UserDict Wrapper A class to subclass for creating custom dictionary-like objects. When you need to deeply customize the behavior of a dictionary.

Conclusion

The collections module is a testament to Python's philosophy of providing powerful, expressive tools for common programming tasks. By understanding these specialized containers, you can write code that is not only more concise and readable but also more performant and robust for specific problems. It's an essential part of any Python programmer's toolkit.

分享:
扫描分享到社交APP
上一篇
下一篇