杰瑞科技汇

Python defaultdict如何自动初始化默认值?

Of course! Let's dive deep into collections.defaultdict in Python.

Python defaultdict如何自动初始化默认值?-图1
(图片来源网络,侵删)

What is a defaultdict?

A defaultdict is a subclass of Python's built-in dict. It behaves almost exactly like a regular dictionary, but with one key difference: it never raises a KeyError when you try to access or modify a key that doesn't exist.

Instead, it automatically creates a default value for that key the first time it's accessed. This is incredibly useful for avoiding repetitive if key in dict: or try...except KeyError: blocks.


The Problem: Why Do We Need defaultdict?

Imagine you want to count the frequency of each word in a sentence. With a standard dictionary, you might write code like this:

# The "manual" way with a regular dict
text = "the quick brown fox jumps over the lazy dog"
word_counts = {}
for word in text.split():
    if word in word_counts:
        word_counts[word] += 1
    else:
        word_counts[word] = 1
print(word_counts)
# Output: {'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 1}

This works, but it's a bit clunky. You have to check for the key's existence every time. A try...except block is another common but verbose pattern.

Python defaultdict如何自动初始化默认值?-图2
(图片来源网络,侵删)

The Solution: Using defaultdict

Now, let's solve the same problem using defaultdict. The magic happens when you initialize it. You provide a "factory function" that will be called to create a default value whenever a new key is accessed.

The most common factory is list, which creates an empty list []. Another is int, which creates the integer 0.

Example 1: Counting Words (using int)

We can initialize our defaultdict with int. When a new key is accessed, int() is called, which returns 0.

from collections import defaultdict
text = "the quick brown fox jumps over the lazy dog"
# Initialize defaultdict with int, which returns 0 for new keys
word_counts = defaultdict(int)
for word in text.split():
    # If 'word' is not in word_counts, it's automatically added with a value of 0.
    # Then, we add 1 to it. No 'if' or 'try' needed!
    word_counts[word] += 1
print(word_counts)
# Output: defaultdict(<class 'int'>, {'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 1})
# You can access it just like a regular dict
print(word_counts['the'])
# Output: 2
# Accessing a non-existent key won't raise an error
print(word_counts['non_existent_word'])
# Output: 0

This is much cleaner and more readable!

Python defaultdict如何自动初始化默认值?-图3
(图片来源网络,侵删)

Example 2: Grouping Items (using list)

A very common use case is grouping items from a list into categories. Let's say we have a list of pets and we want to group them by their species.

from collections import defaultdict
pets = [
    {'name': 'Fido', 'species': 'dog'},
    {'name': 'Whiskers', 'species': 'cat'},
    {'name': 'Rex', 'species': 'dog'},
    {'name': 'Garfield', 'species': 'cat'},
    {'name': 'Goldie', 'species': 'fish'}
]
# Initialize defaultdict with list, which returns [] for new keys
pets_by_species = defaultdict(list)
for pet in pets:
    # Append the pet's name to the list for its species.
    # If the species key doesn't exist, an empty list is created first.
    pets_by_species[pet['species']].append(pet['name'])
print(pets_by_species)
# Output:
# defaultdict(<class 'list'>,
#             {'dog': ['Fido', 'Rex'],
#              'cat': ['Whiskers', 'Garfield'],
#              'fish': ['Goldie']})

How Does It Work? The default_factory

The default_factory is the core of defaultdict. It's stored as an attribute and is a function that takes no arguments.

  • Initialization: defaultdict(list) sets default_factory to the list function.
  • Access: When you do my_dict['new_key'], defaultdict checks if 'new_key' exists.
    • If it exists, it returns the associated value.
    • If it does not exist, it calls default_factory() (e.g., list()), assigns the result (an empty list []) to 'new_key', and then returns that new value.

defaultdict vs. dict.setdefault()

You might be familiar with the dict.setdefault() method, which also provides a way to handle missing keys. Let's compare it to our first example.

Using setdefault()

text = "the quick brown fox jumps over the lazy dog"
word_counts = {}
for word in text.split():
    # setdefault returns the value for the key if it exists.
    # If the key doesn't exist, it sets the key to the default value
    # and returns that default value.
    word_counts.setdefault(word, 0) += 1
print(word_counts)
# Output: {'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 1}

Comparison:

Feature defaultdict dict.setdefault()
Readability Excellent. The logic is clear: counts[word] += 1. Good, but slightly more verbose. The setdefault call is mixed with the update logic.
Performance Faster. The key lookup happens only once. Slower. The key is looked up twice: once by setdefault and once by the operator.
Best For When you are frequently accessing and mutating keys that may not exist. When you need to set a default value only once for a specific key and then perform other operations.

Conclusion: For the common use cases of counting and grouping, defaultdict is generally preferred for its superior performance and cleaner syntax.


Common Factory Functions

Here are the most common factory functions you'll use with defaultdict:

Factory Default Value Use Case
int 0 Counting, summing, or accumulating numerical values.
list [] (empty list) Grouping items into lists.
set set() (empty set) Grouping unique items.
dict (empty dict) Creating nested dictionaries.
lambda x: None None Any situation where a simple None placeholder is sufficient.

Example: Grouping Unique Items with set

from collections import defaultdict
data = ['a', 'b', 'a', 'c', 'b', 'a', 'd']
# Use set to automatically collect unique items
unique_items = defaultdict(set)
for item in data:
    unique_items[item].add(item) # Or just: unique_items[item] = item
print(unique_items)
# Output: defaultdict(<class 'set'>, {'a': {'a'}, 'b': {'b'}, 'c': {'c'}, 'd': {'d'}})

Important Caveats

  1. The default_factory is only for missing keys. It is not called if the key exists and its value is None, 0, [], or any other "falsy" value.

    dd = defaultdict(list)
    dd['existing_key'] = [] # The key exists
    dd['existing_key'].append('item') # This works fine
    print(dd)
    # Output: defaultdict(<class 'list'>, {'existing_key': ['item']})
  2. defaultdict can still have missing keys. Just because it creates a value on access doesn't mean all keys are pre-populated. It only creates the value when you try to access it.

    dd = defaultdict(int)
    # The key 'missing' does not exist in the dictionary yet.
    print('missing' in dd) # False
    print(dd['missing']) # This access creates the key with value 0
    print('missing' in dd) # Now it's True
  3. Be careful with mutable defaults. This is a classic Python pitfall that also applies to defaultdict. If you use a mutable object like list or dict as a default, it can lead to unexpected behavior if you're not careful.

    # DANGEROUS: Using a mutable default argument in a function
    def add_to_list(key, item, my_dict=defaultdict(list)):
        my_dict[key].append(item)
        return my_dict
    d1 = add_to_list('a', 1)
    d2 = add_to_list('b', 2)
    # The default dictionary is shared between calls!
    print(d2) # You might expect {'b': [2]}
    # Output: defaultdict(<class 'list'>, {'a': [1], 'b': [2]})

    The same applies if you initialize a defaultdict with a lambda that returns the same mutable object. For grouping, this is usually what you want, but it's good to be aware of.

Summary

  • What it is: A dict subclass that provides a default value for missing keys.
  • Why use it: To write cleaner, more readable, and more performant code when dealing with keys that may not exist, especially for counting and grouping.
  • How it works: You initialize it with a default_factory function (like int, list, or set). This function is called to create a default value when a new key is first accessed.
  • Key Takeaway: If you find yourself writing if key in my_dict: or my_dict.setdefault(key, default), you should probably be using a defaultdict.
分享:
扫描分享到社交APP
上一篇
下一篇