杰瑞科技汇

Python generators,如何高效使用与优化?

Of course! Let's dive deep into Python generators. They are a powerful and memory-efficient feature of Python.

Python generators,如何高效使用与优化?-图1
(图片来源网络,侵删)

What is a Generator?

At its core, a generator is a special type of function that returns an iterator. Instead of returning a single value and exiting, a generator yields a sequence of values. Think of it as a lazy, on-demand producer of values.

The key difference between a regular function and a generator function is the use of the yield keyword.

  • Regular Function: Uses return. When called, it runs to completion, stores its state in memory, and returns a single value. The next time you call it, it starts fresh from the top.
  • Generator Function: Uses yield. When called, it doesn't run immediately. It returns a generator object. This object is an iterator. The function's code only runs when you ask it for the next value (e.g., using next() or in a for loop).

The yield Keyword

The yield keyword is the heart of a generator. It does two things:

  1. Pauses the function's execution: It "freezes" the function at the point where yield is called.
  2. Yields a value back to the caller: It sends the value following the yield keyword to whoever is asking for it.

When the generator is asked for the next value, it resumes execution from where it left off.

Python generators,如何高效使用与优化?-图2
(图片来源网络,侵删)

Creating and Using a Generator

Example 1: The Basics

Let's create a simple generator that yields the first 5 square numbers.

def square_generator(n):
    """A generator that yields the squares of numbers from 0 to n-1."""
    print("Generator function started")
    for i in range(n):
        # 'yield' pauses the function and returns the value
        result = i * i
        print(f"Yielding {result}")
        yield result
    print("Generator function finished")
# 1. Create the generator object. The function's code has NOT run yet.
gen = square_generator(5)
# 2. Use the generator object
print(f"Generator object created: {gen}")
# 3. Get the first value using next()
#    This will cause the function to run until it hits the first 'yield'
first_value = next(gen)
print(f"First value received: {first_value}\n")
# 4. Get the next value. The function resumes from where it left off.
second_value = next(gen)
print(f"Second value received: {second_value}\n")
# 5. You can also use it in a for loop, which handles StopIteration automatically
print("Using the generator in a for loop:")
for square in gen:
    print(f"Got square: {square}")

Output:

Generator object created: <generator object square_generator at 0x...>
Generator function started
Yielding 0
First value received: 0
Yielding 1
Second value received: 1
Using the generator in a for loop:
Yielding 4
Got square: 4
Yielding 9
Got square: 9
Yielding 16
Got square: 16
Generator function finished

Notice how the function "paused" and resumed. The for loop automatically knew when to stop because the generator was exhausted.


Why Use Generators? The Big Advantage: Memory Efficiency

This is the most important reason to use generators. Imagine you want to process a very large sequence of numbers, like all numbers from 1 to a billion.

Python generators,如何高效使用与优化?-图3
(图片来源网络,侵删)

The "Bad" Way: Using a List

def create_number_list(n):
    """Creates a list of all numbers from 1 to n."""
    print("Creating list... This will use a lot of memory!")
    return [i for i in range(1, n+1)]
# This list will consume gigabytes of RAM
# huge_list = create_number_list(1_000_000_000) # This will likely crash your program

This approach is terrible for large n because it tries to store all 1 billion numbers in your computer's memory at once.

The "Good" Way: Using a Generator

def number_generator(n):
    """Yields numbers from 1 to n one by one."""
    print("Generator created. No memory used yet.")
    for i in range(1, n+1):
        yield i
# This creates a generator object. It uses almost no memory.
# It doesn't generate any numbers yet.
gen = number_generator(1_000_000_000)
# You can process one number at a time. Memory usage is constant.
# for number in gen:
#     # Do something with 'number'
#     pass
# You can get the first number without creating a huge list
first_number = next(gen)
print(f"First number: {first_number}") # Prints 1

The generator produces values on-demand. It only holds the current state of the loop in memory. This makes it perfect for handling large or infinite data streams.


Generator Expressions (A More Concise Syntax)

Python also has a syntax similar to list comprehensions for creating simple generators. It's called a generator expression.

The syntax is (expression for item in iterable).

List Comprehension (creates a list in memory):

squares_list = [x*x for x in range(10)]
print(squares_list)
# Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Generator Expression (creates a generator object):

squares_gen = (x*x for x in range(10))
print(squares_gen)
# Output: <generator object <genexpr> at 0x...>
# You can iterate over it
for square in squares_gen:
    print(square, end=" ")
# Output: 0 1 4 9 16 25 36 49 64 81

Generator expressions are a quick and readable way to create simple generators without defining a full function.


Advanced: yield from (Python 3.3+)

If you have a generator that wants to delegate part of its work to another sub-generator, you can use yield from. This makes the code cleaner and more efficient.

def sub_generator():
    yield "A"
    yield "B"
def main_generator():
    yield "Start"
    # Instead of a loop like: for val in sub_generator(): yield val
    # We can use 'yield from'
    yield from sub_generator()
    yield "End"
for item in main_generator():
    print(item)

Output:

Start
A
B
End

When to Use Generators

Generators are perfect for:

  1. Large Data Processing: Reading files line-by-line, processing massive datasets from a database, or iterating over huge lists of items without running out of memory.

  2. Infinite Sequences: You can create a generator that theoretically never ends.

    def infinite_counter():
        i = 0
        while True:
            yield i
            i += 1
    counter = infinite_counter()
    print(next(counter)) # 0
    print(next(counter)) # 1
    print(next(counter)) # 2
    # ... and so on
  3. Pipelining Data: You can chain generators together, where the output of one is the input of the next. This is a core concept in functional programming and data processing pipelines.

  4. When you only need to iterate once: If you are going to loop over a sequence of items only once, a generator is more memory-efficient than creating a list.

Summary Table: Generator vs. List

Feature Generator List
Creation Uses yield keyword in a function or (expr for item in iterable) Uses [expr for item in iterable] or list()
Memory Usage Very Low. Stores only the current state. High. Stores all elements in memory at once.
Evaluation Lazy. Values are generated on-demand. Eager. All values are computed and stored immediately.
Iteration Can only be iterated over once (unless wrapped). Can be iterated over multiple times.
Use Case Large/infinite data streams, pipelining, memory efficiency. Small datasets, when you need random access or multiple passes.
分享:
扫描分享到社交APP
上一篇
下一篇