杰瑞科技汇

Python random seed 如何固定随机数?

Of course! Let's break down Python's random module and the concept of a "seed" in detail.

Python random seed 如何固定随机数?-图1
(图片来源网络,侵删)

What is a Seed?

Think of a computer's random number generator (RNG) not as a true source of randomness, but as a shuffling machine.

  • A seed is the initial "starting point" or "shuffle order" you give to this machine.
  • If you start with the same seed, the machine will produce the exact same sequence of "shuffles" (random numbers) every single time.
  • If you start with a different seed, you'll get a completely different sequence.

This is called a pseudo-random number generator (PRNG). The numbers appear random, but they are actually generated by a deterministic algorithm. The seed is the secret ingredient that determines the entire sequence.


Why is Seeding Useful?

This "reproducibility" is the primary reason for using a seed. It's incredibly useful in several scenarios:

  1. Debugging: If your code has a bug that only appears with certain random values, you can set a seed. This ensures that every time you run the code, it uses the exact same random numbers, making the bug easy to reproduce and fix.
  2. Testing: When testing a machine learning model or a statistical function, you need to ensure it works consistently. By setting a seed, you can guarantee that your test data (e.g., randomly shuffled data) is the same every time, leading to reliable and repeatable test results.
  3. Sharing Results: If you share your code with a colleague or publish it, setting a seed allows others to run your code and get the exact same results you did, which is crucial for scientific integrity and collaboration.

How to Use a Seed in Python

The most common way to set a seed is by using the random.seed() function from Python's built-in random module.

Python random seed 如何固定随机数?-图2
(图片来源网络,侵删)

Example 1: Basic Reproducibility

Let's generate three random numbers twice. First, without a seed, and then with a seed.

Without a Seed (Different results each time):

import random
print("Run 1 (no seed):")
print(random.random()) # A float between 0.0 and 1.0
print(random.randint(1, 10)) # An integer between 1 and 10
print(random.choice(['apple', 'banana', 'cherry'])) # A random choice from a list
print("\nRun 2 (no seed):")
print(random.random())
print(random.randint(1, 10))
print(random.choice(['apple', 'banana', 'cherry']))

Possible Output:

Run 1 (no seed):
0.8444218515250481
7
banana
Run 2 (no seed):
0.7579544029403025
2
apple

As you can see, the results are different.

With a Seed (Same results every time):

Now, let's set a seed before generating the numbers.

import random
# Set the seed to a specific number
random.seed(42) 
print("Run 1 (with seed 42):")
print(random.random())
print(random.randint(1, 10))
print(random.choice(['apple', 'banana', 'cherry']))
print("\nRun 2 (with seed 42):")
# Reset the seed to the same number
random.seed(42) 
print(random.random())
print(random.randint(1, 10))
print(random.choice(['apple', 'banana', 'cherry']))

Output:

Run 1 (with seed 42):
0.6394267984578837
2
banana
Run 2 (with seed 42):
0.6394267984578837
2
banana

The outputs are identical because we started with the same seed (42) in both runs.


Important Best Practices

Set the Seed Once at the Beginning

You should generally set your seed only once at the start of your script or program. If you set it multiple times, you are effectively restarting the random sequence from that point, which can lead to unexpected behavior.

import random
random.seed(123)
print(random.random()) # 0.052363598850944326
random.seed(123) # Restarting the sequence
print(random.random()) # 0.052363598850944326 (Same number again)
random.seed(456)
print(random.random()) # A different number from the 456 sequence

When NOT to Set a Seed

You should not set a seed in production applications that require true unpredictability, such as:

  • Generating security tokens or passwords.
  • Creating unique IDs.
  • Anywhere in a web application where each user needs a unique, unpredictable experience.

Setting a seed in these scenarios would create a major security vulnerability.


random.seed() vs. numpy.random.seed()

When you start working with data science and machine learning, you'll likely use the NumPy library, which has its own random number generator. It's crucial to understand the difference.

  • random.seed(): Affects only the built-in random module.
  • numpy.random.seed(): Affects only NumPy's random number generator.

If you are using both in the same script, you need to seed both to ensure full reproducibility.

import random
import numpy as np
# Seed both for full reproducibility
random.seed(42)
np.random.seed(42)
# This number comes from Python's random
print(random.random()) 
# This number comes from NumPy's random
print(np.random.rand()) 

Modern Alternatives: random and numpy Generators

Since Python 3.6 and NumPy 1.17, a more powerful and flexible system has been introduced called Generator objects. Instead of setting a global seed, you create a specific generator instance and use that.

Advantages of Generators:

  • No global state: You can have multiple independent generators in your code, each with its own seed.
  • Better performance: They are generally faster.
  • More features: They offer additional distributions and functions.

Example with Python's random Generator

import random
# Create a generator with a specific seed
rg = random.Random(42)
# Use the generator's methods
print(rg.random()) # 0.6394267984578837
print(rg.randint(1, 10)) # 2
# Create another independent generator
rg2 = random.Random(123)
print(rg2.random()) # 0.052363598850944326 (different from rg)

Example with NumPy's Generator

import numpy as np
# Create a generator with a specific seed
rng = np.random.default_rng(42)
# Use the generator's methods
print(rng.random()) # 0.7739587718265961
print(rng.integers(1, 11)) # 11 (NumPy's upper bound is exclusive)
# Create another independent generator
rng2 = np.random.default_rng(123)
print(rng2.random()) # 0.6964691855977493

Summary

Feature random.seed() random.Random(seed) numpy.random.seed() numpy.random.default_rng(seed)
Purpose Sets the global seed for the random module. Creates a local, independent generator. Sets the global seed for NumPy's random module. Creates a local, independent generator (modern approach).
Scope Global. Affects all calls to random. Local. Only affects methods on that specific generator object. Global. Affects all calls to numpy.random. Local. Only affects methods on that specific generator object.
When to Use Simple scripts, quick debugging, or when you need to affect the entire random module. Recommended. When you need multiple independent streams of randomness or want to avoid global state. Necessary for older NumPy code (< 1.17) or when you need to affect the entire numpy.random module. Recommended. The modern, flexible, and efficient way to handle randomness in NumPy.
分享:
扫描分享到社交APP
上一篇
下一篇