python sharedarray-杰瑞科技汇

Of course! Let's dive into the concept of "shared arrays" in Python.

（图片来源网络，侵删）

What is a Shared Array?

A shared array is a data structure that allows multiple independent processes to read from and write to the same block of memory. This is different from a standard Python list or NumPy array, where each process gets its own copy of the data. Copying large arrays can be very slow and memory-intensive.

The primary use case for shared arrays is inter-process communication (IPC), especially when using Python's multiprocessing module to perform parallel or concurrent computations.

The Challenge with `multiprocessing`

When you use multiprocessing.Process, each process runs in its own memory space. If you pass a large NumPy array to a child process, the entire array is copied (pickled) and sent over to the child process. This is known as "pass-by-value".

Example of the problem:

（图片来源网络，侵删）

import numpy as np
import multiprocessing as mp
def worker(arr):
    # This function modifies a COPY of the array, not the original
    arr[0] = 999
    print(f"Inside process, arr[0] is: {arr[0]}")
if __name__ == "__main__":
    # Create a large array
    large_array = np.arange(1000000)
    print(f"Initial value in main: {large_array[0]}")
    p = mp.Process(target=worker, args=(large_array,))
    p.start()
    p.join()
    print(f"Value in main after process: {large_array[0]}")

Output:

Initial value in main: 0
Inside process, arr[0] is: 999
Value in main after process: 0  # <-- The original array was NOT changed!

As you can see, the change made by the child process (999) is lost because the child worked on a copy.

Solution 1: `multiprocessing.Array` (The Built-in)

Python's multiprocessing module provides its own Array type. It's a low-level, C-style array that can be shared.

Key Features:

Low-Level: It's a C-style array, so you must specify the data type ('i' for integer, 'd' for double, 'f' for float, etc.).
Locking: It has an optional lock parameter to prevent race conditions when multiple processes try to write simultaneously.
Slicing: You can get a NumPy view of the array, which is very efficient.

How to Use It:

import numpy as np
import multiprocessing as mp
def worker(shared_array):
    # Get a NumPy view of the shared array
    arr_np = np.ctypeslib.as_array(shared_array.get_obj())
    # Now we can modify the original memory
    arr_np[0] = 999
    print(f"Inside process, arr[0] is: {arr_np[0]}")
if __name__ == "__main__":
    # 'i' for C long (typically an int)
    # 'd' for double (typically a float)
    shared_array = mp.Array('i', [0] * 5) 
    print(f"Initial values in main: {list(shared_array)}")
    p = mp.Process(target=worker, args=(shared_array,))
    p.start()
    p.join()
    print(f"Values in main after process: {list(shared_array)}")

Output:

（图片来源网络，侵删）

Initial values in main: [0, 0, 0, 0, 0]
Inside process, arr[0] is: 999
Values in main after process: [999, 0, 0, 0, 0] # <-- SUCCESS! The original was changed.

Pros and Cons of `multiprocessing.Array`

Pros:
- Built into the standard library (no extra installation needed).
- Very efficient for basic data types.
- Can be used with any language that understands C-style arrays.
Cons:
- Low-level and less flexible than NumPy.
- Requires you to manage data types manually.
- Creating a NumPy view (np.ctypeslib.as_array) adds a small overhead, but it's much better than a full copy.

Solution 2: `multiprocessing.shared_memory` (The Modern Approach)

Python 3.8 introduced the multiprocessing.shared_memory module. This is a more powerful and flexible way to share memory. It's designed to work seamlessly with NumPy arrays.

Key Features:

NumPy Native: It's the preferred way to share NumPy arrays.
Flexible: You can create a shared memory block (SharedMemory) and then attach NumPy arrays to it. You can even have multiple arrays point to the same block.
No Locking: It does not provide built-in locking. You are responsible for synchronization (e.g., using multiprocessing.Lock if needed).
Cleanup: It's crucial to explicitly close() and unlink() shared memory blocks to avoid memory leaks.

How to Use It:

import numpy as np
import multiprocessing as mp
def worker(shm_name, shape, dtype):
    # Attach to the existing shared memory block using its name
    shm = mp.shared_memory.SharedMemory(name=shm_name)
    # Create a NumPy array from the shared memory block
    shared_arr = np.ndarray(shape, dtype=dtype, buffer=shm.buf)
    # Modify the original memory
    shared_arr[0] = 999
    print(f"Inside process, arr[0] is: {shared_arr[0]}")
    # IMPORTANT: Close the shared memory connection in the child process
    shm.close()
if __name__ == "__main__":
    # Create a NumPy array in the main process
    arr = np.arange(10)
    print(f"Initial values in main: {arr}")
    # Create a shared memory block from the NumPy array
    # This copies the initial data into the shared memory
    shm = mp.shared_memory.SharedMemory(create=True, size=arr.nbytes)
    # Create a NumPy view of the shared memory
    shared_arr = np.ndarray(arr.shape, dtype=arr.dtype, buffer=shm.buf)
    # Copy the data from the original array to the shared memory view
    np.copyto(shared_arr, arr)
    # Create a process, passing the name of the shared memory block
    p = mp.Process(target=worker, args=(shm.name, arr.shape, arr.dtype))
    p.start()
    p.join()
    # The main process can now see the changes
    print(f"Values in main after process: {shared_arr}")
    # CRITICAL: Clean up the shared memory block
    shm.close()
    shm.unlink() # Release the shared memory back to the OS

Output:

Initial values in main: [0 1 2 3 4 5 6 7 8 9]
Inside process, arr[0] is: 999
Values in main after process: [999   1   2   3   4   5   6   7   8   9] # <-- SUCCESS!

Pros and Cons of `multiprocessing.shared_memory`

Pros:
- The modern, recommended way for sharing NumPy arrays.
- Extremely efficient and flexible.
- Avoids the C-style array limitations.
Cons:
- Requires manual cleanup (close() and unlink()), which can be error-prone.
- No built-in locking, requiring you to implement your own synchronization.

Solution 3: `multiprocessing.Manager` (The Easiest but Slowest)

The multiprocessing.Manager provides a way to create a server process that manages shared objects (like lists, dictionaries, and even custom classes). Other processes can use proxies to access these objects.

How to Use It:

import multiprocessing as mp
def worker(shared_list):
    # Modify the shared list through the proxy
    shared_list[0] = 999
    print(f"Inside process, list[0] is: {shared_list[0]}")
if __name__ == "__main__":
    with mp.Manager() as manager:
        # Create a shared list using the manager
        shared_list = manager.list([0] * 5)
        print(f"Initial values in main: {list(shared_list)}")
        p = mp.Process(target=worker, args=(shared_list,))
        p.start()
        p.join()
        print(f"Values in main after process: {list(shared_list)}")

Output:

Initial values in main: [0, 0, 0, 0, 0]
Inside process, list[0] is: 999
Values in main after process: [999, 0, 0, 0, 0] # <-- SUCCESS!

Pros and Cons of `multiprocessing.Manager`

Pros:
- Very easy to use with familiar Python objects (lists, dicts).
- Built-in synchronization (the manager handles locking).
Cons:
- Very slow. Every access to the shared object (e.g., shared_list[0]) involves a network-like call to the manager process. This makes it unsuitable for performance-critical, high-frequency data access.
- Can become a bottleneck if many processes are constantly accessing the shared object.

Summary and Recommendation

Method	Best For	Performance	Ease of Use	Key Considerations
`multiprocessing.Array`	Sharing basic C-style arrays, simple data types.	High	Medium	Low-level, requires type management.
`multiprocessing.shared_memory`	Sharing NumPy arrays. (Recommended for modern Python)	Very High	Medium	Manual cleanup (`unlink`) is essential. No built-in locks.
`multiprocessing.Manager`	Sharing complex Python objects (lists, dicts) with minimal code.	Low	Very High	Can be a major performance bottleneck.

Final Recommendation:

If you are using Python 3.8+ and need to share NumPy arrays, multiprocessing.shared_memory is the best and most efficient choice.
If you need to share simple, fixed-size arrays and want to avoid dependencies, multiprocessing.Array is a solid, built-in option.
If you need to share complex Python objects (like a list of dictionaries) and performance is not the primary concern, multiprocessing.Manager is the easiest to implement.

python sharedarray

What is a Shared Array?

The Challenge with `multiprocessing`

Solution 1: `multiprocessing.Array` (The Built-in)

Key Features:

How to Use It:

Pros and Cons of `multiprocessing.Array`

Solution 2: `multiprocessing.shared_memory` (The Modern Approach)

Key Features:

How to Use It:

Pros and Cons of `multiprocessing.shared_memory`

Solution 3: `multiprocessing.Manager` (The Easiest but Slowest)

How to Use It:

Pros and Cons of `multiprocessing.Manager`

Summary and Recommendation

99ANYc3cd6

Maya UV贴图怎么学？新手入门必看技巧

Java的long类型到底占几个字节？

maxdos网刻教程如何操作？

Java面试笔试宝典，核心考点有哪些？

微信Java如何下载多媒体文件？

Java Socket编程实例具体怎么实现？

Java如何将HTML转换为Word文档？

java去掉字符串前后的空格

安卓开发Java环境搭建如何配置？

十天学会CAD全套教程真的可行吗？

Metasploit如何用Python开发模块？

Java字符串长度如何计算？

如何用Python实现用户协同过滤算法？

Java、Oracle、Web三者如何协同工作？

Java编程思想第五版PDF哪里能找到？

Python关系型数据库操作该选什么库？

python sharedarray

What is a Shared Array?

The Challenge with multiprocessing

Solution 1: multiprocessing.Array (The Built-in)

Key Features:

How to Use It:

Pros and Cons of multiprocessing.Array

Solution 2: multiprocessing.shared_memory (The Modern Approach)

Key Features:

How to Use It:

Pros and Cons of multiprocessing.shared_memory

Solution 3: multiprocessing.Manager (The Easiest but Slowest)

How to Use It:

Pros and Cons of multiprocessing.Manager

Summary and Recommendation

相关推荐

Java Socket编程实例具体怎么实现？

The Challenge with `multiprocessing`

Solution 1: `multiprocessing.Array` (The Built-in)

Pros and Cons of `multiprocessing.Array`

Solution 2: `multiprocessing.shared_memory` (The Modern Approach)

Pros and Cons of `multiprocessing.shared_memory`

Solution 3: `multiprocessing.Manager` (The Easiest but Slowest)

Pros and Cons of `multiprocessing.Manager`