Of course! Let's dive into the concept of "shared arrays" in Python.

What is a Shared Array?
A shared array is a data structure that allows multiple independent processes to read from and write to the same block of memory. This is different from a standard Python list or NumPy array, where each process gets its own copy of the data. Copying large arrays can be very slow and memory-intensive.
The primary use case for shared arrays is inter-process communication (IPC), especially when using Python's multiprocessing module to perform parallel or concurrent computations.
The Challenge with multiprocessing
When you use multiprocessing.Process, each process runs in its own memory space. If you pass a large NumPy array to a child process, the entire array is copied (pickled) and sent over to the child process. This is known as "pass-by-value".
Example of the problem:

import numpy as np
import multiprocessing as mp
def worker(arr):
# This function modifies a COPY of the array, not the original
arr[0] = 999
print(f"Inside process, arr[0] is: {arr[0]}")
if __name__ == "__main__":
# Create a large array
large_array = np.arange(1000000)
print(f"Initial value in main: {large_array[0]}")
p = mp.Process(target=worker, args=(large_array,))
p.start()
p.join()
print(f"Value in main after process: {large_array[0]}")
Output:
Initial value in main: 0
Inside process, arr[0] is: 999
Value in main after process: 0 # <-- The original array was NOT changed!
As you can see, the change made by the child process (999) is lost because the child worked on a copy.
Solution 1: multiprocessing.Array (The Built-in)
Python's multiprocessing module provides its own Array type. It's a low-level, C-style array that can be shared.
Key Features:
- Low-Level: It's a C-style array, so you must specify the data type (
'i'for integer,'d'for double,'f'for float, etc.). - Locking: It has an optional
lockparameter to prevent race conditions when multiple processes try to write simultaneously. - Slicing: You can get a NumPy view of the array, which is very efficient.
How to Use It:
import numpy as np
import multiprocessing as mp
def worker(shared_array):
# Get a NumPy view of the shared array
arr_np = np.ctypeslib.as_array(shared_array.get_obj())
# Now we can modify the original memory
arr_np[0] = 999
print(f"Inside process, arr[0] is: {arr_np[0]}")
if __name__ == "__main__":
# 'i' for C long (typically an int)
# 'd' for double (typically a float)
shared_array = mp.Array('i', [0] * 5)
print(f"Initial values in main: {list(shared_array)}")
p = mp.Process(target=worker, args=(shared_array,))
p.start()
p.join()
print(f"Values in main after process: {list(shared_array)}")
Output:

Initial values in main: [0, 0, 0, 0, 0]
Inside process, arr[0] is: 999
Values in main after process: [999, 0, 0, 0, 0] # <-- SUCCESS! The original was changed.
Pros and Cons of multiprocessing.Array
- Pros:
- Built into the standard library (no extra installation needed).
- Very efficient for basic data types.
- Can be used with any language that understands C-style arrays.
- Cons:
- Low-level and less flexible than NumPy.
- Requires you to manage data types manually.
- Creating a NumPy view (
np.ctypeslib.as_array) adds a small overhead, but it's much better than a full copy.
Solution 2: multiprocessing.shared_memory (The Modern Approach)
Python 3.8 introduced the multiprocessing.shared_memory module. This is a more powerful and flexible way to share memory. It's designed to work seamlessly with NumPy arrays.
Key Features:
- NumPy Native: It's the preferred way to share NumPy arrays.
- Flexible: You can create a shared memory block (
SharedMemory) and then attach NumPy arrays to it. You can even have multiple arrays point to the same block. - No Locking: It does not provide built-in locking. You are responsible for synchronization (e.g., using
multiprocessing.Lockif needed). - Cleanup: It's crucial to explicitly
close()andunlink()shared memory blocks to avoid memory leaks.
How to Use It:
import numpy as np
import multiprocessing as mp
def worker(shm_name, shape, dtype):
# Attach to the existing shared memory block using its name
shm = mp.shared_memory.SharedMemory(name=shm_name)
# Create a NumPy array from the shared memory block
shared_arr = np.ndarray(shape, dtype=dtype, buffer=shm.buf)
# Modify the original memory
shared_arr[0] = 999
print(f"Inside process, arr[0] is: {shared_arr[0]}")
# IMPORTANT: Close the shared memory connection in the child process
shm.close()
if __name__ == "__main__":
# Create a NumPy array in the main process
arr = np.arange(10)
print(f"Initial values in main: {arr}")
# Create a shared memory block from the NumPy array
# This copies the initial data into the shared memory
shm = mp.shared_memory.SharedMemory(create=True, size=arr.nbytes)
# Create a NumPy view of the shared memory
shared_arr = np.ndarray(arr.shape, dtype=arr.dtype, buffer=shm.buf)
# Copy the data from the original array to the shared memory view
np.copyto(shared_arr, arr)
# Create a process, passing the name of the shared memory block
p = mp.Process(target=worker, args=(shm.name, arr.shape, arr.dtype))
p.start()
p.join()
# The main process can now see the changes
print(f"Values in main after process: {shared_arr}")
# CRITICAL: Clean up the shared memory block
shm.close()
shm.unlink() # Release the shared memory back to the OS
Output:
Initial values in main: [0 1 2 3 4 5 6 7 8 9]
Inside process, arr[0] is: 999
Values in main after process: [999 1 2 3 4 5 6 7 8 9] # <-- SUCCESS!
Pros and Cons of multiprocessing.shared_memory
- Pros:
- The modern, recommended way for sharing NumPy arrays.
- Extremely efficient and flexible.
- Avoids the C-style array limitations.
- Cons:
- Requires manual cleanup (
close()andunlink()), which can be error-prone. - No built-in locking, requiring you to implement your own synchronization.
- Requires manual cleanup (
Solution 3: multiprocessing.Manager (The Easiest but Slowest)
The multiprocessing.Manager provides a way to create a server process that manages shared objects (like lists, dictionaries, and even custom classes). Other processes can use proxies to access these objects.
How to Use It:
import multiprocessing as mp
def worker(shared_list):
# Modify the shared list through the proxy
shared_list[0] = 999
print(f"Inside process, list[0] is: {shared_list[0]}")
if __name__ == "__main__":
with mp.Manager() as manager:
# Create a shared list using the manager
shared_list = manager.list([0] * 5)
print(f"Initial values in main: {list(shared_list)}")
p = mp.Process(target=worker, args=(shared_list,))
p.start()
p.join()
print(f"Values in main after process: {list(shared_list)}")
Output:
Initial values in main: [0, 0, 0, 0, 0]
Inside process, list[0] is: 999
Values in main after process: [999, 0, 0, 0, 0] # <-- SUCCESS!
Pros and Cons of multiprocessing.Manager
- Pros:
- Very easy to use with familiar Python objects (lists, dicts).
- Built-in synchronization (the manager handles locking).
- Cons:
- Very slow. Every access to the shared object (e.g.,
shared_list[0]) involves a network-like call to the manager process. This makes it unsuitable for performance-critical, high-frequency data access. - Can become a bottleneck if many processes are constantly accessing the shared object.
- Very slow. Every access to the shared object (e.g.,
Summary and Recommendation
| Method | Best For | Performance | Ease of Use | Key Considerations |
|---|---|---|---|---|
multiprocessing.Array |
Sharing basic C-style arrays, simple data types. | High | Medium | Low-level, requires type management. |
multiprocessing.shared_memory |
Sharing NumPy arrays. (Recommended for modern Python) | Very High | Medium | Manual cleanup (unlink) is essential. No built-in locks. |
multiprocessing.Manager |
Sharing complex Python objects (lists, dicts) with minimal code. | Low | Very High | Can be a major performance bottleneck. |
Final Recommendation:
- If you are using Python 3.8+ and need to share NumPy arrays,
multiprocessing.shared_memoryis the best and most efficient choice. - If you need to share simple, fixed-size arrays and want to avoid dependencies,
multiprocessing.Arrayis a solid, built-in option. - If you need to share complex Python objects (like a list of dictionaries) and performance is not the primary concern,
multiprocessing.Manageris the easiest to implement.
