杰瑞科技汇

Python bytearray如何高效处理二进制数据?

Of course! Let's dive deep into Python's bytearray type. It's a powerful and flexible tool for handling raw binary data.

Python bytearray如何高效处理二进制数据?-图1
(图片来源网络,侵删)

What is a bytearray?

A bytearray is a mutable sequence of bytes. In simpler terms, it's like a list, but instead of holding any type of object, it only holds integers in the range 0 to 255, which represent bytes.

This "mutability" is its most important feature and the key difference from its sibling, the bytes type.


Key Differences: bytes vs. bytearray

Feature bytes bytearray
Mutability Immutable Mutable
Syntax b'hello' bytearray(b'hello') or bytearray(...)
Use Case For fixed, read-only binary data (e.g., file contents, network packets). For binary data that needs to be modified in-place (e.g., building a packet, streaming data).

Think of it like a string:

  • A string (str) is immutable. You can't change a character in place. You have to create a new string.
  • A list of characters (list) is mutable. You can change characters directly.

bytearray is the "list" version of the "string" that is bytes.

Python bytearray如何高效处理二进制数据?-图2
(图片来源网络,侵删)

Creating a bytearray

You can create a bytearray in several ways:

From a bytes Literal

You pass a bytes literal to the bytearray() constructor.

# From a bytes literal
ba1 = bytearray(b'hello')
print(ba1)
# Output: bytearray(b'hello')
# From a literal with integer values
ba2 = bytearray([72, 101, 108, 108, 111]) # H, e, l, l, o
print(ba2)
# Output: bytearray(b'Hello')

From an Iterable of Integers

The constructor accepts any iterable of integers where each integer is between 0 and 255.

# From a list of integers
data = [10, 20, 30, 40, 50]
ba3 = bytearray(data)
print(ba3)
# Output: bytearray(b'\n\x14\x1e(*')
# From a range
ba4 = bytearray(range(5)) # Creates [0, 1, 2, 3, 4]
print(ba4)
# Output: bytearray(b'\x00\x01\x02\x03\x04')

From a String with an Encoding

This is a very common use case. You encode a string into a specific format (like UTF-8) to get a bytearray.

text = "Hello, 世界"
# Encode the string to UTF-8 bytes
ba5 = bytearray(text, 'utf-8')
print(ba5)
# Output: bytearray(b'Hello, \xe4\xb8\x96\xe7\x95\x8c')

Creating a Zeroed-Out Buffer

You can create a bytearray of a specific size, initialized with zeros. This is useful for pre-allocating memory.

# Create a bytearray of 10 zero bytes
ba6 = bytearray(10)
print(ba6)
# Output: bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')

Common Operations (Since it's Mutable)

Because bytearray is mutable, you can perform list-like operations on it.

Modifying Elements

You can change individual bytes using their index.

ba = bytearray(b'hello')
print(f"Original: {ba}")
# Change the first character to 'H' (ASCII 72)
ba[0] = 72
print(f"After change: {ba}")
# Output:
# Original: bytearray(b'hello')
# After change: bytearray(b'Hello')

Appending and Extending

You can add new bytes to the end.

ba = bytearray(b'start')
print(f"Original: {ba}")
# Append a single integer
ba.append(33) # ASCII for '!'
print(f"After append: {ba}")
# Extend with another iterable
ba.extend([32, 87, 111, 114, 108, 100]) # Space, 'W', 'o', 'r', 'l', 'd'
print(f"After extend: {ba}")
# Output:
# Original: bytearray(b'start')
# After append: bytearray(b'start!')
# After extend: bytearray(b'start! World')

Inserting and Removing

You can insert bytes at a specific position or remove them.

ba = bytearray(b'World')
print(f"Original: {ba}")
# Insert 'Hello ' at the beginning
ba.insert(0, b'Hello ')
print(f"After insert: {ba}")
# Remove the byte at index 5 (the space)
del ba[5]
print(f"After delete: {ba}")
# Output:
# Original: bytearray(b'World')
# After insert: bytearray(b'Hello World')
# After delete: bytearray(b'HelloWorld')

Slicing

Slicing works just like with lists, and it creates a new bytearray.

ba = bytearray(b'Hello, Python!')
print(f"Original: {ba}")
# Get a slice
sub_ba = ba[7:13]
print(f"Slice: {sub_ba}")
print(f"Type of slice: {type(sub_ba)}")
# Modify the original
ba[0] = 106 # 'j'
print(f"After modifying original: {ba}")
print(f"Slice is unchanged: {sub_ba}")
# Output:
# Original: bytearray(b'Hello, Python!')
# Slice: bytearray(b'Python')
# Type of slice: <class 'bytearray'>
# After modifying original: bytearray(b'jello, Python!')
# Slice is unchanged: bytearray(b'Python')

When to Use bytearray?

bytearray is your go-to choice when you need to build or modify binary data on the fly.

  • Network Programming: Constructing a network packet header where you need to set flags or a length field.
  • Binary File I/O: Reading a file in chunks, modifying a chunk, and writing it back without loading the entire file into memory.
  • Data Manipulation: Working with binary formats like images, audio files, or custom protocols.
  • Performance: For very large binary datasets that you need to process incrementally, a mutable bytearray can be more memory-efficient than creating many new immutable bytes objects.

When to Use bytes?

Use bytes when the binary data is fixed and should not be changed. This is the safer and more common default.

  • Constants: Hardcoded binary data.
  • Function Return Values: A function that reads a file should return a bytes object to signal that the data is a snapshot and shouldn't be altered by the caller.
  • Keys in Dictionaries: bytes objects are hashable and can be used as dictionary keys. bytearray objects are not.

A Practical Example: Building a Simple TCP Packet

Let's simulate building a simple TCP-like packet. A packet might have a header and a body.

# 1. Create a header with a fixed size (e.g., 8 bytes)
# Let's say the first 4 bytes are a sequence number, and the next 4 are a checksum.
# We start with zeros.
header = bytearray(4) # Sequence number
header.extend(bytearray(4)) # Checksum
print(f"Initial Header: {header}")
# 2. Simulate setting the sequence number to 1
sequence_number = 1
header[0] = (sequence_number >> 24) & 0xFF
header[1] = (sequence_number >> 16) & 0xFF
header[2] = (sequence_number >> 8) & 0xFF
header[3] = sequence_number & 0xFF
print(f"Header with Seq Num: {header}")
# 3. Create the packet body
body = bytearray(b"This is the actual data we are sending.")
# 4. Combine header and body into the final packet
# We can't just use `+` because it creates a new object.
# A more efficient way is to create a new bytearray and extend it.
packet = bytearray()
packet.extend(header)
packet.extend(body)
print(f"\nFinal Packet: {packet}")
print(f"Packet Length: {len(packet)} bytes")
# 5. Now, let's pretend we received a packet and want to read the sequence number
# We can access it directly from the bytearray.
received_seq = (packet[0] << 24) | (packet[1] << 16) | (packet[2] << 8) | packet[3]
print(f"\nExtracted Sequence Number: {received_seq}")

This example highlights how bytearray's mutability is essential for constructing a complex data structure piece by piece.

分享:
扫描分享到社交APP
上一篇
下一篇