杰瑞科技汇

Python bytearray与str如何转换?

Of course! This is a very common and important concept in Python, especially when dealing with low-level data, network communication, or binary file I/O.

Python bytearray与str如何转换?-图1
(图片来源网络,侵删)

Let's break down the relationship between bytearray and str (strings) in Python.

The Core Difference: Text vs. Bytes

  • str (String): Represents text. It's a sequence of Unicode characters. Unicode is a universal standard for representing most of the world's writing systems. Python 3 strings are, by default, Unicode strings.
    • Example: "hello" or "你好"
  • bytearray (Byte Array): Represents a mutable (changeable) sequence of bytes. A byte is 8 bits and can hold a value from 0 to 255. It's a raw, low-level representation of data.
    • Example: b'hello' (this is an immutable bytes object, but bytearray works with the same kind of data).

You cannot directly mix str and bytearray. You must encode a string to get bytes, and decode bytes to get a string.


Converting str to bytearray (Encoding)

To create a bytearray from a string, you need to encode the string into a specific byte representation. The most common encoding is UTF-8.

Syntax: bytearray_object = your_string.encode(encoding)

Python bytearray与str如何转换?-图2
(图片来源网络,侵删)

Example:

my_string = "Hello, Python! 🐍"
# Encode the string into a bytearray using UTF-8 encoding
my_bytearray = my_string.encode('utf-8')
print(f"Original string: {my_string}")
print(f"Type of original: {type(my_string)}")
print("-" * 20)
print(f"Resulting bytearray: {my_bytearray}")
print(f"Type of result: {type(my_bytearray)}")

Output:

Original string: Hello, Python! 🐍
Type of original: <class 'str'>
--------------------
Resulting bytearray: b'Hello, Python! \xf0\x9f\x90\x8d'
Type of result: <class 'bytearray'>

Key Observations:

  • The emoji is a complex Unicode character. When encoded in UTF-8, it takes up 4 bytes (\xf0\x9f\x90\x8d). This is why the length of the bytearray is longer than the length of the string.
  • If you try to encode without specifying an encoding, Python 3 will raise a TypeError:
    # This will fail!
    # bytearray(my_string) 
    # TypeError: string argument without an encoding

Converting bytearray to str (Decoding)

To get a str back from a bytearray, you need to decode the bytes using the same encoding that was used to create them.

Python bytearray与str如何转换?-图3
(图片来源网络,侵删)

Syntax: string_object = your_bytearray.decode(encoding)

Example:

# Let's use the bytearray from the previous example
my_bytearray = bytearray(b'Hello, Python! \xf0\x9f\x90\x8d')
# Decode the bytearray back into a string using UTF-8
my_string = my_bytearray.decode('utf-8')
print(f"Original bytearray: {my_bytearray}")
print(f"Type of original: {type(my_bytearray)}")
print("-" * 20)
print(f"Resulting string: {my_string}")
print(f"Type of result: {type(my_string)}")

Output:

Original bytearray: bytearray(b'Hello, Python! \xf0\x9f\x90\x8d')
Type of original: <class 'bytearray'>
--------------------
Resulting string: Hello, Python! 🐍
Type of result: <class 'str'>

Crucial Point: Decoding Errors What if the bytes are corrupted or use a different encoding than you expect?

# Create a bytearray from a string encoded with 'latin-1'
corrupted_bytearray = "café".encode('latin-1')
# Now try to decode it as if it were UTF-8 (which it's not)
try:
    # This will fail because the byte for 'é' is invalid in UTF-8
    corrupted_bytearray.decode('utf-8')
except UnicodeDecodeError as e:
    print(f"Error: {e}")
# The correct way: decode it with the original encoding, 'latin-1'
correct_string = corrupted_bytearray.decode('latin-1')
print(f"Correctly decoded string: {correct_string}")

Output:

Error: 'utf-8' codec can't decode byte 0xe9 in position 3: invalid continuation byte
Correctly decoded string: café

Key Differences: str vs. bytearray vs. bytes

It's important to distinguish bytearray from its close relative, bytes.

Feature str bytes bytearray
Purpose Text Immutable sequence of bytes Mutable sequence of bytes
Mutability Immutable (cannot be changed) Immutable (cannot be changed) Mutable (can be changed)
Syntax "text" b'bytes' bytearray(b'bytes') or bytearray([65, 66, 67])
Methods .upper(), .split(), .find() .hex(), .decode(), .find() All bytes methods AND .append(), .insert(), .pop(), etc.

Example of bytearray Mutability

This is the primary reason to use bytearray over bytes.

# Create a bytearray
data = bytearray(b'hello world')
# You CAN change its contents
data[0] = 74  # J (ASCII value for 'J' is 74)
data[6] = 80  # P (ASCII value for 'P' is 80)
print(f"Modified bytearray: {data}") # Output: b'hello Pyorld'
# You CAN append new bytes
data.append(33) # ! (ASCII value for '!' is 33)
print(f"After appending: {data}") # Output: b'hello Pyorld!'
# This would FAIL with a `bytes` object
# immutable_bytes = b'hello'
# immutable_bytes[0] = 74
# TypeError: 'bytes' object does not support item assignment

Summary and When to Use

Task Method Example
Convert str -> bytearray .encode() my_str = "text"; my_ba = my_str.encode('utf-8')
Convert bytearray -> str .decode() my_ba = bytearray(b'text'); my_str = my_ba.decode('utf-8')
Create empty bytearray bytearray() ba = bytearray(10) # Creates 10 zero bytes
Create bytearray from list bytearray(iterable) ba = bytearray([65, 66, 67]) # Creates b'ABC'

When to use str:

  • For all general-purpose text manipulation in your application.
  • When reading from or writing to text files (use open(..., 'r') or open(..., 'w', encoding='utf-8')).

When to use bytearray:

  • When you need to modify a sequence of bytes in place (e.g., building a network packet).
  • When dealing with low-level data formats like binary file structures, image pixel data, or cryptography.
  • As a more memory-efficient alternative to bytes if you know you'll be modifying the data frequently.

When to use bytes:

  • When you have a fixed, immutable sequence of bytes (e.g., a network protocol header, a file hash, or a constant piece of binary data).
  • It's the more common and generally preferred choice for immutable binary data.
分享:
扫描分享到社交APP
上一篇
下一篇