杰瑞科技汇

Python中decode与encode如何正确使用?

Of course! Let's break down Python's encode() and decode() methods. These are fundamental concepts for working with text and bytes in Python.

Python中decode与encode如何正确使用?-图1
(图片来源网络,侵删)

The Core Problem: Text vs. Bytes

In Python, there are two main types for representing data:

  1. str (String): A sequence of Unicode characters. This is how humans think about text. Python 3 strings are Unicode by default. Examples: "hello", "你好", .
  2. bytes: A sequence of raw 8-bit bytes (integers from 0 to 255). This is how computers actually store and transmit data. Examples: b'hello', b'\xe4\xbd\xa0\xe5\xa5\xbd'.

The Golden Rule:

  • You can only write bytes objects to a file or send them over a network.
  • You can only perform text operations (like searching for a substring) on str objects.

encode() and decode() are the bridges between these two worlds.


encode(): From String (str) to Bytes (bytes)

You use encode() when you have a string and you need to convert it into a sequence of bytes to save it to a file or send it over a network.

Python中decode与encode如何正确使用?-图2
(图片来源网络,侵删)

How it works:

my_string.encode(encoding)

  • my_string: The str object you want to convert.
  • encoding: (Optional, but highly recommended) The character encoding to use (e.g., 'utf-8', 'ascii', 'latin-1'). If you don't provide it, Python uses the system's default encoding, which can lead to unexpected behavior. Always specify it!

Example: Encoding "hello"

my_text = "hello world"
# Encode the string into bytes using UTF-8 encoding
my_bytes = my_text.encode('utf-8')
print(f"Original type: {type(my_text)}")
print(f"Original string: {my_text}")
print(f"\nEncoded type: {type(my_bytes)}")
print(f"Encoded bytes: {my_bytes}")

Output:

Original type: <class 'str'>
Original string: hello world
Encoded type: <class 'bytes'>
Encoded bytes: b'hello world'

For simple ASCII characters, the byte representation looks very similar.

Example: Encoding "你好" (Non-ASCII Characters)

This is where encoding becomes critical.

Python中decode与encode如何正确使用?-图3
(图片来源网络,侵删)
my_text = "你好"
# Encode using UTF-8
utf8_bytes = my_text.encode('utf-8')
# Encode using GBK (another common encoding for Chinese)
gbk_bytes = my_text.encode('gbk')
print(f"Original string: {my_text}")
print(f"UTF-8 encoded bytes: {utf8_bytes}")
print(f"GBK encoded bytes:  {gbk_bytes}")

Output:

Original string: 你好
UTF-8 encoded bytes: b'\xe4\xbd\xa0\xe5\xa5\xbd'
GBK encoded bytes:  b'\xc4\xe3\xba\xc3'

Notice how the same text results in completely different byte sequences depending on the encoding. This is why specifying the correct encoding is so important!


decode(): From Bytes (bytes) to String (str)

You use decode() when you receive a sequence of bytes (from a file, a network request, etc.) and you want to convert it into a human-readable string.

How it works:

my_bytes.decode(encoding)

  • my_bytes: The bytes object you want to convert.
  • encoding: (Optional, but highly recommended) The character encoding that was used to create the bytes. You must use the same encoding that was used for encode(), or you'll get garbled text (called "mojibake").

Example: Decoding the "hello" bytes

my_bytes = b'hello world'
# Decode the bytes back into a string
my_text = my_bytes.decode('utf-8')
print(f"Original type: {type(my_bytes)}")
print(f"Original bytes: {my_bytes}")
print(f"\nDecoded type: {type(my_text)}")
print(f"Decoded string: {my_text}")

Output:

Original type: <class 'bytes'>
Original bytes: b'hello world'
Decoded type: <class 'str'>
Decoded string: hello world

Example: Decoding "你好" bytes

This shows what happens when you use the wrong encoding.

# These bytes were created using UTF-8 encoding
utf8_bytes = b'\xe4\xbd\xa0\xe5\xa5\xbd'
# Correctly decode with UTF-8
correct_text = utf8_bytes.decode('utf-8')
# Incorrectly decode with ASCII (will cause an error)
try:
    incorrect_text = utf8_bytes.decode('ascii')
except UnicodeDecodeError as e:
    print(f"Error decoding with ASCII: {e}")
print(f"Original bytes: {utf8_bytes}")
print(f"Correctly decoded (UTF-8): {correct_text}")

Output:

Error decoding with ASCII: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
Original bytes: b'\xe4\xbd\xa0\xe5\xa5\xbd'
Correctly decoded (UTF-8): 你好

The ASCII codec failed because the byte 0xe4 is not a valid ASCII character. This is why you must know the encoding of the bytes you are trying to decode.


Practical Workflow: Reading from a File

This is the most common use case. You read raw bytes from a file and then decode them into a string.

# 1. Open the file in binary mode ('rb') to read bytes
# Let's assume 'my_chinese_file.txt' contains the text "你好世界"
try:
    with open('my_chinese_file.txt', 'rb') as f:
        file_bytes = f.read()
        print(f"Read from file (bytes): {file_bytes}")
    # 2. Decode the bytes into a string
    # We must know the encoding the file was saved with. Let's assume it's UTF-8.
    file_content = file_bytes.decode('utf-8')
    print(f"Decoded content (string): {file_content}")
    print(f"Type of content: {type(file_content)}")
except FileNotFoundError:
    print("File not found. Creating a dummy one for demonstration.")
    # Create a dummy file to make the example runnable
    with open('my_chinese_file.txt', 'w', encoding='utf-8') as f:
        f.write("你好世界")
    # The code above will now work on the next run.

Output (assuming the file was created with UTF-8):

Read from file (bytes): b'\xe4\xbd\xa0\xe5\xa5\xbd\xe4\xb8\x96\xe7\x95\x8c'
Decoded content (string): 你好世界
Type of content: <class 'str'>

Summary Table

Method Purpose Input Type Output Type Common Use Case
encode() Convert text to bytes for storage/transmission. str bytes Preparing a string to be written to a file or sent over a network.
decode() Convert bytes to text for processing. bytes str Reading data from a file or a network request to make it readable.

Key Takeaways

  1. str is for text, bytes is for data.
  2. encode() turns str -> bytes.
  3. decode() turns bytes -> str.
  4. ALWAYS specify the encoding (like 'utf-8') when using these methods. UTF-8 is the modern standard and a safe default for most cases.
  5. If you get a UnicodeDecodeError, you are likely trying to decode bytes with the wrong encoding.
  6. If you get a UnicodeEncodeError, you are trying to encode a character that is not supported by your chosen encoding (e.g., trying to encode Chinese characters with 'ascii').
分享:
扫描分享到社交APP
上一篇
下一篇