杰瑞科技汇

Python bytestring是什么?如何使用?

Of course! Let's dive deep into Python's bytes and bytearray types, often collectively referred to as "bytestrings."

Python bytestring是什么?如何使用?-图1
(图片来源网络,侵删)

What is a Bytestring? The Core Concept

At its simplest, a bytestring is a sequence of bytes. A byte is a unit of digital information that consists of 8 bits. It's the fundamental building block for storing data in computers.

Think of it like this:

  • A string (str) in Python is a sequence of Unicode characters (like 'a', , '你', ). It's an abstract representation of text.
  • A bytestring (bytes or bytearray) is a sequence of raw bytes (like 97, 195, 226, 240). It's a concrete, binary representation of data.

To store a string on a disk or send it over a network, you must first encode it into a sequence of bytes. To read that data back, you must decode it from bytes back into a string.

The Golden Rule:

Python bytestring是什么?如何使用?-图2
(图片来源网络,侵删)

str <-> Encoding -> bytes <-> Decoding -> str


The bytes Type: Immutable Bytestrings

The bytes type represents an immutable sequence of bytes. Once you create a bytes object, you cannot change it. This makes it similar to a tuple or a regular str.

Creating bytes Objects

There are several common ways to create a bytes object.

a) From a String (The Most Common Way)

You use the .encode() method on a string. You must specify an encoding (UTF-8 is the most common and recommended standard).

text = "Hello, World! 你好 🌎"
# Encode the string into bytes using UTF-8 encoding
encoded_bytes = text.encode('utf-8')
print(f"Original string: {text}")
print(f"Type of original: {type(text)}")
print(f"Encoded bytes: {encoded_bytes}")
print(f"Type of encoded: {type(encoded_bytes)}")

Output:

Original string: Hello, World! 你好 🌎
Type of original: <class 'str'>
Encoded bytes: b'Hello, World! \xe4\xbd\xa0\xe5\xa5\xbd \xf0\x9f\x8c\x8e'
Type of encoded: <class 'bytes'>

Notice the b'' prefix. This is how Python literals denote a bytes object. Also, non-ASCII characters are represented by their byte sequences (e.g., \xe4\xbd\xa0 for "你").

b) From a Literal

You can create a bytes object directly using a literal, similar to a list comprehension.

# A bytes object with 10 bytes, all initialized to the value 0
zero_bytes = bytes(10)
print(f"Zero bytes: {zero_bytes}")
# A bytes object from a list of integers (0-255)
from_list = bytes([65, 66, 67, 255]) # 65='A', 66='B', 67='C'
print(f"From list: {from_list}")
# A bytes literal (b'...')
literal_bytes = b'ABC'
print(f"Literal bytes: {literal_bytes}")

Output:

Zero bytes: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
From list: b'ABC\xff'
Literal bytes: b'ABC'

c) From an Existing bytes Object

You can create a copy of a bytes object.

original = b'Hello'
copy = bytes(original)
print(copy) # Output: b'Hello'

Accessing and Slicing bytes

You can access bytes just like you access items in a str or list.

data = b'Hello, World!'
print(f"First byte: {data[0]}")   # Access by index -> returns an integer
print(f"Slice: {data[0:5]}")     # Slice -> returns a new bytes object
print(f"Length: {len(data)}")

Output:

First byte: 72
Slice: b'Hello'
Length: 13

Key Point: When you access a single byte with data[0], you get an integer (the value of that byte, between 0 and 255). When you slice it, you get a new bytes object.


The bytearray Type: Mutable Bytestrings

Sometimes, you need a sequence of bytes that you can modify. For example, when reading a file or building a network packet piece by piece. This is where bytearray comes in.

A bytearray is exactly like a bytes object, except it is mutable. You can change its contents after it's created.

Creating bytearray Objects

The syntax is very similar to bytes, but you use the bytearray() constructor.

# From a string
text = "mutable"
encoded_bytes = text.encode('utf-8')
mutable_ba = bytearray(encoded_bytes)
print(f"Mutable bytearray: {mutable_ba}")
print(f"Type: {type(mutable_ba)}")
# From a list of integers
from_list = bytearray([65, 66, 67])
print(f"From list: {from_list}")
# From a bytes literal
literal_ba = bytearray(b'ABC')
print(f"From literal: {literal_ba}")

Output:

Mutable bytearray: b'mutable'
Type: <class 'bytearray'>
From list: b'ABC'
From literal: b'ABC'

Modifying a bytearray

This is where bytearray shines. You can use indexing and slicing to change its contents.

ba = bytearray(b'Spam and eggs')
# Change a single byte
ba[0] = 72 # 72 is the ASCII code for 'H'
print(ba) # Output: b'Ham and eggs'
# Change a slice
ba[4:7] = b' Ham' # Note: the replacement must also be a bytes-like object
print(ba) # Output: b'Ham Ham eggs'
# Append a byte
ba.append(33) # 33 is the ASCII code for '!'
print(ba) # Output: b'Ham Ham eggs!'
# You cannot append an integer > 255, it will raise an error
try:
    ba.append(256)
except ValueError as e:
    print(f"Error: {e}")

Output:

b'Ham and eggs'
b'Ham Ham eggs'
b'Ham Ham eggs!'
Error: byte must be in range(0, 256)

Key Differences: bytes vs. bytearray

Feature bytes bytearray
Mutability Immutable (cannot be changed) Mutable (can be changed)
Syntax b'...' literal bytearray() constructor
Use Case For data that shouldn't change, like constants, file contents read once, or cryptographic hashes. For building or modifying binary data, like writing to a file, processing a network stream, or parsing a binary protocol.
Methods Has fewer methods, mostly for reading/querying. Has all methods of bytes plus mutation methods like .append(), .insert(), .pop(), .reverse().

Common Use Cases

Bytestrings are everywhere in programming, especially when dealing with:

a) File I/O (Reading/Writing Binary Files)

When you open a file in binary mode ('rb' or 'wb'), you work directly with bytes.

# Writing a string to a binary file
data_to_write = "This is some data".encode('utf-8')
with open('my_file.bin', 'wb') as f:
    f.write(data_to_write)
# Reading the binary file back
with open('my_file.bin', 'rb') as f:
    data_from_file = f.read()
# You must decode it to use it as a string
original_string = data_from_file.decode('utf-8')
print(f"Read from file: {original_string}")

b) Network Communication (Sockets)

Data sent over a network is always transmitted as a sequence of bytes.

# This is a conceptual example for a socket
import socket
# Host and port
HOST = '127.0.0.1'
PORT = 65432
# Create a socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.connect((HOST, PORT))
    # You must SEND bytes, not a string
    message_to_send = "Hello, server!".encode('utf-8')
    s.sendall(message_to_send)
    # You will RECEIVE bytes
    data_received = s.recv(1024) # Read up to 1024 bytes
    # You must DECODE the received bytes to use it as a string
    response = data_received.decode('utf-8')
    print(f"Server said: {response}")

c) Cryptography and Hashing

Cryptographic libraries like hashlib and cryptography work exclusively with bytes.

import hashlib
# You must hash bytes, not a string
data = "my secret password".encode('utf-8')
# Create a SHA256 hash object
sha256_hash = hashlib.sha256()
# Update the hash object with the bytes
sha256_hash.update(data)
# Get the final hash as a bytes object
hex_digest = sha256_hash.hexdigest() # .hexdigest() gives a hex string representation
full_digest = sha256_hash.digest()   # .digest() gives the raw bytes
print(f"Hex digest: {hex_digest}")
print(f"Raw bytes digest: {full_digest}")
print(f"Type of digest: {type(full_digest)}")

Summary

Concept Description Python Type
String Abstract sequence of characters. Used for text. str
Bytestring Concrete sequence of raw bytes (0-255). Used for binary data. bytes, bytearray
Encoding The process of converting a str to bytes. my_str.encode('utf-8')
Decoding The process of converting bytes to a str. my_bytes.decode('utf-8')

Mastering bytes and bytearray is essential for any Python developer who works with files, the internet, or any kind of binary data.

分享:
扫描分享到社交APP
上一篇
下一篇