python str unicode-杰瑞科技汇

Of course! This is a fundamental and often confusing topic in Python, especially for developers who have worked with both Python 2 and Python 3. Let's break it down clearly.

The Core Idea: Bytes vs. Characters

At the heart of the str vs. unicode issue is the distinction between bytes and characters.

Bytes: A sequence of 8-bit values (0-255). This is how data is actually stored on your disk or transmitted over a network. It's just raw, numerical data.
Characters: An abstract concept, like the letter 'A', the Chinese character '中', or the euro symbol '€'. A character isn't a number; it's an idea.

The job of an encoding (like UTF-8, ASCII, Latin-1) is to translate between these two:

Encoding: Translating characters into bytes.
Decoding: Translating bytes into characters.

The Difference: Python 2 vs. Python 3

This is the most critical point. The meaning of str and unicode changed dramatically between these two versions.

Python 2 (The "Old" Way)

In Python 2, there were two distinct string types:

`str` (The "Byte String")

What it is: A sequence of bytes.
Default Encoding: By default, Python 2 assumed your str was encoded in ASCII.
Problem: You could create a str containing non-ASCII characters (like ), but Python would have no idea what encoding it was in. This led to cryptic UnicodeDecodeError and UnicodeEncodeError exceptions.

Example:

# This is a byte string. Python 2 doesn't know its encoding.
my_str = "Hello, world! 你好" 
# On my system, this is actually a UTF-8 encoded byte string.
# But Python 2 just sees it as a sequence of bytes.

`unicode` (The "Unicode String")

What it is: A sequence of abstract characters. It's an internal representation that is not tied to any specific encoding.
Purpose: To correctly handle text from all languages without ambiguity.
How to create: You create a unicode string by decoding a str (byte string) using a specific encoding.

Example:

# my_str is a byte string (let's assume it's UTF-8 encoded)
my_str = "Hello, world! 你好"
# To get a proper unicode string, you must DECODE it
my_unicode = my_str.decode('utf-8')
print type(my_str)      # <type 'str'>
print type(my_unicode)  # <type 'unicode'>
# Now you can do things that require knowing the character, not the bytes
print len(my_unicode)   # 14 (it counts characters: 'H','e','l','l','o',...,'你','好')

The Golden Rule in Python 2: "Unicode sandwich".

The "bread" is your external interface (reading from a file, getting from a network request). This should be bytes (str).
The "filling" is all your internal processing. This should be unicode.
You decode bytes to unicode when you read them in, and encode unicode back to bytes when you write them out.

# Python 2 Golden Rule Example
# 1. Read bytes from a file (the top slice of bread)
    with open('my_file.txt', 'r') as f:
        # f.read() returns a byte string ('str')
        data_from_file = f.read()
# 2. Decode to unicode for processing (the filling)
    text_data = data_from_file.decode('utf-8')
    # ... do all your text manipulation here with text_data (unicode) ...
# 3. Encode back to bytes to write or send (the bottom slice of bread)
    data_to_write = text_data.encode('utf-8')
    with open('another_file.txt', 'w') as f:
        f.write(data_to_write)

Python 3 (The "New" Way)

Python 3 was designed to fix this confusion by making the str vs. bytes distinction explicit and defaulting to the robust UTF-8 encoding.

`str` (The "Text String")

What it is: A sequence of abstract characters. This is what Python 2 called unicode.
Default Encoding: The default encoding for your source code files is UTF-8. You can now write non-ASCII characters directly in your strings.
Purpose: This is the type you should use for all your text processing.

Example:

# This is a text string. It stores characters, not bytes.
# Python 3 knows this is a string of characters.
my_str = "Hello, world! 你好"
print(type(my_str))     # <class 'str'>
print(len(my_str))      # 14 (counts characters)
print(my_str[0])        # 'H'

`bytes` (The "Byte String")

What it is: A sequence of bytes. This is what Python 2 called str.
Purpose: Used for raw binary data (like images, network packets, or when you need to interface with a legacy system that only works with bytes).
How to create: You create a bytes object by encoding a str (text string).

Example:

# my_str is a text string ('str')
my_str = "Hello, world! 你好"
# To get a byte string, you must ENCODE it
my_bytes = my_str.encode('utf-8')
print(type(my_bytes))   # <class 'bytes'>
print(my_bytes)         # b'Hello, world! \xe4\xbd\xa0\xe5\xa5\xbd'
# The \xe4... are the UTF-8 byte representations for '你' and '好'
# You can also create a bytes literal with a 'b' prefix
my_bytes_literal = b"Hello, world!"
print(type(my_bytes_literal)) # <class 'bytes'>

The Golden Rule in Python 3: It's much simpler.

Use str for all text.
Use bytes for all binary data.
Encode str -> bytes when you need to send or store text.
Decode bytes -> str when you receive or read text.

# Python 3 Golden Rule Example
# 1. Read bytes from a file
    with open('my_file.txt', 'rb') as f: # Note the 'rb' (read bytes)
        data_from_file = f.read() # data_from_file is 'bytes'
# 2. Decode to text (str) for processing
    text_data = data_from_file.decode('utf-8') # text_data is 'str'
    # ... do all your text manipulation here with text_data ...
# 3. Encode back to bytes to write
    data_to_write = text_data.encode('utf-8') # data_to_write is 'bytes'
    with open('another_file.txt', 'wb') as f: # Note the 'wb' (write bytes)
        f.write(data_to_write)

Summary Table

Feature	Python 2 `str`	Python 2 `unicode`	Python 3 `str`	Python 3 `bytes`
Type	Byte String	Text String	Text String	Byte String
Content	Raw bytes (0-255)	Abstract characters	Abstract characters	Raw bytes (0-255)
Encoding	Assumed ASCII by default	N/A (it's the decoded form)	Default is UTF-8 for source code	N/A (it's the encoded form)
Literal	`"hello"`	`u"hello"`	`"hello"`	`b"hello"`
How to get	From files, network	`my_str.decode('utf-8')`	From files, network	`my_str.encode('utf-8')`
`len()`	Counts bytes	Counts characters	Counts characters	Counts bytes

Key Takeaway

In Python 3, you should almost always work with str for text. Only use bytes when you are dealing with binary data or need to interface with a low-level system that requires it.
The unicode type does not exist in Python 3. It has been replaced by the improved str type.
The most common source of errors in Python 2 was mixing str and unicode without proper encoding/decoding. Python 3's design makes this much harder to do by accident.

python str unicode

The Core Idea: Bytes vs. Characters

The Difference: Python 2 vs. Python 3

Python 2 (The "Old" Way)

`str` (The "Byte String")

`unicode` (The "Unicode String")

Python 3 (The "New" Way)

`str` (The "Text String")

`bytes` (The "Byte String")

Summary Table

Key Takeaway

99ANYc3cd6

Java中double二进制如何表示与计算？

Python字符串与字节如何转换？

数控机床编程与操作视频教程

大学计算机基础教程课后答案哪里找？

Java如何调用Unity3D实现交互？

Java Socket编程实例具体怎么实现？

Sublime如何一键格式化Python代码？

flash游戏编程基础教程

javascript菜鸟教程

java cas client

Java URL与Base64如何互转？

Autocad2025教程，新手如何快速上手？

信息系统项目管理师教程第3版

如何正确修改Python的sys.path路径？

c vs2008 教程

Python LDAP搜索如何高效实现？

python str unicode

The Core Idea: Bytes vs. Characters

The Difference: Python 2 vs. Python 3

Python 2 (The "Old" Way)

str (The "Byte String")

unicode (The "Unicode String")

Python 3 (The "New" Way)

str (The "Text String")

bytes (The "Byte String")

Summary Table

Key Takeaway

相关推荐

Java Socket编程实例具体怎么实现？

`str` (The "Byte String")

`unicode` (The "Unicode String")

`str` (The "Text String")

`bytes` (The "Byte String")