杰瑞科技汇

python unicode print

Of course! Printing Unicode characters in Python is a common task, and Python handles it very well. The key is to ensure your code, your terminal, and your text editor are all "speaking the same language" (UTF-8).

Here’s a complete guide, from the basics to handling potential issues.

The Short Answer (TL;DR)

For modern Python (3.x), it's usually this simple:

# This works out-of-the-box in Python 3 if your terminal supports UTF-8
print("Hello, 世界")  # Chinese characters
print("I ❤️ Python")  # Heart emoji
print("€100")         # Euro symbol

If you're on Python 2, you need to be more careful, but it's highly recommended to use Python 3 for new projects.


Detailed Explanation

Python 3: The Modern and Easy Way

Python 3 was designed from the ground up to handle Unicode as the default string type. This is the biggest reason to prefer Python 3 for international applications.

Key Concept: In Python 3, a string literal like "hello" is a Unicode string by default. The print() function is smart enough to encode this string to your terminal's default encoding (which is almost always UTF-8 on modern systems) before sending it to the console.

python unicode print-图1

Example:

# A simple script named `unicode_print.py`
# Basic Unicode characters
print("Café")        # é is a common accented character
print("München")     # ü is another common one
# Emojis (which are technically also Unicode characters)
print("Python 🐍")
print("Let's go! 🚀")
# Mathematical symbols
print("The sum is: 1 + 1 = 2")
print("The square root of 2 is √2")
# Chinese, Japanese, Korean (CJK) characters
print("你好")  # Chinese for "Hello"
print("こんにちは") # Japanese for "Hello"
print("안녕하세요") # Korean for "Hello"

How to run it:

  1. Save the code as unicode_print.py.
  2. Open your terminal or command prompt.
  3. Run the script: python unicode_print.py

As long as your terminal's font supports these characters (most modern ones do), you will see them printed correctly.


Python 2: The "Old School" Way (Not Recommended)

If you are stuck with Python 2, you must be explicit about encodings. In Python 2, there are two main string types:

  • str: A sequence of bytes. This is the default.
  • unicode: A sequence of Unicode code points.

If you try to print a non-ASCII str object, Python will try to encode it using your system's default encoding, which can lead to errors or mojibake (garbled text).

python unicode print-图2

The Golden Rule for Python 2: Always define a source code encoding at the very top of your file.

# -*- coding: utf-8 -*-
# Now you can use non-ASCII characters in your string literals
# and they will be treated as Unicode strings.
my_unicode_string = u"Hello, 世界" # The 'u' prefix makes it a Unicode string
# To print it, you must encode it to a byte string (e.g., UTF-8)
# and print that.
my_encoded_string = my_unicode_string.encode('utf-8')
print(my_encoded_string)
# A more concise way:
print u"Hello, 世界".encode('utf-8')
# You can also set the default encoding for the whole process
# (though the first method is generally preferred).
import sys
reload(sys)  # Only in Python 2
sys.setdefaultencoding('utf-8')
print "This will also work now." # No 'u' prefix needed

Common Problems and Solutions

Problem 1: UnicodeEncodeError

This is the most common error. It happens when Python has a Unicode string in memory but needs to send it to a destination (like your terminal or a file) that can only handle a specific encoding (like ASCII).

Error Message: UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e16' in position 7: ordinal not in range(128)

Cause: You are likely running Python 2 and forgot to use the u"" prefix or .encode('utf-8'). Or, you are on Windows with an older terminal that doesn't support UTF-8 well.

Solution:

python unicode print-图3

  • Best: Switch to Python 3.
  • For Python 2: Ensure you are encoding your strings to UTF-8 before printing, as shown in the section above.
  • For Windows: Make sure your terminal (like Windows Terminal or modern Command Prompt) is configured to use a UTF-8 compatible font (e.g., Consolas, Lucida Console).

Problem 2: Mojibake (Garbled Text)

This looks like instead of or Héllo instead of Héllo.

Cause: This is a classic encoding mismatch. For example:

  1. Your source code file is saved as Latin-1 (a common default on some systems).
  2. Python 2 reads it as Latin-1, so the byte 0xE9 becomes the Unicode character .
  3. When printing, Python tries to encode the Unicode back to bytes for the terminal, but it accidentally uses UTF-8.
  4. The UTF-8 encoding for is the two-byte sequence 0xC3 0xA9.
  5. Your terminal receives these two bytes and displays them as two separate characters: .

Solution:

  • For Python 2: Always specify the source encoding at the top of your file: # -*- coding: utf-8 -*-. This tells Python how to interpret the bytes in your .py file correctly.
  • For Python 3: This is less of an issue, but it can still happen if you are reading text from a file or network that has the wrong encoding. You must decode the incoming bytes using the correct encoding (e.g., data.decode('latin-1')) before you can work with it as a string.

Writing Unicode to a File

The same principles apply when writing to a file. You must specify the encoding.

Python 3 (Recommended):

# The 'with' statement ensures the file is closed automatically
with open('my_unicode_file.txt', 'w', encoding='utf-8') as f:
    f.write("This is a test file.\n")
    f.write("It contains special characters: émojis ❤️ and symbols €.\n")
    f.write("And some Chinese: 你好.\n")

Python 2:

# -*- coding: utf-8 -*-
# Open the file in binary mode ('wb') and write encoded bytes
with open('my_unicode_file.txt', 'wb') as f:
    f.write("This is a test file.\n".encode('utf-8'))
    f.write("It contains special characters: émojis ❤️ and symbols €.\n".encode('utf-8'))
    f.write("And some Chinese: 你好.\n".encode('utf-8'))

Summary

Feature Python 3 Python 2
Default String Type Unicode (str) Bytes (str)
Literal "你好" is Unicode u"你好" is Unicode
Printing print("你好") (Just works) print u"你好".encode('utf-8')
File Writing open(..., encoding='utf-8') open(..., 'wb') and .encode()
Recommendation Use Python 3 for all new projects. Only for legacy maintenance.

分享:
扫描分享到社交APP
上一篇
下一篇