杰瑞科技汇

Python中int与unicode如何转换?

Of course! Let's break down the relationship between Python's int type and Unicode. This is a common point of confusion because the word "Unicode" can refer to two different things in this context:

Python中int与unicode如何转换?-图1
(图片来源网络,侵删)
  1. Unicode Characters (like 'A', 'é', '中'): These are abstract representations of text.
  2. Unicode Code Points: The unique integer number assigned to every character in the Unicode standard (e.g., 'A' is U+0041, which is 65 in decimal).

The key takeaway is: A Python int is not a Unicode character, but it can represent a Unicode code point. You use Python's chr() and ord() functions to convert between integers and Unicode characters.


The Core Relationship: chr() and ord()

These two built-in functions are the bridge between integers and Unicode characters.

ord(): Character to Integer (Code Point)

The ord() function takes a single Unicode character (a string of length 1) and returns its corresponding integer code point.

# Get the integer code point for 'A'
code_point_a = ord('A')
print(f"The code point for 'A' is: {code_point_a}")
# Output: The code point for 'A' is: 65
# Get the code point for the Euro sign '€'
code_point_euro = ord('€')
print(f"The code point for '€' is: {code_point_euro}")
# Output: The code point for '€' is: 8364
# Get the code point for a Chinese character
code_point_chinese = ord('中')
print(f"The code point for '中' is: {code_point_chinese}")
# Output: The code point for '中' is: 20013

chr(): Integer (Code Point) to Character

The chr() function does the opposite. It takes an integer (a valid Unicode code point) and returns the corresponding character as a string.

Python中int与unicode如何转换?-图2
(图片来源网络,侵删)
# Get the character for code point 65
char_a = chr(65)
print(f"The character for code point 65 is: '{char_a}'")
# Output: The character for code point 65 is: 'A'
# Get the character for code point 8364
char_euro = chr(8364)
print(f"The character for code point 8364 is: '{char_euro}'")
# Output: The character for code point 8364 is: '€'
# Get the character for code point 20013
char_chinese = chr(20013)
print(f"The character for code point 20013 is: '{char_chinese}'")
# Output: The character for code point 20013 is: '中'

Practical Examples

Example 1: Generating a Range of Characters

You can use chr() in a loop to generate a sequence of characters.

# Generate the alphabet from 'a' to 'z'
start_code = ord('a')  # 97
end_code = ord('z')    # 122
alphabet = [chr(code) for code in range(start_code, end_code + 1)]
print("".join(alphabet))
# Output: abcdefghijklmnopqrstuvwxyz

Example 2: Checking if a String Contains Only ASCII Characters

ASCII characters have code points from 0 to 127. You can use this knowledge to validate a string.

def is_ascii(s):
    """Checks if a string contains only ASCII characters."""
    try:
        s.encode('ascii')
        return True
    except UnicodeEncodeError:
        return False
# A more manual way using ord()
def is_ascii_manual(s):
    """Checks if a string contains only ASCII characters using ord()."""
    return all(ord(char) < 128 for char in s)
print(f"'Hello' is ASCII: {is_ascii('Hello')}")
print(f"'Café' is ASCII: {is_ascii('Café')}") # 'é' has a code point > 127
print(f"'Hello' is ASCII (manual): {is_ascii_manual('Hello')}")
print(f"'Café' is ASCII (manual): {is_ascii_manual('Café')}")

Output:

'Hello' is ASCII: True
'Café' is ASCII: False
'Hello' is ASCII (manual): True
'Café' is ASCII (manual): False

Example 3: Rotating Characters (like a Caesar Cipher)

This classic cipher shifts each character by a fixed number of positions in the alphabet. We can implement it using ord() and chr().

Python中int与unicode如何转换?-图3
(图片来源网络,侵删)
def rotate_char(char, shift):
    """Rotates a single character by 'shift' positions."""
    if not char.isalpha():
        return char # Don't rotate non-alphabetic characters
    # Determine the starting code point ('a' or 'A')
    start = ord('a') if char.islower() else ord('A')
    # Calculate the new code point
    # 1. Get the character's position in the alphabet (0-25)
    # 2. Add the shift and use modulo 26 to wrap around
    # 3. Add the starting code point back to get the final code point
    new_code_point = (ord(char) - start + shift) % 26 + start
    return chr(new_code_point)
def caesar_cipher(text, shift):
    """Applies the Caesar cipher to a full string."""
    return "".join(rotate_char(char, shift) for char in text)
original_text = "Hello, World!"
encrypted_text = caesar_cipher(original_text, 3)
decrypted_text = caesar_cipher(encrypted_text, -3) # or caesar_cipher(encrypted_text, 23)
print(f"Original:  {original_text}")
print(f"Encrypted: {encrypted_text}")
print(f"Decrypted: {decrypted_text}")

Output:

Original:  Hello, World!
Encrypted: Khoor, Zruog!
Decrypted: Hello, World!

Common Pitfalls and Important Notes

int vs. str

A chr() function always returns a string (a string of length 1), not an integer. You cannot treat a character as a number directly.

char = 'A'
# This will cause a TypeError:
# print(char + 1)
# You must convert it to an int first
code_point = ord(char)
print(code_point + 1) # Output: 66

Valid Range for chr()

You cannot pass just any integer to chr(). It must be a valid Unicode code point, which is in the range from 0 to 0x10FFFF (inclusive).

# This works
print(chr(65))       # 'A'
print(chr(0x10FFFF)) # The highest valid code point
# This will raise a ValueError
try:
    print(chr(1114112)) # 0x110000 is one too high
except ValueError as e:
    print(f"Error: {e}")

Output:

A
𠀿
Error: chr() arg not in range(0x110000)

int and Encoding

Sometimes, people have an integer that represents a byte value and want to get the corresponding character. This is different from a full Unicode code point. In this case, you should first convert the integer to a byte and then decode it.

# Suppose you have an integer representing a byte in Latin-1 (ISO-8859-1) encoding
byte_value = 224 # This is the code for 'à' in Latin-1
# Method 1: Using bytes and decode
# Create a bytes object from the integer, then decode it
char_from_byte = bytes([byte_value]).decode('latin-1')
print(f"Decoded character: '{char_from_byte}'")
# Output: Decoded character: 'à'
# Method 2: Directly with chr() if you know the encoding
# chr() can also handle code points in the Latin-1 range (0-255)
char_from_byte_alt = chr(byte_value)
print(f"Character from chr(): '{char_from_byte_alt}'")
# Output: Character from chr(): 'à'

For this specific case (0-255), chr() works because the Latin-1 code page matches the first 256 Unicode code points. For other encodings like UTF-8, you must use the bytes().decode() method.


Summary Table

Function Input Output Description
ord() A single character (string of length 1) An integer (int) Returns the Unicode code point of the character.
chr() An integer (a valid Unicode code point) A single character (string) Returns the character corresponding to the code point.
int() A string (e.g., "123") An integer (int) Converts a string representation of a number to an actual number.
str() An integer (e.g., 123) A string (e.g., "123") Converts an integer to its string representation.
分享:
扫描分享到社交APP
上一篇
下一篇