Of course! The topic of encoding with Python and MySQL is crucial for correctly handling data, especially non-English characters like Chinese, Arabic, or emojis. A mismatch in encoding between your Python application, the MySQL connection, and the database itself is the most common cause of garbled text (often called "mojibake").

Here’s a comprehensive guide covering the key concepts, best practices, and code examples.
The Core Problem: Character Sets and Collations
In MySQL, text is stored using a character set (e.g., utf8, utf8mb4, latin1) and a collation (e.g., utf8mb4_unicode_ci, latin1_swedish_ci).
- Character Set: Defines which characters are allowed (e.g., letters, numbers, symbols).
- Collation: Defines how to compare and sort characters within a character set.
For modern applications, you almost always want to use utf8mb4.
Why utf8mb4 and not utf8?
This is a critical point. MySQL originally had a utf8 character set that only supported a maximum of 3 bytes per character. This is sufficient for most common languages but cannot store all Unicode characters, especially emojis (like 😊) and some rare CJK (Chinese, Japanese, Korean) characters.

utf8: Max 3 bytes per character. Cannot store emojis.utf8mb4: Max 4 bytes per character. A true superset of UTF-8. It can store every character in the Unicode standard, including emojis. You should always useutf8mb4for new projects.
The Golden Rule: Consistency is Key
To avoid encoding issues, you must ensure the character set is the same in three places:
- The Database/Table/Column: The character set defined in your MySQL schema.
- The MySQL Connection: The character set used by your Python script to communicate with the MySQL server.
- Your Python File: The encoding of your Python source code file (
.py).
If any of these three are out of sync, you will get garbled text.
Step-by-Step Guide to Proper Encoding
Let's walk through setting up a robust connection.
Step 1: Set the MySQL Character Set
First, ensure your database, tables, and text columns (VARCHAR, TEXT, etc.) are created with the utf8mb4 character set and a suitable collation like utf8mb4_unicode_ci.

You can do this in your schema file or when creating the database:
-- Create the database with utf8mb4
CREATE DATABASE my_app_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
-- Use the database
USE my_app_db;
-- Create a table with utf8mb4 columns
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
bio TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
);
Step 2: Use a Modern MySQL Connector
The recommended library for connecting Python to MySQL is mysql-connector-python. It's actively maintained by Oracle and has excellent support for specifying connection parameters.
First, install it:
pip install mysql-connector-python
Step 3: Configure the Connection with charset
When you establish a connection, you must explicitly tell the connector to use utf8mb4. This is the most important step for preventing issues at the connection level.
import mysql.connector
from mysql.connector import Error
try:
# The 'charset' parameter is the key here!
connection = mysql.connector.connect(
host='localhost',
database='my_app_db',
user='your_username',
password='your_password',
charset='utf8mb4' # <-- IMPORTANT: Use utf8mb4 for full Unicode support
)
if connection.is_connected():
db_info = connection.get_server_info()
print(f"Successfully connected to MySQL Server version {db_info}")
cursor = connection.cursor()
cursor.execute("SELECT database();")
record = cursor.fetchone()
print(f"You're connected to database: {record[0]}")
except Error as e:
print(f"Error while connecting to MySQL: {e}")
finally:
# Closing the connection
if 'connection' in locals() and connection.is_connected():
cursor.close()
connection.close()
print("MySQL connection is closed.")
Step 4: Ensure Your Python File is Saved as UTF-8
Most modern code editors (like VS Code, PyCharm, Sublime Text) save files as UTF-8 by default. This is good practice. It ensures that if you hardcode strings with special characters in your Python script, they are correctly interpreted before being sent to the database.
# This string will be correctly handled if the file is saved as UTF-8 greeting = "你好,世界!" emoji_text = "Hello with an emoji: 😊"
Complete Working Example
Here is a full script that demonstrates creating a table, inserting data with special characters and emojis, and retrieving it.
import mysql.connector
from mysql.connector import Error
def create_connection(host_name, user_name, user_password, db_name=None):
"""Create a database connection to a MySQL server."""
connection = None
try:
connection = mysql.connector.connect(
host=host_name,
user=user_name,
passwd=user_password,
database=db_name,
charset='utf8mb4' # Ensure the connection uses utf8mb4
)
print("Connection to MySQL successful")
except Error as e:
print(f"The error '{e}' occurred")
return connection
def execute_query(connection, query, data=None):
"""Execute a single query."""
cursor = connection.cursor()
try:
if data:
cursor.execute(query, data)
else:
cursor.execute(query)
connection.commit()
print("Query executed successfully")
except Error as e:
print(f"The error '{e}' occurred")
def execute_read_query(connection, query):
"""Execute a read query and fetch results."""
cursor = connection.cursor()
try:
cursor.execute(query)
result = cursor.fetchall()
return result
except Error as e:
print(f"The error '{e}' occurred")
# --- Main Execution ---
if __name__ == "__main__":
# --- Connection Details ---
HOST = "localhost"
USER = "your_username"
PASSWORD = "your_password"
DB_NAME = "my_app_db"
# 1. Connect to the MySQL server (no database initially to create it)
conn = create_connection(HOST, USER, PASSWORD)
# 2. Create the database if it doesn't exist (using utf8mb4)
create_db_query = "CREATE DATABASE IF NOT EXISTS my_app_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci"
execute_query(conn, create_db_query)
# 3. Close and reconnect to the specific database
conn.close()
conn = create_connection(HOST, USER, PASSWORD, DB_NAME)
# 4. Create a table with utf8mb4 columns
create_table_query = """
CREATE TABLE IF NOT EXISTS users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
bio TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
favorite_emoji VARCHAR(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
)
"""
execute_query(conn, create_table_query)
# 5. Insert data with special characters and emojis
insert_user_query = "INSERT INTO users (name, bio, favorite_emoji) VALUES (%s, %s, %s)"
user_data = (
"张三", # Chinese characters
"这是一个关于Python和MySQL编码的测试。", # More Chinese
"😊" # Emoji
)
execute_query(conn, insert_user_query, user_data)
# 6. Retrieve and print the data to verify
select_users_query = "SELECT id, name, bio, favorite_emoji FROM users"
users = execute_read_query(conn, select_users_query)
print("\n--- Retrieved Users ---")
for user in users:
# The data should be printed correctly
print(f"ID: {user[0]}, Name: {user[1]}, Bio: {user[2]}, Emoji: {user[3]}")
# 7. Close the connection
conn.close()
Troubleshooting Common Issues
-
mysql.connector.errors.ProgrammingError: 1366 (HY000): Incorrect string value: This is the classic mojibake error. It means you are trying to insert a character that is not supported by the column's character set. Solution: Ensure your MySQL table columns are defined asCHARACTER SET utf8mb4and your connection usescharset='utf8mb4'. -
UnicodeEncodeErrororUnicodeDecodeErrorin Python: This usually happens when there's a mismatch between what your Python script expects (UTF-8) and what the MySQL connection is sending/receiving (e.g.,latin1). Solution: Double-check that your connection string includescharset='utf8mb4'. -
Data is correct in the database but shows as in your application: This is almost always a display issue. The data is stored correctly, but the tool you're using to view the database (like a GUI or command-line client) is not configured to display
utf8mb4. Check the client's connection settings and ensure it's also connecting withutf8mb4.
