杰瑞科技汇

Python、Protobuf与JSON如何高效转换?

Of course! This is a very common and important task in modern software development. Here's a comprehensive guide on working with Protocol Buffers (protobuf) and JSON in Python, covering the entire lifecycle from defining your schema to converting data.

Python、Protobuf与JSON如何高效转换?-图1
(图片来源网络,侵删)

We'll cover:

  1. What is Protobuf? (A quick refresher)
  2. Setting Up Your Environment
  3. Defining Your Schema (.proto file)
  4. Compiling the Schema to Python Code
  5. Working with Protobuf Messages in Python
  6. Converting Protobuf to JSON
  7. Converting JSON to Protobuf
  8. Putting It All Together: A Complete Example

What is Protocol Buffers (Protobuf)?

Protobuf is a method developed by Google for serializing structured data. It's language-neutral, platform-neutral, and extensible.

  • Schema Definition: You define your data structure in a .proto file.
  • Code Generation: A compiler (protoc) generates data access classes in your chosen language (e.g., Python, Java, C++).
  • Serialization: The generated classes can be serialized into a compact, binary format.
  • Key Advantage: This binary format is much smaller and faster to parse than text-based formats like XML or JSON, making it ideal for network transmission and storage.

Setting Up Your Environment

First, you need to install the necessary tools and libraries.

Install the Protocol Buffer Compiler (protoc) This is a separate tool that compiles your .proto files. You can download it from the official GitHub releases page. Make sure to add it to your system's PATH.

Python、Protobuf与JSON如何高效转换?-图2
(图片来源网络,侵删)

Install the Python Protobuf Library This library provides the runtime for your generated Python code.

pip install protobuf

Defining Your Schema (.proto file)

Let's create a simple schema for a Person. Create a file named person.proto:

// person.proto
syntax = "proto3";
// Define the package to help avoid name conflicts
package tutorial;
// The message definition for a person.
message Person {
  // The name of the person.
  string name = 1;
  // The age of the person.
  int32 age = 2;
  // An email address.
  string email = 3;
  // Nested message for a person's address
  message Address {
    string street = 1;
    string city = 2;
    string country = 3;
  }
  // An optional address field.
  // The 'optional' keyword is implicit in proto3 for scalar types,
  // but good practice for messages.
  optional Address address = 4;
}

Key Points:

  • syntax = "proto3";: Specifies we are using version 3 of the protobuf syntax.
  • message: Defines a data structure, similar to a class in Python.
  • = 1, = 2, etc.: These are field numbers. They are unique identifiers within a message and must never be changed once your data is in production.
  • optional: In proto3, fields are optional by default. This keyword is often used for message types to explicitly mark them as optional.

Compiling the Schema to Python Code

Now, use the protoc compiler to generate Python classes from your .proto file.

Python、Protobuf与JSON如何高效转换?-图3
(图片来源网络,侵删)

Open your terminal in the same directory as person.proto and run:

# The --python_out=. flag tells protoc to generate Python code
# in the current directory (indicated by the dot .).
protoc --python_out=. person.proto

This will create a new file: person_pb2.py. This is the generated file you will import and use in your Python code. Do not edit this file manually.


Working with Protobuf Messages in Python

Let's create a Python script (create_person.py) to use the generated classes.

# create_person.py
from person_pb2 import Person
# Create a new Person message object
person = Person()
# Set the fields of the message
person.name = "Alice"
person.age = 30
person.email = "alice@example.com"
# Create and set the nested Address message
address = person.address
address.street = "123 Main St"
address.city = "Wonderland"
address.country = "Fiction"
print("--- Protobuf Message Object ---")
print(person)
print("\n--- Accessing a specific field ---")
print(f"Name: {person.name}")
print(f"City: {person.address.city}")
# You can also serialize the message to a binary string
serialized_data = person.SerializeToString()
print("\n--- Serialized Binary Data (as bytes) ---")
print(serialized_data)

Converting Protobuf to JSON

The generated Python classes have a built-in method ToJson() for this. It's very straightforward.

Let's modify our script to include the conversion.

# create_and_convert_to_json.py
from person_pb2 import Person
# --- 1. Create a Protobuf Message (same as before) ---
person = Person()
person.name = "Bob"
person.age = 25
person.email = "bob@example.com"
person.address.street = "456 Oak Ave"
person.address.city = "Tech City"
person.address.country = "Future"
print("--- Original Protobuf Message ---")
print(person)
# --- 2. Convert Protobuf to JSON ---
# The ToJson() method does the magic!
json_string = person.SerializeToString() # This is binary
# The correct way is to use json_format.MessageToJson
# Let's import it
from google.protobuf import json_format
json_output = json_format.MessageToJson(person)
print("\n--- Converted to JSON String ---")
print(json_output)
print("\n--- Type of the output ---")
print(type(json_output)) # It's a standard Python string

Why json_format.MessageToJson? The ToJson() method on the message object is deprecated. The modern and recommended way is to use google.protobuf.json_format.MessageToJson(). It handles things like enums, special field names, and nested messages correctly.


Converting JSON to Protobuf

You can also convert a JSON string back into a Protobuf message object using json_format.Parse(). This is useful when you receive JSON data from an API and want to serialize it efficiently.

# convert_json_to_protobuf.py
from person_pb2 import Person
from google.protobuf import json_format
# A JSON string that matches our Person schema
json_data = """
{
  "name": "Charlie",
  "age": 42,
  "email": "charlie@example.com",
  "address": {
    "street": "789 Pine Ln",
    "city": "Data Valley",
    "country": "Protocol Land"
  }
}
"""
# Create an empty message object to populate
new_person = Person()
# Parse the JSON string into the Protobuf message
json_format.Parse(json_data, new_person)
print("--- JSON String ---")
print(json_data)
print("\n--- Parsed Protobuf Message ---")
print(new_person)
# You can now access the fields as usual
print(f"\nName from parsed Protobuf: {new_person.name}")
print(f"Country from parsed Protobuf: {new_person.address.country}")

Key Considerations for JSON -> Protobuf:

  • The JSON keys must match the field names in your .proto file (e.g., name, not Name or full_name).
  • The JSON values must be of the correct type (e.g., age must be a number, not a string "25").
  • If a field is missing in the JSON, it will not be set in the Protobuf message (it will have its default value: 0, , None, etc.).

Putting It All Together: A Complete Example

Here is a single, complete script that demonstrates the full cycle.

# main.py
import sys
from person_pb2 import Person
from google.protobuf import json_format
def main():
    """
    Demonstrates the full lifecycle of a Protobuf message:
    1. Create a message from scratch.
    2. Serialize it to a JSON string.
    3. Parse a JSON string back into a message.
    4. Serialize the final message to binary.
    """
    print("--- 1. Creating a Protobuf Message ---")
    # Create a new Person message object
    person = Person()
    person.name = "David"
    person.age = 35
    person.email = "david@example.com"
    person.address.street = "101 Data Drive"
    person.address.city = "Pythonville"
    person.address.country = "USA"
    print(person)
    print("\n--- 2. Converting Protobuf to JSON ---")
    # Convert the Protobuf message to a JSON string
    json_string = json_format.MessageToJson(person)
    print("JSON Output:")
    print(json_string)
    print("\n--- 3. Converting JSON back to Protobuf ---")
    # Define a new JSON string to parse
    new_json_data = """
    {
      "name": "Eve",
      "age
分享:
扫描分享到社交APP
上一篇
下一篇