Of course. This is a very common and important topic for building modern, high-performance applications. Let's break down how to use Protocol Buffers (Protobuf) with Python and C++, covering the workflow, best practices, and a complete example.

What is Protocol Buffers (Protobuf)?
Protobuf is a method developed by Google for serializing structured data. It's like a more efficient, language-agnostic version of XML or JSON.
Key Advantages:
- Schema-based: You define your data structure in a
.protofile. - Language Agnostic: You can generate code for many languages (Python, C++, Java, Go, etc.) from the same
.protofile. - Performance: The generated code is highly optimized for both speed and size. The binary format is much smaller and faster to parse than text-based formats like JSON.
- Strictness: The schema enforces data types and structure, catching many errors at compile-time rather than runtime.
The Core Workflow (The 3-Step Process)
No matter which languages you use, the process is always the same:
- Define the Schema: Write a
.protofile that describes your data structures. - Generate Code: Use the Protobuf compiler (
protoc) to generate Python and C++ classes from your.protofile. - Use the Generated Code: In your Python and C++ applications, import the generated modules and use them to serialize/deserialize data.
Step 1: Define the Schema (person.proto)
This is the single source of truth for your data. Let's create a simple person.proto file.

// person.proto
syntax = "proto3"; // Use proto3 syntax
package tutorial; // A namespace to avoid name collisions
// The message definition for a Person.
message Person {
string name = 1;
int32 id = 2; // Unique ID for this person
string email = 3;
// A nested message for a phone number.
message PhoneNumber {
string number = 1;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
PhoneType type = 2;
}
repeated PhoneNumber phones = 4; // 'repeated' means this is a list/array
}
Key Concepts:
syntax = "proto3";: Specifies the version.package tutorial;: Creates a namespace for the generated code.message Person { ... }: Defines a struct-like object.string name = 1;:nameis a field of typestring. The number1is a unique field tag. It's crucial for the binary format and should not be changed once data is serialized with it.repeated PhoneNumber phones = 4;: Defines a list ofPhoneNumberobjects.enum PhoneType { ... }: Defines a set of named constants.
Step 2: Prerequisites & Code Generation
Before you can generate code, you need to install the necessary tools for both Python and C++.
Prerequisites
-
Install the Protobuf Compiler (
protoc):- macOS (using Homebrew):
brew install protobuf - Ubuntu/Debian:
sudo apt-get install protobuf-compiler - Windows (using vcpkg):
vcpkg install protobuf - Pre-compiled Binaries: You can also download binaries directly from the GitHub Releases page.
- macOS (using Homebrew):
-
Install Python Protobuf Library:
(图片来源网络,侵删)pip install protobuf
-
Install C++ Protobuf Library (Headers and Libraries):
- macOS (using Homebrew):
brew install protobuf - Ubuntu/Debian:
sudo apt-get install libprotobuf-dev - Windows (using vcpkg):
vcpkg install protobuf
- macOS (using Homebrew):
Generate the Code
Now, run the protoc compiler. It's best practice to create a separate directory for the generated code, e.g., python_pb and cpp_pb.
# Create directories for generated code mkdir python_pb cpp_pb # Generate Python code # The --python_out flag specifies the output directory for Python files. # It will create person_pb2.py. protoc --python_out=python_pb person.proto # Generate C++ code # The --cpp_out flag specifies the output directory for C++ files. # It will create person.pb.h and person.pb.cc. protoc --cpp_out=cpp_pb person.proto
After running these commands, you will have:
python_pb/person_pb2.pycpp_pb/person.pb.h(header file)cpp_pb/person.pb.cc(source file)
Step 3: Use the Generated Code
Now let's see how to use these generated modules in Python and C++.
Example in Python
Create a file write_read_python.py:
import sys
import os
# Add the directory containing the generated module to the Python path
sys.path.append(os.path.join(os.path.dirname(__file__), 'python_pb'))
from tutorial import person_pb2 # Import the generated module
def main():
# --- 1. Create and populate a Person object ---
person = person_pb2.Person()
person.name = "Jane Doe"
person.id = 12345
person.email = "jane.doe@example.com"
# Add a phone number
phone = person.phones.add()
phone.number = "555-1234"
phone.type = person_pb2.PhoneType.HOME
# Add another phone number
phone = person.phones.add()
phone.number = "555-5678"
phone.type = person_pb2.PhoneType.WORK
# --- 2. Serialize the object to a byte string ---
serialized_data = person.SerializeToString()
print(f"Serialized data (bytes): {serialized_data}")
print(f"Serialized data (hex): {serialized_data.hex()}\n")
# --- 3. Deserialize the byte string back into a Person object ---
new_person = person_pb2.Person()
new_person.ParseFromString(serialized_data)
# --- 4. Verify the deserialized data ---
print("Deserialized Person:")
print(f" Name: {new_person.name}")
print(f" ID: {new_person.id}")
print(f" Email: {new_person.email}")
for phone_number in new_person.phones:
print(f" Phone: {phone_number.number} (Type: {phone_number.type})")
if __name__ == '__main__':
main()
To run the Python script:
python write_read_python.py
Example in C++
C++ is more verbose. You'll need to link against the libprotobuf library.
First, create a CMakeLists.txt file to manage the build process easily:
cmake_minimum_required(VERSION 3.10)
project(ProtobufCPlusPlusExample)
# Find the Protobuf package
find_package(protobuf CONFIG REQUIRED)
# Add the executable
add_executable(write_read_cpp write_read_cpp.cpp)
# Link the protobuf library
target_link_libraries(write_read_cpp PRIVATE protobuf::libprotobuf)
# Include the directory where our generated header is
target_include_directories(write_read_cpp PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/cpp_pb)
Now, create the C++ source file write_read_cpp.cpp:
#include <iostream>
#include <fstream>
#include <string>
// Include the generated header
#include "tutorial/person.pb.h"
// For convenience, import the whole namespace
using namespace tutorial;
void SerializeToFile(const Person& person, const std::string& filename) {
std::fstream output(filename, std::ios::out | std::ios::binary);
if (!person.SerializeToOstream(&output)) {
std::cerr << "Failed to write person." << std::endl;
}
}
bool ParseFromFile(const std::string& filename, Person* person) {
std::fstream input(filename, std::ios::in | std::ios::binary);
if (!person->ParseFromIstream(&input)) {
std::cerr << "Failed to parse person." << std::endl;
return false;
}
return true;
}
int main() {
// --- 1. Create and populate a Person object ---
Person person;
person.set_name("John Doe");
person.set_id(67890);
person.set_email("john.doe@example.com");
// Add a phone number
Person::PhoneNumber* phone = person.add_phones();
phone->set_number("555-9876");
phone->set_type(Person::PhoneType::HOME);
// --- 2. Serialize the object to a file ---
std::string filename = "person.dat";
SerializeToFile(person, filename);
std::cout << "Serialized data to " << filename << std::endl << std::endl;
// --- 3. Deserialize the file back into a Person object ---
Person new_person;
if (!ParseFromFile(filename, &new_person)) {
return 1; // Exit if parsing failed
}
// --- 4. Verify the deserialized data ---
std::cout << "Deserialized Person:" << std::endl;
std::cout << " Name: " << new_person.name() << std::endl;
std::cout << " ID: " << new_person.id() << std::endl;
std::cout << " Email: " << new_person.email() << std::endl;
for (const auto& phone_number : new_person.phones()) {
std::cout << " Phone: " << phone_number.number()
<< " (Type: " << phone_number.type() << ")" << std::endl;
}
// Clean up the generated .cc file if you wish
// std::remove("person.dat");
return 0;
}
To compile and run the C++ code:
# Create a build directory mkdir build cd build # Configure with CMake cmake .. # Build the project make # Run the executable ./write_read_cpp
Summary and Key Takeaways
| Feature | Python | C++ |
|---|---|---|
| Generated Module | person_pb2.py (Python module) |
person.pb.h (header) & person.pb.cc (source) |
| Include/Import | from tutorial import person_pb2 |
#include "tutorial/person.pb.h" |
| Object Creation | person = person_pb2.Person() |
Person person; |
| Field Assignment | person.name = "..." |
person.set_name("..."); |
| Field Access | person.name |
person.name() (accessor) |
| Repeated Fields | person.phones.add() to add items |
person.add_phones() to add items |
| Serialization | data = person.SerializeToString() |
person.SerializeToString(&data) |
| Deserialization | new_person.ParseFromString(data) |
new_person.ParseFromString(&data) |
| Building | pip install protobuf |
find_package(protobuf) in CMake, link libprotobuf |
This workflow allows you to define your data model once and share it seamlessly between a high-level language like Python (for scripting, web backends, data analysis) and a high-performance language like C++ (for game engines, embedded systems, or performance-critical components). The data remains consistent and the communication is fast and efficient.
