杰瑞科技汇

如何用Python编写C扩展模块?

Of course! Let's dive deep into Python C Extensions. This is a powerful feature that allows you to write performance-critical code in C (or C++) and make it accessible from Python.

如何用Python编写C扩展模块?-图1
(图片来源网络,侵删)

What is a Python C Extension?

A Python C Extension is a module that allows you to extend Python's functionality with C code. When you write a Python script and import a module, the interpreter loads the module's code. If the module is a C extension, it loads a compiled C library (usually a .so file on Linux, .pyd on Windows, or .dylib on macOS) that contains C functions which can be called directly from Python.

Why would you want to do this?

  1. Performance: This is the primary reason. Python is an interpreted language, which is inherently slower than compiled languages like C for CPU-intensive tasks. If you have a loop that performs millions of calculations, rewriting that loop in C can result in a massive speedup (often by orders of magnitude).
  2. Access to Low-Level System Features: C gives you direct access to memory pointers, system calls, and hardware features that are not available or are difficult to access from pure Python.
  3. Leverage Existing C/C++ Libraries: There are thousands of high-quality, mature C/C++ libraries for scientific computing (e.g., BLAS, LAPACK), image processing (e.g., OpenCV), and networking. C extensions allow you to wrap these libraries and use them from Python, creating powerful tools like NumPy and SciPy.
  4. Create New Built-in Types: You can define new data types in C that are more efficient than Python's built-in types and can be used directly in your Python code.

How to Create a C Extension: The Modern Approach (Python 3.2+)

The modern and recommended way to write a C extension is by using the Python setuptools and its pyproject.toml file. This approach is far simpler and more portable than the old, manual method of writing setup.py files with distutils.

Let's create a simple C extension that adds two numbers.

Step 1: Project Structure

Create a new directory for your project with the following structure:

如何用Python编写C扩展模块?-图2
(图片来源网络,侵删)
my_project/
├── my_module/
│   ├── __init__.py
│   └── my_module.c
└── pyproject.toml
  • my_module/: This will be our Python package.
  • __init__.py: An empty file that makes my_module a package.
  • my_module.c: The C source code for our extension.
  • pyproject.toml: The build configuration file.

Step 2: Write the C Code (my_module.c)

This is the core of our extension. We need to tell Python about our C function so it can call it. We do this using a special "Method Table".

// my_module.c
// Include the Python.h header. This is essential.
#include <Python.h>
// This is the C function that will do the work.
// It receives a Python tuple of arguments (args) and a Python dictionary of keyword arguments (kwargs).
static PyObject* add_numbers(PyObject* self, PyObject* args) {
    // Variables to hold our two numbers.
    long a, b;
    // Parse the arguments from the Python tuple.
    // "ll" means we expect two long integers.
    // If the arguments are not valid, this function returns NULL and sets a Python error.
    if (!PyArg_ParseTuple(args, "ll", &a, &b)) {
        return NULL; // An error has been set by PyArg_ParseTuple.
    }
    // Perform the calculation.
    long result = a + b;
    // Create a Python integer object from the C long result.
    // PyLong_FromLong is the function to convert a C long to a Python int.
    return PyLong_FromLong(result);
}
// This is the Method Table. It links Python names to our C functions.
// It's an array of PyMethodDef structures.
// {Python_name, C_function_pointer, flags, doc_string}
static PyMethodDef MyModuleMethods[] = {
    {"add", add_numbers, METH_VARARGS, "Add two numbers."},
    // Add more functions here if needed.
    // The last entry must be all NULLs to signal the end of the table.
    {NULL, NULL, 0, NULL}
};
// This is the module definition structure.
// It tells Python the name of the module and where to find its methods.
static struct PyModuleDef my_module_def = {
    PyModuleDef_HEAD_INIT,
    "my_module", // Name of the module
    NULL,        // Module documentation
    -1,          // Size of per-interpreter state of the module, or -1 if the module keeps state in global variables.
    MyModuleMethods // The method table from above.
};
// This is the function that Python calls when the module is imported.
// It must be named PyInit_<modulename>.
PyMODINIT_FUNC PyInit_my_module(void) {
    // Create and return the module object.
    return PyModule_Create(&my_module_def);
}

Step 3: Configure the Build (pyproject.toml)

This file tells setuptools how to build our C extension.

# pyproject.toml
[build-system]
requires = ["setuptools>=42", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "my_project"
version = "0.1.0"
description = "A simple C extension example."
readme = "README.md"
requires-python = ">=3.8"
classifiers = [
    "Programming Language :: Python :: 3",
    "License :: OSI Approved :: MIT License",
    "Operating System :: OS Independent",
]
[tool.setuptools]
packages = ["my_module"]
[tool.setuptools.ext-modules]
# This is the magic part!
# It tells setuptools to build a C extension named "my_module"
# from the source file "my_module.c".
my_module = ["my_module.c"]

Step 4: Build and Install the Extension

Now, open your terminal in the my_project directory and run the following commands. It's highly recommended to do this in a virtual environment.

# Create a virtual environment (optional but good practice)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
# Install the package in editable mode (-e)
# This will build the C extension and install it.
pip install -e .

You should see output from the compiler (like gcc or clang) as it compiles my_module.c. If it succeeds, the my_module is now installed in your environment.

如何用Python编写C扩展模块?-图3
(图片来源网络,侵删)

Step 5: Use the Extension in Python

Now you can use your C extension just like any other Python module!

Create a file named main.py:

# main.py
import my_module
# Call the C function directly from Python
result = my_module.add(10, 25)
print(f"The result from the C function is: {result}")
# Let's compare it to a pure Python function
def python_add(a, b):
    return a + b
# For a simple operation, the difference is negligible,
# but for complex operations, the C version would be much faster.
print(f"The result from the Python function is: {python_add(10, 25)}")

Run it:

python main.py

Output:

The result from the C function is: 35
The result from the Python function is: 35

Congratulations! You have successfully created and used a Python C extension.


Advanced Topics & Considerations

  1. The GIL (Global Interpreter Lock):

    • The GIL is a mutex that protects access to Python objects, preventing multiple native threads from executing Python bytecode at the same time within a single process.
    • Implications for C Extensions:
      • Releasing the GIL: If your C function is a long-running, CPU-bound task that doesn't need to interact with Python objects, you can release the GIL. This allows other Python threads to run while your C code is executing. You do this using Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS.
      • Not a Silver Bullet: Releasing the GIL is only beneficial for multi-threaded applications. For single-threaded scripts, it adds overhead. For I/O-bound tasks, it's not needed.
  2. Error Handling:

    • The key is: If your C function encounters an error, it must return NULL. Python's C API functions that fail (like PyArg_ParseTuple) will set a Python exception and return NULL. Your function should propagate this.
    • To raise a specific Python exception from C, you can use functions like PyErr_SetString(PyExc_TypeError, "A descriptive message.").
  3. Python 2 vs. Python 3:

    • The API changed significantly between Python 2 and Python 3. The method described above using PyModuleDef and PyInit_ is the Python 3 standard. The Python 2 method used Py_InitModule and had a different module structure. Always target Python 3.
  4. Alternatives to Manual C Coding:

    • Cython: This is a superset of the Python language that allows you to add C-like static type declarations. Cython code is translated into highly optimized C code and then compiled. It's much easier and safer than writing raw C extensions, as you can mostly work with Python-like syntax. For most users, Cython is the recommended way to write high-performance extensions.
    • CFFI (C Foreign Function Interface): A more direct way to call existing C libraries from Python. It's less about creating new Python types and more about interfacing with foreign C functions.
    • ctypes: A built-in Python library for calling functions in shared libraries (DLLs on Windows, .so on Linux). It's very easy to use for simple tasks but is less powerful and efficient than Cython or a handwritten extension.

Summary: When to Use What

Method Complexity Performance Use Case
Pure Python Low Low Rapid prototyping, I/O-bound tasks, most applications.
Cython Medium High The sweet spot. Best for most performance-critical code. Easier and safer than raw C.
Handwritten C Extension High Highest Maximum performance, access to low-level features, wrapping complex C APIs.
ctypes Low Medium Quick and easy calls to simple, existing C libraries.
分享:
扫描分享到社交APP
上一篇
下一篇