杰瑞科技汇

Python creoleparser如何解析Markdown语法?

Of course! Let's dive into python-creoleparser.

Python creoleparser如何解析Markdown语法?-图1
(图片来源网络,侵删)

What is Creole and Why Use It?

Creole is a lightweight markup language designed to be simple, easy to read, and easy to write. It's a common choice for wikis because it's less complex than full HTML but more structured than plain text.

Creole Example:

= Main Title ==
=== Subtitle ===
This is a paragraph with **bold text** and //italic text//.
*   A bulleted list item
*   Another item
    *   A nested item
[[Link to a Page|Link Text]]
<<SomeMacro>>
{image: myimage.png|This is the caption}

The python-creoleparser library is a powerful and flexible Python tool that parses this Creole syntax and converts it into other formats, most commonly HTML.


Installation

First, you need to install the library. It's available on PyPI.

Python creoleparser如何解析Markdown语法?-图2
(图片来源网络,侵删)
pip install creoleparser

Basic Usage: The Core Parser

The main entry point is the creoleparser module. The most straightforward way to use it is with the core.Parser.

import creoleparser
# The Creole text you want to parse
creole_text = """
= Welcome to Creole ==
This is a **bold** and //italic// paragraph.
Here's a list:
*   Item 1
*   Item 2
    *   Nested item
"""
# Create a parser instance. By default, it converts to HTML.
# We'll cover customization later.
parser = creoleparser.Parser()
# Parse the text
html_output = parser.parse(creole_text)
print(html_output)

Output:

<h1>Welcome to Creole</h1>
<p>This is a <strong>bold</strong> and <em>italic</em> paragraph.</p>
<p>Here's a list:</p>
<ul>
    <li>Item 1</li>
    <li>Item 2
        <ul>
            <li>Nested item</li>
        </ul>
    </li>
</ul>

Customization: The Real Power of CreoleParser

The core.Parser is great, but the real strength of this library is its flexibility. You can customize almost every aspect of the parsing process.

Customizing the Dialect

The "dialect" is a configuration object that tells the parser what syntax to recognize and how to handle it. You can build a dialect from a set of predefined modules.

Python creoleparser如何解析Markdown语法?-图3
(图片来源网络,侵删)

Common Dialect Modules:

  • creoleparser.basemod: Basic syntax like headings, text formatting, lists, links, etc.
  • creoleparser.lexer: The low-level tokenizer (usually you don't need to change this).
  • creoleparser.paragraph: Rules for handling paragraphs.
  • creoleparser.htmlsax: A backend for generating HTML (the default).

Example: Disabling Italic Text

Let's say you want to use for something else and disable the standard italic syntax.

import creoleparser
from creoleparser import core
# Import the modules we need for our custom dialect
from creoleparser import basemod, paragraph, htmlsax
# Create a custom dialect by modifying the default one
# We remove the 'em' (italic) rule from the text modifiers
my_dialect = core.Dialect(
    modules=[basemod.BlockHead, basemod.ListBlock, basemod.LinkRule, basemod.ImageRule, paragraph.PreprocessorRule],
    text=basemod.strong # Only keep 'strong' (bold), not 'em' (italic)
)
# Create a parser with our custom dialect
parser = creoleparser.Parser(dialect=my_dialect)
creole_text = "This text is **bold** but not //italic//."
html_output = parser.parse(creole_text)
print(html_output)

Output:

<p>This text is <strong>bold</strong> but not //italic//.</p>

Notice that //italic// was not converted.

Adding Custom Macros

Macros are a powerful feature for embedding dynamic content. Let's define a macro <<today>> that inserts the current date.

import creoleparser
from creoleparser import core, macros
from datetime import datetime
# Define a function to handle our macro
def print_today():
    return f"Today is: {datetime.now().strftime('%Y-%m-%d')}"
# Create a macro registry
macro_registry = macros.MacroRegistry()
macro_registry.register('today', print_today)
# Build a dialect that includes our custom macro rules
# We need to import the macro module
from creoleparser import macro as macro_mod
my_dialect = core.Dialect(
    modules=[
        basemod.BlockHead,
        basemod.ListBlock,
        basemod.LinkRule,
        basemod.ImageRule,
        paragraph.PreprocessorRule,
        macro_mod.BlockMacro, # Enable block macros like <<Macro>>
        macro_mod.InlineMacro # Enable inline macros like `Macro`
    ]
)
# Create the parser, passing the macro registry
parser = creoleparser.Parser(dialect=my_dialect, macro=macro_registry)
creole_text = """
The current date is <<today>>.
"""
html_output = parser.parse(creole_text)
print(html_output)

Output (will vary by day):

<p>The current date is Today is: 2025-10-27.</p>

Using a Different Backend (e.g., Docutils for reStructuredText)

You don't have to generate HTML. The library can output reStructuredText, which can then be converted to PDF, LaTeX, etc.

import creoleparser
from creoleparser import core
from creoleparser.backends import docutils
# Use the Docutils backend
# The dialect is slightly different for this backend
my_dialect = core.Dialect(
    modules=[
        # Docutils has its own specific modules
        docutils.BlockHead,
        docutils.ListBlock,
        docutils.LinkRule,
        docutils.ImageRule,
        docutils.ParagraphPreprocessorRule,
    ]
)
# The parser needs the Docutils backend
parser = creoleparser.Parser(dialect=my_dialect, backend=docutils.backend)
creole_text = """==
This is **bold** text.
"""
rst_output = parser.parse(creole_text)
print(rst_output)

Output (reStructuredText):

This is **bold** text.

Summary of Key Classes and Functions

Component Purpose Example
creoleparser.Parser The main class for parsing Creole text. parser = creoleparser.Parser(dialect=my_dialect)
core.Dialect A configuration object that defines the Creole syntax rules. my_dialect = core.Dialect(modules=[...])
basemod Module containing standard Creole elements (headings, lists, etc.). from creoleparser import basemod
macros.MacroRegistry A registry to hold your custom macro functions. registry = macros.MacroRegistry()
backends.htmlsax The default backend for generating HTML. parser = Parser(backend=htmlsax.backend)
backends.docutils A backend for generating reStructuredText. parser = Parser(backend=docutils.backend)

When to Use python-creoleparser

  • Building a Wiki: It's a perfect fit for a custom wiki engine.
  • User-Generated Content: Allow users to format their posts or comments with a simple, safe markup.
  • Configuration Files: Use Creole for human-readable configuration files that need some basic structure.
  • Documentation: As a simple alternative to Markdown or reStructuredText for lightweight docs.

When to Look Elsewhere

  • Full Markdown Support: If you need 100% compatibility with the official Markdown spec (including tables, footnotes, etc.), a dedicated Markdown parser like mistune or python-markdown might be better.
  • Maximum Performance: For extremely high-volume parsing, you might need to benchmark, but creoleparser is generally quite fast.
  • Extremely Complex Documents: For books or complex documentation with cross-references and indices, a full system like Sphinx (which uses reStructuredText) is more suitable.
分享:
扫描分享到社交APP
上一篇
下一篇