Python如何将HTML转为PDF？-杰瑞科技汇

Of course! Converting HTML to PDF in Python is a common task, and there are several excellent libraries to choose from. The best one for you depends on your specific needs, such as handling complex CSS, JavaScript, and images.

Here’s a comprehensive guide covering the most popular and effective methods, from simple command-line tools to powerful Python libraries.

Summary of Methods

Method	Best For	Pros	Cons
WeasyPrint	High-fidelity conversion, print stylesheets, static-to-PDF.	Excellent CSS support, great for print media, pure Python.	Can be tricky with complex JS, no browser rendering engine.
pdfkit	Simplicity, using the power of Chrome/Chromium.	Very easy to use, handles modern web tech (JS, CSS).	Requires an external browser installation, less control over output.
xhtml2pdf	Generating reports from HTML templates (like Django).	Integrates well with templating engines, handles basic CSS.	Older, CSS support can be dated, can be slow.
Playwright / Selenium	Dynamic websites with heavy JavaScript.	Full browser automation, can handle anything a browser can.	More complex setup, overkill for simple static HTML.

Method 1: WeasyPrint (Recommended for Most Cases)

WeasyPrint is a visual rendering engine for HTML and CSS. It's fantastic for converting HTML documents to PDFs, especially if you're using print-specific CSS (@media print).

Why use it?

High-fidelity: It aims to produce pixel-perfect PDFs from your HTML/CSS.
Pure Python: No external dependencies like browsers.
Excellent CSS support: Supports a large subset of CSS 2.1 and CSS 3, including advanced features like columns, flexbox, and grid.

Step 1: Installation

First, you need to install WeasyPrint and its dependencies. It's best to do this in a virtual environment.

# Create and activate a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
# Install WeasyPrint
pip install WeasyPrint

Note: WeasyPrint has system dependencies like Cairo, Pango, and GTK. If you encounter issues during installation, check the official installation guide for your operating system.

Step 2: Basic Usage

Here's a simple example. Create a Python script (e.g., convert.py):

import weasyprint
# 1. Define your HTML content as a string
html_string = """
<!DOCTYPE html>
<html>
<head>
    <style>
        body { font-family: sans-serif; }
        h1 { color: #2c3e50; }
        .highlight { background-color: #f1c40f; padding: 5px; }
        @page {
            size: A4;
            margin: 2cm;
        }
    </style>
</head>
<body>
    <h1>Hello, WeasyPrint!</h1>
    <p>This is a paragraph converted to a PDF.</p>
    <p class="highlight">This paragraph is highlighted with CSS.</p>
</body>
</html>
"""
# 2. Convert the HTML string to a PDF
# The result is a PDF object
pdf = weasyprint.HTML(string=html_string).write_pdf()
# 3. Write the PDF to a file
with open("output_weasyprint.pdf", "wb") as f:
    f.write(pdf)
print("PDF generated successfully: output_weasyprint.pdf")

Step 3: Using an External HTML File

You can also point WeasyPrint to an existing .html file.

import weasyprint
# Path to your HTML file
html_file = "my_document.html"
# Path for the output PDF
pdf_file = "output_from_file.pdf"
# Convert the HTML file to PDF
weasyprint.HTML(filename=html_file).write_pdf(pdf_file)
print(f"PDF generated successfully: {pdf_file}")

Method 2: pdfkit (Using Chrome/Chromium)

pdfkit is a Python wrapper around the command-line tool wkhtmltopdf. This tool uses the WebKit rendering engine (the same one used in older versions of Chrome) to convert HTML to PDF. It's very good at handling modern websites, including JavaScript.

Why use it?

Excellent JS/CSS support: Since it uses a real browser engine, it handles JavaScript, complex CSS, and even SVGs perfectly.
Relatively simple setup.

Step 1: Installation

You need to install two things:

wkhtmltopdf: The underlying binary.
pdfkit: The Python wrapper.

A. Install wkhtmltopdf

Windows: Download the installer from the official site. Important: Note the installation path (e.g., C:\Program Files\wkhtmltopdf\bin).
macOS: brew install wkhtmltopdf
Linux (Debian/Ubuntu): sudo apt-get install wkhtmltopdf

B. Install pdfkit

pip install pdfkit

Step 2: Basic Usage

If wkhtmltopdf is in your system's PATH, the basic usage is very simple.

import pdfkit
# Path to your HTML file
html_file = "my_document.html"
# Convert HTML file to PDF
pdfkit.from_file(html_file, "output_pdfkit.pdf")
print("PDF generated successfully: output_pdfkit.pdf")

Step 3: Handling Custom Paths

If wkhtmltopdf is not in your PATH, you must tell pdfkit where to find it. This is common on Windows.

import pdfkit
# Path to the wkhtmltopdf executable
path_to_wkhtmltopdf = r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe' # Use 'r' for raw string
# Configuration for pdfkit
config = pdfkit.configuration(wkhtmltopdf=path_to_wkhtmltopdf)
# Path to your HTML file
html_file = "my_document.html"
# Convert using the custom configuration
pdfkit.from_file(html_file, "output_pdfkit_custom_path.pdf", configuration=config)
print("PDF generated successfully: output_pdfkit_custom_path.pdf")

You can also convert from a string:

import pdfkit
html_string = "<h1>Hello from pdfkit!</h1><p>This came from a string.</p>"
pdfkit.from_string(html_string, "output_from_string.pdf")

Method 3: xhtml2pdf (Good for Templating)

xhtml2pdf is another popular library that can be useful, especially if you're generating reports from templates within a framework like Django. It's a port of the PHP library html2pdf.

Why use it?

Good for templating scenarios.
Can be simpler for basic report generation.

Step 1: Installation

pip install xhtml2pdf

Step 2: Basic Usage

from xhtml2pdf import pisa
# Define your HTML content
html_string = """
<html>
<head>
    <style>
        body { font-family: 'Helvetica Neue', Arial, sans-serif; }
        table { border-collapse: collapse; width: 100%; }
        th, td { border: 1px solid #dddddd; text-align: left; padding: 8px; }
        th { background-color: #f2f2f2; }
    </style>
</head>
<body>
    <h1>Report</h1>
    <table>
        <tr><th>Product</th><th>Price</th></tr>
        <tr><td>Apple</td><td>$1.00</td></tr>
        <tr><td>Banana</td><td>$0.50</td></tr>
    </table>
</body>
</html>
"""
# Output file name
output_filename = "output_xhtml2pdf.pdf"
# Open output file for writing (binary mode)
with open(output_filename, "w+b") as result_file:
    # Convert HTML to PDF
    pisa_status = pisa.CreatePDF(
        html_string,                # the HTML to convert
        dest=result_file)           # file handle to receive result
# Check if conversion was successful
if pisa_status.err:
    print("Error converting HTML to PDF!")
else:
    print(f"PDF generated successfully: {output_filename}")

Method 4: Playwright / Selenium (For Dynamic Content)

If your HTML requires JavaScript to run before it's ready to be converted (e.g., a single-page application or a page with dynamic charts), you need a tool that can control a real browser.

Why use it?

Handles JavaScript: Can wait for elements to load, click buttons, and fill forms before printing.
Maximum compatibility: Can render anything a modern browser can.

This is a more advanced approach. Here’s a conceptual example using Playwright, which is generally faster and more modern than Selenium.

Step 1: Installation

pip install playwright
playwright install  # This installs the browser binaries (Chromium, Firefox, WebKit)

Step 2: Basic Usage

This script will open a browser, navigate to a URL (or serve local HTML), wait for a specific element, and then print the page to a PDF.

from playwright.sync_api import sync_playwright
# Path for the output PDF
pdf_file = "output_playwright.pdf"
# Your HTML content (for this example, we'll save it to a temp file)
# In a real scenario, you might navigate to a live URL.
html_content = """
<!DOCTYPE html>
<html>
<head>Dynamic Page</title>
    <script>
        setTimeout(() => {
            document.getElementById('content').innerHTML = 'Hello, loaded after 2 seconds!';
        }, 2000);
    </script>
</head>
<body>
    <h1>Waiting for JS...</h1>
    <p id="content">Content will appear here.</p>
</body>
</html>
"""
with open("temp.html", "w") as f:
    f.write(html_content)
with sync_playwright() as p:
    # Launch a browser instance (Chromium, Firefox, or WebKit)
    browser = p.chromium.launch()
    # Open a new page
    page = browser.new_page()
    # Go to the local HTML file
    page.goto("file://" + "temp.html")
    # Wait for the specific element to contain the text we expect
    # This ensures the JS has run and the content is ready
    page.wait_for_selector("#content", has_text="Hello, loaded after 2 seconds")
    # Print the page to a PDF
    # The path must be absolute
    page.pdf(path=pdf_file)
    # Close the browser
    browser.close()
    print(f"PDF generated successfully: {pdf_file}")
# Clean up the temp file
import os
os.remove("temp.html")

Which One Should I Choose?

For most static HTML/CSS documents, especially for reports or invoices: Use WeasyPrint. It's powerful, pure Python, and gives you great control.
For modern websites with lots of JavaScript or complex CSS: Use pdfkit. It's robust and handles real-world web content very well.
If you're already in a Django project and need basic PDF generation from templates: xhtml2pdf is a viable, simple option.
If your HTML is not ready until JavaScript has run and user interaction might be needed: Use Playwright or Selenium. This is the most flexible but also the most complex approach.

Python如何将HTML转为PDF？

Summary of Methods

Method 1: WeasyPrint (Recommended for Most Cases)

Step 1: Installation

Step 2: Basic Usage

Step 3: Using an External HTML File

Method 2: pdfkit (Using Chrome/Chromium)

Step 1: Installation

Step 2: Basic Usage

Step 3: Handling Custom Paths

Method 3: xhtml2pdf (Good for Templating)

Step 1: Installation

Step 2: Basic Usage

Method 4: Playwright / Selenium (For Dynamic Content)

Step 1: Installation

Step 2: Basic Usage

Which One Should I Choose?

99ANYc3cd6

Python HTML转PDF用什么库实现？

Python的for in循环如何遍历list？

信息安全等级测评师培训教程怎么学？

Python datetime时间戳如何转换与计算？

dom4j如何高效读取XML文件？

Java Socket编程实例具体怎么实现？

Python filehandler如何高效操作文件？

Java中字节代码由谁解释执行？

Python如何监控Weblogic服务器状态？

Java byte转int为何要补位？

ArcGIS Python脚本编程如何上手？

Java版MC下载去哪找？

淘宝PS教程哪个更实用？新手怎么选？

Python曲线拟合如何选择最佳模型？

Ansys 13.0教程怎么学？入门到精通技巧有哪些？

Excel2007视频教程下载哪里有？

Python如何将HTML转为PDF？

Summary of Methods

Method 1: WeasyPrint (Recommended for Most Cases)

Step 1: Installation

Step 2: Basic Usage

Step 3: Using an External HTML File

Method 2: pdfkit (Using Chrome/Chromium)

Step 1: Installation

Step 2: Basic Usage

Step 3: Handling Custom Paths

Method 3: xhtml2pdf (Good for Templating)

Step 1: Installation

Step 2: Basic Usage

Method 4: Playwright / Selenium (For Dynamic Content)

Step 1: Installation

Step 2: Basic Usage

Which One Should I Choose?

相关推荐

Java Socket编程实例具体怎么实现？