杰瑞科技汇

Java如何将HTML转为PDF?

Of course! Converting HTML to PDF in Java is a common requirement for generating reports, invoices, and other documents. There are several excellent libraries to achieve this, each with its own strengths.

Java如何将HTML转为PDF?-图1
(图片来源网络,侵删)

Here’s a comprehensive guide covering the most popular and effective methods.

Summary of Popular Libraries

Library Pros Cons Best For
Flying Saucer (xhtmlrenderer) - Free & Open Source
- Excellent CSS support (including modern features)
- Great for web content
- Can be slow for very large documents
- API can be a bit verbose
High-fidelity conversion of modern web pages and complex layouts.
iText - Powerful and mature
- Great for manipulating existing PDFs
- Commercial support available
- The free version (AGPL) has strong "copyleft" restrictions
- Can be complex for simple HTML-to-PDF tasks
Generating PDFs from scratch, merging/splitting PDFs, or when commercial support is needed.
OpenPDF - Free & Open Source
- Fork of the original iText 2.1.7
- Simple for basic tasks
- Lacks many modern features of iText 5+
- CSS support is basic
Simple, straightforward PDF generation where you don't need advanced features.
Apache PDFBox - Free & Open Source
- Pure Java
- Great for text extraction/manipulation
- Does not support HTML conversion natively (requires a separate tool like Flying Saucer) Projects already using PDFBox for PDF manipulation that need basic HTML rendering.

Method 1: Flying Saucer (Recommended for High-Fidelity Conversion)

Flying Saucer is the go-to library for converting XHTML to PDF. It uses a real rendering engine, similar to a web browser, which means it supports CSS very well, including modern features like Flexbox and Grid.

Step 1: Add the Dependency

Add the Flying Saucer and an XML dependency (like jsoup for HTML cleaning) to your pom.xml.

<dependencies>
    <!-- Flying Saucer: The core library for rendering -->
    <dependency>
        <groupId>org.xhtmlrenderer</groupId>
        <artifactId>flying-saucer-pdf</artifactId>
        <version>9.3.1</version> <!-- Check for the latest version -->
    </dependency>
    <!-- Jsoup: For parsing and cleaning HTML if needed -->
    <dependency>
        <groupId>org.jsoup</groupId>
        <artifactId>jsoup</artifactId>
        <version>1.17.2</version> <!-- Check for the latest version -->
    </dependency>
</dependencies>

Step 2: Create a Java Class

Here is a complete, runnable example.

Java如何将HTML转为PDF?-图2
(图片来源网络,侵删)
import org.jsoup.Jsoup;
import org.xhtmlrenderer.pdf.ITextRenderer;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
public class HtmlToPdfConverter {
    public static void main(String[] args) {
        // 1. Define your HTML content
        // You can load this from a file, a URL, or a string.
        String html = """
            <!DOCTYPE html>
            <html>
            <head>
                <style>
                    body { font-family: sans-serif; }
                    h1 { color: #2c3e50; }
                    .highlight { background-color: #f1c40f; padding: 5px; }
                    table { width: 100%; border-collapse: collapse; }
                    th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
                    th { background-color: #3498db; color: white; }
                </style>
            </head>
            <body>
                <h1>My First PDF Report</h1>
                <p>This is a paragraph generated from HTML using <span class="highlight">Flying Saucer</span>.</p>
                <h2>Table Example</h2>
                <table>
                    <thead>
                        <tr>
                            <th>Product</th>
                            <th>Quantity</th>
                            <th>Price</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td>Laptop</td>
                            <td>1</td>
                            <td>$1200</td>
                        </tr>
                        <tr>
                            <td>Mouse</td>
                            <td>2</td>
                            <td>$25</td>
                        </tr>
                    </tbody>
                </table>
            </body>
            </html>
            """;
        // 2. Create the output file
        File output = new File("report.pdf");
        // 3. Use Flying Saucer to convert
        try (OutputStream os = new FileOutputStream(output)) {
            ITextRenderer renderer = new ITextRenderer();
            // Optional: If your HTML is not well-formed XHTML, clean it with Jsoup
            // String cleanXhtml = Jsoup.parse(html).html();
            // renderer.setDocumentFromString(cleanXhtml);
            // Set the HTML content to be rendered
            renderer.setDocumentFromString(html);
            // Layout and write the PDF to the output stream
            renderer.layout();
            renderer.createPDF(os);
            System.out.println("PDF created successfully: " + output.getAbsolutePath());
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Key Concepts & Tips

  • CSS Support: Flying Saucer is excellent with CSS. You can use most CSS 2.1 properties and many CSS3 features.
  • External Resources (Images, CSS files): To load external files, you need to set a "user agent" or a "base URL".
    // If your HTML links to a CSS file or images, provide a base URL
    renderer.getSharedContext().setBaseURL("file:///path/to/your/files/");
    renderer.setDocumentFromString(html, "file:///path/to/your/files/");
  • Page Size & Orientation: You can configure this.
    renderer.getSharedContext().setPageSize(new org.xhtmlrenderer.css.style.PageSize(org.xhtmlrenderer.css.constants.CSSName.A4, 72)); // 72 DPI
    renderer.getSharedContext().setPrint(true); // Use print stylesheets

Method 2: iText (Commercial & Powerful)

iText is a very mature and powerful library. It's important to note the licensing:

  • AGPLv3: The free version. If you use it in a networked application (like a web server), your entire application must also be open-sourced under the AGPL license. This is often not acceptable for commercial products.
  • iText 7 Commercial: A paid version with a more permissive license.

Step 1: Add the Dependency

<!-- For iText 7 -->
<dependencies>
    <dependency>
        <groupId>com.itextpdf</groupId>
        <artifactId>html2pdf</artifactId>
        <version>5.0.5</version> <!-- Check for the latest version -->
    </dependency>
</dependencies>

Step 2: Create a Java Class

iText provides a HtmlConverter class that makes this very simple.

import com.itextpdf.html2pdf.HtmlConverter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class ItextHtmlToPdfConverter {
    public static void main(String[] args) {
        // 1. Define your HTML content (or load from a file)
        String html = "<h1>Hello iText!</h1><p>This PDF was created with the iText 7 HTML to PDF add-on.</p>";
        // 2. Create the output file
        File output = new File("itext-report.pdf");
        // 3. Convert HTML to PDF
        try (FileOutputStream fos = new FileOutputStream(output)) {
            HtmlConverter.convertToPdf(html, fos);
            System.out.println("PDF created successfully: " + output.getAbsolutePath());
        } catch (IOException e) {
            e.printStackTrace();
        }
        // Example from a file:
        // File htmlFile = new File("source.html");
        // try (FileInputStream fis = new FileInputStream(htmlFile)) {
        //     HtmlConverter.convertToPdf(fis, new FileOutputStream("from-file.pdf"));
        // }
    }
}

Key Concepts & Tips

  • Simplicity: The HtmlConverter is incredibly easy to use for basic conversions.
  • Advanced Features: iText's real power shines when you combine HTML generation with PDF manipulation (e.g., adding page numbers, watermarks, or digital signatures).
  • Licensing: Be very mindful of the AGPL license if you are using the free version in a commercial product.

Method 3: OpenPDF (Simple & Open Source)

OpenPDF is a fork of the original iText 2.1.7, which was open source. It's less powerful than iText 5+ or Flying Saucer but is great for simple tasks and has a very permissive LGPL license.

Step 1: Add the Dependency

<dependencies>
    <dependency>
        <groupId>com.github.librepdf</groupId>
        <artifactId>openpdf</artifactId>
        <version>1.3.30</version> <!-- Check for the latest version -->
    </dependency>
</dependencies>

Step 2: Create a Java Class

OpenPDF does not have a direct HTML-to-PDF converter like iText. You typically use an external tool like Flying Saucer alongside it. However, it can be used in a pipeline.

Java如何将HTML转为PDF?-图3
(图片来源网络,侵删)

A more common use case for OpenPDF is creating PDFs from HTML by first parsing the HTML and then using its text/layout capabilities, which is more complex.

Recommendation: For HTML-to-PDF, prefer Flying Saucer over OpenPDF unless you have a specific reason to stick with the older iText 2.x API.


Which One Should You Choose?

  • For the best visual fidelity and modern CSS support: Flying Saucer is your best bet. It's free, open-source, and designed specifically for this purpose.
  • For a simple, quick conversion and you're okay with the AGPL license: iText is very easy to use.
  • For a commercial project where you need a powerful library and can afford a license: iText Commercial is the industry standard.
  • For a simple, open-source project with basic needs: OpenPDF is a viable, simple option, but Flying Saucer is often better for HTML.
  • If you are already using Apache PDFBox for other PDF tasks: You can combine it with Flying Saucer to get HTML rendering capabilities.
分享:
扫描分享到社交APP
上一篇
下一篇