杰瑞科技汇

iText如何高效处理Java Word文档?

Of course! Let's dive deep into using iText with Word documents in Java.

iText如何高效处理Java Word文档?-图1
(图片来源网络,侵删)

First, a crucial point of clarification: iText is a library for creating and manipulating PDFs, not for directly reading or writing .docx (Word) files.

However, there are two primary ways to work with Word documents using iText and related Java libraries:

  1. The Indirect Method (Most Common): Convert the Word document (.docx) to a PDF, then use iText to manipulate the PDF. This is the most powerful and flexible approach.
  2. The Direct Method (Using Apache POI): Use a library like Apache POI to directly read and write the .docx file's XML structure. You can use iText for PDF generation and Apache POI for Word manipulation in the same project.

Let's break down both methods in detail.


Method 1: The Indirect Method (iText + Word-to-PDF Conversion)

This is the recommended approach if your goal is to add advanced features (like digital signatures, forms, complex watermarks, or data extraction) to a document that originated as a Word file.

iText如何高效处理Java Word文档?-图2
(图片来源网络,侵删)

The Workflow:

  1. Start with a .docx file.
  2. Convert .docx to .pdf. You need a separate tool or library for this conversion. Microsoft Office, LibreOffice, or commercial libraries like Aspose.Words can do this.
  3. Load the generated .pdf into iText.
  4. Use iText to manipulate the PDF (e.g., add a watermark, fill a form, extract text).
  5. Save the final .pdf file.

Example: Adding a Watermark to a Word-Converted PDF

Let's assume you have a file MyDocument.docx. You've already converted it to MyDocument.pdf using an external tool. Now, you want to add a "CONFIDENTIAL" watermark to it using iText.

Add iText 7 Dependency to your pom.xml

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itext7-core</artifactId>
    <version>7.2.5</version> <!-- Use the latest version -->
    <type>pom</type>
</dependency>
<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itextpdf</artifactId>
    <version>7.2.5</version>
    <scope>compile</scope>
</dependency>
<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itext-layout</artifactId>
    <version>7.2.5</version>
    <scope>compile</scope>
</dependency>
<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itpdf-forms</artifactId>
    <version>7.2.5</version>
    <scope>compile</scope>
</dependency>

Java Code to Add a Watermark

This code will open MyDocument.pdf, add a "CONFIDENTIAL" text watermark to every page, and save it as WatermarkedDocument.pdf.

iText如何高效处理Java Word文档?-图3
(图片来源网络,侵删)
import com.itextpdf.io.font.constants.StandardFonts;
import com.itextpdf.kernel.colors.Color;
import com.itextpdf.kernel.colors.ColorConstants;
import com.itextpdf.kernel.colors.DeviceGray;
import com.itextpdf.kernel.font.PdfFont;
import com.itextpdf.kernel.font.PdfFontFactory;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfPage;
import com.itextpdf.kernel.pdf.PdfReader;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.layout.Canvas;
import com.itextpdf.layout.element.Paragraph;
import com.itextpdf.layout.properties.TextAlignment;
import com.itextpdf.layout.properties.UnitValue;
import java.io.IOException;
public class WordToPdfWatermarker {
    public static void main(String[] args) {
        String srcPath = "MyDocument.pdf"; // The PDF converted from Word
        String destPath = "WatermarkedDocument.pdf";
        try {
            // 1. Create a PdfReader to read the existing PDF
            PdfReader reader = new PdfReader(srcPath);
            // 2. Create a PdfWriter to write the modified PDF
            PdfWriter writer = new PdfWriter(destPath);
            // 3. Create a PdfDocument object
            PdfDocument pdf = new PdfDocument(reader, writer);
            // 4. Set up the watermark properties
            PdfFont font = PdfFontFactory.createFont(StandardFonts.HELVETICA_BOLD);
            Color color = new DeviceGray(0.7f); // 70% gray
            float fontSize = 60;
            float angle = 45; // Diagonal watermark
            // 5. Loop through all pages of the document
            for (int i = 1; i <= pdf.getNumberOfPages(); i++) {
                PdfPage page = pdf.getPage(i);
                // Get the page dimensions
                float pageWidth = page.getPageSize().getWidth();
                float pageHeight = page.getPageSize().getHeight();
                // Create a Paragraph for the watermark text
                Paragraph watermark = new Paragraph("CONFIDENTIAL")
                        .setFont(font)
                        .setFontSize(fontSize)
                        .setFontColor(color)
                        .setRotation(angle)
                        .setTextAlignment(TextAlignment.CENTER)
                        .setMargin(0)
                        .setFixedPosition(pageWidth / 2, pageHeight / 2, UnitValue.createPercentValue(100));
                // Create a Canvas to draw the watermark on the page
                Canvas canvas = new Canvas(page, page.getPageSize());
                canvas.showTextAligned(watermark, pageWidth / 2, pageHeight / 2, angle);
                canvas.close();
            }
            // 6. Close the document to save the changes
            pdf.close();
            System.out.println("Watermark added successfully!");
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Method 2: The Direct Method (Using Apache POI)

If you need to programmatically create or modify the content of a Word document (.docx), you should use Apache POI. It's the standard Java library for interacting with Microsoft Office formats.

The Workflow:

  1. Use Apache POI to create a new .docx file or read an existing one.
  2. Manipulate the document's structure: Add paragraphs, tables, images, styles, etc.
  3. Save the modified .docx file.

You can use iText and Apache POI together in the same project for different purposes. For example:

  • Use Apache POI to generate a monthly report in .docx format.
  • Use iText to generate a contract in .pdf format and add a digital signature.

Example: Creating a Simple Word Document with Apache POI

Add Apache POI Dependency to your pom.xml

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>5.2.3</version> <!-- Use the latest version -->
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>5.2.3</version>
</dependency>

Java Code to Create a .docx File

import org.apache.poi.xwpf.usermodel.*;
import java.io.FileOutputStream;
import java.io.IOException;
public class WordDocCreator {
    public static void main(String[] args) {
        String outputPath = "MyNewDocument.docx";
        try (XWPFDocument document = new XWPFDocument()) {
            // 1. Create a new Paragraph
            XWPFParagraph title = document.createParagraph();
            title.setAlignment(ParagraphAlignment.CENTER);
            // 2. Create a Run (a styled text segment) for the title
            XWPFRun titleRun = title.createRun();
            titleRun.setText("Java iText and Apache POI Integration");
            titleRun.setBold(true);
            titleRun.setFontFamily("Calibri");
            titleRun.setFontSize(16);
            titleRun.addBreak(); // Add a line break
            // 3. Create a body paragraph
            XWPFParagraph body = document.createParagraph();
            body.setAlignment(ParagraphAlignment.LEFT);
            XWPFRun bodyRun = body.createRun();
            bodyRun.setText("This document was created using Apache POI. ");
            bodyRun.setText("Apache POI is a powerful library for manipulating Office files. ");
            bodyRun.addBreak();
            bodyRun.addBreak();
            // 4. Add a table
            XWPFTable table = document.createTable();
            XWPFTableRow tableRowOne = table.getRow(0);
            tableRowOne.getCell(0).setText("Feature");
            tableRowOne.addNewTableCell().setText("Library");
            XWPFTableRow tableRowTwo = table.createRow();
            tableRowTwo.getCell(0).setText("PDF Manipulation");
            tableRowTwo.getCell(1).setText("iText");
            XWPFTableRow tableRowThree = table.createRow();
            tableRowThree.getCell(0).setText("Word Manipulation");
            tableRowThree.getCell(1).setText("Apache POI");
            // 5. Write the document to a file
            try (FileOutputStream out = new FileOutputStream(outputPath)) {
                document.write(out);
            }
            System.out.println("Document created successfully at: " + outputPath);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Summary: iText vs. Apache POI for Word

Feature iText Apache POI
Primary Purpose PDF generation and manipulation. Microsoft Office format manipulation (.docx, .xlsx, .pptx).
Word File Support None. Cannot read or write .docx files directly. Excellent. The standard library for creating and editing .docx files.
PDF File Support Excellent. Industry standard for creating, signing, and securing PDFs. Basic. Can create simple PDFs from scratch, but not for advanced manipulation.
Typical Use Case - Generating invoices, reports, contracts as PDF.
- Adding watermarks, stamps, digital signatures.
- Extracting data from PDFs.
- Creating PDF forms.
- Generating Word documents from templates or data.
- Modifying existing Word documents.
- Automating document creation (e.g., reports, letters).
Licensing AGPL. Commercial projects require a license. Apache 2.0. Very permissive license, even for commercial use.

Conclusion

  • If your task involves Word documents, use Apache POI.
  • If your task involves PDF documents, use iText.
  • If you need to perform advanced operations on a document that started as a Word file, the best practice is to convert the Word file to PDF first (using an external tool or library like Aspose.Words) and then use iText to manipulate the resulting PDF.
分享:
扫描分享到社交APP
上一篇
下一篇