杰瑞科技汇

Java OpenOffice如何实现PDF转换?

Of course. The term "Java OpenOffice PDF" typically refers to one of two main tasks:

Java OpenOffice如何实现PDF转换?-图1
(图片来源网络,侵删)
  1. Generating a PDF from an OpenOffice/LibreOffice document using a Java program. This is the most common use case.
  2. Extracting content (like text) from a PDF that was created by OpenOffice/LibreOffice using Java.

I'll cover both, with a strong focus on the first, as it's more practical and widely used.


Summary

Task Recommended Approach Why?
Create PDF from ODT/ODS UNO (Universal Network Objects) The native, most powerful, and most reliable way to control OpenOffice/LibreOffice from any external language, including Java.
Extract Text from PDF Apache PDFBox or iText These are dedicated Java libraries for working with PDFs. They are generally more robust and easier than trying to parse PDFs via UNO.

Task 1: Generating a PDF from an OpenOffice Document using Java (UNO)

This is the core of "Java OpenOffice PDF". You will use Java to start a hidden instance of LibreOffice (or OpenOffice), load a document (like an .odt or .ods file), and then save it as a PDF.

Prerequisites

  1. Java Development Kit (JDK): Ensure you have a JDK installed (version 8 or newer is fine).

  2. LibreOffice (Recommended) or OpenOffice: You need a full installation of the office suite on the same machine where your Java code will run. LibreOffice is the modern, actively maintained fork of OpenOffice and is highly recommended.

    Java OpenOffice如何实现PDF转换?-图2
    (图片来源网络,侵删)
  3. The UNO JAR file: You need the juh.jar (Java UNO Helper) and ridl.jar (Remote Interface Definition Language) files from your LibreOffice/OpenOffice installation.

    • Typical Location on Linux: /usr/lib/libreoffice/program/
    • Typical Location on Windows: C:\Program Files\LibreOffice\program\
    • Typical Location on macOS: /Applications/LibreOffice.app/Contents/

Step-by-Step Implementation

Step 1: Set up your Java Project

Create a new Java project in your favorite IDE (IntelliJ, Eclipse, etc.). Add the two JAR files (juh.jar and ridl.jar) to your project's classpath.

Step 2: Write the Java Code

Java OpenOffice如何实现PDF转换?-图3
(图片来源网络,侵删)

Here is a complete, well-commented Java class that converts an .odt file to a PDF.

import com.sun.star.beans.PropertyValue;
import com.sun.star.frame.XStorable;
import com.sun.star.lang.XComponent;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.uno.Exception;
import com.sun.star.uno.XComponentContext;
import java.io.File;
public class OpenOfficePdfConverter {
    // Path to your LibreOffice/OpenOffice installation directory
    private static final String OFFICE_HOME = "C:\\Program Files\\LibreOffice";
    public static void main(String[] args) {
        // Define input and output file paths
        String inputFile = "C:\\path\\to\\your\\document.odt";
        String outputFile = "C:\\path\\to\\your\\output.pdf";
        try {
            convertOdtToPdf(inputFile, outputFile);
            System.out.println("Successfully converted '" + inputFile + "' to '" + outputFile + "'");
        } catch (Exception e) {
            System.err.println("Conversion failed!");
            e.printStackTrace();
        }
    }
    public static void convertOdtToPdf(String inputFilePath, String outputFilePath) throws Exception {
        // 1. Get the component context
        XComponentContext xComponentContext = getComponentContext();
        if (xComponentContext == null) {
            throw new RuntimeException("Failed to get component context. Is LibreOffice installed correctly?");
        }
        // 2. Get the central office component
        XMultiComponentFactory xMCF = xComponentContext.getServiceManager();
        if (xMCF == null) {
            throw new RuntimeException("Failed to get service manager.");
        }
        // 3. Open the input document
        Object document = loadDocument(xComponentContext, xMCF, inputFilePath);
        if (document == null) {
            throw new RuntimeException("Failed to load document: " + inputFilePath);
        }
        // 4. Save the document as PDF
        saveDocumentAsPdf(document, outputFilePath);
        // 5. Close the document
        // Note: Closing is important to prevent memory leaks and office instances from hanging.
        // The XComponent.dispose() method might not be available directly through the Object reference.
        // A more robust approach is to query for the XComponent interface.
        // For simplicity in this example, we'll rely on the office process terminating.
        // A real application should manage this lifecycle better.
        System.out.println("Conversion process complete.");
    }
    private static XComponentContext getComponentContext() throws Exception {
        // The Bootstrap class is the entry point to connect to a running office instance
        // or to start a new one.
        return com.sun.star.comp.helper.Bootstrap.bootstrap();
    }
    private static Object loadDocument(XComponentContext xContext, XMultiComponentFactory xMCF, String filePath) throws Exception {
        // Create a desktop instance
        Object desktop = xMCF.createInstanceWithContext("com.sun.star.frame.Desktop", xContext);
        // Prepare the arguments for opening the file
        PropertyValue[] loadProps = new PropertyValue[1];
        loadProps[0] = new PropertyValue();
        loadProps[0].Name = "Hidden"; // Open the document in a hidden window
        loadProps[0].Value = true;
        // Load the document
        // The first argument is the URL of the file.
        // The second is a set of properties to control how the document is opened.
        return com.sun.star.lang.XComponent.class.cast(desktop).getClass()
                .getMethod("loadComponentFromURL", String.class, String.class, int.class, PropertyValue[].class)
                .invoke(desktop, filePath, "_blank", 0, loadProps);
    }
    private static void saveDocumentAsPdf(Object document, String outputPath) throws Exception {
        // Query for the XStorable interface, which allows us to save the document
        XStorable xStorable = com.sun.star.frame.XStorable.class.cast(document);
        // Prepare the arguments for saving the file
        PropertyValue[] storeProps = new PropertyValue[1];
        storeProps[0] = new PropertyValue();
        storeProps[0].Name = "FilterName";
        storeProps[0].Value = "writer_pdf_Export"; // The filter name for PDF export in Writer
        // Save the document to the new path with the specified filter
        xStorable.storeToURL(outputPath, storeProps);
    }
}

How to Run:

  1. Make sure OFFICE_HOME in the code points to your LibreOffice installation.
  2. Make sure inputFile and outputFile paths are correct.
  3. Run the main method.

Important Notes:

  • Performance: Starting a new LibreOffice process for every conversion can be slow. For high-volume applications, you should manage a pool of long-running office instances.
  • Headless Mode: The Hidden property is good, but for server environments, you should start LibreOffice itself in "headless" mode.
    • Windows: `soffice.exe -headless -accept="socket,host=127.0.0.1,port=2002;urp;"
    • Linux: soffice --headless --accept="socket,host=127.0.0.1,port=2002;urp;"
    • Then, in your Java code, you would connect to this running instance instead of letting Bootstrap start a new one.
  • Filters: The FilterName is crucial. Here are some common ones:
    • writer_pdf_Export: For Writer documents (.odt -> .pdf)
    • calc_pdf_Export: For Calc documents (.ods -> .pdf)
    • impress_pdf_Export: For Impress presentations (.odp -> .pdf)
    • draw_pdf_Export: For Draw documents (.odg -> .pdf)

Task 2: Extracting Text from a PDF using Java

While you could use UNO to open a PDF and get its text, it's often more complex and less reliable than using a dedicated Java PDF library. If your goal is just text extraction, use a tool built for it.

Recommended Libraries: Apache PDFBox

Step 1: Add PDFBox Dependency

If you're using Maven, add this to your pom.xml:

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.27</version> <!-- Use the latest version -->
</dependency>

Step 2: Write the Java Code

This example is much simpler than the UNO approach.

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStrip
分享:
扫描分享到社交APP
上一篇
下一篇