Of course! Converting HTML to a Word document (.docx) in Java is a common requirement for generating reports, invoices, or any formatted document from web-based content.

Here’s a comprehensive guide covering the most popular and effective Java libraries, with complete code examples for each.
Summary of Libraries
| Library | Pros | Cons | Best For |
|---|---|---|---|
| Apache POI | Industry standard, very powerful, full control over Word document structure. | Steep learning curve, complex API, requires manual handling of HTML structure. | Complex, highly customized Word documents where you need pixel-perfect control. |
| docx4j | Excellent HTML-to-DOCX conversion, good support for CSS, easier to use than POI for this task. | Can be slower than POI, fewer low-level document manipulation features. | Most scenarios. The best choice if your primary goal is converting HTML to a Word document with good fidelity. |
| Flying Saucer (xhtmlrenderer) | Renders HTML/CSS to an image, which you can then embed in a Word doc. Excellent for visual accuracy. | Indirect method (image-based), not true text, text is not selectable/searchable. | Converting complex, modern web pages with advanced CSS into a visual snapshot in Word. |
Method 1: Using docx4j (Recommended for HTML Conversion)
docx4j has a dedicated HtmlImporter that is specifically designed for this task. It does a great job of translating HTML tags and even some CSS styles into Word's native format.
Add the Dependency
Add the docx4j library to your project. If you're using Maven, add this to your pom.xml:
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-core</artifactId>
<version>11.4.4</version> <!-- Check for the latest version -->
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-export-fo</artifactId>
<version>11.4.4</version> <!-- This is often needed for the conversion process -->
</dependency>
Java Code Example
This code takes a simple HTML string and converts it into a .docx file.

import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
import org.docx4j.wml.P;
public class HtmlToWordDocx4j {
public static void main(String[] args) {
try {
// 1. Create a new Word document package
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
// 2. Get the main document part
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
// 3. Define the HTML content
String html = "<html>"
+ "<head><style>h1 {color: blue;}</style></head>"
+ "<body>"
+ " <h1>Hello from HTML!</h1>"
+ " <p>This is a <b>paragraph</b> with some <i>italic</i> text.</p>"
+ " <ul>"
+ " <li>List item 1</li>"
+ " <li>List item 2</li>"
+ " </ul>"
+ "</body>"
+ "</html>";
// 4. Import the HTML into the document part
// The 'false' parameter means don't use an XHTML namespace
documentPart.addHtml(html);
// 5. Save the document to a file
Docx4J.save(wordMLPackage, new java.io.File("output.docx"));
System.out.println("Successfully created output.docx");
} catch (Exception e) {
e.printStackTrace();
}
}
}
How to Run:
- Save the code as
HtmlToWordDocx4j.java. - Compile and run it with Maven, or include the JARs in your classpath.
- An
output.docxfile will be created in your project's root directory.
Method 2: Using Apache POI (For Full Control)
Apache POI is the most powerful library for manipulating Office documents, but it's more verbose. Converting HTML with POI is a manual process where you essentially parse the HTML and build the Word document element by element.
Add the Dependency
Add the Apache POI library to your pom.xml:
<dependencies>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>5.2.3</version> <!-- Check for the latest version -->
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.2.3</version>
</dependency>
<!-- You'll need an HTML parser like Jsoup -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.15.3</version>
</dependency>
</dependencies>
Java Code Example
This example uses Jsoup to parse the HTML and Apache POI to create the Word document.

import org.apache.poi.xwpf.usermodel.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.nodes.TextNode;
import org.jsoup.select.NodeVisitor;
import java.io.FileOutputStream;
public class HtmlToWordApachePOI {
public static void main(String[] args) {
try {
// 1. Define the HTML content
String html = "<h1>Hello from POI!</h1>"
+ "<p>This is a <b>paragraph</b> with some <i>italic</i> text.</p>"
+ "<ul><li>List item 1</li><li>List item 2</li></ul>";
// 2. Create a new Word document
XWPFDocument document = new XWPFDocument();
// 3. Parse the HTML using Jsoup
Document jsoupDoc = Jsoup.parse(html);
// 4. Recursively process the HTML body and add content to the Word doc
processNode(document.createParagraph(), jsoupDoc.body());
// 5. Save the document
try (FileOutputStream out = new FileOutputStream("output_poi.docx")) {
document.write(out);
}
System.out.println("Successfully created output_poi.docx");
document.close();
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* Recursively processes a Jsoup node and adds its content to a Word paragraph.
* This is a simplified example and would need to be expanded for full HTML/CSS support.
*/
private static void processNode(XWPFParagraph paragraph, org.jsoup.nodes.Node node) {
for (org.jsoup.nodes.Node child : node.childNodes()) {
if (child instanceof Element) {
Element element = (Element) child;
switch (element.tagName().toLowerCase()) {
case "h1":
XWPFParagraph h1Para = paragraph.getParagraph().getDocument().createParagraph();
h1Para.getCTP().addNewPPr().addNewShd().setFill("E0E0E0"); // Light grey background
XWPFRun h1Run = h1Para.createRun();
h1Run.setBold(true);
h1Run.setFontSize(20);
h1Run.setText(element.text());
break;
case "p":
// Create a new paragraph for each <p> tag
XWPFParagraph pPara = paragraph.getParagraph().getDocument().createParagraph();
processNode(pPara, element); // Process children of <p>
break;
case "b":
case "strong":
XWPFRun boldRun = paragraph.createRun();
boldRun.setBold(true);
boldRun.setText(element.text());
break;
case "i":
case "em":
XWPFRun italicRun = paragraph.createRun();
italicRun.setItalic(true);
italicRun.setText(element.text());
break;
case "ul":
// For simplicity, just process list items. A full implementation
// would handle indentation and bullet styles.
for (org.jsoup.nodes.Node li : element.childNodes()) {
if (li instanceof Element && li.nodeName().equals("li")) {
XWPFParagraph liPara = paragraph.getParagraph().getDocument().createParagraph();
XWPFRun liRun = liPara.createRun();
liRun.setText("• " + ((Element) li).text());
}
}
break;
case "li": // Handled by the 'ul' case for simplicity
break;
default:
// For unknown tags, just add the text
if (!child.nodeName().equals("#text")) {
paragraph.createRun().setText("[" + element.tagName() + "]");
}
processNode(paragraph, element); // Process children anyway
}
} else if (child instanceof TextNode) {
// Add text content to the current paragraph
TextNode textNode = (TextNode) child;
if (!textNode.isBlank()) {
paragraph.createRun().setText(textNode.text());
}
}
}
}
}
How to Run:
- Save the code as
HtmlToWordApachePOI.java. - Compile and run it with Maven.
- An
output_poi.docxfile will be created.
Method 3: Using Flying Saucer (For Image-Based Conversion)
Flying Saucer is an XHTML/CSS renderer. You can use it to render your HTML to a BufferedImage, and then embed that image into a Word document created with Apache POI.
Add Dependencies
You'll need Flying Saucer and Apache POI.
<dependencies>
<!-- Flying Saucer -->
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>flying-saucer-pdf</artifactId>
<version>9.1.22</version> <!-- Check for the latest version -->
</dependency>
<!-- Apache POI -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.2.3</version>
</dependency>
</dependencies>
Java Code Example
import org.apache.poi.xwpf.usermodel.*;
import org.xhtmlrenderer.swing.Java2DRenderer;
import org.xhtmlrenderer.util.FSImageWriter;
import java.awt.image.BufferedImage;
import java.io.ByteArrayOutputStream;
import java.io.FileOutputStream;
public class HtmlToWordFlyingSaucer {
public static void main(String[] args) {
try {
// 1. Define the HTML content (must be XHTML-compliant)
String html = "<html>"
+ "<head><style>body { font-family: sans-serif; }</style></head>"
+ "<body>"
+ " <h1>Hello from Flying Saucer!</h1>"
+ " <p>This is a paragraph rendered as an image.</p>"
+ "</body>"
+ "</html>";
// 2. Create a renderer
Java2DRenderer renderer = new Java2DRenderer(html, 800, 600); // width, height
// 3. Render the HTML to a BufferedImage
BufferedImage image = renderer.getImage();
// 4. Create a new Word document
XWPFDocument document = new XWPFDocument();
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
// 5. Convert the image to a byte array
ByteArrayOutputStream baos = new ByteArrayOutputStream();
FSImageWriter imageWriter = new FSImageWriter();
imageWriter.write(image, "png", baos);
// 6. Add the image to the Word document
run.addPicture(baos.toByteArray(), XWPFDocument.PICTURE_TYPE_PNG, "image.png", Units.toEMU(600), Units.toEMU(400));
// 7. Save the document
try (FileOutputStream out = new FileOutputStream("output_flying_saucer.docx")) {
document.write(out);
}
System.out.println("Successfully created output_flying_saucer.docx");
document.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Which Method Should You Choose?
- For most use cases: Use
docx4j. It's the most direct and effective tool for converting HTML to a.docxfile, handling styles and structure reasonably well. - For complex, template-based documents: Use Apache POI. If you need to build a Word document from scratch with very specific layouts, tables, headers, and footers, and the HTML is just a source of content, POI gives you the most control.
- For visual snapshots: Use Flying Saucer. If your goal is to create a document that looks exactly like a web page, including complex CSS layouts that are hard to translate to Word, rendering it as an image is the best approach.
