Of course! python-docx is a fantastic and popular library in Python for creating and modifying Microsoft Word (.docx) files. It's not designed for reading complex, formatted documents, but it excels at generating documents from scratch or populating templates with data.

Here's a comprehensive guide covering everything from installation to advanced features.
Installation
First, you need to install the library. Open your terminal or command prompt and run:
pip install python-docx
Core Concepts: The Document Object Model (DOM)
Before you start coding, it's helpful to understand how python-docx structures a Word document. A .docx file is essentially a ZIP archive containing XML files. python-docx gives you a Pythonic way to interact with this structure.
The main components are:

Document: This is the top-level object. You can think of it as the entire Word file.BlockElements: These are the large, structural parts of your document.Paragraph: A block of text. This is the most common element you'll work with.Table: A structured table with rows and cells.
InlineElements: These are elements that exist within a paragraph.Run: A contiguous run of text with the same formatting (e.g., bold, italic, font size). You apply styles to aRun, not directly to aParagraph. This is a crucial concept.
Document Properties: Metadata about the file, like the author, title, and subject.
Creating a New Document from Scratch
Let's start by creating a simple, styled document.
from docx import Document
from docx.shared import Pt, RGBColor, Inches
from docx.enum.text import WD_ALIGN_PARAGRAPH
# 1. Create a new Document object
doc = Document()
# 2. Add a heading to the document
# The level can be 0-9 (0 is the standard "Title" style)
doc.add_heading('Document Title', level=0)
# 3. Add a paragraph
p = doc.add_paragraph('This is a paragraph with some initial text. ')
# 4. Add a second paragraph with bold and italic text
p.add_run('This is a bold run. ').bold = True
p.add_run('And this is an italic run. ').italic = True
# 5. Add a left-aligned paragraph
p_left = doc.add_paragraph('This is a left-aligned paragraph.')
p_left.alignment = WD_ALIGN_PARAGRAPH.LEFT
# 6. Add a right-aligned paragraph
p_right = doc.add_paragraph('This is a right-aligned paragraph.')
p_right.alignment = WD_ALIGN_PARAGRAPH.RIGHT
# 7. Add a centered paragraph
p_center = doc.add_paragraph('This is a centered paragraph.')
p_center.alignment = WD_ALIGN_PARAGRAPH.CENTER
# 8. Add a paragraph with custom font size and color
p_custom = doc.add_paragraph('This text has a custom style.')
run = p_custom.add_run('Custom font size and color. ')
run.font.size = Pt(14) # 14 points
run.font.color.rgb = RGBColor(0x42, 0x24, 0xE9) # A nice blue color
# 9. Add a page break
doc.add_page_break()
# 10. Add a table
table = doc.add_table(rows=3, cols=3)
table.style = 'Table Grid' # Apply a built-in style
# Populate the table
for i in range(3):
for j in range(3):
cell = table.cell(i, j)
cell.text = f'Row {i+1}, Col {j+1}'
# 11. Save the document
doc.save('my_new_document.docx')
print("Document 'my_new_document.docx' created successfully!")
Working with an Existing Document
You can also open an existing .docx file to read its contents or modify it.
from docx import Document
# Open an existing document
doc = Document('my_new_document.docx') # Use the file from the previous example
# --- Reading Content ---
print("\n--- Reading Document Content ---")
for paragraph in doc.paragraphs:
# Check if the paragraph is empty (contains only whitespace)
if paragraph.text.strip():
print(f"Paragraph Text: '{paragraph.text}'")
# --- Modifying Content ---
print("\n--- Modifying Document Content ---")
# Find the first paragraph and change its text
if doc.paragraphs:
first_paragraph = doc.paragraphs[0]
first_paragraph.text = "This is the new, modified first paragraph."
# Add a new paragraph at the end
doc.add_paragraph("This paragraph was added after opening the document.")
# Find a table and modify its content
if doc.tables:
table = doc.tables[0]
# Change the text in the cell at row 1, column 1
table.cell(1, 1).text = "Modified Cell!"
# Save the modified document
# It's good practice to save with a new name to avoid overwriting the original
doc.save('modified_document.docx')
print("Document 'modified_document.docx' saved successfully!")
Adding Images
You can easily add images to your document.
from docx import Document
from docx.shared import Inches
doc = Document()
doc.add_heading('Adding Images', level=1)
# Add an image, specifying its width
# The height is automatically adjusted to maintain aspect ratio
doc.add_picture('python_logo.png', width=Inches(2.0))
# You can also specify height
# doc.add_picture('python_logo.png', height=Inches(1.0))
doc.save('document_with_image.docx')
(Make sure you have an image file named python_logo.png in the same directory, or provide the correct path.)

Advanced Features: Using Templates
A very powerful use case is using a Word template (.docx file) with "merge fields" and populating it with data from Python.
Step 1: Create the Template in Word
- Create a new Word document.
- Type your static text: "Dear [Customer Name],"
- Go to the
Inserttab ->Quick Parts->Field.... - In the Field dialog, choose
Merge Fieldfrom the "Categories" list. - In the "Field name" box, type
customer_name. - Add another line: "Your total order amount is $[order_total]."
- Repeat the process, but this time use the field name
order_total. - Save this file as
template.docx.
Step 2: Populate the Template with Python
python-docx can't directly replace these fields. A common and effective workaround is to use string replacement on the raw XML of the document.
import docx
from docx import Document
import re
def populate_template(template_path, output_path, data):
"""
Populates a Word template with merge fields using string replacement.
Note: This is a simple approach and may not work for all complex templates.
"""
# Open the template document
doc = Document(template_path)
# Get the document's XML content
# doc.element.body is the XML element for the body
# We convert it to a string to perform replacements
xml_string = doc.element.body.xml
# Use regular expressions to find and replace fields like [field_name]
# The regex r'\[(.*?)\]' finds text inside square brackets
for key, value in data.items():
# Replace the field name with the actual value
# We use re.escape to handle special characters in the key if necessary
xml_string = re.sub(r'\[' + re.escape(key) + r'\]', str(value), xml_string)
# Replace the body's XML with the new, modified XML
# We need to clear existing content first
for element in doc.element.body:
doc.element.body.remove(element)
# Add the new XML back
# This part is a bit tricky; a more robust way is to use a library like 'lxml'
# but for simple cases, this works.
from docx.oxml import parse_xml
new_body = parse_xml(xml_string)
doc.element.body.append(new_body)
# Save the new document
doc.save(output_path)
# --- Example Usage ---
# Data to be inserted into the template
customer_data = {
'customer_name': 'John Doe',
'order_total': '150.75'
}
# Populate the template
populate_template('template.docx', 'populated_invoice.docx', customer_data)
print("Template 'template.docx' has been populated and saved as 'populated_invoice.docx'.")
Summary and Best Practices
| Task | Key Methods/Properties |
|---|---|
| Create New Document | doc = Document() |
| Save Document | doc.save('filename.docx') |
| Add Paragraph | doc.add_paragraph('text') |
| Add Heading | doc.add_heading('text', level=1) |
| Modify Paragraph Text | paragraph.text = 'new text' |
| Add Formatting (Bold, Italic) | run = paragraph.add_run('text')run.bold = True |
| Add Font Style/Color | run.font.size = Pt(12)run.font.color.rgb = ... |
| Add Table | doc.add_table(rows=3, cols=2) |
| Modify Table Cell | table.cell(0, 0).text = 'new text' |
| Add Image | doc.add_picture('image.png', width=Inches(1)) |
| Add Page Break | doc.add_page_break() |
| Read Paragraphs | for p in doc.paragraphs: print(p.text) |
Best Practices:
- Use
Runfor Styling: Always apply formatting (bold, italic, color) to aRunobject, not directly to aParagraph. This allows for multiple styles within a single paragraph. - Check for Existence: Before trying to access a paragraph or table, check if the list is empty to avoid
IndexError(e.g.,if doc.paragraphs:). - Save with a New Name: When modifying an existing document, save it with a different name to preserve the original file.
- For Complex Templates: For production-level template generation, consider more robust solutions like
docx-mailmerge(a library built on top ofpython-docxspecifically for this purpose) or generating HTML and converting it to.docx.
