Of course! Converting HTML to PDF in Python is a common task, and there are several excellent libraries to choose from. The best one for you depends on your specific needs, such as handling complex CSS, JavaScript, and images.
Here’s a comprehensive guide covering the most popular and effective methods, from simple command-line tools to powerful Python libraries.
Summary of Methods
| Method | Best For | Pros | Cons |
|---|---|---|---|
| WeasyPrint | High-fidelity conversion, print stylesheets, static-to-PDF. | Excellent CSS support, great for print media, pure Python. | Can be tricky with complex JS, no browser rendering engine. |
| pdfkit | Simplicity, using the power of Chrome/Chromium. | Very easy to use, handles modern web tech (JS, CSS). | Requires an external browser installation, less control over output. |
| xhtml2pdf | Generating reports from HTML templates (like Django). | Integrates well with templating engines, handles basic CSS. | Older, CSS support can be dated, can be slow. |
| Playwright / Selenium | Dynamic websites with heavy JavaScript. | Full browser automation, can handle anything a browser can. | More complex setup, overkill for simple static HTML. |
Method 1: WeasyPrint (Recommended for Most Cases)
WeasyPrint is a visual rendering engine for HTML and CSS. It's fantastic for converting HTML documents to PDFs, especially if you're using print-specific CSS (@media print).
Why use it?
- High-fidelity: It aims to produce pixel-perfect PDFs from your HTML/CSS.
- Pure Python: No external dependencies like browsers.
- Excellent CSS support: Supports a large subset of CSS 2.1 and CSS 3, including advanced features like columns, flexbox, and grid.
Step 1: Installation
First, you need to install WeasyPrint and its dependencies. It's best to do this in a virtual environment.
# Create and activate a virtual environment (optional but recommended) python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install WeasyPrint pip install WeasyPrint
Note: WeasyPrint has system dependencies like Cairo, Pango, and GTK. If you encounter issues during installation, check the official installation guide for your operating system.
Step 2: Basic Usage
Here's a simple example. Create a Python script (e.g., convert.py):
import weasyprint
# 1. Define your HTML content as a string
html_string = """
<!DOCTYPE html>
<html>
<head>
<style>
body { font-family: sans-serif; }
h1 { color: #2c3e50; }
.highlight { background-color: #f1c40f; padding: 5px; }
@page {
size: A4;
margin: 2cm;
}
</style>
</head>
<body>
<h1>Hello, WeasyPrint!</h1>
<p>This is a paragraph converted to a PDF.</p>
<p class="highlight">This paragraph is highlighted with CSS.</p>
</body>
</html>
"""
# 2. Convert the HTML string to a PDF
# The result is a PDF object
pdf = weasyprint.HTML(string=html_string).write_pdf()
# 3. Write the PDF to a file
with open("output_weasyprint.pdf", "wb") as f:
f.write(pdf)
print("PDF generated successfully: output_weasyprint.pdf")
Step 3: Using an External HTML File
You can also point WeasyPrint to an existing .html file.
import weasyprint
# Path to your HTML file
html_file = "my_document.html"
# Path for the output PDF
pdf_file = "output_from_file.pdf"
# Convert the HTML file to PDF
weasyprint.HTML(filename=html_file).write_pdf(pdf_file)
print(f"PDF generated successfully: {pdf_file}")
Method 2: pdfkit (Using Chrome/Chromium)
pdfkit is a Python wrapper around the command-line tool wkhtmltopdf. This tool uses the WebKit rendering engine (the same one used in older versions of Chrome) to convert HTML to PDF. It's very good at handling modern websites, including JavaScript.
Why use it?
- Excellent JS/CSS support: Since it uses a real browser engine, it handles JavaScript, complex CSS, and even SVGs perfectly.
- Relatively simple setup.
Step 1: Installation
You need to install two things:
wkhtmltopdf: The underlying binary.pdfkit: The Python wrapper.
A. Install wkhtmltopdf
- Windows: Download the installer from the official site. Important: Note the installation path (e.g.,
C:\Program Files\wkhtmltopdf\bin). - macOS:
brew install wkhtmltopdf - Linux (Debian/Ubuntu):
sudo apt-get install wkhtmltopdf
B. Install pdfkit
pip install pdfkit
Step 2: Basic Usage
If wkhtmltopdf is in your system's PATH, the basic usage is very simple.
import pdfkit
# Path to your HTML file
html_file = "my_document.html"
# Convert HTML file to PDF
pdfkit.from_file(html_file, "output_pdfkit.pdf")
print("PDF generated successfully: output_pdfkit.pdf")
Step 3: Handling Custom Paths
If wkhtmltopdf is not in your PATH, you must tell pdfkit where to find it. This is common on Windows.
import pdfkit
# Path to the wkhtmltopdf executable
path_to_wkhtmltopdf = r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe' # Use 'r' for raw string
# Configuration for pdfkit
config = pdfkit.configuration(wkhtmltopdf=path_to_wkhtmltopdf)
# Path to your HTML file
html_file = "my_document.html"
# Convert using the custom configuration
pdfkit.from_file(html_file, "output_pdfkit_custom_path.pdf", configuration=config)
print("PDF generated successfully: output_pdfkit_custom_path.pdf")
You can also convert from a string:
import pdfkit html_string = "<h1>Hello from pdfkit!</h1><p>This came from a string.</p>" pdfkit.from_string(html_string, "output_from_string.pdf")
Method 3: xhtml2pdf (Good for Templating)
xhtml2pdf is another popular library that can be useful, especially if you're generating reports from templates within a framework like Django. It's a port of the PHP library html2pdf.
Why use it?
- Good for templating scenarios.
- Can be simpler for basic report generation.
Step 1: Installation
pip install xhtml2pdf
Step 2: Basic Usage
from xhtml2pdf import pisa
# Define your HTML content
html_string = """
<html>
<head>
<style>
body { font-family: 'Helvetica Neue', Arial, sans-serif; }
table { border-collapse: collapse; width: 100%; }
th, td { border: 1px solid #dddddd; text-align: left; padding: 8px; }
th { background-color: #f2f2f2; }
</style>
</head>
<body>
<h1>Report</h1>
<table>
<tr><th>Product</th><th>Price</th></tr>
<tr><td>Apple</td><td>$1.00</td></tr>
<tr><td>Banana</td><td>$0.50</td></tr>
</table>
</body>
</html>
"""
# Output file name
output_filename = "output_xhtml2pdf.pdf"
# Open output file for writing (binary mode)
with open(output_filename, "w+b") as result_file:
# Convert HTML to PDF
pisa_status = pisa.CreatePDF(
html_string, # the HTML to convert
dest=result_file) # file handle to receive result
# Check if conversion was successful
if pisa_status.err:
print("Error converting HTML to PDF!")
else:
print(f"PDF generated successfully: {output_filename}")
Method 4: Playwright / Selenium (For Dynamic Content)
If your HTML requires JavaScript to run before it's ready to be converted (e.g., a single-page application or a page with dynamic charts), you need a tool that can control a real browser.
Why use it?
- Handles JavaScript: Can wait for elements to load, click buttons, and fill forms before printing.
- Maximum compatibility: Can render anything a modern browser can.
This is a more advanced approach. Here’s a conceptual example using Playwright, which is generally faster and more modern than Selenium.
Step 1: Installation
pip install playwright playwright install # This installs the browser binaries (Chromium, Firefox, WebKit)
Step 2: Basic Usage
This script will open a browser, navigate to a URL (or serve local HTML), wait for a specific element, and then print the page to a PDF.
from playwright.sync_api import sync_playwright
# Path for the output PDF
pdf_file = "output_playwright.pdf"
# Your HTML content (for this example, we'll save it to a temp file)
# In a real scenario, you might navigate to a live URL.
html_content = """
<!DOCTYPE html>
<html>
<head>Dynamic Page</title>
<script>
setTimeout(() => {
document.getElementById('content').innerHTML = 'Hello, loaded after 2 seconds!';
}, 2000);
</script>
</head>
<body>
<h1>Waiting for JS...</h1>
<p id="content">Content will appear here.</p>
</body>
</html>
"""
with open("temp.html", "w") as f:
f.write(html_content)
with sync_playwright() as p:
# Launch a browser instance (Chromium, Firefox, or WebKit)
browser = p.chromium.launch()
# Open a new page
page = browser.new_page()
# Go to the local HTML file
page.goto("file://" + "temp.html")
# Wait for the specific element to contain the text we expect
# This ensures the JS has run and the content is ready
page.wait_for_selector("#content", has_text="Hello, loaded after 2 seconds")
# Print the page to a PDF
# The path must be absolute
page.pdf(path=pdf_file)
# Close the browser
browser.close()
print(f"PDF generated successfully: {pdf_file}")
# Clean up the temp file
import os
os.remove("temp.html")
Which One Should I Choose?
- For most static HTML/CSS documents, especially for reports or invoices: Use WeasyPrint. It's powerful, pure Python, and gives you great control.
- For modern websites with lots of JavaScript or complex CSS: Use pdfkit. It's robust and handles real-world web content very well.
- If you're already in a Django project and need basic PDF generation from templates: xhtml2pdf is a viable, simple option.
- If your HTML is not ready until JavaScript has run and user interaction might be needed: Use Playwright or Selenium. This is the most flexible but also the most complex approach.
