Four serious ways to convert HTML to PDF in Python. Each has a specific niche; picking the right one saves you from debugging a class of problems that only exist because of the library choice.
TL;DR
- Playwright — Chromium-based, full CSS support, works with JS-heavy pages. Best for production Python PDF work.
- WeasyPrint — Pure Python, no binary deps. Good for simple reports with plain CSS. No JS, no modern layout.
- Managed API — 30 lines of
requests. Zero Chromium to operate. - pdfkit / wkhtmltopdf — Skip. Dead engine.
Option 1: Playwright
Playwright is Microsoft’s browser automation library — same underlying Chromium as Puppeteer, with first-class Python support.
Install
pip install playwright
playwright install chromium
Downloads ~180MB of Chromium. For Docker, pre-install via the Playwright Docker image (mcr.microsoft.com/playwright/python).
Minimum viable example
from playwright.sync_api import sync_playwright
def html_to_pdf(html: str) -> bytes:
with sync_playwright() as p:
browser = p.chromium.launch()
try:
page = browser.new_page()
page.set_content(html, wait_until="networkidle")
return page.pdf(
format="A4",
margin={"top": "20mm", "bottom": "20mm", "left": "15mm", "right": "15mm"},
print_background=True,
prefer_css_page_size=True,
)
finally:
browser.close()
pdf = html_to_pdf("<h1>Invoice #2041</h1><p>Total $1,284</p>")
with open("out.pdf", "wb") as f:
f.write(pdf)
Working HTML-to-PDF in Python, 15 lines.
Async version (for FastAPI, aiohttp, asyncio)
from playwright.async_api import async_playwright
async def html_to_pdf(html: str) -> bytes:
async with async_playwright() as p:
browser = await p.chromium.launch()
try:
page = await browser.new_page()
await page.set_content(html, wait_until="networkidle")
return await page.pdf(
format="A4",
margin={"top": "20mm", "bottom": "20mm", "left": "15mm", "right": "15mm"},
print_background=True,
)
finally:
await browser.close()
Use sync_playwright in sync code (Flask, scripts); async_playwright in async code (FastAPI, Starlette).
Browser pool pattern
Launching Chromium costs 500-1500ms. For any real throughput, reuse the browser:
import asyncio
from playwright.async_api import async_playwright, Browser
_browser: Browser | None = None
_request_count = 0
_lock = asyncio.Lock()
RECYCLE_AFTER = 1000
async def get_browser() -> Browser:
global _browser, _request_count
async with _lock:
if _browser is None or _request_count >= RECYCLE_AFTER:
if _browser:
await _browser.close()
p = await async_playwright().start()
_browser = await p.chromium.launch(args=["--no-sandbox", "--disable-dev-shm-usage"])
_request_count = 0
_request_count += 1
return _browser
async def html_to_pdf(html: str) -> bytes:
browser = await get_browser()
page = await browser.new_page()
try:
await page.set_content(html, wait_until="networkidle")
return await page.pdf(format="A4", print_background=True)
finally:
await page.close()
Fresh page per request, recycle browser every 1000 requests. For higher throughput, run several worker processes (uvicorn/gunicorn workers).
Playwright in Docker
FROM mcr.microsoft.com/playwright/python:v1.47.0-jammy
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "server.py"]
The Microsoft-maintained image has Chromium and all system deps preinstalled. Alternative: build your own Debian image and playwright install chromium --with-deps.
Option 2: WeasyPrint
WeasyPrint is a pure-Python HTML-to-PDF renderer. No Chromium, no binary dependency, no Docker complexity. It parses HTML+CSS and produces PDF directly.
Install
pip install weasyprint
On Linux you may need libcairo and libpango. The Weasyprint install docs cover platform specifics.
Example
from weasyprint import HTML
def html_to_pdf(html: str) -> bytes:
return HTML(string=html).write_pdf()
pdf = html_to_pdf("""
<!doctype html>
<html>
<head>
<style>
@page { size: A4; margin: 20mm; }
body { font-family: sans-serif; }
</style>
</head>
<body>
<h1>Invoice #2041</h1>
<p>Total $1,284</p>
</body>
</html>
""")
with open("out.pdf", "wb") as f:
f.write(pdf)
Five lines. No async complexity.
What WeasyPrint is good at
- CSS
@pagerules — actually better than Chromium here, because WeasyPrint implements the full Paged Media Level 3 spec including margin boxes (@top-left { content: ... }). - Simple layout — paragraphs, tables, lists, basic positioning all work well.
- Reproducible output — pure Python means no “works on my machine” font issues.
- Low memory footprint — ~50MB RSS vs Chromium’s 500MB+.
What WeasyPrint is not good at
- Modern CSS layout — no
display: grid, no container queries, half-baked flex. Anything past CSS 2.1 is inconsistent. - JavaScript — none. If your HTML depends on JS rendering, WeasyPrint shows what the server sent, not what the user sees.
- Web fonts — supports
@font-face, but font fetching is synchronous and slower than Chromium. - Complex typography — no variable fonts, limited OpenType features, fewer kerning options than a browser.
When WeasyPrint wins
For server-rendered, statically-styled documents (invoices, receipts, government forms, reports), WeasyPrint is excellent. Pure Python, fast, deterministic. No Docker pains.
For anything with modern frontend layout (Tailwind, utility CSS, flex-heavy designs, JS-rendered content), WeasyPrint will frustrate you. Use Chromium-based.
WeasyPrint example: invoice
from weasyprint import HTML, CSS
CSS_TMPL = """
@page { size: Letter; margin: 20mm 15mm; }
body { font-family: 'Noto Sans', sans-serif; font-size: 11pt; }
.head { display: flex; justify-content: space-between; margin-bottom: 24pt; }
.head .brand { font-weight: 700; font-size: 20pt; }
table { width: 100%; border-collapse: collapse; margin: 16pt 0; }
th, td { padding: 6pt 8pt; text-align: left; border-bottom: 0.5pt solid #e4e4ec; }
td.num { text-align: right; font-variant-numeric: tabular-nums; }
.total { margin-top: 16pt; text-align: right; font-weight: 700; font-size: 13pt; }
"""
def render_invoice(inv):
html = f"""<!doctype html>
<html><head></head><body>
<div class="head">
<div class="brand">Acme Software Inc.</div>
<div>Invoice {inv['number']}<br/>{inv['date']}</div>
</div>
<table>
<thead><tr><th>Item</th><th class="num">Qty</th><th class="num">Total</th></tr></thead>
<tbody>
{"".join(f"<tr><td>{l['desc']}</td><td class='num'>{l['qty']}</td><td class='num'>${l['total']:.2f}</td></tr>" for l in inv['lines'])}
</tbody>
</table>
<div class="total">Total $ {inv['total']:.2f}</div>
</body></html>"""
return HTML(string=html).write_pdf(stylesheets=[CSS(string=CSS_TMPL)])
Works well. Would also work with Playwright; the pick is operational preference.
Option 3: Managed API
If you don’t want to operate Chromium yourself and WeasyPrint can’t handle your CSS, a managed API is a 30-line integration using requests:
import os, time, requests
QPDF_BASE = "https://login.21pdf.com/v1"
QPDF_KEY = os.environ["QPDF_KEY"]
HEADERS = {"Authorization": f"Bearer {QPDF_KEY}"}
def html_to_pdf(html: str) -> bytes:
# 1. Submit render job.
sub = requests.post(
f"{QPDF_BASE}/convert",
headers={**HEADERS, "Content-Type": "application/json"},
json={
"html": html,
"options": {
"page_size": "A4",
"margin_top": 20, "margin_bottom": 20,
"margin_left": 15, "margin_right": 15,
"wait_for_network_idle": True,
},
},
)
sub.raise_for_status()
job_id = sub.json()["job_id"]
# 2. Poll until complete.
for _ in range(60):
time.sleep(0.5)
st = requests.get(f"{QPDF_BASE}/jobs/{job_id}", headers=HEADERS).json()
if st["status"] == "succeeded":
break
if st["status"] == "failed":
raise RuntimeError(st.get("message", "render failed"))
# 3. Download PDF bytes.
pdf = requests.get(f"{QPDF_BASE}/jobs/{job_id}/download", headers=HEADERS)
pdf.raise_for_status()
return pdf.content
Works with any Python environment that has requests. No system deps, no Chromium, no fonts to bundle, no Docker complexity.
Async version for FastAPI
import asyncio, os, httpx
QPDF_BASE = "https://login.21pdf.com/v1"
QPDF_KEY = os.environ["QPDF_KEY"]
HEADERS = {"Authorization": f"Bearer {QPDF_KEY}"}
async def html_to_pdf(html: str) -> bytes:
async with httpx.AsyncClient(timeout=90) as client:
sub = await client.post(
f"{QPDF_BASE}/convert", headers=HEADERS,
json={
"html": html,
"options": {"page_size": "A4", "wait_for_network_idle": True},
},
)
sub.raise_for_status()
job_id = sub.json()["job_id"]
for _ in range(60):
await asyncio.sleep(0.5)
st = (await client.get(f"{QPDF_BASE}/jobs/{job_id}", headers=HEADERS)).json()
if st["status"] == "succeeded": break
if st["status"] == "failed": raise RuntimeError(st.get("message"))
pdf = await client.get(f"{QPDF_BASE}/jobs/{job_id}/download", headers=HEADERS)
pdf.raise_for_status()
return pdf.content
Option 4: pdfkit
pdfkit is a Python wrapper around wkhtmltopdf. Don’t start new projects here.
- wkhtmltopdf upstream is dead — project entered deprecation in 2023. No modern CSS (no flex, no grid), no ongoing security patches.
- If you have a legacy pdfkit integration, budget migration to Playwright or a managed API. The longer you wait, the more weird CSS edge cases you’ll discover that wkhtmltopdf renders differently than any real browser.
Don’t mistake pdfkit-the-Python-library for pdfkit (Node.js PDF construction library) or qpdf (the C++ PDF manipulation CLI). Different projects.
Framework integrations
Flask
from flask import Flask, Response, request
from weasyprint import HTML # or your preferred option
app = Flask(__name__)
@app.route("/invoices/<invoice_id>/pdf")
def invoice_pdf(invoice_id):
html = render_invoice_html(load_invoice(invoice_id))
pdf = HTML(string=html).write_pdf()
return Response(
pdf,
mimetype="application/pdf",
headers={"Content-Disposition": f'inline; filename="invoice-{invoice_id}.pdf"'},
)
FastAPI
from fastapi import FastAPI
from fastapi.responses import Response
app = FastAPI()
@app.get("/invoices/{invoice_id}/pdf")
async def invoice_pdf(invoice_id: str):
invoice = await load_invoice(invoice_id)
html = render_invoice_html(invoice)
pdf = await html_to_pdf(html) # using any option above
return Response(
content=pdf,
media_type="application/pdf",
headers={"Content-Disposition": f'inline; filename="invoice-{invoice_id}.pdf"'},
)
Django
# views.py
from django.http import HttpResponse
from weasyprint import HTML
def invoice_pdf(request, invoice_id):
invoice = Invoice.objects.get(pk=invoice_id)
html = render_to_string("invoice.html", {"invoice": invoice}, request=request)
pdf = HTML(string=html, base_url=request.build_absolute_uri("/")).write_pdf()
return HttpResponse(pdf, content_type="application/pdf")
base_url is important in Django — relative asset URLs (/static/...) need a base to resolve against.
Jinja2 template → PDF
Any of the three viable options works with a rendered Jinja2 template:
from jinja2 import Environment, FileSystemLoader
env = Environment(loader=FileSystemLoader("templates"))
template = env.get_template("invoice.html")
html = template.render(invoice=invoice_data)
pdf = html_to_pdf(html) # Playwright / WeasyPrint / 21pdf
Performance reference
Rough numbers on a modern laptop (M2 Pro, warmed up):
| Scenario | Time | RAM |
|---|---|---|
| Playwright cold start | 800-1500ms | ~600MB |
| Playwright warm (new page) | 50-150ms | - |
| WeasyPrint (simple HTML) | 80-200ms | ~50MB |
| WeasyPrint (complex HTML + fonts) | 300-800ms | ~80MB |
| 21pdf API round-trip (warm) | 300-800ms | negligible |
| pdfkit + wkhtmltopdf | 400-1200ms | ~200MB |
WeasyPrint’s low memory footprint makes it attractive for cost-sensitive deployments. Playwright trades RAM for full CSS support. Managed API trades dollars for zero operational surface.
Async gotcha
If you use sync_playwright() inside an async context (like a FastAPI handler), it will work but blocks the event loop during the render — undoing the async benefit.
# BAD — blocks the event loop
@app.get("/pdf")
async def pdf_handler():
with sync_playwright() as p: # sync API inside async handler
browser = p.chromium.launch() # blocks!
# ...
# GOOD — use async_playwright
@app.get("/pdf")
async def pdf_handler():
async with async_playwright() as p:
browser = await p.chromium.launch()
# ...
# ALSO GOOD — offload sync work to a thread
import asyncio
@app.get("/pdf")
async def pdf_handler():
pdf = await asyncio.to_thread(sync_playwright_render, html)
WeasyPrint is sync-only; wrap it in asyncio.to_thread() inside async handlers.
Try 21pdf from Python
30 lines of requests, no Chromium to operate. 100 PDFs/month free.
Which option should you pick?
- Simple reports, plain CSS, no JS? WeasyPrint. Pure Python, low memory, fast.
- Modern CSS (flex, grid, container queries), web fonts, JS rendering? Playwright or managed API.
- Serverless Python (Lambda, Cloud Functions)? Managed API. Getting Chromium into a serverless function from Python is more painful than from Node.
- High volume self-hosted? Playwright with the browser pool pattern, or a self-hosted Gotenberg deployment that you call from Python via HTTP.
- Low-to-moderate volume, don’t want to think about Chromium? Managed API.
The general considerations — @page support, SSRF hardening, wait conditions, async job models — apply across all Python options. See the HTML-to-PDF API guide for the Python-independent picture.
— 21pdf Engineering