HTML to PDF in Python — 4 options compared, with working code cover illustration

HTML to PDF in Python — 4 options compared, with working code

Four serious ways to convert HTML to PDF in Python. Each has a specific niche; picking the right one saves you from debugging a class of problems that only exist because of the library choice.

TL;DR

  • Playwright — Chromium-based, full CSS support, works with JS-heavy pages. Best for production Python PDF work.
  • WeasyPrint — Pure Python, no binary deps. Good for simple reports with plain CSS. No JS, no modern layout.
  • Managed API — 30 lines of requests. Zero Chromium to operate.
  • pdfkit / wkhtmltopdf — Skip. Dead engine.

Option 1: Playwright

Playwright is Microsoft’s browser automation library — same underlying Chromium as Puppeteer, with first-class Python support.

Install

pip install playwright
playwright install chromium

Downloads ~180MB of Chromium. For Docker, pre-install via the Playwright Docker image (mcr.microsoft.com/playwright/python).

Minimum viable example

from playwright.sync_api import sync_playwright

def html_to_pdf(html: str) -> bytes:
    with sync_playwright() as p:
        browser = p.chromium.launch()
        try:
            page = browser.new_page()
            page.set_content(html, wait_until="networkidle")
            return page.pdf(
                format="A4",
                margin={"top": "20mm", "bottom": "20mm", "left": "15mm", "right": "15mm"},
                print_background=True,
                prefer_css_page_size=True,
            )
        finally:
            browser.close()

pdf = html_to_pdf("<h1>Invoice #2041</h1><p>Total $1,284</p>")
with open("out.pdf", "wb") as f:
    f.write(pdf)

Working HTML-to-PDF in Python, 15 lines.

Async version (for FastAPI, aiohttp, asyncio)

from playwright.async_api import async_playwright

async def html_to_pdf(html: str) -> bytes:
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        try:
            page = await browser.new_page()
            await page.set_content(html, wait_until="networkidle")
            return await page.pdf(
                format="A4",
                margin={"top": "20mm", "bottom": "20mm", "left": "15mm", "right": "15mm"},
                print_background=True,
            )
        finally:
            await browser.close()

Use sync_playwright in sync code (Flask, scripts); async_playwright in async code (FastAPI, Starlette).

Browser pool pattern

Launching Chromium costs 500-1500ms. For any real throughput, reuse the browser:

import asyncio
from playwright.async_api import async_playwright, Browser

_browser: Browser | None = None
_request_count = 0
_lock = asyncio.Lock()
RECYCLE_AFTER = 1000

async def get_browser() -> Browser:
    global _browser, _request_count
    async with _lock:
        if _browser is None or _request_count >= RECYCLE_AFTER:
            if _browser:
                await _browser.close()
            p = await async_playwright().start()
            _browser = await p.chromium.launch(args=["--no-sandbox", "--disable-dev-shm-usage"])
            _request_count = 0
        _request_count += 1
        return _browser

async def html_to_pdf(html: str) -> bytes:
    browser = await get_browser()
    page = await browser.new_page()
    try:
        await page.set_content(html, wait_until="networkidle")
        return await page.pdf(format="A4", print_background=True)
    finally:
        await page.close()

Fresh page per request, recycle browser every 1000 requests. For higher throughput, run several worker processes (uvicorn/gunicorn workers).

Playwright in Docker

FROM mcr.microsoft.com/playwright/python:v1.47.0-jammy

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "server.py"]

The Microsoft-maintained image has Chromium and all system deps preinstalled. Alternative: build your own Debian image and playwright install chromium --with-deps.

Option 2: WeasyPrint

WeasyPrint is a pure-Python HTML-to-PDF renderer. No Chromium, no binary dependency, no Docker complexity. It parses HTML+CSS and produces PDF directly.

Install

pip install weasyprint

On Linux you may need libcairo and libpango. The Weasyprint install docs cover platform specifics.

Example

from weasyprint import HTML

def html_to_pdf(html: str) -> bytes:
    return HTML(string=html).write_pdf()

pdf = html_to_pdf("""
<!doctype html>
<html>
<head>
  <style>
    @page { size: A4; margin: 20mm; }
    body { font-family: sans-serif; }
  </style>
</head>
<body>
  <h1>Invoice #2041</h1>
  <p>Total $1,284</p>
</body>
</html>
""")
with open("out.pdf", "wb") as f:
    f.write(pdf)

Five lines. No async complexity.

What WeasyPrint is good at

  • CSS @page rules — actually better than Chromium here, because WeasyPrint implements the full Paged Media Level 3 spec including margin boxes (@top-left { content: ... }).
  • Simple layout — paragraphs, tables, lists, basic positioning all work well.
  • Reproducible output — pure Python means no “works on my machine” font issues.
  • Low memory footprint — ~50MB RSS vs Chromium’s 500MB+.

What WeasyPrint is not good at

  • Modern CSS layout — no display: grid, no container queries, half-baked flex. Anything past CSS 2.1 is inconsistent.
  • JavaScript — none. If your HTML depends on JS rendering, WeasyPrint shows what the server sent, not what the user sees.
  • Web fonts — supports @font-face, but font fetching is synchronous and slower than Chromium.
  • Complex typography — no variable fonts, limited OpenType features, fewer kerning options than a browser.

When WeasyPrint wins

For server-rendered, statically-styled documents (invoices, receipts, government forms, reports), WeasyPrint is excellent. Pure Python, fast, deterministic. No Docker pains.

For anything with modern frontend layout (Tailwind, utility CSS, flex-heavy designs, JS-rendered content), WeasyPrint will frustrate you. Use Chromium-based.

WeasyPrint example: invoice

from weasyprint import HTML, CSS

CSS_TMPL = """
@page { size: Letter; margin: 20mm 15mm; }
body { font-family: 'Noto Sans', sans-serif; font-size: 11pt; }
.head { display: flex; justify-content: space-between; margin-bottom: 24pt; }
.head .brand { font-weight: 700; font-size: 20pt; }
table { width: 100%; border-collapse: collapse; margin: 16pt 0; }
th, td { padding: 6pt 8pt; text-align: left; border-bottom: 0.5pt solid #e4e4ec; }
td.num { text-align: right; font-variant-numeric: tabular-nums; }
.total { margin-top: 16pt; text-align: right; font-weight: 700; font-size: 13pt; }
"""

def render_invoice(inv):
    html = f"""<!doctype html>
    <html><head></head><body>
      <div class="head">
        <div class="brand">Acme Software Inc.</div>
        <div>Invoice {inv['number']}<br/>{inv['date']}</div>
      </div>
      <table>
        <thead><tr><th>Item</th><th class="num">Qty</th><th class="num">Total</th></tr></thead>
        <tbody>
        {"".join(f"<tr><td>{l['desc']}</td><td class='num'>{l['qty']}</td><td class='num'>${l['total']:.2f}</td></tr>" for l in inv['lines'])}
        </tbody>
      </table>
      <div class="total">Total $ {inv['total']:.2f}</div>
    </body></html>"""
    return HTML(string=html).write_pdf(stylesheets=[CSS(string=CSS_TMPL)])

Works well. Would also work with Playwright; the pick is operational preference.

Option 3: Managed API

If you don’t want to operate Chromium yourself and WeasyPrint can’t handle your CSS, a managed API is a 30-line integration using requests:

import os, time, requests

QPDF_BASE = "https://login.21pdf.com/v1"
QPDF_KEY = os.environ["QPDF_KEY"]
HEADERS = {"Authorization": f"Bearer {QPDF_KEY}"}

def html_to_pdf(html: str) -> bytes:
    # 1. Submit render job.
    sub = requests.post(
        f"{QPDF_BASE}/convert",
        headers={**HEADERS, "Content-Type": "application/json"},
        json={
            "html": html,
            "options": {
                "page_size": "A4",
                "margin_top": 20, "margin_bottom": 20,
                "margin_left": 15, "margin_right": 15,
                "wait_for_network_idle": True,
            },
        },
    )
    sub.raise_for_status()
    job_id = sub.json()["job_id"]

    # 2. Poll until complete.
    for _ in range(60):
        time.sleep(0.5)
        st = requests.get(f"{QPDF_BASE}/jobs/{job_id}", headers=HEADERS).json()
        if st["status"] == "succeeded":
            break
        if st["status"] == "failed":
            raise RuntimeError(st.get("message", "render failed"))

    # 3. Download PDF bytes.
    pdf = requests.get(f"{QPDF_BASE}/jobs/{job_id}/download", headers=HEADERS)
    pdf.raise_for_status()
    return pdf.content

Works with any Python environment that has requests. No system deps, no Chromium, no fonts to bundle, no Docker complexity.

Async version for FastAPI

import asyncio, os, httpx

QPDF_BASE = "https://login.21pdf.com/v1"
QPDF_KEY = os.environ["QPDF_KEY"]
HEADERS = {"Authorization": f"Bearer {QPDF_KEY}"}

async def html_to_pdf(html: str) -> bytes:
    async with httpx.AsyncClient(timeout=90) as client:
        sub = await client.post(
            f"{QPDF_BASE}/convert", headers=HEADERS,
            json={
                "html": html,
                "options": {"page_size": "A4", "wait_for_network_idle": True},
            },
        )
        sub.raise_for_status()
        job_id = sub.json()["job_id"]

        for _ in range(60):
            await asyncio.sleep(0.5)
            st = (await client.get(f"{QPDF_BASE}/jobs/{job_id}", headers=HEADERS)).json()
            if st["status"] == "succeeded": break
            if st["status"] == "failed": raise RuntimeError(st.get("message"))

        pdf = await client.get(f"{QPDF_BASE}/jobs/{job_id}/download", headers=HEADERS)
        pdf.raise_for_status()
        return pdf.content

Option 4: pdfkit

pdfkit is a Python wrapper around wkhtmltopdf. Don’t start new projects here.

  • wkhtmltopdf upstream is deadproject entered deprecation in 2023. No modern CSS (no flex, no grid), no ongoing security patches.
  • If you have a legacy pdfkit integration, budget migration to Playwright or a managed API. The longer you wait, the more weird CSS edge cases you’ll discover that wkhtmltopdf renders differently than any real browser.

Don’t mistake pdfkit-the-Python-library for pdfkit (Node.js PDF construction library) or qpdf (the C++ PDF manipulation CLI). Different projects.

Framework integrations

Flask

from flask import Flask, Response, request
from weasyprint import HTML  # or your preferred option

app = Flask(__name__)

@app.route("/invoices/<invoice_id>/pdf")
def invoice_pdf(invoice_id):
    html = render_invoice_html(load_invoice(invoice_id))
    pdf = HTML(string=html).write_pdf()
    return Response(
        pdf,
        mimetype="application/pdf",
        headers={"Content-Disposition": f'inline; filename="invoice-{invoice_id}.pdf"'},
    )

FastAPI

from fastapi import FastAPI
from fastapi.responses import Response

app = FastAPI()

@app.get("/invoices/{invoice_id}/pdf")
async def invoice_pdf(invoice_id: str):
    invoice = await load_invoice(invoice_id)
    html = render_invoice_html(invoice)
    pdf = await html_to_pdf(html)  # using any option above
    return Response(
        content=pdf,
        media_type="application/pdf",
        headers={"Content-Disposition": f'inline; filename="invoice-{invoice_id}.pdf"'},
    )

Django

# views.py
from django.http import HttpResponse
from weasyprint import HTML

def invoice_pdf(request, invoice_id):
    invoice = Invoice.objects.get(pk=invoice_id)
    html = render_to_string("invoice.html", {"invoice": invoice}, request=request)
    pdf = HTML(string=html, base_url=request.build_absolute_uri("/")).write_pdf()
    return HttpResponse(pdf, content_type="application/pdf")

base_url is important in Django — relative asset URLs (/static/...) need a base to resolve against.

Jinja2 template → PDF

Any of the three viable options works with a rendered Jinja2 template:

from jinja2 import Environment, FileSystemLoader

env = Environment(loader=FileSystemLoader("templates"))
template = env.get_template("invoice.html")

html = template.render(invoice=invoice_data)
pdf = html_to_pdf(html)  # Playwright / WeasyPrint / 21pdf

Performance reference

Rough numbers on a modern laptop (M2 Pro, warmed up):

ScenarioTimeRAM
Playwright cold start800-1500ms~600MB
Playwright warm (new page)50-150ms-
WeasyPrint (simple HTML)80-200ms~50MB
WeasyPrint (complex HTML + fonts)300-800ms~80MB
21pdf API round-trip (warm)300-800msnegligible
pdfkit + wkhtmltopdf400-1200ms~200MB

WeasyPrint’s low memory footprint makes it attractive for cost-sensitive deployments. Playwright trades RAM for full CSS support. Managed API trades dollars for zero operational surface.

Async gotcha

If you use sync_playwright() inside an async context (like a FastAPI handler), it will work but blocks the event loop during the render — undoing the async benefit.

# BAD — blocks the event loop
@app.get("/pdf")
async def pdf_handler():
    with sync_playwright() as p:      # sync API inside async handler
        browser = p.chromium.launch() # blocks!
        # ...

# GOOD — use async_playwright
@app.get("/pdf")
async def pdf_handler():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        # ...

# ALSO GOOD — offload sync work to a thread
import asyncio

@app.get("/pdf")
async def pdf_handler():
    pdf = await asyncio.to_thread(sync_playwright_render, html)

WeasyPrint is sync-only; wrap it in asyncio.to_thread() inside async handlers.

Try 21pdf from Python

30 lines of requests, no Chromium to operate. 100 PDFs/month free.

Get API key → Read docs

Which option should you pick?

  • Simple reports, plain CSS, no JS? WeasyPrint. Pure Python, low memory, fast.
  • Modern CSS (flex, grid, container queries), web fonts, JS rendering? Playwright or managed API.
  • Serverless Python (Lambda, Cloud Functions)? Managed API. Getting Chromium into a serverless function from Python is more painful than from Node.
  • High volume self-hosted? Playwright with the browser pool pattern, or a self-hosted Gotenberg deployment that you call from Python via HTTP.
  • Low-to-moderate volume, don’t want to think about Chromium? Managed API.

The general considerations — @page support, SSRF hardening, wait conditions, async job models — apply across all Python options. See the HTML-to-PDF API guide for the Python-independent picture.

— 21pdf Engineering

Frequently asked questions

What is the best Python library for HTML to PDF conversion?

For production: Playwright (if you're OK operating Chromium) or a managed HTML-to-PDF API (if you're not). WeasyPrint is viable for simple reports but lacks modern CSS (no flex, no grid, no container queries). Skip pdfkit + wkhtmltopdf — the engine is unmaintained.

Is WeasyPrint a good choice for HTML to PDF in Python?

For simple, print-oriented documents with straightforward CSS, yes — it's pure Python, no binary deps, reasonable output. For anything using modern layout (flex, grid), JavaScript-rendered content, or complex fonts, it will disappoint. It's ~30% of what Chromium does.

Can I use Puppeteer from Python?

There's a port called Pyppeteer, but it's lightly maintained. Use Playwright's Python binding instead — same underlying Chromium, officially maintained by Microsoft, works the same way.

How do I use Playwright for PDF generation in Python?

pip install playwright, then playwright install chromium. Launch browser, new_page(), set_content() with wait_until='networkidle', call page.pdf(). 15 lines total. Example in this post.

Does WeasyPrint support JavaScript?

No. WeasyPrint parses HTML+CSS only — no JS execution. Pages that fetch data after load or render to canvas will produce blank PDFs. Use a Chromium-based solution (Playwright, managed API) if your HTML depends on JS.

Can I generate PDFs from Flask or FastAPI?

Yes — render your template to an HTML string, pass it to any of the options in this post, return the PDF bytes as a Response with Content-Type application/pdf. Working example below.