home / data-processing / pandoc

Pandoc

42k

Document conversion CLI for turning Markdown, HTML, DOCX, EPUB, notebooks, and other markup formats into HTML, DOCX, slides, ebooks, and PDF output.

$brew install pandoc

Language

Haskell

Stars

42,457

Category

Data Processing

Agent

Ready

Agent Compatibility

JSON Output

Agent Skill

MCP Support

AI Analysis

Pandoc is a document conversion CLI that reads many markup, publishing, and office formats into a shared document AST, then writes them back out as HTML, Markdown, DOCX, EPUB, slides, man pages, or PDF via external engines. It is most useful when you need repeatable content transformation, not interactive editing.

What It Enables

Convert source documents between Markdown, HTML, DOCX, EPUB, Jupyter notebooks, wiki formats, slide decks, and other text-centric formats in scripts or CI.
Generate publishable outputs such as HTML, DOCX, EPUB, man pages, presentations, and PDF files from source documents plus templates, metadata, citations, and style settings.
Apply custom document transformations with built-in citeproc, Lua filters, or JSON AST filters before emitting the target format.

Agent Fit

Explicit --from and --to flags, stdin/stdout operation, defaults files, and list/help commands make conversion jobs easy to inspect and rerun.
JSON support is real but AST-oriented: -t json and -f json expose Pandoc's document tree for filters, while most ordinary conversions emit target documents rather than machine-readable status.
Useful for agents that need to normalize content, generate derived artifacts, or apply repeatable document rewrites; less relevant for service control or state inspection tasks.

Caveats

PDF generation depends on external engines such as LaTeX, Groff ms, or HTML-based tooling, so unattended environments need those dependencies installed.
Conversions can be lossy between richer formats, and the server mode disables filters, PDF output, and HTTP resource fetching.