home / data-processing / pandoc

Pandoc

42k

Document conversion CLI for turning Markdown, HTML, DOCX, EPUB, notebooks, and other markup formats into HTML, DOCX, slides, ebooks, and PDF output.

$brew install pandoc
Language
Haskell
Stars
42,457
Category
Data Processing
Agent
Ready
Agent Compatibility
JSON Output
Agent Skill
MCP Support
AI Analysis

Pandoc is a document conversion CLI that reads many markup, publishing, and office formats into a shared document AST, then writes them back out as HTML, Markdown, DOCX, EPUB, slides, man pages, or PDF via external engines. It is most useful when you need repeatable content transformation, not interactive editing.

What It Enables
  • Convert source documents between Markdown, HTML, DOCX, EPUB, Jupyter notebooks, wiki formats, slide decks, and other text-centric formats in scripts or CI.
  • Generate publishable outputs such as HTML, DOCX, EPUB, man pages, presentations, and PDF files from source documents plus templates, metadata, citations, and style settings.
  • Apply custom document transformations with built-in citeproc, Lua filters, or JSON AST filters before emitting the target format.
Agent Fit
  • Explicit --from and --to flags, stdin/stdout operation, defaults files, and list/help commands make conversion jobs easy to inspect and rerun.
  • JSON support is real but AST-oriented: -t json and -f json expose Pandoc's document tree for filters, while most ordinary conversions emit target documents rather than machine-readable status.
  • Useful for agents that need to normalize content, generate derived artifacts, or apply repeatable document rewrites; less relevant for service control or state inspection tasks.
Caveats
  • PDF generation depends on external engines such as LaTeX, Groff ms, or HTML-based tooling, so unattended environments need those dependencies installed.
  • Conversions can be lossy between richer formats, and the server mode disables filters, PDF output, and HTTP resource fetching.