pup

8.4k

HTML query CLI for selecting nodes with CSS selectors and emitting matching markup, text, attributes, or JSON.

$brew install https://raw.githubusercontent.com/EricChiang/pup/master/pup.rb
Language
Go
Stars
8,399
Category
Data Processing
Agent
Ready
Agent Compatibility
JSON Output
Agent Skill
MCP Support
AI Analysis

pup is a small HTML parsing CLI that reads markup from stdin or a file, applies CSS selectors, and prints the matching nodes. It is useful for lightweight scraping, inspection, and preprocessing when you already have the HTML and do not need a browser session.

What It Enables
  • Extract specific HTML fragments, text content, attribute values, or match counts from fetched pages or saved documents.
  • Turn selected nodes into a simple JSON structure for downstream parsing in shell pipelines.
  • Pretty-print messy markup or narrow a large page down to the subsection another tool or script should inspect next.
Agent Fit
  • Stdin or file input, explicit flags, and selector-based queries make it easy to compose with curl, saved fixtures, and follow-up shell steps.
  • json{} provides real machine-readable output, but the schema is limited to node tags, attributes, text, comments, and nested children.
  • Best as a lightweight HTML extraction primitive inside a larger fetch and parse workflow, not as a complete web interaction surface.
Caveats
  • It only processes static HTML you provide; it does not fetch pages, run JavaScript, or maintain login state.
  • The project README has at least one stale behavior note around json{} output shape, so source is a better reference for edge cases.