HTML query CLI for selecting nodes with CSS selectors and emitting matching markup, text, attributes, or JSON.
$brew install https://raw.githubusercontent.com/EricChiang/pup/master/pup.rb
Agent Compatibility
JSON Output
Agent Skill
MCP Support
AI Analysis
pup is a small HTML parsing CLI that reads markup from stdin or a file, applies CSS selectors, and prints the matching nodes. It is useful for lightweight scraping, inspection, and preprocessing when you already have the HTML and do not need a browser session.
What It Enables
- Extract specific HTML fragments, text content, attribute values, or match counts from fetched pages or saved documents.
- Turn selected nodes into a simple JSON structure for downstream parsing in shell pipelines.
- Pretty-print messy markup or narrow a large page down to the subsection another tool or script should inspect next.
Agent Fit
- Stdin or file input, explicit flags, and selector-based queries make it easy to compose with
curl, saved fixtures, and follow-up shell steps. json{}provides real machine-readable output, but the schema is limited to node tags, attributes, text, comments, and nested children.- Best as a lightweight HTML extraction primitive inside a larger fetch and parse workflow, not as a complete web interaction surface.
Caveats
- It only processes static HTML you provide; it does not fetch pages, run JavaScript, or maintain login state.
- The project README has at least one stale behavior note around
json{}output shape, so source is a better reference for edge cases.