home / data-processing / htmlq

htmlq

7.5k

CLI for querying HTML with CSS selectors and extracting matching fragments, text, or attributes in shell pipelines.

$brew install htmlq
Language
Rust
Stars
7,504
Category
Data Processing
Agent
AI Analysis

htmlq is a small CLI for selecting parts of an HTML document with CSS selectors and sending the result to stdout. It is useful when a script or agent needs lightweight HTML extraction or cleanup without a browser session.

What It Enables
  • Extract matching elements, text content, or attribute values from HTML read from stdin or a file.
  • Strip unwanted nodes before output so downstream tools see only the fragment you care about.
  • Rewrite relative links against a supplied or detected base URL before passing results into later shell steps.
Agent Fit
  • CSS selectors plus stdin/stdout make it easy to drop into fetch, inspect, and follow-up pipeline loops.
  • Output is deterministic plain text or HTML, but there is no JSON mode, so downstream parsing stays string-based.
  • Best for lightweight scraping and preprocessing of static HTML, not for broader browser automation or stateful web workflows.
Caveats
  • The feature surface is intentionally narrow: selection, text or attribute extraction, node removal, pretty printing, and link rewriting.
  • If a page needs JavaScript execution, login state, or form interaction, you need another tool before htmlq can help.