home / data-processing / csvkit

csvkit

6.4k

Command-line suite for converting tabular files to CSV, transforming CSV data, and querying it with SQL.

$pip install csvkit
Language
Python
Stars
6,354
Category
Data Processing
Agent
Ready
Agent Compatibility
JSON Output
Agent Skill
MCP Support
AI Analysis

csvkit is a collection of small commands for moving tabular data into CSV, reshaping it with Unix-style filters, and bridging CSVs to JSON or SQL. It is most useful when you need quick inspection, cleanup, joins, or ad hoc queries without opening a spreadsheet or writing a custom script.

What It Enables
  • Convert Excel, JSON, ndjson, DBF, GeoJSON, and fixed-width sources into CSV for shell pipelines.
  • Inspect, filter, sort, join, stack, clean, and reformat CSV files with single-purpose commands like csvcut, csvgrep, csvjoin, and csvsort.
  • Generate summary stats, emit JSON or GeoJSON, and run ad hoc SQL queries or move CSV data into databases with csvstat, csvjson, csvsql, and sql2csv.
Agent Fit
  • Most commands are non-interactive, accept stdin or file inputs, and write predictable stdout, so they compose cleanly in shell loops and scripts.
  • Structured output is present but uneven across the suite: csvjson emits JSON, GeoJSON, or NDJSON, and csvstat --json returns machine-readable stats, while many other commands stay CSV or text-first.
  • Best fit for small to medium tabular workflows, preprocessing steps, and inspect or transform loops before handing larger analysis to SQL engines or faster CSV tooling.
Caveats
  • CSV dialect sniffing and type inference are convenient but can misread edge cases; the docs recommend --snifflimit 0 and --no-inference when you need deterministic parsing.
  • The docs explicitly warn that csvkit reaches its limits on larger files, and csvsql --query works by loading data into an in-memory SQLite database.