Local model runtime CLI for pulling models, serving a local API with OpenAI-compatible endpoints, creating Modelfile-based variants, and launching supported integrations.
$curl -fsSL https://ollama.com/install.sh | sh
Agent Compatibility
JSON Output
Agent Skill
MCP Support
AI Analysis
Ollama is a local model runtime and control CLI for pulling models, running them locally or through Ollama Cloud, and exposing them over a local HTTP API. It also packages customized models and can launch supported coding tools against that runtime.
What It Enables
- Pull, run, stop, and inspect local or cloud-backed models from the shell, including interactive chat and embedding generation.
- Start a local Ollama server that exposes native and OpenAI-compatible JSON APIs for chat, embeddings, structured outputs, vision, and tool-calling requests.
- Create and import customized models from
Modelfile, Safetensors, or GGUF assets, then launch supported tools likecodexorclaudeagainst the local runtime.
Agent Fit
- Agents usually get the most value from
ollama serveplus the JSON API, with the CLI handling model lifecycle, setup, and simple one-shot runs. ollama runaccepts piped stdin, supports--format json, and embedding models print a JSON array, so it can participate in shell pipelines even without wrapping the HTTP API.- Fit is mixed rather than fully deterministic: the default entrypoint is a TUI, model responses are still probabilistic, and unattended use depends on local hardware limits or Ollama account auth for cloud flows.
Caveats
- Large local models are constrained by available CPU, GPU, memory, and disk; cloud models require sign-in or API-key setup.
- The bare
ollamaandollama launchflows are human-oriented Bubble Tea menus, so automation should call explicit subcommands or the API directly.