Skip to main content

Ollama

ValorIDE can use Ollama as a local model provider for private, offline, or low-latency workflows. The best setup is usually a fast instruction model, streaming enabled, and a timeout long enough for first-token startup.

Quick Start

Install and start Ollama, then pull a model:

ollama serve
ollama pull mistral

Configure ValorIDE with the local Ollama endpoint:

{
"apiProvider": "ollama",
"ollamaBaseUrl": "http://localhost:11434",
"ollamaModelId": "mistral",
"ollamaRequestTimeout": "120000",
"ollamaKeepAlive": "10m"
}

Model Selection

Start with a small or mid-sized model before moving to larger models.

ModelTypical UseResource Profile
mistralGeneral coding and chatFast on most developer machines
neural-chatLightweight instruction followingGood for lower-memory machines
phiSmall tasks and constrained hardwareVery small, lower capability ceiling
Larger Llama-family modelsHigher-quality reasoning when hardware allowsRequires significantly more memory

The first response after loading a model may be slower. Keep-alive settings help subsequent requests stay responsive.

Timeout Strategy

ValorIDE streams Ollama responses and tolerates pauses between chunks. Use a larger request timeout when:

  • the model is large
  • the machine is memory constrained
  • the first token takes longer than expected
  • the prompt includes a large project context

Example:

{
"ollamaRequestTimeout": "180000"
}

Troubleshooting

Connection Refused

Start the Ollama server:

ollama serve

Model Not Found

Pull the model before selecting it in ValorIDE:

ollama pull mistral

Slow First Response

The model may still be loading into memory. Retry after the first response, use a smaller model, or increase the request timeout.

Stream Pauses

Short pauses are expected with local inference. If pauses become frequent, reduce context size, choose a smaller model, or close other memory-heavy applications.

Related guides: