Ollama
ValorIDE can use Ollama as a local model provider for private, offline, or low-latency workflows. The best setup is usually a fast instruction model, streaming enabled, and a timeout long enough for first-token startup.
Quick Start
Install and start Ollama, then pull a model:
ollama serve
ollama pull mistral
Configure ValorIDE with the local Ollama endpoint:
{
"apiProvider": "ollama",
"ollamaBaseUrl": "http://localhost:11434",
"ollamaModelId": "mistral",
"ollamaRequestTimeout": "120000",
"ollamaKeepAlive": "10m"
}
Model Selection
Start with a small or mid-sized model before moving to larger models.
| Model | Typical Use | Resource Profile |
|---|---|---|
mistral | General coding and chat | Fast on most developer machines |
neural-chat | Lightweight instruction following | Good for lower-memory machines |
phi | Small tasks and constrained hardware | Very small, lower capability ceiling |
| Larger Llama-family models | Higher-quality reasoning when hardware allows | Requires significantly more memory |
The first response after loading a model may be slower. Keep-alive settings help subsequent requests stay responsive.
Timeout Strategy
ValorIDE streams Ollama responses and tolerates pauses between chunks. Use a larger request timeout when:
- the model is large
- the machine is memory constrained
- the first token takes longer than expected
- the prompt includes a large project context
Example:
{
"ollamaRequestTimeout": "180000"
}
Troubleshooting
Connection Refused
Start the Ollama server:
ollama serve
Model Not Found
Pull the model before selecting it in ValorIDE:
ollama pull mistral
Slow First Response
The model may still be loading into memory. Retry after the first response, use a smaller model, or increase the request timeout.
Stream Pauses
Short pauses are expected with local inference. If pauses become frequent, reduce context size, choose a smaller model, or close other memory-heavy applications.
Related guides: