Sunday, March 30, 2025

Ollama Setup for a small LLM on Linux

Download and install Ollama: 

curl -fsSL https://ollama.com/install.sh | sh

Install two models (for comparing their performances):

ollama pull llama2 ollama pull gemma:7b

Check if everything is fine:

ollama list systemctl status ollama

Let us serve the gemma:7b model:

ollama run gemma:7b

ollama show gemma:7b Model architecture gemma parameters 8.5B context length 8192 embedding length 3072 quantization Q4_0 Parameters penalize_newline false repeat_penalty 1 stop "<start_of_turn>" stop "<end_of_turn>" License Gemma Terms of Use Last modified: February 21, 2024

ollama ps NAME ID SIZE PROCESSOR UNTIL gemma:7b a72c7f4d0a15 9.6 GB 100% CPU 4 minutes from now

Check through generate API:


curl http://localhost:11434/api/generate -d '{ "model": "gemma:7b", "prompt": "Why Redis is so fast?", "stream": false }'