Skip to main content

Ollama

‌Ollama‌ is an open-source, cross-platform tool for local deployment of large language models (LLMs), engineered to simplify the execution, management, and inference workflows of LLMs in on-premises environments. It empowers users to deploy and invoke pre-trained models (e.g., LLaMA, DeepSeek) directly on personal devices (including PCs and edge servers) via simple CLI commands, eliminating dependencies on cloud services or high-end GPU hardware.

Installation

sudo apt update
sudo apt install spacemit-ollama-toolkit

Verify installation:

ollama list

The final output of NAME ID SIZE MODIFIED indicates a successful installation.

Verify version (ensure version 0.0.8 or above):

sudo apt show spacemit-ollama-toolkit

Confirm the version is 0.0.8 or above to support new model formats and direct pulling functionality.

Pull models

Starting from spacemit-ollama-toolkit version 0.0.8, new support for q4_K_M and q4_1 model formats has been added. You can directly use the ollama pull command to fetch q4_K_M format models from the official Ollama website with acceleration features:

# Directly pull q4_K_M format models (recommended)
ollama pull qwen3:0.6b

Method 2: Manual Model Creation

To ensure maximum performance efficiency on the K1 development board, q4_0 quantized models also perform very well. You can choose to manually download and create models by downloading GGUF-format models with q4_0 quantization precision from platforms like ModelScope or Hugging Face and transferring them to your K1 board or MuseBook device.

Below is a model creation example demonstrating the production workflow:

sudo apt install wget
wget https://huggingface.co/second-state/Qwen2.5-0.5B-Instruct-GGUF/blob/main/Qwen2.5-0.5B-Instruct-Q4_0.gguf ~/
wget https://archive.spacemit.com/spacemit-ai/modelfile/qwen2.5:0.5b.modelfile ~/
cd ~/
ollama create qwen2.5:0.5b -f qwen2.5:0.5b.modelfile

Usage

# Run directly pulled models
ollama run qwen3:0.6b

# Or run manually created models
ollama run qwen2.5:0.5b