Skip to main content

Ollama

‌Ollama‌ is an open-source, cross-platform tool for local deployment of large language models (LLMs), engineered to simplify the execution, management, and inference workflows of LLMs in on-premises environments. It empowers users to deploy and invoke pre-trained models (e.g., LLaMA, DeepSeek) directly on personal devices (including PCs and edge servers) via simple CLI commands, eliminating dependencies on cloud services or high-end GPU hardware.

Installation

sudo apt update
sudo apt install spacemit-ollama-toolkit

Verify:

ollama list

The final output of NAME ID SIZE MODIFIED indicates a successful installation.

Pull models

To ensure maximum performance efficiency on the K1 development board, we strongly recommend deploying the ‌q4_0 quantized model‌ by downloading GGUF-format models with ‌q4_0 quantization precision‌ from platforms like ModelScope or Hugging Face and transferring them to your K1 board or MuseBook device for optimal hardware compatibility.

Below is a model creation example demonstrating the production workflow:

sudo apt install wget
wget https://huggingface.co/second-state/Qwen2.5-0.5B-Instruct-GGUF/blob/main/Qwen2.5-0.5B-Instruct-Q4_0.gguf ~/
wget https://archive.spacemit.com/spacemit-ai/modelfile/qwen2.5:0.5b.modelfile ~/
cd ~/
ollama create qwen2.5:0.5b -f qwen2.5:0.5b.modelfile

Usage

ollama run qwen2.5:0.5b