5.1.3 Large Language Models (LLM)
Last Version: 11/09/2025
Overview
This section explains how to use Ollama, a local deployment tool for Large Language Models (LLMs). Ollama is an open-source, cross-platform framework that allows you to quickly deploy popular pre-trained LLMs (such as LLaMA, DeepSeek, Qwen) on PCs, edge servers, or other devices. It can run offline, without cloud services or high-performance GPUs.
Instal Ollama
Update your system & Install the Ollama toolkit with:
sudo apt update
sudo apt install spacemit-ollama-toolkit
Verify the installation
ollama list
If the output shows NAME ID SIZE MODIFIED
, the installation was successful.
Check version (must be 0.0.8 or higher)
sudo apt show spacemit-ollama-toolkit
Confirm the version is 0.0.8 or higher to support new model formats and direct pull functionality.
Download Models
Method 1: Direct Pull (Recommended, requires version 0.0.8+)
Starting with spacemit-ollama-toolkit
version 0.0.8, add support for q4_K_M and q4_1 model formats.
You can directly use the ollama pull
command to pull q4_K_M format models from the official Ollama website and enjoy acceleration features:
# Directly pull q4_K_M format models (recommended)
ollama pull qwen3:0.6b
Method 2: Manual Model Creation
Alternatively, you can manually download and create a model.
This method is useful for q4_0 models, which also run well on K1 development boards.
-
Download the model file (
.gguf
) and its corresponding Modelfile (.modelfile
). -
In ModelScope, select a
.gguf
model, download the q4_0 quantized version, and copy both files to your development board or MUSEBOOK.
Below is an example of model creation:
sudo apt install wget
wget https://modelscope.cn/models/second-state/Qwen2.5-0.5B-Instruct-GGUF/resolve/master/Qwen2.5-0.5B-Instruct-Q4_0.gguf ~/
wget https://archive.spacemit.com/spacemit-ai/modelfile/qwen2.5:0.5b.modelfile ~/
cd ~/
ollama create qwen2.5:0.5b -f qwen2.5:0.5b.modelfile
⚠️ Note:
- Adjust the
.modelfile
content to match to the downloaded model name, path, format, etc.- For specific format, please check the official Ollama website: https://ollama.com/search
Use the Model
# Run directly pulled model
ollama run qwen3:0.6b
# Or run manually created model
ollama run qwen2.5:0.5b
The system will start the model, and wait for user input interaction or call it from an application.