Skip to main content

5.2.3 Large Language Models

Feature Introduction

This section introduces how to use Ollama, a local deployment tool for Large Language Models (LLM). Ollama is an open-source, cross-platform LLM local deployment framework that supports rapid deployment of mainstream pre-trained language models (such as LLaMA, DeepSeek, Qwen, etc.) on PCs, edge servers, and other devices. It can run offline without depending on cloud services or high-performance GPUs.

Installation

sudo apt update
sudo apt install spacemit-ollama-toolkit

Verify installation:

ollama list

The final output NAME ID SIZE MODIFIED indicates successful installation.

Verify version (ensure version 0.0.8 or above):

sudo apt show spacemit-ollama-toolkit

Confirm the version is 0.0.8 or above to support new model formats and direct pull functionality.

Download Models

Starting from spacemit-ollama-toolkit version 0.0.8, new support for q4_K_M and q4_1 model formats has been added. You can directly use the ollama pull command to pull q4_K_M format models from the official Ollama website and enjoy acceleration features:

# Directly pull q4_K_M format models (recommended)
ollama pull qwen3:0.6b

Method 2: Manual Model Creation

Since q4_0 models can also perform very well on K1 development boards, you can also choose to manually download and create models. Select the gguf type model you want to download on modelscope, and download the q4_0 quantized precision model to the development board or musebook.

Below is an example of model creation:

sudo apt install wget
wget https://modelscope.cn/models/second-state/Qwen2.5-0.5B-Instruct-GGUF/resolve/master/Qwen2.5-0.5B-Instruct-Q4_0.gguf ~/
wget https://archive.spacemit.com/spacemit-ai/modelfile/qwen2.5:0.5b.modelfile ~/
cd ~/
ollama create qwen2.5:0.5b -f qwen2.5:0.5b.modelfile

⚠️ The .modelfile content should be adapted according to the downloaded model name, path, format, etc. For specific format, please refer to the documentation provided on the official Ollama website: https://ollama.com/search

Usage

# Run directly pulled model
ollama run qwen3:0.6b

# Or run manually created model
ollama run qwen2.5:0.5b

The system will start the model and wait for user input interaction or interface calls.