Skip to main content

5.1.3 Large Language Models (LLM)

Last Version: 11/09/2025

Overview

This section explains how to use Ollama, a local deployment tool for Large Language Models (LLMs). Ollama is an open-source, cross-platform framework that allows you to quickly deploy popular pre-trained LLMs (such as LLaMA, DeepSeek, Qwen) on PCs, edge servers, or other devices. It can run offline, without cloud services or high-performance GPUs.

Instal Ollama

Update your system & Install the Ollama toolkit with:

sudo apt update
sudo apt install spacemit-ollama-toolkit

Verify the installation

ollama list

If the output shows NAME ID SIZE MODIFIED, the installation was successful.

Check version (must be 0.0.8 or higher)

sudo apt show spacemit-ollama-toolkit

Confirm the version is 0.0.8 or higher to support new model formats and direct pull functionality.

Download Models

Starting with spacemit-ollama-toolkit version 0.0.8, add support for q4_K_M and q4_1 model formats. You can directly use the ollama pull command to pull q4_K_M format models from the official Ollama website and enjoy acceleration features:

# Directly pull q4_K_M format models (recommended)
ollama pull qwen3:0.6b

Method 2: Manual Model Creation

Alternatively, you can manually download and create a model.

This method is useful for q4_0 models, which also run well on K1 development boards.

  1. Download the model file (.gguf) and its corresponding Modelfile (.modelfile).

  2. In ModelScope, select a .gguf model, download the q4_0 quantized version, and copy both files to your development board or MUSEBOOK.

Below is an example of model creation:

sudo apt install wget
wget https://modelscope.cn/models/second-state/Qwen2.5-0.5B-Instruct-GGUF/resolve/master/Qwen2.5-0.5B-Instruct-Q4_0.gguf ~/
wget https://archive.spacemit.com/spacemit-ai/modelfile/qwen2.5:0.5b.modelfile ~/
cd ~/
ollama create qwen2.5:0.5b -f qwen2.5:0.5b.modelfile

⚠️ Note:

  • Adjust the .modelfile content to match to the downloaded model name, path, format, etc.
  • For specific format, please check the official Ollama website: https://ollama.com/search

Use the Model

# Run directly pulled model
ollama run qwen3:0.6b

# Or run manually created model
ollama run qwen2.5:0.5b

The system will start the model, and wait for user input interaction or call it from an application.