OpenHarmony-5.0.0-Risc-V架构搭建DeepSeek-R1-EW帮帮网

OpenHarmony-5.0.0-Risc-V架构搭建DeepSeek-R1

参考laval社区的文章：OpenHarmony带你玩转DeepSeekR1大模型

文章目录

OpenHarmony-5.0.0-Risc-V架构搭建DeepSeek-R1
前言
一、前期准备
二、获取源码
- 1.错误示范
- 2.下载
三、编译llama.cpp
- 1.生成makefile
- 2.编译
四、模型文件下载
五、设备上运行
总结

前言

开始上手OpenHarmony上的AI，前期调研了一下AI在OH上的发展，过程中找到了laval上的这篇文章，想着在riscv架构的设备上实现以下，于是有了本篇文章。
调研情况可以参见该文章：《还没写，之后补上》

一、前期准备

烧录了OH的riscv架构设备（荔枝派，rvbook，进迭时空的musepaper等，qemu不确定）
HDC（华为官方获取即可）
适配了riscv的OHOS-sdk（musepaper-5.0.0分支）

二、获取源码

1.错误示范

开始从gitee上找的llama.cpp，结果编译出来后在设备上运行时加载模型报错，猜测是分支问题，于是自行扶墙从github上下载。
在这里插入图片描述

2.下载

https://github.com/ggml-org/llama.cpp
配置了密钥的可以直接

git clone https://github.com/ggml-org/llama.cpp.git

没配置的直接在页面上下载master分支的zip即可。然后解压
无法扶墙的朋友看过来：https://download.csdn.net/download/qqq1112345/90576842

三、编译llama.cpp

1.生成makefile

进入llama目录，使用以下命令生成llama的makefile，工具链目录自行修改

root@liusai-ubuntu-01:/opt/liusai/llama.cpp-master# /opt/liusai/musepaper-5.0.0/prebuilts/ohos-sdk/linux/12/native/build-tools/cmake/bin/cmake
-DCMAKE_TOOLCHAIN_FILE=/opt/liusai/musepaper-5.0.0/prebuilts/ohos-sdk/linux/12/native/build/cmake/ohos.toolchain.cmake
-DOHOS_ARCH=riscv64
-DOHOS_ABI=riscv64-unknown-ohos
-DCMAKE_C_COMPILER=/opt/liusai/musepaper-5.0.0/prebuilts/ohos-sdk/linux/12/native/llvm/bin/clang
-DCMAKE_CXX_COMPILER=/opt/liusai/musepaper-5.0.0/prebuilts/ohos-sdk/linux/12/native/llvm/bin/clang
-B build
– The C compiler identification is Clang 15.0.4
– The CXX compiler identification is Clang 15.0.4
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Check for working C compiler: /opt/liusai/musepaper-5.0.0/prebuilts/ohos-sdk/linux/12/native/llvm/bin/clang - skipped
– Detecting C compile features
– Detecting C compile features - done
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: /opt/liusai/musepaper-5.0.0/prebuilts/ohos-sdk/linux/12/native/llvm/bin/clang++ - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Found Git: /usr/bin/git (found version “2.34.1”)
fatal: not a git repository (or any parent up to mount point /opt)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: not a git repository (or any parent up to mount point /opt)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
– Setting GGML_NATIVE_DEFAULT to OFF
– Performing Test CMAKE_HAVE_LIBC_PTHREAD
– Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
– Check if compiler accepts -pthread
– Check if compiler accepts -pthread - yes
– Found Threads: TRUE
– ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
– CMAKE_SYSTEM_PROCESSOR: aarch64
– Including CPU backend
– ARM detected
– Performing Test GGML_COMPILER_SUPPORTS_FP16_FORMAT_I3E
– Performing Test GGML_COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
– Adding CPU backend variant ggml-cpu:
fatal: not a git repository (or any parent up to mount point /opt)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: not a git repository (or any parent up to mount point /opt)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
– Looking for pthread_create in pthreads
– Looking for pthread_create in pthreads - not found
– Looking for pthread_create in pthread
– Looking for pthread_create in pthread - found
CMake Warning at common/CMakeLists.txt:32 (message):
Git repository not found; to enable automatic generation of build info,
make sure Git is installed and the project is a Git repository.
– Configuring done (1.1s)
– Generating done (0.1s)
CMake Warning:
Manually-specified variables were not used by the project:
OHOS_ABI
– Build files have been written to: /opt/liusai/llama.cpp-master/build

编译gitee上这个版本https://gitee.com/src-openeuler/llama.cpp
使用的是

root@liusai-ubuntu-01:/opt/liusai/llama.cpp/llama# /opt/liusai/musepaper-5.0.0/prebuilts/ohos-sdk/linux/12/native/build-tools/cmake/bin/cmake
-DCMAKE_TOOLCHAIN_FILE=/opt/liusai/musepaper-5.0.0/prebuilts/ohos-sdk/linux/12/native/build/cmake/ohos.toolchain.cmake
-DANDROID_ABI=riscv64
-DANDROID_PLATFORM=OHOS
-DCMAKE_C_FLAGS=“-march=rv64g”
-DCMAKE_CXX_FLAGS=“-march=rv64g”
-DGGML_OPENMP=OFF
-DGGML_LLAMAFILE=OFF
-B build
– The C compiler identification is Clang 15.0.4
– The CXX compiler identification is Clang 15.0.4
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Check for working C compiler: /opt/liusai/musepaper-5.0.0/prebuilts/ohos-sdk/linux/12/native/llvm/bin/clang - skipped
– Detecting C compile features
– Detecting C compile features - done
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: /opt/liusai/musepaper-5.0.0/prebuilts/ohos-sdk/linux/12/native/llvm/bin/clang++ - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Found Git: /usr/bin/git (found version “2.34.1”)
– Performing Test CMAKE_HAVE_LIBC_PTHREAD
– Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
– Check if compiler accepts -pthread
– Check if compiler accepts -pthread - yes
– Found Threads: TRUE
– ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
– CMAKE_SYSTEM_PROCESSOR: aarch64
– Including CPU backend
– ARM detected
– Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
– Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
– Adding CPU backend variant ggml-cpu:
– Looking for pthread_create in pthreads
– Looking for pthread_create in pthreads - not found
– Looking for pthread_create in pthread
– Looking for pthread_create in pthread - found
CMake Warning at common/CMakeLists.txt:32 (message):
Git repository not found; to enable automatic generation of build info,
make sure Git is installed and the project is a Git repository.
– Configuring done (2.0s)
– Generating done (0.1s)
– Build files have been written to: /opt/liusai/llama.cpp/llama/build

注意，riscv架构使用的是-DCMAKE_C_FLAGS=“-march=rv64g”

2.编译

用以下命令即可，没有报错

/opt/liusai/musepaper-5.0.0/prebuilts/ohos-sdk/linux/12/native/build-tools/cmake/bin/cmake --build build --config Release

在这里插入图片描述
想编译过程中没有warning的话，在编译配置中指定llvm的路径应该就可以。
同样，进入build目录查看编译产物架构

四、模型文件下载

在魔塔社区https://modelscope.cn/中下载DeepSeek-R1-Distill-Qwen-1.5B-GGUF模型

我贴的详细点，不然还得稍微找一下。按照laval文章中，模型库->模型文件，搜索“DeepSeek-R1-Distill-Qwen-1.5B-GGUF”，然后下载大小为1.12G的那个，其他的模型本人没有测试，感兴趣的可以下载一下看看模型加载后运行过程中有没有区别。
https://modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/files
在这里插入图片描述

五、设备上运行

hdc连上OH设备后

PS C:\Users\刘赛\Desktop\AI\llama\riscv64> hdc file send libggml.so /lib/
[Fail]Error opening file: read-only file system, path:/lib//libggml.so
PS C:\Users\刘赛\Desktop\AI\llama\riscv64> hdc file send libggml.so /lib
[Fail]Error opening file: illegal operation on a directory, path:/lib
PS C:\Users\刘赛\Desktop\AI\llama\riscv64> hdc target mount //增加权限否则会像上面那样报错
Mount finish

PS C:\Users\刘赛\Desktop\AI\llama\riscv64> hdc file send libggml.so /lib/
FileTransfer finish, Size:107136, File count = 1, time:17ms rate:6302.12kB/s
PS C:\Users\刘赛\Desktop\AI\llama\riscv64> hdc shell ls /lib/
chipset-pub-sdk
chipset-sdk
extensionability
firmware
ld-musl-aarch64.so.1
ld-musl-arm.so.1
ld-musl-riscv64.so.1
libggml.so
media
module
ndk
platformsdk
seccomp
PS C:\Users\刘赛\Desktop\AI\llama\riscv64> hdc file send libggml-base.so /lib/
FileTransfer finish, Size:518192, File count = 1, time:47ms rate:11025.36kB/s
PS C:\Users\刘赛\Desktop\AI\llama\riscv64> hdc file send libggml-cpu.so /lib/
FileTransfer finish, Size:357856, File count = 1, time:35ms rate:10224.46kB/s
PS C:\Users\刘赛\Desktop\AI\llama\riscv64> hdc file send libllama.so /lib/
FileTransfer finish, Size:1959784, File count = 1, time:153ms rate:12809.05kB/s
PS C:\Users\刘赛\Desktop\AI\llama\riscv64> hdc file send libllava_shared.so /lib/
FileTransfer finish, Size:528624, File count = 1, time:48ms rate:11013.00kB/s
PS C:\Users\刘赛\Desktop\AI\llama\riscv64> hdc file send llama-cli /data/
FileTransfer finish, Size:1560432, File count = 1, time:123ms rate:12686.44kB/s
PS C:\Users\刘赛\Desktop\AI\llama\riscv64> hdc file send llama-server /data/
FileTransfer finish, Size:3484456, File count = 1, time:255ms rate:13664.53kB/s
PS C:\Users\刘赛\Desktop\AI\llama\riscv64> hdc file send DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf /mnt/
FileTransfer finish, Size:1117320576, File count = 1, time:74188ms rate:15060.66kB/s
PS C:\Users\刘赛\Desktop\AI\llama\riscv64> hdc shell
# cd data
# ls
app llama-cli lost+found power update
bluetooth llama-server misc samgr updater
chipset local module_update service vendor
data log nfc system
# chmod -R 777 llama-cli llama-server

一共五个动态库文件，两个进程文件和一个模型文件。
在这里插入图片描述
动态库文件放于/lib目录下，该目录貌似是软链接至/sysyem/lib的。
进程文件放于/data目录下。
由于设备data目录空间不足，gguf放于/mnt目录下。
运行后报错：

./llama-cli -m /mnt/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf <
Error loading shared library libc++_shared.so: No such file or directory (needed by ./llama-cli)
Error loading shared library libc++_shared.so: No such file or directory (needed by /system/lib/libllama.so)
Error loading shared library libc++_shared.so: No such file or directory (needed by /system/lib/libggml.so)
Error loading shared library libc++_shared.so: No such file or directory (needed by /system/lib/libggml-cpu.so)
Error loading shared library libc++_shared.so: No such file or directory (needed by /system/lib/libggml-base.so)
Error relocating /system/lib/libllama.so: _Znwm: symbol not found
Error relocating /system/lib/libllama.so: _ZdlPv: symbol not found
Error relocating /system/lib/libllama.so: _Znam: symbol not found
Error relocating /system/lib/libllama.so: _ZdaPv: symbol not found
Error relocating /system/lib/libllama.so: _ZNKSt4__n112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE4findEcm: symbol not found
Error relocating /system/lib/libllama.so: ZNSt4__n112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC1ERKS5_mmRKS4: symbol not found
Error relocating /system/lib/libllama.so: ZNSt4__n112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC1ERKS5: symbol not found

在sdk中找到./prebuilts/ohos-sdk/linux/12/native/llvm/lib/riscv64-linux-ohos/c++/libc++_shared.so同样放入/lib目录

PS C:\Users\刘赛\Desktop\AI\llama\riscv64> hdc file send
libc++_shared.so /lib/

再次运行，贴一下正常运行发日志

build: 0 (unknown) with OHOS (dev) clang version 15.0.4 (llvm-project 27758cb0b7fa926968c05dbdce074da617e2408b) for x86_64-unknown-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 27 key-value pairs and 339 tensors from /mnt/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen2
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = DeepSeek R1 Distill Qwen 1.5B
llama_model_loader: - kv 3: general.organization str = Deepseek Ai
llama_model_loader: - kv 4: general.basename str = DeepSeek-R1-Distill-Qwen
llama_model_loader: - kv 5: general.size_label str = 1.5B
llama_model_loader: - kv 6: qwen2.block_count u32 = 28
llama_model_loader: - kv 7: qwen2.context_length u32 = 131072
llama_model_loader: - kv 8: qwen2.embedding_length u32 = 1536
llama_model_loader: - kv 9: qwen2.feed_forward_length u32 = 8960
llama_model_loader: - kv 10: qwen2.attention.head_count u32 = 12
llama_model_loader: - kv 11: qwen2.attention.head_count_kv u32 = 2
llama_model_loader: - kv 12: qwen2.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 13: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 14: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 15: tokenizer.ggml.pre str = deepseek-r1-qwen
llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,151936] = [“!”, “”", “#”, “$”, “%”, “&”, “'”, …
llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
llama_model_loader: - kv 18: tokenizer.ggml.merges arr[str,151387] = [“臓臓”, “臓臓臓臓”, “i n”, “臓 t”,…
llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 151646
llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151643
llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151654
llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 24: tokenizer.chat_template str = {% if not add_generation_prompt is de…
llama_model_loader: - kv 25: general.quantization_version u32 = 2
llama_model_loader: - kv 26: general.file_type u32 = 15
llama_model_loader: - type f32: 141 tensors
llama_model_loader: - type q4_K: 169 tensors
llama_model_loader: - type q6_K: 29 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 1.04 GiB (5.00 BPW)
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch = qwen2
print_info: vocab_only = 0
print_info: n_ctx_train = 131072
print_info: n_embd = 1536
print_info: n_layer = 28
print_info: n_head = 12
print_info: n_head_kv = 2
print_info: n_rot = 128
print_info: n_swa = 0
print_info: n_swa_pattern = 1
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 6
print_info: n_embd_k_gqa = 256
print_info: n_embd_v_gqa = 256
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-06
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 8960
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 2
print_info: rope scaling = linear
print_info: freq_base_train = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 131072
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 1.5B
print_info: model params = 1.78 B
print_info: general.name = DeepSeek R1 Distill Qwen 1.5B
print_info: vocab type = BPE
print_info: n_vocab = 151936
print_info: n_merges = 151387
print_info: BOS token = 151646 ‘<锝渂egin鈻乷f鈻乻entence锝?’
print_info: EOS token = 151643 ‘<锝渆nd鈻乷f鈻乻entence锝?’
print_info: EOT token = 151643 ‘<锝渆nd鈻乷f鈻乻entence锝?’
print_info: PAD token = 151654 ‘<|vision_pad|>’
print_info: LF token = 198 ‘膴’
print_info: FIM PRE token = 151659 ‘<|fim_prefix|>’
print_info: FIM SUF token = 151661 ‘<|fim_suffix|>’
print_info: FIM MID token = 151660 ‘<|fim_middle|>’
print_info: FIM PAD token = 151662 ‘<|fim_pad|>’
print_info: FIM REP token = 151663 ‘<|repo_name|>’
print_info: FIM SEP token = 151664 ‘<|file_sep|>’
print_info: EOG token = 151643 ‘<锝渆nd鈻乷f鈻乻entence锝?’
print_info: EOG token = 151662 ‘<|fim_pad|>’
print_info: EOG token = 151663 ‘<|repo_name|>’
print_info: EOG token = 151664 ‘<|file_sep|>’
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while… (mmap = true)
load_tensors: CPU_Mapped model buffer size = 1059.89 MiB
…
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = 0
llama_context: freq_base = 10000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (131072) – the full capacity of the model will not be utilized
llama_context: CPU output buffer size = 0.58 MiB
init: kv_size = 4096, offload = 1, type_k = ‘f16’, type_v = ‘f16’, n_layer = 28, can_shift = 1
init: CPU KV buffer size = 112.00 MiB
llama_context: KV self size = 112.00 MiB, K (f16): 56.00 MiB, V (f16): 56.00 MiB
llama_context: CPU compute buffer size = 299.75 MiB
llama_context: graph nodes = 1042
llama_context: graph splits = 1
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait … (–no-warmup to disable)
main: llama threadpool init, n_threads = 8
main: chat template is available, enabling conversation mode (disable it with -no-cnv)
main: chat template example:
You are a helpful assistant
<锝淯ser锝?Hello<锝淎ssistant锝?Hi there<锝渆nd鈻乷f鈻乻entence锝?<锝淯ser锝?How are you?<锝淎ssistant锝?
system_info: n_threads = 8 (n_threads_batch = 8) / 8 | CPU : AARCH64_REPACK = 1 |
main: interactive mode on.
sampler seed: 381697047
sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 1
== Running in interactive mode. ==
- Press Ctrl+C to interject at any time.
- Press Return to return control to the AI.
- To return control without starting a new line, end your input with ‘/’.
- If you want to submit another line, end your input with ‘’.
- Not using system message. To change it, set a different value via -sys PROMPT

成功
在这里插入图片描述

总结

刚开始用的OpenHarmony-5.0.0-rvbook版本中的sdk进行编译配置，结果该版本中的sdk并没有完全是配riscv，会导致该过程报错。
流畅程度源于设备性能。

OpenHarmony-5.0.0-Risc-V架构搭建DeepSeek-R1