Skip to content

Commit 91dd6e1

Browse files
Merge branch 'Seeed-Projects:main' into main
2 parents 35c5194 + 237708c commit 91dd6e1

1 file changed

Lines changed: 117 additions & 0 deletions

File tree

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
sidebar_position: 8
3+
---
4+
5+
# Distributed Inference of DeepSeek model on Raspberry Pi
6+
7+
## Introduction
8+
9+
This wiki explains how to deploy the [DeepSeek](https://github.com/deepseek-ai/DeepSeek-LLM) model on Multiple Raspberry Pi AI Boxs with [distributed-llama](https://github.com/b4rtaz/distributed-llama).In this wiki, I used a **Raspberry Pi with 8GB of RAM** as the **root node** and **three Raspberry Pis with 4GB of RAM** as **worker nodes** to run the **DeepSeek 8B model**. The inference speed reached **6.06 tokens per second**.
10+
11+
## Prepare Hardware
12+
13+
<div class="table-center">
14+
<table align="center">
15+
<tr>
16+
<th>reComputer AI R2130</th>
17+
</tr>
18+
<tr>
19+
<td><div style={{textAlign:'center'}}><img src="https://media-cdn.seeedstudio.com/media/catalog/product/cache/bb49d3ec4ee05b6f018e93f896b8a25d/1/_/1_24_1.jpg" style={{width:600, height:'auto'}}/></div></td>
20+
</tr>
21+
<tr>
22+
<td><div class="get_one_now_container" style={{textAlign: 'center'}}>
23+
<a class="get_one_now_item" href="https://www.seeedstudio.com/reComputer-AI-R2130-12-p-6368.html">
24+
<strong><span><font color={'FFFFFF'} size={"4"}> Get One Now 🖱️</font></span></strong>
25+
</a>
26+
</div></td>
27+
</tr>
28+
</table>
29+
</div>
30+
31+
## Prepare software
32+
33+
### update the system:
34+
35+
Open one terminal with `Ctrl+Alt+T` and input command like below:
36+
37+
```
38+
sudo date -s "$(wget -qSO- --max-redirect=0 google.com 2>&1 | grep Date: | cut -d' ' -f5-8)Z"
39+
sudo apt update
40+
sudo apt full-upgrade
41+
```
42+
43+
### Install ditributed llama to your root and worker node
44+
45+
Open one terminal with `Ctrl+Alt+T` and input command like below to install [distributed-llama](https://github.com/b4rtaz/distributed-llama.git):
46+
47+
```
48+
git clone https://github.com/b4rtaz/distributed-llama.git
49+
cd distributed-llama
50+
make dllama
51+
make dllama-api
52+
```
53+
54+
### Run on your woker node
55+
56+
Then input command like below to make worker nodes working:
57+
58+
```
59+
cd distributed-llama
60+
sudo nice -n -20 ./dllama worker --port 9998 --nthreads 4
61+
```
62+
63+
### Run on your root node
64+
65+
#### Creat and activate python vitural environment
66+
67+
```
68+
cd distributed-llama
69+
python -m venv .env
70+
source .env/bin/acitvate
71+
```
72+
73+
#### Install necessary lib
74+
75+
```
76+
pip install numpy==1.23.5
77+
pip install tourch=2.0.1
78+
pip install safetensors==0.4.2
79+
pip install sentencepiece==0.1.99
80+
pip install transformers
81+
```
82+
83+
#### Install deepseek 8b q40 model
84+
85+
```
86+
sudo mkdir model && cd model
87+
git lfs install
88+
git clone https://huggingface.co/b4rtaz/Llama-3_1-8B-Q40-Instruct-Distributed-Llama
89+
```
90+
91+
#### Run distributed inference on root node
92+
93+
> **Note:** `--workers 10.0.0.139:9998 10.0.0.175:9998 10.0.0.124:9998` is the IP of the workers.
94+
95+
```
96+
cd ..
97+
./dllama chat --model ./model/dllama_model_deepseek-r1-distill-llama-8b_q40.m --tokenizer ./model/dllama_tokenizer_deepseek-r1-distill-llama-8b.t --buffer-float-type q80 --prompt "What is 5 plus 9 minus 3?" --nthreads 4 --max-seq-len 2048 --workers 10.0.0.139:9998 10.0.0.175:9998 10.0.0.124:9998 --steps 256
98+
99+
```
100+
101+
> **Note:** If you want to test the inference speed, please use the following command.
102+
103+
```
104+
cd ..
105+
./dllama inference --model ./model/dllama_model_deepseek-r1-distill-llama-8b_q40.m --tokenizer ./model/dllama_tokenizer_deepseek-r1-distill-llama-8b.t --buffer-float-type q80 --prompt "What is 5 plus 9 minus 3?" --nthreads 4 --max-seq-len 2048 --workers 10.0.0.139:9998 10.0.0.175:9998 10.0.0.124:9998 --steps 256
106+
```
107+
108+
## Result
109+
110+
The following is the inference of the [DeepSeek Llama 8b](https://huggingface.co/b4rtaz/Llama-3_1-8B-Q40-Instruct-Distributed-Llama) model using 4 the Raspberry Pi.
111+
112+
113+
<div align="center">
114+
<img width={900}
115+
src="https://files.seeedstudio.com/wiki/distributed-inference/distributed_llama.gif" />
116+
</div>
117+

0 commit comments

Comments
 (0)