AI/ML

Run DeepSeek-R1:70b on CPU and RAM

by MICHAL 2025-03-14

Utilize both CPU, RAM and GPU computational resources

With Ollama you can use not only GPU but also CPU with regular RAM go run LLM models, like DeepSeek-R1:70b. Of course you need to have fast both CPU and RAM and have plenty of it. My Lab setup contains 24 vCPU (2 x 6 cores * 2 threads) and from 128 to 384 GB of RAM. Once started, Ollama allocates 22.4GB in RAM (RES) and 119GB of vritual memory. It occupies 1200% CPU utilization causing system load to go up to 12. However, CPU utilization is only 50% in total.

It loads over 20GB in RAM, puts system on load

On GPU side it allocates 2 x 10GB of VRAM, but stays silient in terms of actual cores usage.

Thinking…

DeepSeek-R1 stars with “Thinking” part, where it makes conversation with itself about its knowledge and tries to better understand questions aloud. It could ask me those questions, but chooses not to and tries to pick whatever it thinks its best at the moment. Fully on CPU at the moment, no extensive GPU usages.

It generates this “Thinking” stage for minutes… and after an hour or so it gave full answer:

So, it works. Just very very slow.

MICHAŁ SOBCZAK

MICHAŁ SOBCZAK

Run DeepSeek-R1:70b on CPU and RAM

It loads over 20GB in RAM, puts system on load

Thinking…

Run DeepSeek-R1:70b on CPU and RAM

It loads over 20GB in RAM, puts system on load

Thinking…

Related Posts

Generating AI video with FramePack

GPU pass-thru in Proxmox 7 and Ubuntu 20, follow-up

Mattermost AI chatbot with image generation support from Automatic1111