Tag: Exo

AI/ML

Ollama with Open WebUI on 2 x RTX 3060 12 GB

Ollama with WebUI on 2 “powerful” GPUs feels like commercial GPTs online I thought that Exo would do the job and utilize both of my Lab servers. Unfortunately, it does not work on Linux/NVIDIA with my setup and following official documentation. So I went back to Ollama and I found it great. I have 2 x NVIDIA RTX 3060 with 12GB VRAM each giving me in total 24GB which can run Gemma3:27b or DeepSeek-r1:32b. Ollama can utilize both GPUs in my system which can be seen in nvidia-smi. How to run Ollama in Docker with GPU acceleration you can read

AI/ML

Exo: the GPU cluster (tinygrad | MLX)

Theory: running AI workload spreaded across various devices using pipeline parallel inference In theory Exo provides a way to run memory heavy AI/LLM models workload onto many different devices spreading memory and computations across. They say: “Unify your existing devices into one powerful GPU: iPhone, iPad, Android, Mac, NVIDIA, Raspberry Pi, pretty much any device!“ People say: “It requires mlx but it is an Apple silicon-only library as far as I can tell. How is it supposed to be (I quote) “iPhone, iPad, Android, Mac, Linux, pretty much any device” ? Has it been tested on anything else than the