NVIDIA – MICHAŁ SOBCZAK

AI/ML

GPU pass-thru in Proxmox 7 and Ubuntu 20, follow-up

2025-04-25 1 Min Reading

In previous article about GPU pass-thru which can found here, I described how to setup things mostly from Proxmox perspective. However from VM perspective I would like to make a little follow-up, just to make things clear about it. It has been told that you need to setup q35 machine with VirtIO-GPU and UEFI. It is true, but the most important thing is to actuall disable secure boot, which effectively prevents from loading NVIDIA driver modules. Add EFI disk, but do not check “pre-enroll keys”. This option would enroll keys and enable secure boot by default. Just add EFI disk

AI/ML

Configuring NVIDIA RTX A6000 ADA in Ubuntu 22

2025-03-25 1 Min Reading

I thought that installing NVIDIA RTX A6000 ADA in default Ubuntu 22 server installation would be an easy one. However, installing drivers from the repository made no good. I verified if secure boot is enable and no it was disabled. We need to install few things first: We need to get rig of previously installed drivers: Verify if secure boot is disabled: Get NVIDIA driver, such as NVIDIA-Linux-x86_64-535.216.01.run from their webiste and install it: In case you got rid of previously installed drivers, disabled secure boot and installed build tools, kernel headers… you will be good to go to compile

AI/ML

Single vs multiple GPU power load

2025-03-162025-03-23 1 Min Reading

slight utlization drop when dealing with multi GPU setup TLDR Power usage and GPU utilization varies between single GPU models and multi GPU models. Deal with it. My latest finding is that single GPU load in Ollama/Gemma or Automatic1111/StableDiffusion is higher than using multiple GPUs load with Ollama when model does not fit into one GPU’s memory. Take a look. GPU utilization of Stable Diffusion is at 100% with 90 – 100% fan speed and temperature over 80 degress C. Compare this to load spread across two GPUs. You can clearly see that GPU utilization is much lower, as well

AI/ML

Generate images with Stable Diffusion, Gemma and WebUI on NVIDIA GPUs

2025-03-152025-03-23 3 Min Reading

With Ollama paired with Gemma3 model, Open WebUI with RAG and search capabilities and finally Automatic1111 running Stable Diffusion you can have quite complete set of AI features at home in a price of 2 consumer grade GPUs and some home electricity. With 500 iterations and image size of 512×256 it took around a minute to generate response. I find it funny to be able to generate images with AI techniques. Tried Stable Diffusion in the past, but now with help of Gemma and integratino with Automatic1111 on WebUI, it’s damn easy. Step by step Prerequisites You can find information

AI/ML

Run DeepSeek-R1:70b on CPU and RAM

2025-03-14 1 Min Reading

Utilize both CPU, RAM and GPU computational resources With Ollama you can use not only GPU but also CPU with regular RAM go run LLM models, like DeepSeek-R1:70b. Of course you need to have fast both CPU and RAM and have plenty of it. My Lab setup contains 24 vCPU (2 x 6 cores * 2 threads) and from 128 to 384 GB of RAM. Once started, Ollama allocates 22.4GB in RAM (RES) and 119GB of vritual memory. It occupies 1200% CPU utilization causing system load to go up to 12. However, CPU utilization is only 50% in total. It

AI/ML

Ollama with Open WebUI on 2 x RTX 3060 12 GB

2025-03-13 3 Min Reading

Ollama with WebUI on 2 “powerful” GPUs feels like commercial GPTs online I thought that Exo would do the job and utilize both of my Lab servers. Unfortunately, it does not work on Linux/NVIDIA with my setup and following official documentation. So I went back to Ollama and I found it great. I have 2 x NVIDIA RTX 3060 with 12GB VRAM each giving me in total 24GB which can run Gemma3:27b or DeepSeek-r1:32b. Ollama can utilize both GPUs in my system which can be seen in nvidia-smi. How to run Ollama in Docker with GPU acceleration you can read

AI/ML

Exo: the GPU cluster (tinygrad | MLX)

2025-03-132025-03-13 6 Min Reading

Theory: running AI workload spreaded across various devices using pipeline parallel inference In theory Exo provides a way to run memory heavy AI/LLM models workload onto many different devices spreading memory and computations across. They say: “Unify your existing devices into one powerful GPU: iPhone, iPad, Android, Mac, NVIDIA, Raspberry Pi, pretty much any device!“ People say: “It requires mlx but it is an Apple silicon-only library as far as I can tell. How is it supposed to be (I quote) “iPhone, iPad, Android, Mac, Linux, pretty much any device” ? Has it been tested on anything else than the

Tag: NVIDIA