Well, in one of the previous articles I described how to invoke Ollama/moondream:1.8b using cURL, however I forgot to tell how to even run it in Docker container. So here you go: You can specify to run particular model in background (-d) or in foreground (without parameter -d). You can also define parallelism and maximum queue in Ollama server: One important one regarding stability of Ollama server. Once it runs for over few hours there might be issue with GPU driver which requires restart, so Ollama needs to be monitored for such scenario. Moreover after minutes of idle time it
Given this image: You would like to describe it using Ollama and moondream:1.8b model you can try cURL. First encode image in base64: Then prepare request: And finally invoke cURL pointing at your Ollama server running: In response you could get somehing like this: