AI video surveillance with DeepStack, Python and ZoneMinder

For those using ZoneMinder and trying to figure out how to detect objects, there is deepquestai/deepstack AI model and builtin HTTP server. You can grab video frames by using ZoneMinder API or UI API:

https://ADDR/zm/cgi-bin/nph-zms?scale=100&mode=single&maxfps=30&monitor=X&user=XXX&pass=XXX

You need to specify address, monitor ID, user, password. You can also specify single frame (mode=single) or motion (mode=jpeg). Zoneminder uses internally /usr/lib/zoneminder/cgi-bin/nph-zms program binary to grab frame from configured IP ONVIF RTSP camera. It is probably to the most quickest option, but it is convenient one. Using OpenCV in Python you could also access RTSP stream and grab frames manually. However for sake of simplicity I stay with ZoneMinder nph-zms.

So lets say I have such video frame from camera:

It is simple view of street with concrete fence with some wooden boards across. Now lets say I would like to detect passing objects. First we need to install drivers, runtime and start server.

NVIDIA drivers

In my testing setup I have RTX 3050 Ti with 4GB of VRAM running Ubuntu 22 LTS desktop. By default there will not be CUDA 12.8+ drivers available. You can get up to version 550. Starting from 525 you can get CUDA 12.x. This video card has Ampere architecture with Compute Capabilities of 8.6 which translates with CUDA 11.5 – 11.7.1. However you can install drivers 570.86.16 with consists of CUDA 12.8 SDK.

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install nvidia-driver-570

To check if it is already loaded:

lsmod | grep nvidia

Docker GPU support

Native, default Docker installation does not support direct GPU usage. According to DeepStack you should run the following commands in order to configure Docker NVIDIA runtime. However ChatGPT suggests to install nvidia-container-toolkit. You can find proper explanation of differences here. At first glace it seems that those packages are correlated.

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

As we installed new driver and reconfigured Docker it is good to fire reboot. After rebooting machine, check if nvidia-smi reports proper driver and CUDA SDK versions:

sudo docker run --gpus '"device=0"' nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04 nvidia-smi

This should report output of nvidia-smi run from Docker container called nvidia/cuda. Please note that this image may differ a little bit as it changes over time. You can adjust –gpus flag in case you got more than one NVIDIA supported video card in your system.

deepquestai DeepStack

In order to run DeepStack model and API server utilizing GPU just give gpu tag and set –gpus all flag. There is also environment variable VISION-DETECTION set to True. Probably you can configure other things such as face detection, but for now I will just stick with only this one:

sudo docker run --rm --gpus all -e VISION-DETECTION=True -v localstorage:/datastore -p 80:5000 deepquestai/deepstack:gpu

Now you have running DeepStack Docker container with GPU support. Let’s check program code now.

My Vision AI source code

import requests
from PIL import Image
import urllib3
from io import BytesIO
import time
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

prefix = "data-xxx/xxx"
zmuser = "readonly"
zmpass = "readonly"
zmaddr = "x.x.x.x"
zmmoid = x
deepstackaddr = "localhost:80"

#
# DOWNLOAD AND SAVE ZONEMINDER VIDEO FRAME
#
timestamp = int(time.time() * 1000)  
url = "https://{}/zm/cgi-bin/nph-zms?scale=100&mode=single&maxfps=30&monitor={}&user={}&pass={}".format(zmaddr, zmmoid, zmuser, zmpass)
output_path = "{}_{}.jpg".format(prefix, timestamp)
response = requests.get(url, verify=False)
if response.status_code == 200:
    with open(output_path, "wb") as file:
        file.write(response.content) 
    print(f"Downloaded: {output_path}")
else:
    print("Unable to download video frame")

#
# AI ANALYSE VIDEO FRAME
#
image_data = open(output_path,"rb").read()
image = Image.open(output_path).convert("RGB")
response = requests.post("http://{}/v1/vision/detection".format(deepstackaddr),files={"image":image_data},data={"min_confidence":0.65}).json()

#
# PRINT RECOGNIZED AND PREDICTED OBJECTS
#
for object in response["predictions"]:
    print(object["label"])
print(response)

#
# CROP OBJECTS AND SAVE TO FILES
#
i = 0
for object in response["predictions"]:
    label = object["label"]
    y_max = int(object["y_max"])
    y_min = int(object["y_min"])
    x_max = int(object["x_max"])
    x_min = int(object["x_min"])
    cropped = image.crop((x_min,y_min,x_max,y_max))
    cropped.save("{}_{}_{}_{}_found.jpg".format(prefix, timestamp, i, label))
    i += 1

With this code we grab ZoneMinder video frame, save it locally, pass to DeepStack API server for model vision detection and finally we take predicted detections with text output as well as cropped images showing only detected artifacts. For instance, the whole frame was as following:

And automatically program detected and cropped the following region:

There are several, few tens I think, types/classes of object which can be detected by this AI model. It is already pretrained and I think it is closed in terms of learning and correcting detections. I will further investigate that matter of course. Maybe registration plates OCR?

Run Bielik LLM from SpeakLeash using LM Studio on your local machine

Did you know that you can use the Polish LLM Bielik from SpeakLeash locally, on your private computer? The easiest way to do this is LM Studio (from lmstudio.ai).

  • download LM Studio
  • download the model (e.g. Bielik-11B-v2.2-Instruct-GGUF)
  • load model
  • open a new conversation
  • converse…

Why use a model locally? Just for fun. Where we don’t have internet. Because we don’t want to share our data and conversations etc…

You can run it on macOS, Windows and Linux. It requires support for AVX2 CPU instructions, a large amount of RAM and, preferably, a dedicated and modern graphics card.



Note: for example, on a Thinkpad t460p with i5 6300HQ with a dedicated 940MX 2GB VRAM card basically does not want to work, but on a Dell g15 with i5 10200h and RTX 3050Ti it works without any problem. I suspect that it is about Compute Capability and not the size of VRAM in the graphics card… because on my old datacenter cards (Tesla, Quadro) these models and libraries do not work.

Block AI web-scrapers from stealing your website content

Did you know that you may block AI-related web-scrapers from downloading your whole websites and actually stealing your content. This way LLM models will need to have different data source for learning process!

Why you may ask? First of all, AI companies make money on their LLM, so using your content without paying you is just stealing. It applies for texts, images and sounds. It is intellectual property which has certain value. Long time ago I placed on my website a license “Attribution-NonCommercial-NoDerivatives” and guest what… it does not matter. I did not receive any attribution. Dozens of various bot visit my webiste and just download all the content. So I decided…

… to block those AI-related web-crawling web-scraping bots. And no, not by modyfing robots.txt file (or any XML sitemaps) as it might be not sufficient in case of some chinese bots as they just “don’t give a damn”. Neither I decided to use any kind of plugins or server extenstions. I decided to go hard way:

location / {
  if ($http_user_agent ~* "Bytespider") { return 403; }
  ...
}

And decide to which exactly HTTP User Agent (client “browser” in other words) I would like to show middle finger. For those who do not stare at server logs at least few minutes a day, “Bytespider” is a scraping-bot from ByteDance company which owns TikTok. It is said that this bot could possible download content to feed some chinese LLM. Chinese or US it actually does not matter. If you would like to use my content, either pay me or attribute usage of my content. How you may ask? To be honest I do not know.

There is either hard way (as with NGINX blocking certain UA) or diplomacy way which could lead to creating a websites catalogue which do not want to participate in AI feeding process for free. I think there are many more content creators who would like to get some piece of AI birthday cake…

BLOOM LLM: how to use?

Asking BLOOM-560M “what is love?” it replies with “The woman who had my first kiss in my life had no idea that I was a man”. wtf?!

Intro

I’ve been into parallel computing since 2021, playing with OpenCL (you can read about it here), looking for maximizing devices capabilities. I’ve got pretty decent in-depth knowledge about how computational process works on GPUs and I’m curious how the most recent AI/ML/LLM technology works. And here you have my little introduction to LLM topic from practical point-of-view.

Course of Action

  • BLOOM overview
  • vLLM
  • Transformers
  • Microsoft Azure NV VM
  • What’s next?

What is BLOOM?

It is a BigScience Large Open-science Open-access Multilingual language model. It based on transformer deep-learning concept, where text is coverted into tokens and then vectors for lookup tables. Deep learning itself is a machine learning method based on neural networks where you train artificial neurons. BLOOM is free and it was created by over 1000 researches. It has been trained on about 1.6 TB of pre-processed multilingual text.

There are few variants of this model 176 billion elements (called just BLOOM) but also BLOOM 1b7 with 1.7 billion elements. There is even BLOOM 560M:

  • to load and run 176B you need to have 350 GB VRAM with FP32 and half with FP16
  • to load and run 1B7 you need somewhere between 10 and 12 GB VRAD and half with FP16

So in order to use my NVIDIA GeForce RTX 3050 Ti with 4GB RAM I would either need to run with BLOOM 560M which requires 2 to 3 GB VRAM and even below 2 GB VRAD in case of using FP16 mixed precision or… use CPU. So 176B requires 700 GB RAM, 1B7 requires 12 – 16 GB RAM and 560M requires 8 – 10 GB RAM.

Are those solid numbers? Lets find out!

vLLM

“vLLM is a Python library that also contains pre-compiled C++ and CUDA (12.1) binaries.”

“A high-throughput and memory-efficient inference and serving engine for LLMs”

You can download (from Hugging Face, company created in 2016 in USA) and serve language models with these few steps:

pip install vllm
vllm serve "bigscience/bloom"

And then once it’s started (and to be honest it won’t start just like that…):

curl -X POST "http://localhost:8000/v1/chat/completions" \ 
	-H "Content-Type: application/json" \ 
	--data '{
		"model": "bigscience/bloom"
		"messages": [
			{"role": "user", "content": "Hello!"}
		]
	}'

You can back up your vLLM runtime using GPU or CPU but also ROCm, OpenVINO, Neuron, TPU and XPU. It requires GPU compute capability 7.0 or higher. I’ve got my RTX 3050 Ti which has 8.6, but my Tesla K20Xm with 6GB VRAD has only 3.5 so it will not be able to use it.

Here is the Python program:

from vllm import LLM, SamplingParams
model_name = "bigscience/bloom-560M"
llm = LLM(model=model_name, gpu_memory_utilization=0.6,  cpu_offload_gb=4, swap_space=2)
question = "What is love?"
sampling_params = SamplingParams(
    temperature=0.5,     
    max_tokens=10,
)
output = llm.generate([question], sampling_params)
print(output[0].outputs[0].text)

In return, there is either:

[rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 736.00 MiB. GPU 0 has a total capacity of 3.81 GiB of which 73.00 MiB is free. Including non-PyTorch memory, this process has 3.73 GiB memory in use. Of the allocated memory 3.56 GiB is allocated by PyTorch, and 69.88 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

or the following:

No available memory for the cache blocks. Try increasing `gpu_memory_utilization` when initializing the engine.

I may try later to check it out on bigger GPU but as for now, I will try to run it using transformers library which is the next topic.

Transformers

So I picked the same BLOOM 560M model. First, you need to install the following main packages and plenty of dependencies:

pip install transformers
pip install torch
pip install accelerate

Source code of Python program using those libraries is as follows:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-560m")
model = AutoModelForCausalLM.from_pretrained(
    "bigscience/bloom-560m", 
    device_map="auto", 
    torch_dtype="auto"
)
def ask_bloom(question, max_length=100, temperature=0.7):
    inputs = tokenizer(question, return_tensors="pt").to(model.device)
    output = model.generate(
        inputs['input_ids'],
        max_length=max_length,  
        temperature=temperature,   
        pad_token_id=tokenizer.eos_token_id, 
        do_sample=True             
    )
    answer = tokenizer.decode(output[0], skip_special_tokens=True)
    return answer
question = "What is love?"
answer = ask_bloom(question)
print(f"Q: {question}\nAnwser: {answer}")

To run:

python3 transformers-torch.py

It will download the model and execute in Python program.

We can see that my NVIDIA GeForce 940MX with 2 GB VRAM is loaded around 60% with over 90% utilization (results from nvidia-smi).

“What is love?”. Anwser is as follows:

What is love? She never asked me the question but I am the one who has to give my answer. She is a beautiful, beautiful, very beautiful woman. The first thing you know about love is that it is the most complicated thing in the world. I was so shocked when I saw the man I thought I was marrying. My life was over. The woman who had my first kiss in my life had no idea that I was a man. She was so shocked. She said something

Next I asked for “The sun is shining and?”

I love this scene and I like when it comes to it. The sun, the sky and all the other stars are bright and I love that! I really enjoy this scene and it sets the scene for the rest of the story. It seems like the characters are just going about their day and the sun is shining. The world seems to be in its perfect place and everything is beautiful. I love the lighting in this scene and the warmth of the sunlight that

Does it make any sense? What is a concept of “sense” at all? Anyway it works, somehow. Lets find out the other possibilities.

Microsoft Azure N-series virtual machines

Instead of buying MSI Vector, ASUS ROG, Lenovo Legion Pro, MSI Raider or any kind of ultimate gaming laptops you go to Azure and pick on their NV virtual machines. Especially that they have 14 and 28 GB of VRAM onboard. It costs around 400 Euro per month, but you will not be using it all the time (I suppose).

We have:

root@z92-az-bloom:/home/adminadmin# lspci 
0002:00:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 [Instinct MI25 MxGPU/MI25x2 MxGPU/V340 MxGPU/V340L MxGPU]

And I was not so sure how to use AMD GPU, so instead I decided to requests for a quote increase:

However I got rejected on my account with that request:

Unfortantely changing parameters and virtual machine types did not change the situation, I got still rejected and neeeded to submit support ticket to Microsoft in order to manually process it. So until next time!

What’s next to check?

AWS g6 and Hetzner GEX44. Keep reading!

Further reading

Microsoft Azure AI Services: computer vision

Use Microsoft Azure AI Services to analyze images, voice, documents. No AI/ML or coding skills required. Responsible AI applies by EU AI act. Formerly Cognitive Services.

Course of Action

  • Create AI Services multi-account in Azure
  • Run computer vision OCR on image

What is Microsoft Azure?

It is Microsoft’s public cloud platform offering broad range of products and services, including virtual machines, managed containers, databases, analytics platforms as well as AI Services. Major competitors of Azure are Amazon AWS and Google’s GCP.

What are AI Services (formerly Cognitive Services)?

It is a set of various services concerning recognition and analysis procedures based on already trained ML models (or even traditional programming techniques). You can use it to describe documents, run OCR tasks, face recognition etc. Actually, those services tend to be categorized under Cognitive Services section, which concerns recognition which is a synonym of cognitive. Name change process which happend in July 2023 was more-or-less rebranding and provided non breaking changes only as a part of marketing. It is obvious that AI services would sell better than Cognitive Services.

Create AI multi-service account in Azure portal

In order to create Microsoft Azure AI Services multi-service account you need to have valid Azure subscription, either Free Trial or regular account. Type “AI” in search field in portal.azure.com and you will find this service thru service catalog.

It is worth menitioning that you get “Responsible AI Notice” which relates to AI Act which includes European Union, USA and UK. It defines what AI/ML models can do and what should not allow. Accoring to KPMG source it covers among others: social scoring, recruitment and deep-fake disclosure as the most crucial areas which require regulations. What about the rest of the world? Well, it might be same as with CO2 emissions or plastic garbage recycling. situation.

Deployment process in Azure is especially meaningful when speaking about configurable assets with data. However in terms of deploying services it is a matter of linking them to our account, so the deployment process of AI Services finishes within seconds.

AI Services account overview

To use Azure AI services you need to go to Resource Management, Keys and Endpoint. You will know to which Endpoint you should send your API calls/requests and what is the access key. This key is then mapped to “Ocp-Apim-Subscription-Key” header which should be passed during HTTP call.

As for S0 standad pricing tier on Free Tier subscription, estimate 1 000 calls to API (requests made) would cost less than 1 Euro. It might be “cheap” however it is starting point of pricing and I suspect that it might be actually a different value in real production use case scenario especially when speaking about decision making (still could be ML based only) services and not only those services which could be replaced by traditional programming techniques, which is notabene OCR processes which are present on the market for few decades already.

Run example recognition task

Instead of programming (aka coding) in various SDKs for AI Services (Python, JavaScript etc) you can also invoke such services within HTTP request using curl utility. As far as I know every Windows 10 and 11 should have curl present. As for Linux distributions you most probably have curl already installed.

So, in order to invoke recognition task, pass subscription key (here replaced by xxx), point at specific Endpoint URL and pass url parameter which should be some publicly available image on which recognition service will run over. I found out that not every feature is available in every Endpoint. In that case, you would need to modify “features” parameter:

curl -H "Ocp-Apim-Subscription-Key: xxx" -H "Content-Type: application/json" "https://z92-azure-ai-services.cognitiveservices.azure.com//computervision/imageanalysis:analyze?features=read&model-version=latest&language=en&api-version=2024-02-01" -d "{'url':'https://michalasobczak.pl/wp-content/uploads/2024/08/image-6.png'}"

I passed this image for analysis, which contains dozen of rectangular boxes with text inside. It should should be straighforward to get proper results as text is not rotated, it is written in machine font and color contrast at proper value.

In return we receive the following JSON formatted output. We can see that it properly detected word “rkhunter” as well as “process”. However we need to provide additional layer of processing in order to merge those adjacent words in separate lines to make them phrases instead of just separate words.

{
   "modelVersion":"2023-10-01",
   "metadata":{
      "width":860,
      "height":532
   },
   "readResult":{
      "blocks":[
         {
            "lines":[
               {
                  "text":"rkhunter",
                  "boundingPolygon":[
                     {
                        "x":462,
                        "y":78
                     },
                     {
                        "x":519,
                        "y":79
                     },
                     {
                        "x":519,
                        "y":92
                     },
                     {
                        "x":462,
                        "y":91
                     }
                  ],

                  ...

                  "words":[
                     {
                        "text":"process",
                        "boundingPolygon":[
                           {
                              "x":539,
                              "y":447
                           },
                           {
                              "x":586,
                              "y":447
                           },
                           {
                              "x":586,
                              "y":459
                           },
                           {
                              "x":539,
                              "y":459
                           }
                        ],
                        "confidence":0.993
                     }
                  ]
               }
            ]
         }
      ]
   }
}

Conclusion

I think that price-wise this AI Service, formerly known as Cognitive Services, it is reasonable way of running recognition tasks in online environment. We could include such recognition into our applications for further automation, in, for instance ERP FI invoice processing.

Demaskowanie “możliwości” ChatGPT

Wstęp

Dzisiaj wyjątkowo będzie po polsku na temat ChatGPT. Zadałem sobie nieco trudu i wypróbowałem jego “możliwości” z zakresu programowania jako, że zasłyszałem o takiej opcji. Moim ulubionym w ostatnim czasie tematem jest OpenCL, zatem….

JA: to napisz mi kernel OpenCL który służy do sortowanie liczb zmiennoprzecinkowych

Dostałem sortowanie bąbelkowe…. i to jeszcze na floatach. W sumie to nie poprosiłem o double, więc nie mogę narzekać. Ale serio, sortowanie bąbelkowe, na GPU w OpenCL?

__kernel void sort(__global float *data, int length) {
    int i = get_global_id(0);
    int j = get_global_id(1);

    if (i < length && j < length) {
        if (data[i] > data[j]) {
            float temp = data[i];
            data[i] = data[j];
            data[j] = temp;
        }
    }
}

Konwersacja

Spróbowałem porozmawiać na ten temat z chatem tak aby dowiedzieć się dlaczego tak zaproponował oraz czy po kilku wskazówkach nie mógłby zrewidować swoich odpowiedzi na bardziej przystające do zadanego pytania:

JA: no ale przecież sortowanie w przestrzeni globalnej nie uda się ponieważ na GPU będą jednocześnie uruchomione pozostałe instancje tego kernela, co doprowadzi do nieprawidłowych wyników, twoja odpowedź jest nieprawidłowa, nie bieżesz pod uwagę kontekstu tego uruchomienia.
Chat: Przepraszam, to prawda. Sortowanie w przestrzeni globalnej jest nieprawidłowe, ponieważ wiele instancji tego jądra mogą być jednocześnie uruchamiane na GPU i może to prowadzić do nieprawidłowych wyników. W celu uniknięcia tego problemu należy użyć innego algorytmu sortowania, który umożliwia sortowanie w jednej instancji, lub rozbić sortowanie na wiele mniejszych sekwencji i sortować je oddzielnie.
JA: jaki zatem sugerujesz algorytm do sortowanie danych używając kernel OpenCL
Chat: Istnieje wiele algorytmów sortowania, które można użyć w kernelach OpenCL. Najbardziej popularne…

No i zamiast dostać propozycję pod OpenCL, dostałem ogólną odpowiedź z podręcznika o algorytmach nieprzystosowanych do programowania współbieżnego.

JA: moim zdaniem to błędna odpowiedź ponieważ będzie to sortowanie bitoniczne i pozycyjne oraz ewentualnie sortowanie z przesunięciem
Chat: Sortowanie bitoniczne i pozycyjne to popularne algorytmy sortowania używane w OpenCL. Sortowanie z przesunięciem jest mniej popularne, ale może być używane w niektórych sytuacjach. Wszystko zależy od wymagań i wymagań danego zastosowania.
JA: to napisz mi proszę przykład kernela OpenCL dla sortowania z przesunięciem.

No i dostałem co prawda, coś co przypomina shift-sort ale z globalną barierą.

__kernel void shiftSort(__global float *inputArray, int arrayLength) {
    int globalId = get_global_id(0);
    int stride = 1;

    while (stride < arrayLength) {
        int index = globalId * stride * 2;

        if (index + stride < arrayLength) {
            if (inputArray[index] > inputArray[index + stride]) {
                float temp = inputArray[index];
                inputArray[index] = inputArray[index + stride];
                inputArray[index + stride] = temp;
            }
        }

        stride *= 2;
        barrier(CLK_GLOBAL_MEM_FENCE);
    }
}
JA: dlaczego założyłeś barierę na pamięć globalną?
Chat: Bariera na pamięć globalną jest zastosowana w powyższym kernel OpenCL, aby zapewnić synchronizację między wszystkimi wątkami. W trakcie sortowania z przesunięciem każdy wątek sortuje swoją część danych wejściowych i jednocześnie jest uzależniony od wyniku sortowania wątków z innych części danych. Bariera na pamięć globalną gwarantuje, że wszystkie wątki zakończą sortowanie swoich danych i będą gotowe do następnego kroku sortowania, zanim sortowanie zostanie kontynuowane. To zapewnia poprawność wyniku sortowania i unika problemów związanych z nieprawidłowym dostępem do pamięci.
JA: a czy przypadkiem bariera globalna nie dotyczy wyłącznie pojedynczej grupy roboczej z N wątków?
Chat: Tak, bariera globalna działa na pojedynczej grupie roboczej i synchronizuje wątki w tej grupie, ale nie wpływa na inne grupy robocze w tym samym kernelu. W OpenCL, bariery na pamięć globalną służą do synchronizacji wątków w grupie roboczej i zapobiegają race conditions, gdy wiele wątków dostępuje i modyfikuje te same dane.

Wnioski

No i tutaj skończyła się przygoda z tym programem, bowiem próby doprecyzowania czegokolwiek powodowały albo błąd albo te same odpowiedzi. Podsumowując fakty mamy tutaj bardzo sprawne narzędzie, które zostało nakarmione mnóstwem informacji, nawet ciężko mi sobie wyobrazić jakie to są ilości. Wiedza tego mechanizmu kończy się na 2021 roku i nie ma podłączenia bieżącego do internetu. Zapytane czy używa CUDA do swojej pracy odpowiedziało twierdząco.

Tak szybkie działanie modelu wymaga ogromych ilości pamięci operacyjnej i bardzo dużej ilości rdzeni obliczeniowych aby procesować wejście i generować odpowiedzi. Brzmi to jak gigantyczna infrastruktura oparta o system rozproszony i tysiące kart GPU przetwarzające całość tego wsadu. Pytanie kiedy zacznie był płatne, czy jak zacznie dobrze odpowiadać czy jak ludzie się od tego uzależnią?