AI/ML

Thoughts on GPU thermal issues

I’ve been playing around with several devices in a context of running OpenCL code on them. They have one common thing which is excessive heat coming out of GPU and heatsink being unable to dissipate it. I start with MacBookPro3,1. It has NVIDIA 8600M GT, which is known to fail. I assume that it may be linked with overheating. Second example is design failure of Lenovo Thinkpad T420s which has built in NVIDIA NVS 4200M. This laptop has Optimus feature which in theory could detect if workload should be run on discrete or integrated GPU. Unfortunately enabling either Optimus or run-only-on discrete GPU causes extreme overheating up to 100 degress Celcius which makes this setup unusable (it slows down). Last example would be Lenovo Thinkpad T61 with NVS 140M. Contrary to previous examples, this one shows no issues with overheating itself, but is extremely fragile in terms of build quality. CPU has proper heatsink contact by means of 4 screws, but for unknown reason GPU which is 2 cm aside lacks of any screws and is dependent on separate metal clip which puts pressure on heatsink and has its own screw. I find it quite silly, because in case of thermal paste going bad or having loose this one screw it may completely damage GPU. Unscrewing just a little bit this one screw and temperature goes from 50 to 100 degrees risking fire I think….

So, back in a days when manufacturers tried to put dedicated GPU in compact laptops there were several examples of design flaws especially lacking proper heatsink and fans. Nowadays when discrete graphics are way more common on the market it is not uncommon to see several fans and huge blocks of metal giving away all this heat coming out from case because you run game or try to compute something in OpenCL.