Sept 9th, 10th & 11th, 2013

Received the nvidia K2000D card for the IBM server. Hardware-wise installation appears to have gone well (the cards fits the riser assembly with no problems whatsoever). But then, again, you can't have everything : Booting stops at the initial firmware screen (never gets to the <F2>, <F12>, …, screen). Will have to make the hardware installation cycle a couple of times, test the card onto a different machine, confirm that the problem is due to the card, confirm that it is not an issue with the 3rd PSU needed to be operational, update IBM's firmware(?), …

⇒ Remove and re-install card a couple of times → no

⇒ Add 3rd PSU in the game → no

⇒ Will try to (i) disable the 16x slot from bios, (ii) confirm that it boots with the card in place but the slot disabled, (iii) reboot, go to bios, enable slot, try to see if will get us through.

⇒ Disable from within BIOS the ROM option for PCIe slot 1 → Success, we can boot and see the device.

But, of course, you can't have everything : the kernel module 270.41.06 does not support the K2000D card. Here we go again …

And, even better, the driver version suggested by nvidia (319.49) needs very recent kernels (>3.10 ?). It is getting better and better.

Give it a try with fc19 kernel + friends :

  • kernel-3.10.4-300.fc19.x86_64.rpm
  • kmod-nvidia-3.10.4-300.fc19.x86_64-319.32-2.fc19.1.x86_64.rpm
  • linux-firmware-20130418-0.1.gitb584174.fc19.noarch.rpm

⇒ OK. Can boot and load the nvidia module. But still not done : there is an incompatibility between cuda version and driver. Try to get a newer libcuda from nvidia, but then the libc was not compatible. Hate it. Try with the EL6 distribution (and then with the EL5 ?).

Finally, all OK. Can get NAMD to run with '+devices 0,0,0,…'. Unfortunately, it didn't worth all this effort : for large number of cores, the quadro K2000D is actually slowing down the calculation. See the this page from benchmarks.

maintenance/sept_9th_2013.txt · Last modified: 2013/09/11 20:18 (external edit)