Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:centro:servizos:servidores_de_computacion_gpgpu [2020/05/08 13:19] – [Service description] fernando.guillen | en:centro:servizos:servidores_de_computacion_gpgpu [2024/10/01 17:34] (current) – jorge.suarez | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| ===== Service description ===== | ===== Service description ===== | ||
| - | + | ==== Servers | |
| - | Five servers | + | |
| - | + | ||
| - | * '' | + | |
| - | * Supermicro X8DTG-D | + | |
| - | * 2 x [[http:// | + | |
| - | * 10 GB RAM (5 DIMM 1333 MHz) | + | |
| - | * 2 x Nvidia GF100 [Tesla S2050] | + | |
| - | * Ubuntu 10.04 | + | |
| - | * CUDA 5.0 | + | |
| - | * '' | + | |
| - | * Dell Precision R5400 | + | |
| - | * 2 x [[http:// | + | |
| - | * 8 GB RAM (4 x DDR2 FB-DIMM 667 MHz) | + | |
| - | * 1 Nvidia GK104 [Geforce GTX 680] | + | |
| - | * Ubuntu 18.04 operative system | + | |
| - | * Slurm (// | + | |
| - | * CUDA 9.2 (//Nvidia official repo//) | + | |
| - | * Docker-ce 18.06 (//Docker official repo//) | + | |
| - | * Nvidia-docker 2.0.3 (//Nvidia official repo//) | + | |
| - | * Nvidia cuDNN v7.2.1 for CUDA 9.2 | + | |
| - | * Intel Parallel Studio Professional for C++ 2015 (//single license! coordinate with other users!//) | + | |
| - | * '' | + | |
| - | * PowerEdge R720 | + | |
| - | * 1 x [[http:// | + | |
| - | * 16 GB RAM (1 DDR3 DIMM 1600MHz) | + | |
| - | * Connected to a graphical card extensión box with: | + | |
| - | * Gigabyte GeForce GTX Titan 6GB (2014) | + | |
| - | * Nvidia Titan X Pascal 12GB (2016) | + | |
| - | * Ubuntu 18.04 operative system | + | |
| - | * Slurm (// | + | |
| - | * CUDA 9.2 (//Nvidia official repo//) | + | |
| - | * Docker-ce 18.06 (//Docker official repo//) | + | |
| - | * Nvidia-docker 2.0.3 (//Nvidia official repo//) | + | |
| - | * Nvidia cuDNN v7.2.1 for CUDA 9.2 | + | |
| - | * Intel Parallel Studio Professional for C++ 2015 (//single license! coordinate with other users!//) | + | |
| - | * ROS Melodic Morenia (// | + | |
| * '' | * '' | ||
| * PowerEdge R730 | * PowerEdge R730 | ||
| Line 44: | Line 8: | ||
| * 128 GB RAM (4 DDR4 DIMM 2400MHz) | * 128 GB RAM (4 DDR4 DIMM 2400MHz) | ||
| * 2 x Nvidia GP102GL 24GB [Tesla P40] | * 2 x Nvidia GP102GL 24GB [Tesla P40] | ||
| - | * Centos 7.4 | + | * AlmaLinux 9.1 |
| - | * Docker 17.09 and nvidia-docker 1.0.1 | + | * Cuda 12.0 |
| - | * OpenCV 2.4.5 | + | * **Mandatory use of Slurm queue manager**. |
| - | * Dliv, Caffe, Caffe2 and pycaffe | + | |
| - | * Python 3.4: cython, easydict, sonnet | + | * HPC cluster servers: [[ en: |
| - | * TensorFlow | + | * CESGA servers: [[ en: |
| - | * '' | + | |
| + | ==== Restricted access GPU servers | ||
| + | * '' | ||
| * PowerEdge R730 | * PowerEdge R730 | ||
| * 2 x [[https:// | * 2 x [[https:// | ||
| * 128 GB RAM (4 DDR4 DIMM 2400MHz) | * 128 GB RAM (4 DDR4 DIMM 2400MHz) | ||
| * 2 x Nvidia GP102GL 24GB [Tesla P40] | * 2 x Nvidia GP102GL 24GB [Tesla P40] | ||
| - | * Ubuntu | + | * Ubuntu |
| * **Slurm as a mandatory use queue manager**. | * **Slurm as a mandatory use queue manager**. | ||
| * ** Modules for library version management **. | * ** Modules for library version management **. | ||
| - | * CUDA 9.0 | + | * CUDA 11.0 |
| * OpenCV 2.4 and 3.4 | * OpenCV 2.4 and 3.4 | ||
| * Atlas 3.10.3 | * Atlas 3.10.3 | ||
| Line 70: | Line 36: | ||
| * 192 GB RAM memory(12 DDR4 DIMM 2933MHz) | * 192 GB RAM memory(12 DDR4 DIMM 2933MHz) | ||
| * Nvidia Quadro P6000 24GB (2018) | * Nvidia Quadro P6000 24GB (2018) | ||
| + | * Nvidia Quadro RTX8000 48GB (2019) | ||
| * Operating system Centos 7.7 | * Operating system Centos 7.7 | ||
| * Nvidia Driver 418.87.00 for CUDA 10.1 | * Nvidia Driver 418.87.00 for CUDA 10.1 | ||
| * Docker 19.03 | * Docker 19.03 | ||
| * [[https:// | * [[https:// | ||
| - | * '' | + | * '' |
| - | * Server | + | * Dell PowerEdge |
| - | * 2 processors[[https:// | + | * 2 x [[ https:// |
| - | * 192 GB RAM (12 DDR4 DIMM a 2667MHz) | + | * 128 GB RAM |
| - | * 2 x Nvidia Tesla V100S 32GB (2019) | + | * 2 x NVIDIA Ampere A100 80 GB |
| - | * Operating system Centos | + | * AlmaLinux |
| - | * **Slurm as a mandatory use queue manager**. | + | |
| - | * ** Modules for library version management **. | + | * '' |
| - | * Nvidia Driver 440.64.00 for CUDA 10.2 | + | * PowerEdge |
| - | * Docker 19.03 | + | * 2 x [[ https:// |
| - | * [[ https:// | + | * 128 GB RAM |
| - | * '' | + | * NVIDIA Ampere A100 80 GB |
| - | * Dell PowerEdge | + | * Sistema operativo AlmaLinux |
| - | * 2 processors | + | |
| - | * 192 GB RAM (12 DDR4 DIMM a 2667MHz) | + | |
| - | * 2 x Nvidia Tesla V100S 32GB (2019) | + | |
| - | * Operating System Centos | + | |
| - | * **Slurm as a mandatory use queue manager**. | + | * 256 GB RAM |
| - | * ** Modules for library version management **. | + | |
| - | * Nvidia | + | |
| - | * Docker 19.03 | + | |
| - | * [[ https://github.com/NVIDIA/nvidia-docker | + | * '' |
| + | * Servidor Dell PowerEdge R760 | ||
| + | * 2 x [[ https://ark.intel.com/content/www/ | ||
| + | * 384 GB RAM | ||
| + | * 2 x NVIDIA Hopper H100 de 80 GB | ||
| + | * Sistema operativo AlmaLinux 9.2 | ||
| + | * Driver NVIDIA 555.42.06 and CUDA 12.5 | ||
| ===== Activation ===== | ===== Activation ===== | ||
| - | All CITIUS users can access this service, You have to register | + | Not all servers are available |
| ===== User Manual ===== | ===== User Manual ===== | ||
| Line 103: | Line 77: | ||
| Use SSH. Hostnames and ip addresses are: | Use SSH. Hostnames and ip addresses are: | ||
| - | * ctgpgpu1.inv.usc.es - 172.16.242.91: | ||
| - | * ctgpgpu2.inv.usc.es - 172.16.242.92: | ||
| - | * ctgpgpu3.inv.usc.es - 172.16.242.93: | ||
| - | * ctgpgpu4.inv.usc.es - 172.16.242.201: | ||
| - | * ctgpgpu5.inv.usc.es - 172.16.242.202: | ||
| + | * ctgpgpu4.inv.usc.es - 172.16.242.201 | ||
| + | * ctgpgpu5.inv.usc.es - 172.16.242.202 | ||
| + | * ctgpgpu6.inv.usc.es - 172.16.242.205 | ||
| + | * ctgpgpu9.inv.usc.es - 172.16.242.94 | ||
| + | * ctgpgpu10.inv.usc.es - 172.16.242.95 | ||
| + | * ctgpgpu11.inv.usc.es - 172.16.242.96 | ||
| + | * ctgpgpu12.inv.usc.es - 172.16.242.97 | ||
| Connection in only possible from inside the CITIUS network. To connect from other places or from the RAI network it is necessary to use the [[https:// | Connection in only possible from inside the CITIUS network. To connect from other places or from the RAI network it is necessary to use the [[https:// | ||
| Line 118: | Line 94: | ||
| ==== Job management with SLURM ==== | ==== Job management with SLURM ==== | ||
| - | In '' | + | On servers where there is a queue management software installed |
| To send a job to the queue command '' | To send a job to the queue command '' | ||