Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
en:centro:servizos:servidores_de_computacion_gpgpu [2019/11/14 10:48] fernando.guillenen:centro:servizos:servidores_de_computacion_gpgpu [2023/10/11 13:56] – [How to connect the servers] fernando.guillen
Line 2: Line 2:
  
 ===== Service description ===== ===== Service description =====
- +==== Servers with free access GPUs ====
-Five servers with graphic cards: +
- +
-  * ''ctgpgpu1'': +
-    * Supermicro X8DTG-D +
-    * 2 x [[http://ark.intel.com/products/40200|Intel Xeon E5520]] +
-    * 10 GB RAM (5 DIMM 1333 MHz) +
-    * 2 x Nvidia GF100 [Tesla S2050] +
-    * Ubuntu 10.04 +
-      * CUDA 5.0 +
-  * ''ctgpgpu2'': +
-    * Dell Precision R5400 +
-    * 2 x [[http://ark.intel.com/products/33082/|Intel Xeon E5440]] +
-    * 8 GB RAM (4 x DDR2 FB-DIMM 667 MHz) +
-    * 1 Nvidia GK104 [Geforce GTX 680] +
-    * Ubuntu 18.04 operative system +
-      * Slurm (//mandatory to queue jobs!//) +
-      * CUDA 9.2 (//Nvidia official repo//) +
-      * Docker-ce 18.06 (//Docker official repo//) +
-      * Nvidia-docker 2.0.3 (//Nvidia official repo//) +
-      * Nvidia cuDNN v7.2.1 for CUDA 9.2 +
-      * Intel Parallel Studio Professional for C++ 2015 (//single license! coordinate with other users!//) +
-  * ''ctgpgpu3'': +
-    * PowerEdge R720 +
-    * 1 x [[http://ark.intel.com/products/64588|Intel Xeon E52609]] +
-    * 16 GB RAM (1 DDR3 DIMM  1600MHz) +
-    * Connected to a graphical card extensión box with: +
-      * Gigabyte GeForce GTX Titan 6GB (2014) +
-      * Nvidia Titan X Pascal 12GB (2016) +
-    * Ubuntu 18.04 operative system +
-      * Slurm (//mandatory to queue jobs!//) +
-      * CUDA 9.2 (//Nvidia official repo//) +
-      * Docker-ce 18.06 (//Docker official repo//) +
-      * Nvidia-docker 2.0.3 (//Nvidia official repo//) +
-      * Nvidia cuDNN v7.2.1 for CUDA 9.2 +
-      * Intel Parallel Studio Professional for C++ 2015 (//single license! coordinate with other users!//) +
-      * ROS Melodic Morenia (//repositorio oficial de ROS//)+
   * ''ctgpgpu4'':   * ''ctgpgpu4'':
       * PowerEdge R730       * PowerEdge R730
Line 44: Line 8:
       * 128 GB RAM (4 DDR4 DIMM  2400MHz)       * 128 GB RAM (4 DDR4 DIMM  2400MHz)
       * 2 x Nvidia GP102GL 24GB [Tesla P40]       * 2 x Nvidia GP102GL 24GB [Tesla P40]
-      * Centos 7.4 +      * AlmaLinux 9.1 
-          * Docker 17.09 and nvidia-docker 1.0.1 +          * Cuda 12.0 
-          * OpenCV 2.4.5 +          * **Mandatory use of Slurm queue manager**
-          * Dliv, Caffe, Caffe2 and pycaffe + 
-          Python 3.4cython, easydict, sonnet +  HPC cluster servers[[ en:centro:servizos:hpc | HPC cluster ]] 
-          TensorFlow +  CESGA servers: [[ en:centro:servizos:cesga | Access procedure info ]]  
-  * ''ctgpgpu5'':+ 
 +==== Restricted access GPU servers  ==== 
 + * ''ctgpgpu5'':
       * PowerEdge R730       * PowerEdge R730
       * 2 x  [[https://ark.intel.com/products/92980/Intel-Xeon-Processor-E5-2623-v4-10M-Cache-2_60-GHz|Intel Xeon E52623v4]]       * 2 x  [[https://ark.intel.com/products/92980/Intel-Xeon-Processor-E5-2623-v4-10M-Cache-2_60-GHz|Intel Xeon E52623v4]]
       * 128 GB RAM (4 DDR4 DIMM  2400MHz)       * 128 GB RAM (4 DDR4 DIMM  2400MHz)
       * 2 x Nvidia GP102GL 24GB [Tesla P40]       * 2 x Nvidia GP102GL 24GB [Tesla P40]
-      * Ubuntu 16.04+      * Ubuntu 18.04
           * **Slurm as a mandatory use queue manager**.           * **Slurm as a mandatory use queue manager**.
           * ** Modules for library version management **.           * ** Modules for library version management **.
-          * CUDA 9.0+          * CUDA 11.0
           * OpenCV 2.4 and 3.4           * OpenCV 2.4 and 3.4
           * Atlas 3.10.3           * Atlas 3.10.3
Line 65: Line 31:
           * Caffee           * Caffee
      
-* ''ctgpgpu6'': +  * ''ctgpgpu6'': 
       * Server SIE LADON 4214       * Server SIE LADON 4214
       * 2 processors  [[https://ark.intel.com/content/www/us/en/ark/products/193385/intel-xeon-silver-4214-processor-16-5m-cache-2-20-ghz.html|Intel Xeon Silver 4214]]       * 2 processors  [[https://ark.intel.com/content/www/us/en/ark/products/193385/intel-xeon-silver-4214-processor-16-5m-cache-2-20-ghz.html|Intel Xeon Silver 4214]]
       * 192 GB RAM memory(12 DDR4 DIMM 2933MHz)        * 192 GB RAM memory(12 DDR4 DIMM 2933MHz) 
       * Nvidia Quadro P6000 24GB (2018)       * Nvidia Quadro P6000 24GB (2018)
 +      * Nvidia Quadro RTX8000 48GB (2019)
       * Operating system Centos 7.7       * Operating system Centos 7.7
           * Nvidia Driver 418.87.00 for CUDA 10.1           * Nvidia Driver 418.87.00 for CUDA 10.1
           * Docker 19.03           * Docker 19.03
           * [[https://github.com/NVIDIA/nvidia-docker | Nvidia-docker  ]]           * [[https://github.com/NVIDIA/nvidia-docker | Nvidia-docker  ]]
 +  * ''ctgpgpu9'':
 +      * Dell PowerEdge R750
 +      * 2 x [[ https://ark.intel.com/content/www/es/es/ark/products/215274/intel-xeon-gold-6326-processor-24m-cache-2-90-ghz.html |Intel Xeon Gold 6326 ]]
 +      * 128 GB RAM 
 +      * 2 x NVIDIA Ampere A100 80 GB
 +      * AlmaLinux 8.6
 +           * NVIDIA 515.48.07 driver and CUDA 11.7
 +  * ''ctgpgpu10'':
 +      * PowerEdge R750
 +      * 2 x [[ https://ark.intel.com/content/www/es/es/ark/products/215272/intel-xeon-gold-5317-processor-18m-cache-3-00-ghz.html |Intel Xeon Gold 5317 ]]
 +      * 128 GB  RAM 
 +      * NVIDIA Ampere A100 80 GB
 +      * Sistema operativo AlmaLinux 8.7
 +           * Driver NVIDIA 525.60.13 and CUDA 12.0
 +  * ''ctgpgpu11'':
 +      * Server Gybabyte  G482-Z54
 +      * 2 x [[ https://www.amd.com/es/products/cpu/amd-epyc-7413 | AMD EPYC 7413 @2,65 GHz 24c ]]
 +      * 256 GB RAM
 +      * 4 x NVIDIA Ampere A100 de 80 GB  
 +      * AlmaLinux 9.1
 +           * Driver NVIDIA 520.61.05 and CUDA 11.8
 +  * ''ctgpgpu12'':
 +      * Servidor Dell PowerEdge R760
 +      * 2 procesadores [[ https://ark.intel.com/content/www/xl/es/ark/products/232376.html |Intel Xeon Silver 4410Y ]]
 +      * 384 GB de memoria RAM 
 +      * 2 x NVIDIA Hopper H100 de 80 GB
 +      * Sistema operativo AlmaLinux 9.2
 +           * Driver NVIDIA 535.104.12 para CUDA 12.2
 +
 ===== Activation ===== ===== Activation =====
-All CITIUS users can access this service, You have to register filling the [[https://citius.usc.es/dashboard/enviar-incidencia| requests and problem reporting form]]. +Not all servers are available to use freely. Access must be requested filling the [[https://citius.usc.es/dashboard/enviar-incidencia| requests and problem reporting form]]. Users without access permission will receive an incorrect password error message.
  
 ===== User Manual ===== ===== User Manual =====
Line 81: Line 77:
 Use SSH. Hostnames and ip addresses are: Use SSH. Hostnames and ip addresses are:
  
-  * ctgpgpu1.inv.usc.es - 172.16.242.91:1301 +
-  * ctgpgpu2.inv.usc.es - 172.16.242.92:22 +
-  * ctgpgpu3.inv.usc.es - 172.16.242.93:22+
   * ctgpgpu4.inv.usc.es - 172.16.242.201:22   * ctgpgpu4.inv.usc.es - 172.16.242.201:22
   * ctgpgpu5.inv.usc.es - 172.16.242.202:22   * ctgpgpu5.inv.usc.es - 172.16.242.202:22
 +  * ctgpgpu6.inv.usc.es - 172.16.242.205:22 
 +  * ctgpgpu9.inv.usc.es - 172.16.242.94:22 
 +  * ctgpgpu10.inv.usc.es - 172.16.242.95:22 
 +  * ctgpgpu11.inv.usc.es - 172.16.242.96:22 
 +  * ctgpgpu12.inv.usc.es - 172.16.242.97:22
 Connection in only possible from inside the CITIUS network. To connect from other places or from the RAI network it is necessary to use the [[https://wiki.citius.usc.es/en:centro:servizos:vpn:start | VPN]] or the [[https://wiki.citius.usc.es/en:centro:servizos:pasarela_ssh|SSH gateway]]. Connection in only possible from inside the CITIUS network. To connect from other places or from the RAI network it is necessary to use the [[https://wiki.citius.usc.es/en:centro:servizos:vpn:start | VPN]] or the [[https://wiki.citius.usc.es/en:centro:servizos:pasarela_ssh|SSH gateway]].
  
Line 96: Line 94:
 ==== Job management with SLURM ==== ==== Job management with SLURM ====
  
-In ''ctgpgpu2'',''ctgpgpu3'' and ''ctgpgpu5'' there is a queue management software installed to send jobs and avoid conflicts between different processes because two jobs shouldn't be executed at the same time.+On servers where there is a queue management software installed its use is mandatory to send jobs and avoid conflicts between different processes because two jobs shouldn't be executed at the same time.
  
 To send a job to the queue command ''srun'' is used: To send a job to the queue command ''srun'' is used: