Differences

This shows you the differences between two versions of the page.

--- en:centro:servizos:servidores_de_computacion_gpgpu [2019/11/14 10:48] – fernando.guillen
+++ en:centro:servizos:servidores_de_computacion_gpgpu [2023/10/11 13:56] – [How to connect the servers] fernando.guillen
@@ Line 2: / Line 2: @@
 ===== Service description =====
+==== Servers with free access GPUs ====
-Five servers with graphic cards:
-  * ''ctgpgpu1'':
-    * Supermicro X8DTG-D
-    * 2 x [[http://ark.intel.com/products/40200|Intel Xeon E5520]]
-    * 10 GB RAM (5 DIMM 1333 MHz)
-    * 2 x Nvidia GF100 [Tesla S2050]
-    * Ubuntu 10.04
-      * CUDA 5.0
-  * ''ctgpgpu2'':
-    * Dell Precision R5400
-    * 2 x [[http://ark.intel.com/products/33082/|Intel Xeon E5440]]
-    * 8 GB RAM (4 x DDR2 FB-DIMM 667 MHz)
-    * 1 Nvidia GK104 [Geforce GTX 680]
-    * Ubuntu 18.04 operative system
-      * Slurm (//mandatory to queue jobs!//)
-      * CUDA 9.2 (//Nvidia official repo//)
-      * Docker-ce 18.06 (//Docker official repo//)
-      * Nvidia-docker 2.0.3 (//Nvidia official repo//)
-      * Nvidia cuDNN v7.2.1 for CUDA 9.2
-      * Intel Parallel Studio Professional for C++ 2015 (//single license! coordinate with other users!//)
-  * ''ctgpgpu3'':
-    * PowerEdge R720
-    * 1 x [[http://ark.intel.com/products/64588|Intel Xeon E52609]]
-    * 16 GB RAM (1 DDR3 DIMM  1600MHz)
-    * Connected to a graphical card extensión box with:
-      * Gigabyte GeForce GTX Titan 6GB (2014)
-      * Nvidia Titan X Pascal 12GB (2016)
-    * Ubuntu 18.04 operative system
-      * Slurm (//mandatory to queue jobs!//)
-      * CUDA 9.2 (//Nvidia official repo//)
-      * Docker-ce 18.06 (//Docker official repo//)
-      * Nvidia-docker 2.0.3 (//Nvidia official repo//)
-      * Nvidia cuDNN v7.2.1 for CUDA 9.2
-      * Intel Parallel Studio Professional for C++ 2015 (//single license! coordinate with other users!//)
-      * ROS Melodic Morenia (//repositorio oficial de ROS//)
   * ''ctgpgpu4'':
       * PowerEdge R730
@@ Line 44: / Line 8: @@
       * 128 GB RAM (4 DDR4 DIMM  2400MHz)
       * 2 x Nvidia GP102GL 24GB [Tesla P40]
-      * Centos 7.4
+      * AlmaLinux 9.1
-          * Docker 17.09 and nvidia-docker 1.0.1
+          * Cuda 12.0
-          * OpenCV 2.4.5
+          * **Mandatory use of Slurm queue manager**.
-          * Dliv, Caffe, Caffe2 and pycaffe
-          * Python 3.4: cython, easydict, sonnet
+  * HPC cluster servers: [[ en:centro:servizos:hpc | HPC cluster ]]
-          * TensorFlow
+  * CESGA servers: [[ en:centro:servizos:cesga | Access procedure info ]]
-  * ''ctgpgpu5'':
+==== Restricted access GPU servers  ====
+ * ''ctgpgpu5'':
       * PowerEdge R730
       * 2 x  [[https://ark.intel.com/products/92980/Intel-Xeon-Processor-E5-2623-v4-10M-Cache-2_60-GHz|Intel Xeon E52623v4]]
       * 128 GB RAM (4 DDR4 DIMM  2400MHz)
       * 2 x Nvidia GP102GL 24GB [Tesla P40]
-      * Ubuntu 16.04
+      * Ubuntu 18.04
           * **Slurm as a mandatory use queue manager**.
           * ** Modules for library version management **.
-          * CUDA 9.0
+          * CUDA 11.0
           * OpenCV 2.4 and 3.4
           * Atlas 3.10.3
@@ Line 65: / Line 31: @@
           * Caffee
-* ''ctgpgpu6'':
+  * ''ctgpgpu6'':
       * Server SIE LADON 4214
       * 2 processors  [[https://ark.intel.com/content/www/us/en/ark/products/193385/intel-xeon-silver-4214-processor-16-5m-cache-2-20-ghz.html|Intel Xeon Silver 4214]]
       * 192 GB RAM memory(12 DDR4 DIMM 2933MHz)
       * Nvidia Quadro P6000 24GB (2018)
+      * Nvidia Quadro RTX8000 48GB (2019)
       * Operating system Centos 7.7
           * Nvidia Driver 418.87.00 for CUDA 10.1
           * Docker 19.03
           * [[https://github.com/NVIDIA/nvidia-docker | Nvidia-docker  ]]
+  * ''ctgpgpu9'':
+      * Dell PowerEdge R750
+      * 2 x [[ https://ark.intel.com/content/www/es/es/ark/products/215274/intel-xeon-gold-6326-processor-24m-cache-2-90-ghz.html |Intel Xeon Gold 6326 ]]
+      * 128 GB RAM
+      * 2 x NVIDIA Ampere A100 80 GB
+      * AlmaLinux 8.6
+           * NVIDIA 515.48.07 driver and CUDA 11.7
+  * ''ctgpgpu10'':
+      * PowerEdge R750
+      * 2 x [[ https://ark.intel.com/content/www/es/es/ark/products/215272/intel-xeon-gold-5317-processor-18m-cache-3-00-ghz.html |Intel Xeon Gold 5317 ]]
+      * 128 GB  RAM
+      * NVIDIA Ampere A100 80 GB
+      * Sistema operativo AlmaLinux 8.7
+           * Driver NVIDIA 525.60.13 and CUDA 12.0
+  * ''ctgpgpu11'':
+      * Server Gybabyte  G482-Z54
+      * 2 x [[ https://www.amd.com/es/products/cpu/amd-epyc-7413 | AMD EPYC 7413 @2,65 GHz 24c ]]
+      * 256 GB RAM
+      * 4 x NVIDIA Ampere A100 de 80 GB
+      * AlmaLinux 9.1
+           * Driver NVIDIA 520.61.05 and CUDA 11.8
+  * ''ctgpgpu12'':
+      * Servidor Dell PowerEdge R760
+      * 2 procesadores [[ https://ark.intel.com/content/www/xl/es/ark/products/232376.html |Intel Xeon Silver 4410Y ]]
+      * 384 GB de memoria RAM
+      * 2 x NVIDIA Hopper H100 de 80 GB
+      * Sistema operativo AlmaLinux 9.2
+           * Driver NVIDIA 535.104.12 para CUDA 12.2
 ===== Activation =====
-All CITIUS users can access this service, You have to register filling the [[https://citius.usc.es/dashboard/enviar-incidencia| requests and problem reporting form]].
+Not all servers are available to use freely. Access must be requested filling the [[https://citius.usc.es/dashboard/enviar-incidencia| requests and problem reporting form]]. Users without access permission will receive an incorrect password error message.
 ===== User Manual =====
@@ Line 81: / Line 77: @@
 Use SSH. Hostnames and ip addresses are:
-  * ctgpgpu1.inv.usc.es - 172.16.242.91:1301
-  * ctgpgpu2.inv.usc.es - 172.16.242.92:22
-  * ctgpgpu3.inv.usc.es - 172.16.242.93:22
   * ctgpgpu4.inv.usc.es - 172.16.242.201:22
   * ctgpgpu5.inv.usc.es - 172.16.242.202:22
+  * ctgpgpu6.inv.usc.es - 172.16.242.205:22
+  * ctgpgpu9.inv.usc.es - 172.16.242.94:22
+  * ctgpgpu10.inv.usc.es - 172.16.242.95:22
+  * ctgpgpu11.inv.usc.es - 172.16.242.96:22
+  * ctgpgpu12.inv.usc.es - 172.16.242.97:22
 Connection in only possible from inside the CITIUS network. To connect from other places or from the RAI network it is necessary to use the [[https://wiki.citius.usc.es/en:centro:servizos:vpn:start | VPN]] or the [[https://wiki.citius.usc.es/en:centro:servizos:pasarela_ssh|SSH gateway]].
@@ Line 96: / Line 94: @@
 ==== Job management with SLURM ====
-In ''ctgpgpu2'',''ctgpgpu3'' and ''ctgpgpu5'' there is a queue management software installed to send jobs and avoid conflicts between different processes because two jobs shouldn't be executed at the same time.
+On servers where there is a queue management software installed its use is mandatory to send jobs and avoid conflicts between different processes because two jobs shouldn't be executed at the same time.
 To send a job to the queue command ''srun'' is used: