GPGPU Computing Servers

GPGPU Computing Servers

Service Description

These servers are intended for GPU computing (GPGPU), aimed at intensive computing tasks, machine learning, data processing, and scientific simulation that require hardware acceleration.

Public Access Servers

Any researcher from the center can request access to these servers. Access is granted upon prior request and validation.

Node Server CPU RAM GPUs Operating System Job Management
ctgpgpu4 PowerEdge R730 2 × Intel Xeon E5-2623 v4 128 GB 2 × Nvidia GP102GL 24GB (Tesla P40, 2016) AlmaLinux 9.1
• CUDA 12.0
Slurm (mandatory use)

Restricted Access Servers

Access to these servers is restricted to a specific group, specific project, or is more controlled due to resource management and planning issues.

It is essential to check the updated information in Xici at the time of requesting the service, which details the particular circumstances of each server (access criteria, priorities, usage conditions, etc.).

Node Server CPU RAM GPUs Operating System Job Management
ctgpgpu5 PowerEdge R730 2 × Intel Xeon E5-2623 v4 128 GB 2 × Nvidia GP102GL (Tesla P40) Ubuntu 22.04
• Nvidia Driver 590
• CUDA Toolkit 12.5 and 13.1 (default)
n/a
ctgpgpu6 SIE LADON 4214 2 × Intel Xeon Silver 4214 192 GB Nvidia Quadro P6000 24GB (2018)
Nvidia Quadro RTX8000 48GB (2019)
2 × Nvidia A30 24GB (2020)
CentOS 7.9
• Nvidia Driver 535.86.10 (CUDA 12.2)
n/a
ctgpgpu9 Dell PowerEdge R750 2 × Intel Xeon Gold 6326 128 GB 2 × NVIDIA Ampere A100 80GB AlmaLinux 8.6
• NVIDIA Driver 515.48.07 (CUDA 11.7)
n/a
ctgpgpu11 Gigabyte G482-Z54 2 × AMD EPYC 7413 @2.65 GHz (24c) 256 GB 5 × NVIDIA Ampere A100 80GB AlmaLinux 9.1
• NVIDIA Driver 520.61.05 (CUDA 11.8)
n/a
ctgpgpu12 Dell PowerEdge R760 2 × Intel Xeon Silver 4410Y 384 GB 2 × NVIDIA Hopper H100 80GB AlmaLinux 9.2
• NVIDIA Driver 555.42.06 (CUDA 12.5)
n/a
ctgpgpu15 ⚠️ SIE LADON (Gigabyte) 2x AMD EPYC 9474F (48c) 768 GB 4 × NVIDIA H200 NVL AlmaLinux 9.6 ts
ctgpgpu16 ⚠️ SIE LADON (Gigabyte) 2x AMD EPYC 9474F (48c) 768 GB 4 × NVIDIA H200 NVL AlmaLinux 9.7 ts
ctgpgpu17 ⚠️ SIE LADON (Gigabyte) 2x AMD EPYC 9474F (48c) 768 GB 4 × NVIDIA H200 NVL AlmaLinux 9.7 ts
ctgpgpu18 ⚠️ SIE LADON (MegaRAC SP-X) 2x AMD EPYC 9335 (24c) 1536 GB 4 × NVIDIA H200 Ubuntu 22.04 ts

⚠️ The servers ctgpgpu15, ctgpgpu16, ctgpgpu17 and ctgpgpu18 have a temporary installation and assignments, and their configuration and access may change around May 2026.

Service Registration

Not all servers are available at all times for any use. To access the servers, you must request it in advance through the incident form. Users without access permission will receive a message of incorrect password.

User Manual

Connecting to the Servers

To connect to the servers, you must do so via SSH. The names and IP addresses of the servers are as follows:

Node FQDN IP
ctgpgpu4 ctgpgpu4.inv.usc.es 172.16.242.201
ctgpgpu5 ctgpgpu5.inv.usc.es 172.16.242.202
ctgpgpu6 ctgpgpu6.inv.usc.es 172.16.242.205
ctgpgpu9 ctgpgpu9.inv.usc.es 172.16.242.94
ctgpgpu11 ctgpgpu11.inv.usc.es 172.16.242.96
ctgpgpu12 ctgpgpu12.inv.usc.es 172.16.242.97
ctgpgpu15 ctgpgpu15.inv.usc.es 172.16.242.207
ctgpgpu16 ctgpgpu16.inv.usc.es 172.16.242.212
ctgpgpu17 ctgpgpu17.inv.usc.es 172.16.242.213
ctgpgpu18 ctgpgpu18.inv.usc.es 172.16.242.208

The connection is only available from the center's network. To connect from other locations or from the RAI network, it is necessary to use the VPN or the SSH gateway.

Job Management with SLURM

On the servers where there is a Slurm queue manager, its use is mandatory to submit jobs and thus avoid conflicts between processes, as two jobs should not be executed at the same time.

To submit a job to the queue, use the srun command:

srun programa_cuda argumentos_programa_cuda

The srun process waits for the job to run before returning control to the user. If you do not want to wait, you can use console session managers like screen, allowing you to leave the job waiting and disconnect the session without worry and retrieve the console output later.

Alternatively, you can use nohup and place the job in the background with &. In this case, the output is saved in the nohup.out file:

nohup srun programa_cuda argumentos_programa_cuda &

To check the status of the queue, the squeue command is used. The command produces output similar to this:

JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
9  servidore ca_water pablo.qu    PD       0:00      1 (Resources)
10 servidore ca_water pablo.qu    PD       0:00      1 (Priority)
11 servidore ca_water pablo.qu    PD       0:00      1 (Priority)
12 servidore ca_water pablo.qu    PD       0:00      1 (Priority)
13 servidore ca_water pablo.qu    PD       0:00      1 (Priority)
14 servidore ca_water pablo.qu    PD       0:00      1 (Priority)
 8 servidore ca_water pablo.qu     R       0:11      1 ctgpgpu2

You can also obtain an interactive view, updated every second, with the smap command:

smap -i 1

Job Management with TS

On servers that use ts as the job manager, it is mandatory to use it to run tasks that use GPU, in order to avoid conflicts and ensure correct resource allocation.

To request a GPU, you must prepend the option -G 1 (or the number of GPUs needed):

ts -G 1 programa_cuda argumentos_programa_cuda

For example:

ts -G 1 python train.py --epochs 100

The system will take care of placing the job in the queue and running it when a GPU is available.

To check more advanced examples (multiple GPUs, additional resources, specific options, etc.), you can use the command:

usage-overview