ctgpgpu4
:
* ctgpgpu5
:
ctgpgpu6
: ctgpgpu9
:ctgpgpu10
:ctgpgpu11
:ctgpgpu12
:Not all servers are available to use freely. Access must be requested filling the requests and problem reporting form. Users without access permission will receive an incorrect password error message.
Use SSH. Hostnames and ip addresses are:
Connection in only possible from inside the CITIUS network. To connect from other places or from the RAI network it is necessary to use the VPN or the SSH gateway.
The servers switch themselves off after an hour of being idle. To switch them on again use the remote power service.
Servers won't switch themselves off if there is an open SSH or Screen session.
On servers where there is a queue management software installed its use is mandatory to send jobs and avoid conflicts between different processes because two jobs shouldn't be executed at the same time.
To send a job to the queue command srun
is used:
srun cuda_program arguments_of_cuda_program
The srun
process waits until the job is executed before returning control to the user. If you don't want to wait a console session manager like screen
can be used. This way you can leave the the job in the queue and disconnect the session without losing the output of the job witch can be recovered any other moment.
Alternatively nohup
can be used and then the job sent to the background with &
. This way the output is written in the file nohup.out
:
nohup srun cuda_program cuda_program_arguments &
To check the queue status command squeue
is used. The command shows an output similar to this one:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 9 servidore ca_water pablo.qu PD 0:00 1 (Resources) 10 servidore ca_water pablo.qu PD 0:00 1 (Priority) 11 servidore ca_water pablo.qu PD 0:00 1 (Priority) 12 servidore ca_water pablo.qu PD 0:00 1 (Priority) 13 servidore ca_water pablo.qu PD 0:00 1 (Priority) 14 servidore ca_water pablo.qu PD 0:00 1 (Priority) 8 servidore ca_water pablo.qu R 0:11 1 ctgpgpu2
An interactive view can be obtained, refreshed every second, with the smap
command:
smap -i 1