Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:centro:servizos:hpc [2023/01/17 14:22] – [Available Software] fernando.guillen | en:centro:servizos:hpc [2024/10/08 09:55] (current) – [CONDA] jorge.suarez | ||
---|---|---|---|
Line 25: | Line 25: | ||
To access the cluster, access must be requested in advance via [[https:// | To access the cluster, access must be requested in advance via [[https:// | ||
- | The access is done through an SSH connection to the login node: | + | The access is done through an SSH connection to the login node (172.16.242.211): |
<code bash> | <code bash> | ||
ssh < | ssh < | ||
Line 160: | Line 160: | ||
<code bash> | <code bash> | ||
# Getting miniconda | # Getting miniconda | ||
- | wget https:// | + | wget https:// |
# Install | # Install | ||
- | sh Miniconda3-py39_4.11.0-Linux-x86_64.sh | + | bash Miniconda3-latest-Linux-x86_64.sh |
+ | # Initialize for bash shell | ||
+ | ~/ | ||
</ | </ | ||
Line 170: | Line 172: | ||
== Available resources == | == Available resources == | ||
<code bash> | <code bash> | ||
+ | hpc-login2 ~]# ver_estado.sh | ||
+ | ============================================================================================================= | ||
+ | NODO | ||
+ | ============================================================================================================= | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | ============================================================================================================= | ||
+ | TOTALES: [Cores : 3/688] [Mem(MB): 270000/ | ||
hpc-login2 ~]$ sinfo -e -o " | hpc-login2 ~]$ sinfo -e -o " | ||
# There is an alias for that command: | # There is an alias for that command: | ||
Line 240: | Line 262: | ||
# There is an alias that shows only the relevant info: | # There is an alias that shows only the relevant info: | ||
hpc-login2 ~]$ ver_colas | hpc-login2 ~]$ ver_colas | ||
- | Name | + | Name Priority |
- | ---------- | + | ---------- |
- | | + | |
- | interactive | + | interactive |
- | urgent | + | urgent |
- | long 100 DenyOnLimit | + | long |
- | | + | |
- | | + | |
+ | | ||
</ | </ | ||
# Priority: is the relative priority of each queue. \\ | # Priority: is the relative priority of each queue. \\ | ||
Line 260: | Line 283: | ||
==== Sending a job to the queue system ==== | ==== Sending a job to the queue system ==== | ||
== Requesting resources == | == Requesting resources == | ||
- | By default, if you submit a job without specifying anything, the system submits it to the default (regular) QOS and assigns it a node, a CPU and all available memory. The time limit for job execution is that of the queue (4 days and 4 hours). | + | By default, if you submit a job without specifying anything, the system submits it to the default (regular) QOS and assigns it a node, a CPU and 4 GB. The time limit for job execution is that of the queue (4 days and 4 hours). |
This is very inefficient, | This is very inefficient, | ||
- %%Node number (-N or --nodes), tasks (-n or --ntasks) and/or CPUs per task (-c or --cpus-per-task).%% | - %%Node number (-N or --nodes), tasks (-n or --ntasks) and/or CPUs per task (-c or --cpus-per-task).%% | ||
Line 297: | Line 320: | ||
== Job submission == | == Job submission == | ||
+ | - sbatch | ||
- salloc | - salloc | ||
- srun | - srun | ||
- | - sbatch | ||
- | 1. SALLOC \\ | + | 1. SBATCH \\ |
- | It is used to immediately obtain an allocation of resources (nodes). As soon as it is obtained, the specified command or a shell is executed. | + | |
- | <code bash> | + | |
- | # Get 5 nodes and launch a job. | + | |
- | hpc-login2 ~]$ salloc -N5 myprogram | + | |
- | # Get interactive access to a node (Press Ctrl+D to exit): | + | |
- | hpc-login2 ~]$ salloc -N1 | + | |
- | </ | + | |
- | 2. SRUN \\ | + | |
- | It is used to launch a parallel job (preferable to using mpirun). It is interactive and blocking. | + | |
- | <code bash> | + | |
- | # Launch the hostname command on 2 nodes | + | |
- | hpc-login2 ~]$ srun -N2 hostname | + | |
- | hpc-node1 | + | |
- | hpc-node2 | + | |
- | </ | + | |
- | 3. SBATCH \\ | + | |
Used to send a script to the queuing system. It is batch-processing and non-blocking. | Used to send a script to the queuing system. It is batch-processing and non-blocking. | ||
<code bash> | <code bash> | ||
Line 336: | Line 343: | ||
hpc-login2 ~]$ sbatch test_job.sh | hpc-login2 ~]$ sbatch test_job.sh | ||
</ | </ | ||
+ | 2. SALLOC \\ | ||
+ | It is used to immediately obtain an allocation of resources (nodes). As soon as it is obtained, the specified command or a shell is executed. | ||
+ | <code bash> | ||
+ | # Get 5 nodes and launch a job. | ||
+ | hpc-login2 ~]$ salloc -N5 myprogram | ||
+ | # Get interactive access to a node (Press Ctrl+D to exit): | ||
+ | hpc-login2 ~]$ salloc -N1 | ||
+ | # Get interactive EXCLUSIVE access to a node | ||
+ | hpc-login2 ~]$ salloc -N1 --exclusive | ||
+ | </ | ||
+ | 3. SRUN \\ | ||
+ | It is used to launch a parallel job (preferable to using mpirun). It is interactive and blocking. | ||
+ | <code bash> | ||
+ | # Launch the hostname command on 2 nodes | ||
+ | hpc-login2 ~]$ srun -N2 hostname | ||
+ | hpc-node1 | ||
+ | hpc-node2 | ||
+ | </ | ||
+ | |||
==== GPU use ==== | ==== GPU use ==== | ||
Line 410: | Line 436: | ||
JOBID PARTITION | JOBID PARTITION | ||
6547 defaultPa | 6547 defaultPa | ||
+ | |||
+ | ## Check status of queue use: | ||
+ | hpc-login2 ~]$ estado_colas.sh | ||
+ | JOBS PER USER: | ||
+ | -------------- | ||
+ | | ||
+ | | ||
+ | |||
+ | JOBS PER QOS: | ||
+ | -------------- | ||
+ | | ||
+ | long: 1 | ||
+ | |||
+ | JOBS PER STATE: | ||
+ | -------------- | ||
+ | | ||
+ | | ||
+ | ========================================== | ||
+ | Total JOBS in cluster: | ||
</ | </ | ||
Common job states: | Common job states: |