Tulip Lab | Getting Started with NCI

NCI is Australia’s pre-eminent computing facility, delivering on the critical national need for high-performance data, storage, and computing services. This blog provides basic user guiders for using the UCI servers.

Account Management

All NCI users must have a validated account to use the NCI resources, the account could be created in the NCI online self-service portal (https://my.nci.org.au/mancini). The information required to register an NCI account is email, name, mobile phone number and project code.

The step by step register guide could be found at NCI Account Help Center (https://opus.nci.org.au/display/Help/How+to+create+an+NCI+user+account)

Gadi User Guide

Gadi is Australia's most powerful supercomputer, a highly parallel cluster comprising more than 200,000 processor cores on ten different types of compute nodes.

To run jobs on Gadi, we need ssh to the Gadi login server. For the windows users, they could download MobaXterm (https://mobaxterm.mobatek.net/), Xshell (https://www.netsarang.com/en/xshell/) or Putty (https://www.putty.org/) to creat SSH connections.

For example, user aaa777 would run

ssh [email protected]

after entering the correct user password, we will be able to use Gadi.

The Folders

Each user has a project-independent home directory. The storage limit of the home folder is fixed at 10 GiB (Normally, we upload our code to the appropriate subfolder of the home directory).

Each users could create folders and run jobs in the Home folder. Besides, each user also have /scratch and /g/data folders to storage data file; $PBS_JOBFS to storage jobs. The difference between these folders and their using scenarios could be found at: folder structure introduction section in NCI document (https://opus.nci.org.au/display/Help/0.+Welcome+to+Gadi).

File Transfer to/from Gadi

Gadi has six designated data-mover nodes with the domain name gadi-dm.nci.org.au We can use these nodes to transfer files to and from Gadi.

For example, aaa777 runs the following command line in the local terminal to transfer the file input.dat in the current directory to the home folder on Gadi.

scp input.dat [email protected]:/home/777/aaa777

If the transfer is going to take a long time, there is a possibility that it could be interrupted by network instability. For that reason, it is better to start the transfer in a resumable way. For example, the following command line allows user aaa777 to download data in the folder /scratch/a00/aaa777/test_dir on Gadi onto the current directory on their local machine using rsync.

rsync -avPS [email protected]:/scratch/a00/aaa777/test_dir ./

If the download is interrupted, run the same command again to resume the download from where it left off.

Python Package Installation

Gadi has preinstall Models such as python we can directly use. Details of these Models are available in NCI document (https://opus.nci.org.au/display/Help/Environment+Modules).

Thanks to these pre-install models, we do not need to install Python or pytorch from scratch. However, we still need to learn how to install Python packages in Gadi for our project.

Gadi expects users to install the Python package into their own directory in the /g/data or /home file system. Therefore to install a package for example "matplotlib". We can use the following commands:

cd $HOME pip install matplotlib

The same commands are applicable for other packages.

Gadi Jobs

To run compute tasks such as simulations, weather models, and sequence assemblies on Gadi, users need to submit them as jobs to queues. Job submission enables users to specify the queue, duration and resources needs of their jobs. Gadi uses PBSPro to schedule all submitted jobs and keeps nodes that have different hardware in different queues. See details about the hardware available in the different queues on the Gadi Queue Structure page (https://opus.nci.org.au/display/Help/Queue+Structure) . Users submit jobs to a specific queue to run jobs on the corresponding type of node.

This means that we need to wrap our code as Gadi Jobs to run them in a selected queue of Gadi. The Gadi Job creating methods could be found at Gadi PBS Jobs guide: ( https://opus.nci.org.au/display/Help/4.+PBS+Jobs)

Once the job has been created, we could run the job by submitting it to Gadi using the qsub command.

For example, to submit a job defined in a submission script, called for example job.sh, run the following code on the login node.

qsub job.sh

Submission Script Example

Here is an example job submission script to run the python script main.py which is assumed to be located inside the same folder where you run qsub job.sh

#PBS -l mem=190GB
#PBS -l jobfs=200GB
#PBS -q normal
#PBS -P a00
#PBS -l walltime=02:00:00
#PBS -l storage=gdata/a00+scratch/a00
#PBS -l wd 
module load python3/3.7.4
python3 main.py $PBS_NCPUS > /g/data/a00/$USER/job_logs/$PBS_JOBID.log

GPUs support

To use GPUs in Gadi, we need to submit our job to a queue that support GPUs, such as 'gpuvolta', then specify the number of GPUs required. We can add the following PBS command:

#PBS -q normal
#PBS -l ngpus=2

Job Monitoring

Once a job submission is accepted, its jobID is shown in the return message and can be used to monitor the job's status. Users are encouraged to keep monitoring their own jobs at every stage of their lifespan on Gadi.

Queue Status

To look up the status of a job in the queue, run the command qstat. For example, to lookup the job 12345678 in the queue, run

qstat -swx 12345678

qstat -u $USER -Esw Other commands could found at: GaDi help document (https://opus.nci.org.au/display/Help/0.+Welcome+to+Gadi).