Working on a cluster
OverviewTeaching: 15 min
Exercises: 10 minQuestions
How do I log on to a cluster?
How do I transfer data to a cluster?
How is a cluster different to my laptop?
How do I run processes on the cluster?Objectives
Connect to a cluster using ssh.
Transfer files to and from the cluster.
hostnamecommand on a compute node of the cluster.
Throughout this material, we will assist Lola Curious and look over her shoulder while she is starting to work at the Institute of Things as a side job to earn some extra money. On the first day, her supervisor greets her friendly and welcomes her to the job. She explains what her task is and suggests her that she will need to use the supercomputer on the campus. This supercomputer comprises of large number of computers working as a unit so a better technical term for this is computer cluster or just cluster. Lola has so far used her Laptop at home for her studies, so the idea of using a cluster appears a bit intimidating to her. Her supervisor notices her anxiety and tells her that she will receive an introduction to it after she has requested an account on the cluster.
Lola walks to the IT department and finishes the paper work to get an account. One of the admins promises to sit down with her in the morning to show her the way around the machine. The admin explains that Lola will use a small to mid-range HPC cluster.
First of all, the admin asks Lola to connect to the cluster. For this Lola needs to use a Program called the terminal (make sure you have this program in your laptop and refer the setup section for more details).
The admin asks Lola to open a terminal on her laptop and type in the following commands:
$ ssh firstname.lastname@example.org
If you do this material on your own, be sure to replace
lolawith the username that is attributed to you on saga.sigma2.no. When you hit enter, a prompt like this might appear:
Now is your chance to type in your password. But watch out, the characters you type are not displayed on the screen.
Last login: Fri Dec 14 14:13:14 2018 from lolas_laptop $
The admin explains to Lola that she is using a program known as the secure shell or
This establishes a temporary encrypted connection between Lola’s laptop and
The word before the
@ symbol, e.g.
lola here, is the user account name that Lola has access
permissions for on the cluster.
The authenticity of host
When login in for the first time you may get a question whether you trust the server you are trying to connect. If you typed the address correctly (i.e. saga.sigma2.no) then it is safe to say “yes” to the question at the end of this message and permanently added this server to trusted hosts
$ ssh email@example.com The authenticity of host 'saga.sigma2.no (###.###.###.##)' can't be established. RSA key fingerprint is SHA256:NwV2/9HMlLfj6hFmXTuA4UVievE/uq36K9EYa20CteI. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'saga.sigma2.no' (RSA) to the list of known hosts.
Where do I get this
On Linux and/or macOS, the
sshcommand line utility is almost always pre-installed. Open a terminal and type
ssh --helpto check if that is the case.
At the time of writing, the openssh support on Microsoft is still very recent. Alternatives to this are putty, bitvise SSH, mRemoteNG or MobaXterm. Download it, install it and open the GUI. The GUI asks for your user name and the destination address or IP of the computer you want to connect to. Once provided, you will be queried for your password just like in the example above.
Lola is asked to use a UNIX command called
ls (for list directory contents) to have a look
To prove, that Lola is really logged in to another machine, Lola issues a command that prints the name of the machine she is currently working on:
The admin explains that Lola has to work with this remote shell session in order to
run programs on the cluster. Launching programs that open a Graphical User Interface
(GUI) is possible, but the interaction with the GUI will be slow as everything will
have to get transferred through the WiFi network her laptop is currently logged into.
Before Rob continues, he suggests to leave the cluster node again. For this, Lola can
Looking around more
The admin continues to encourage Lola to look around. She explains that all of a cluster’s nodes have similar components as Lola’s laptop or workstation.
- every cluster node offers a certain amount of CPU (Central Processing Unit) cores. To see how many, Lola can run
$ nproc --all
- every cluster node has a certain amount of memory or
RAM (Random-access memory).
To see much memory
saga.sigma2.noin units of Gigabyte has, Lola can run
$ free -g
Units and Language
A computer’s memory and disk are measured in units called Bytes (one Byte is 8 bits). As today’s files and memory have grown to be large given historic standards, volumes are noted using the SI prefixes. So 1000 Bytes is a Kilobyte (kB), 1000 Kilobytes is a Megabyte, 1000 Megabytes is a Gigabyte etc.
History and common language have however mixed this notation with a different meaning. When people say “Kilobyte”, they mean 1024 Bytes instead. In that spirit, a Megabyte are 1024 Kilobytes. To address this ambiguity, the International System of Quantities standardizes the binary prefixes (with base of 1024) by the prefixes kibi, mibi, gibi, etc. For more details, see here
Relative to your Laptop
Note down the number of CPU cores, the amount of RAM and the total disk space available on
saga.sigma2.no. Compare it to your laptop!
Bonus: Divide the values obtained from
saga.sigma2.noby the numbers obtained for your laptop. How much more powerful is the login node of the cluster compared to your laptop?
Using the login node is not using the cluster
As a final word on this lesson, the admin tells Lola that she should never execute long running
processes or applications on
saga.sigma2.no. This is a server that is used by many users
SAGA. If Lola starts a lot of long running processes, other users may
start seeing their commands taking longer to complete. To actually to do science and complete the
tasks Lola is meant to complete, a software called the scheduler has to be used.
The login node and the worker/compute nodes
At this point admin try to emphasize the difference between the login nodes and the worker nodes (also known as compute nodes) . When Lola reached the cluster using ssh, she landed on a login node, which act as an entrance point to the cluster. Login nodes are used for light weight tasks such as managing user login, copy files to the cluster , copy files from the cluster, install software, submit and monitor jobs etc. We never perform analysis on the login node. If we do the login nodes will choke and other users may not be login or copy files. The analysis are performed on compute nodes and that process we will learn later in the lessons. This is a main difference between executing your program on a remote server and using a cluster to run the program. Lola being a good citizen, acknowledges the importance of using shared resources in a way that will not hinder the other users.
The admin continues to explain, that typically people perform computationally heavy tasks on the
cluster and prepare files that contain the results or a subset of data to create final results
on the individuals laptop. So communication to and from the cluster is done mostly by transferring
files. For example, Lola is asked to create a text file and transfer it over. For this, he advises
her to use the secure copy command,
scp. As before, this establishes a secure encrypted temporary
connection between Lola’s laptop and the cluster just for the sake of transferring the files.
After the transfer has completed, scp will close the connection again.
$ echo "Test transfer from $HOSTNAME on " $(date) > from_laptop.txt $ scp from_laptop.txt firstname.lastname@example.org:from_laptop.txt
from_laptop.txt 100% 1KB 27.6KB/s 00:00
She can now
ssh into the cluster again and check, if the file has arrived after she just
$ ssh email@example.com Last login: Tue Mar 14 14:17:44 2017 from lolas_laptop $ ls from_laptop.txt
Now, let’s try the other way around, i.e. downloading a file from the cluster to Lola’s laptop.
For this, Lola has to swap the two arguments of the
scp command she just issued. First she
creates a file while logged into the cluster and then log her self out
$ echo "Test transfer from $HOSTNAME on " $(date) > from_cluster.txt $ logout
Then from the lptop she issues the following command.
$ scp firstname.lastname@example.org:from_cluster.txt from_cluster.txt
Lola notices how the command line changed. First, she has to enter the source
email@example.com) then put a
: and continue with the path of the file she wants
to download. After that, separated by a space, the destination has to be provided, which in this
case is a file
from_cluster.txt in the current directory.
from_cluster.txt 100% 1KB 27.6KB/s 00:00
Paths Are everywhere
sshcommand always entails the same logic of path or folder description than in the regular shell. For example,
$ scp firstname.lastname@example.org:from_cluster.txt from_cluster
yields two relative paths. For the remote source
email@example.com:todays_canteen_menu.pdf, the file name mentioned after the colon, is a relative path to the home directory. For brevity, this information is not shown. The same is true for the destination on the local machine
from_cluster. This is a relative path to the folder Lola currently works in. The same command as above expressed with absolute paths, could look like this (if Lola currently works inside
$ scp firstname.lastname@example.org:/from_cluster.txt /home/lola/work/from_cluster
Lola has a look in the current directory and indeed
She opens it with her pdf reader and can tell that it contains indeed the same content as
the original one. The admin explains that if she would have used the same name as the
scp would have overwritten her local copy.
To finish, The admin asks Lola that she can also transfer entire directories. She prepared a
temporary directory on the cluster for her under
/tmp/lolas_files. She asks
Lola to obtain a copy of the entire directory onto her laptop.
$ scp -r email@example.com:/tmp/lolas_files .
. is a short-hand to represent the current working directory that Lola currently
scp from. When inspecting this directory, Lola sees the transferred directory:
lolas_files/ from_cluster.txt from_laptop.txt
A closer look into that directory using the relative path with respect to the current one:
Rob suggests to Lola to consult the man page of
scp for further details by calling:
$ man scp
All mixed up
Lola needs to obtain a file called
results.datafrom a remote machine that is called
safe-store-1. This machine is hidden behind the login node
saga.sigma2.no. However she mixed up the commands somehow that are needed to get the file onto her laptop. Help her and rearrange the following commands into the right order!
$ ssh lola@`saga.sigma2.no` $ logout $ scp lola@`saga.sigma2.no`:results.data . $ scp lola@safe-store-1:results.data .
$ ssh lola@`saga.sigma2.no` $ scp lola@safe-store-1:results.data . $ logout $ scp lola@`saga.sigma2.no`:results.data .
Who is hanging around ?
wutility displays a list logged-in users and what they are currently doing. Use it to check:
- that nobody but yourself is logged into your laptop/desktop
- that a lot of people use the login node of your cluster
Where did they go ?
Rob has a zip file stored under
/tmp/passwords.zipon the login node of the cluster
saga.sigma2.no. He wants to unzip it on his laptop under
/important/passwords. How does he do that?
$ ssh firstname.lastname@example.org $ unzip /tmp/passwords.zip
$ scp saga.sigma2.no@rob:/tmp/passwords.zip . $ unzip passwords.zip
$ cd /important/passwords $ scp email@example.com:passwords.zip . $ unzip passwords.zip
$ cd /important/passwords $ scp firstname.lastname@example.org:/tmp/passwords.zip . $ unzip passwords.zip
- No: Rob only unpacks the zip file, but does not transfer the unpacked files onto his laptop
- No: Rob mixed up the syntax for
- No: Rob did not specify the correct path of
/tmp/passwords.zipon the login node of the cluster
- Yes: you may also use
unzip foo.zip -d /somewhereif you want to omit the first command
Differences Between Nodes
Many HPC clusters have a variety of nodes optimized for particular workloads. Some nodes may have larger amount of memory, or specialized resources such as Graphical Processing Units.
Clusters (almost always) provide a login node.
You connect to the login node by special software using an encrypted connection.
Data has to be transferred to and from the cluster using specialized software.
A cluster is a shared resource.
Running a process on the cluster requires using the scheduler.