This lesson is being piloted (Beta version)

Working on a cluster

Overview

Teaching: 15 min
Exercises: 10 min
Questions
  • How do I log on to a cluster?

  • How do I transfer data to a cluster?

  • How is a cluster different to my laptop?

  • How do I run processes on the cluster?

Objectives
  • Connect to a cluster using ssh.

  • Transfer files to and from the cluster.

  • Run the hostname command on a compute node of the cluster.

The Story

Throughout this material, we will assist Lola Curious and look over her shoulder while she is starting to work at the Institute of Things as a side job to earn some extra money. On the first day, her supervisor greets her friendly and welcomes her to the job. She explains what her task is and suggests her that she will need to use the supercomputer on the campus. This supercomputer comprises of large number of computers working as a unit so a better technical term for this is computer cluster or just cluster. Lola has so far used her Laptop at home for her studies, so the idea of using a cluster appears a bit intimidating to her. Her supervisor notices her anxiety and tells her that she will receive an introduction to it after she has requested an account on the cluster.

Lola walks to the IT department and finishes the paper work to get an account. One of the admins promises to sit down with her in the morning to show her the way around the machine. The admin explains that Lola will use a small to mid-range HPC cluster.

Going remote

First of all, the admin asks Lola to connect to the cluster. For this Lola needs to use a Program called the terminal (make sure you have this program in your laptop and refer the setup section for more details).

/hpc-intro/Connect%20to%20cluster

The admin asks Lola to open a terminal on her laptop and type in the following commands:

$ ssh lola@saga.sigma2.no

Logging in

If you do this material on your own, be sure to replace lola with the username that is attributed to you on saga.sigma2.no. When you hit enter, a prompt like this might appear:

lola@saga.sigma2.no's password:

Now is your chance to type in your password. But watch out, the characters you type are not displayed on the screen.

Last login: Fri Dec 14 14:13:14 2018 from lolas_laptop
$ 

The admin explains to Lola that she is using a program known as the secure shell or ssh. This establishes a temporary encrypted connection between Lola’s laptop and saga.sigma2.no. The word before the @ symbol, e.g. lola here, is the user account name that Lola has access permissions for on the cluster.

The authenticity of host

When login in for the first time you may get a question whether you trust the server you are trying to connect. If you typed the address correctly (i.e. saga.sigma2.no) then it is safe to say “yes” to the question at the end of this message and permanently added this server to trusted hosts

$ ssh lola@saga.sigma2.no
The authenticity of host 'saga.sigma2.no (###.###.###.##)' can't be established.
RSA key fingerprint is SHA256:NwV2/9HMlLfj6hFmXTuA4UVievE/uq36K9EYa20CteI.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'saga.sigma2.no' (RSA) to the list of known hosts.

Where do I get this ssh from ?

On Linux and/or macOS, the ssh command line utility is almost always pre-installed. Open a terminal and type ssh --help to check if that is the case.

At the time of writing, the openssh support on Microsoft is still very recent. Alternatives to this are putty, bitvise SSH, mRemoteNG or MobaXterm. Download it, install it and open the GUI. The GUI asks for your user name and the destination address or IP of the computer you want to connect to. Once provided, you will be queried for your password just like in the example above.

Lola is asked to use a UNIX command called ls (for list directory contents) to have a look around.

$ ls

To prove, that Lola is really logged in to another machine, Lola issues a command that prints the name of the machine she is currently working on:

$ hostname
saga.sigma2.no

The admin explains that Lola has to work with this remote shell session in order to run programs on the cluster. Launching programs that open a Graphical User Interface (GUI) is possible, but the interaction with the GUI will be slow as everything will have to get transferred through the WiFi network her laptop is currently logged into. Before Rob continues, he suggests to leave the cluster node again. For this, Lola can type in logout or exit.

$ logout

Looking around more

The admin continues to encourage Lola to look around. She explains that all of a cluster’s nodes have similar components as Lola’s laptop or workstation.

$ nproc --all
$ free -g

Units and Language

A computer’s memory and disk are measured in units called Bytes (one Byte is 8 bits). As today’s files and memory have grown to be large given historic standards, volumes are noted using the SI prefixes. So 1000 Bytes is a Kilobyte (kB), 1000 Kilobytes is a Megabyte, 1000 Megabytes is a Gigabyte etc.

History and common language have however mixed this notation with a different meaning. When people say “Kilobyte”, they mean 1024 Bytes instead. In that spirit, a Megabyte are 1024 Kilobytes. To address this ambiguity, the International System of Quantities standardizes the binary prefixes (with base of 1024) by the prefixes kibi, mibi, gibi, etc. For more details, see here

Relative to your Laptop

Note down the number of CPU cores, the amount of RAM and the total disk space available on saga.sigma2.no. Compare it to your laptop!

Bonus: Divide the values obtained from saga.sigma2.no by the numbers obtained for your laptop. How much more powerful is the login node of the cluster compared to your laptop?

Using the login node is not using the cluster

As a final word on this lesson, the admin tells Lola that she should never execute long running processes or applications on saga.sigma2.no. This is a server that is used by many users of SAGA. If Lola starts a lot of long running processes, other users may start seeing their commands taking longer to complete. To actually to do science and complete the tasks Lola is meant to complete, a software called the scheduler has to be used.

The login node and the worker/compute nodes

/hpc-intro/Connect%20to%20cluster

At this point admin try to emphasize the difference between the login nodes and the worker nodes (also known as compute nodes) . When Lola reached the cluster using ssh, she landed on a login node, which act as an entrance point to the cluster. Login nodes are used for light weight tasks such as managing user login, copy files to the cluster , copy files from the cluster, install software, submit and monitor jobs etc. We never perform analysis on the login node. If we do the login nodes will choke and other users may not be login or copy files. The analysis are performed on compute nodes and that process we will learn later in the lessons. This is a main difference between executing your program on a remote server and using a cluster to run the program. Lola being a good citizen, acknowledges the importance of using shared resources in a way that will not hinder the other users.

Transferring Data

The admin continues to explain, that typically people perform computationally heavy tasks on the cluster and prepare files that contain the results or a subset of data to create final results on the individuals laptop. So communication to and from the cluster is done mostly by transferring files. For example, Lola is asked to create a text file and transfer it over. For this, he advises her to use the secure copy command, scp. As before, this establishes a secure encrypted temporary connection between Lola’s laptop and the cluster just for the sake of transferring the files. After the transfer has completed, scp will close the connection again.

$ echo "Test transfer from $HOSTNAME on " $(date) > from_laptop.txt
$ scp from_laptop.txt lola@saga.sigma2.no:from_laptop.txt
from_laptop.txt                                              100%   1KB  27.6KB/s   00:00

She can now ssh into the cluster again and check, if the file has arrived after she just uploaded it:

$ ssh lola@saga.sigma2.no
Last login: Tue Mar 14 14:17:44 2017 from lolas_laptop
$ ls from_laptop.txt
from_laptop.txt

Now, let’s try the other way around, i.e. downloading a file from the cluster to Lola’s laptop. For this, Lola has to swap the two arguments of the scp command she just issued. First she creates a file while logged into the cluster and then log her self out

$ echo "Test transfer from $HOSTNAME on " $(date) > from_cluster.txt
$ logout 

Then from the lptop she issues the following command.

$ scp lola@saga.sigma2.no:from_cluster.txt from_cluster.txt

Lola notices how the command line changed. First, she has to enter the source (lola@saga.sigma2.no) then put a : and continue with the path of the file she wants to download. After that, separated by a space, the destination has to be provided, which in this case is a file from_cluster.txt in the current directory.

from_cluster.txt                                              100%   1KB  27.6KB/s   00:00

Paths Are everywhere

Issuing a ssh command always entails the same logic of path or folder description than in the regular shell. For example,

$ scp lola@saga.sigma2.no:from_cluster.txt from_cluster

yields two relative paths. For the remote source lola@saga.sigma2.no:todays_canteen_menu.pdf, the file name mentioned after the colon, is a relative path to the home directory. For brevity, this information is not shown. The same is true for the destination on the local machine from_cluster. This is a relative path to the folder Lola currently works in. The same command as above expressed with absolute paths, could look like this (if Lola currently works inside /home/lola/work):

$ scp lola@saga.sigma2.no:/from_cluster.txt /home/lola/work/from_cluster

Lola has a look in the current directory and indeed from_cluster. She opens it with her pdf reader and can tell that it contains indeed the same content as the original one. The admin explains that if she would have used the same name as the destination, i.e. todays_canteen_menu.pdf, scp would have overwritten her local copy.

To finish, The admin asks Lola that she can also transfer entire directories. She prepared a temporary directory on the cluster for her under /tmp/lolas_files. She asks Lola to obtain a copy of the entire directory onto her laptop.

$ scp -r lola@saga.sigma2.no:/tmp/lolas_files .

The trailing . is a short-hand to represent the current working directory that Lola currently calls scp from. When inspecting this directory, Lola sees the transferred directory:

$ ls
lolas_files/  from_cluster.txt from_laptop.txt  

A closer look into that directory using the relative path with respect to the current one:

Rob suggests to Lola to consult the man page of scp for further details by calling:

$ man scp

All mixed up

Lola needs to obtain a file called results.data from a remote machine that is called safe-store-1. This machine is hidden behind the login node saga.sigma2.no. However she mixed up the commands somehow that are needed to get the file onto her laptop. Help her and rearrange the following commands into the right order!

$ ssh lola@`saga.sigma2.no`
$ logout
$ scp lola@`saga.sigma2.no`:results.data .
$ scp lola@safe-store-1:results.data .

Solution

$ ssh lola@`saga.sigma2.no`
$ scp lola@safe-store-1:results.data .
$ logout
$ scp lola@`saga.sigma2.no`:results.data .

Who is hanging around ?

The w utility displays a list logged-in users and what they are currently doing. Use it to check:

  1. that nobody but yourself is logged into your laptop/desktop
  2. that a lot of people use the login node of your cluster saga.sigma2.no

Where did they go ?

Rob has a zip file stored under /tmp/passwords.zip on the login node of the cluster saga.sigma2.no. He wants to unzip it on his laptop under /important/passwords. How does he do that?

1.

$ ssh rob@saga.sigma2.no
$ unzip /tmp/passwords.zip

2.

$ scp saga.sigma2.no@rob:/tmp/passwords.zip .
$ unzip passwords.zip

3.

$ cd /important/passwords
$ scp rob@saga.sigma2.no:passwords.zip .
$ unzip passwords.zip

4.

$ cd /important/passwords
$ scp rob@saga.sigma2.no:/tmp/passwords.zip .
$ unzip passwords.zip

Solution

  1. No: Rob only unpacks the zip file, but does not transfer the unpacked files onto his laptop
  2. No: Rob mixed up the syntax for scp
  3. No: Rob did not specify the correct path of /tmp/passwords.zip on the login node of the cluster saga.sigma2.no
  4. Yes: you may also use unzip foo.zip -d /somewhere if you want to omit the first command

Differences Between Nodes

Many HPC clusters have a variety of nodes optimized for particular workloads. Some nodes may have larger amount of memory, or specialized resources such as Graphical Processing Units.

Key Points

  • Clusters (almost always) provide a login node.

  • You connect to the login node by special software using an encrypted connection.

  • Data has to be transferred to and from the cluster using specialized software.

  • A cluster is a shared resource.

  • Running a process on the cluster requires using the scheduler.