Setting up the cluster

Step 1. Install Ubuntu on all the nodes in the cluster. As using Linux is the easiest way to set up a Beowulf Cluster.

Step 2. Make sure everything is up to date using the following commands in terminal

sudo apt update sudo apt upgrade -y

Step 3. Is to install all the required packages onto the cluster. This is done with the following command.

sudo apt install nfs-common ssh mpi build-essential libatomic1 python gfortran perl wget m4 cmake pkg-config -y

Step 4. Edit the host file of all the nodes so that they can all communicate. An example host file is shown below

Open using: sudo nano /etc/hosts

File:
 127.0.0.1 localhost
 192.168.1.10 masternode
 192.168.1.11 node1
 192.168.1.12 node2
 192.168.1.13 node3

Step 5. Create the user account that will be used by the cluster to communicate. Do not use the root account for this. The account can be created using the following command.

sudo adduser Juser --uid 999

Make sure to create an account with the same name on all the nodes in the cluster. A uid of less than 1000 is used so that the user does not show up on the login screen.

Step 6. On the master node you need to add the following steps.

Step 6.1. Use the following commands in terminal to set up the shared drive space.

Install the nfs kernel server using the following command

sudo apt install nfs-kernel-server -y
ls -l /home/ | grep Juser

If you are not using the Juser home directory, make sure the Juser own it by running the following command

sudo chown Juser:Juser /path/to/dir

Open the /etc/exports file and add the following line

/Juser/home *(rw,sync,no_subtree_check)

Run,

sudo service nfs-kernel-server restart
sudo exportfs -a

to restart the nfs server and share the drive location on the network. To ensure that the nodes located on the network can access the location add the following exception to the ufw.

sudo ufw allow from 192.168.1.0/24

This is done assuming you have used the same network and have set up the nodes using a static ip.

Step 6.2. Setting up ssh. This is still being done from the master node.

Run the following commands to generate ssh keys and copy them into the shared directory so that all nodes can access each other using the same key. Do not add a password when asked to create one. Just hit enter. This means that the nodes will have password-less ssh access to each other.

su Juser<br>ssh-keygen
ssh-copy-id localhost

Step 7. On all the other nodes you now need to mount the shared drive location. This is done using the following command.

sudo mount masternode:/home/Juser /home/Juser

To make sure that the location is mounted at startup, add the following line to the /etc/fstab file.

masternode:/home/Juser /home/Juser nfs

Step 8. Restart all the nodes and make sure they can all access the shared drive location and that you can ssh between all the nodes without entering a password.

Step 9. To install Julia, download the official binaries from their website and export the downloaded file to your home directory.

To enable running Julia without having to cd into the directory you can link the location of the binary to any of the directories labeled on your $PATH. This can be done using the following command.

sudo ln -s /path/to/Julia(version)/bin/Julia /usr/local/bin/

In this case the full location was

sudo ln -s /home/Juser/Julia-1.1.0/bin/Julia /usr/local/bin/

You should now be able to run Julia by typing Julia in the terminal window.

Leave a comment