|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Parallel Shell Utilities" |
| 4 | +tags: hpc software utilities |
| 5 | +--- |
| 6 | + |
| 7 | +> The most fundamental tool needed to administer a cluster is a parallel shell, which allows you to run the same command on a series of nodes. |
| 8 | +> |
| 9 | +> [Linux Magazine](http://www.linux-magazine.com/Issues/2014/166/Parallel-Shells) |
| 10 | +
|
| 11 | + |
| 12 | +There are 3 popular ways to perform parallel actions on HPC clusters: |
| 13 | +- Using a shell script |
| 14 | +- Using Pdsh |
| 15 | +- Using ClusterShell |
| 16 | + |
| 17 | +No extra installation is needed to use the shell script method but you'll have to install Pdsh and ClusterShell if you want to use them. The [OpenHPC stack](https://github.com/openhpc/ohpc) comes with them pre-packaged. I won't be going over how to install them, but installation is simple with provided package managers. |
| 18 | + |
| 19 | +Before we dive into these 3 methods, its important to note that you should have ssh set up to authenticate with ssh keys, instead of a password for this to work. Don't skip this, or your'll have to enter your password way too many times! |
| 20 | + |
| 21 | +## Using a shell script |
| 22 | +This method can be done with any scripting language, but I'll use bash. |
| 23 | + |
| 24 | +Create a script that you want ran: |
| 25 | +``` |
| 26 | +#!/bin/bash |
| 27 | +
|
| 28 | +# You will need to run this script on a passwordless account! |
| 29 | +
|
| 30 | +for NODE in node1 node2 node3 |
| 31 | +do |
| 32 | + ssh $NODE uptime |
| 33 | +done |
| 34 | +``` |
| 35 | + |
| 36 | +And run it with `sh your-script.sh`. This will run your command on all nodes, one at a time, printing everything to the screen. |
| 37 | + |
| 38 | +## Using Pdsh |
| 39 | + |
| 40 | +> "Pdsh is a multithreaded remote shell client which executes commands on |
| 41 | + multiple remote hosts in parallel." |
| 42 | +> |
| 43 | +> [Pdsh Github](https://github.com/grondo/pdsh) |
| 44 | +
|
| 45 | +Using pdsh is simple enough: |
| 46 | +``` |
| 47 | +# perform th uptime command on node1, node2, node3 and node 4 |
| 48 | +$ pdsh -w node[1-4] uptime |
| 49 | +node1: 19:01:18 up 1 day, 19:19, 1 user, load average: 0.29, 0.57, 0.53 |
| 50 | +node2: 19:11:25 up 1 day, 19:19, 1 user, load average: 0.29, 0.57, 0.53 |
| 51 | +node3: 19:19:11 up 1 day, 19:19, 1 user, load average: 0.29, 0.57, 0.53 |
| 52 | +node4: 19:20:45 up 1 day, 19:19, 1 user, load average: 0.29, 0.57, 0.53 |
| 53 | +``` |
| 54 | + |
| 55 | + |
| 56 | +## Using ClusterShell |
| 57 | + |
| 58 | +ClusterShell is built and maintained by CEA-HPC and claims to be effective on supercomputers with 5,000 compute nodes. Using ClusterShell is very similar to pdsh |
| 59 | +``` |
| 60 | +$ clush -w foo[1-5] echo "Hello World" |
| 61 | +``` |
| 62 | +ClusterShell does some merging of outputs and does a better job of handling hanging nodes than pdsh. |
| 63 | + |
| 64 | +## References |
| 65 | + |
| 66 | +[Bash for loop Syntax](https://www.cyberciti.biz/faq/bash-for-loop/) |
| 67 | +[Pdsh github](https://github.com/grondo/pdsh) |
| 68 | +[Pdsh man page](https://linux.die.net/man/1/pdsh) |
| 69 | +[Pdsh vs ClusterShell](https://github.com/cea-hpc/clustershell/wiki/Pdsh) |
| 70 | +[ClusterShell Github](https://github.com/cea-hpc/clustershell) |
| 71 | +[ClusterShell main page](http://cea-hpc.github.io/clustershell/) |
0 commit comments