Clustercomputing with mpirun and torque
Some time ago I wrote about Clustercomputing with Torque, with a focus on discrete event simulations (using the method of independent replications). Turns out that method was not really efficient, so we try something else.
Have a look at this tutorial by the Bioinformatics department of the University of Idaho, they describe how to schedule many independent jobs using pbsdsh.
An alternative is to use mpirun with torque for this:
Your pbs:
#PBS -N runallmyjobs #PBS -l vmem=20Gb,pmem=20Gb,mem=20Gb,nodes=4,walltime=336:00:00 #PBS -W x=NACCESSPOLICY:SINGLEJOB #PBS -u skidder mpirun -np 4 /home/skidder/taskswitcher.sh
And then taskswitcher.sh:
#!/bin/bash case $PBS_VNODENUM in "0" ) echo "run on node null hostname = "`hostname` ;; "1" ) echo "run on node one hostname = "`hostname` ;; "2" ) echo "run on node two hostname = "`hostname` ;; "3" ) echo "run on node three hostname = "`hostname` ;; esac
In runallmyjobs.o[jobID] you will now see the output of the above file.
To use this in an elegant way we want to minimise the number of files we need to edit. So we make the taskswitcher generic:
#!/bin/bash EXECUTE=`cat $PBS_O_WORKDIR/configs/config.$PBS_VNODENUM` $EXECUTE
And in our pbs file we build configfiles:
... # must agree with total number of runs in scenario, and cleanly divisible: NUMRUNS=80 NUMNODES=4 cd $PBS_O_WORKDIR # Create the configuration files for i in `seq 0 3`; do BEGIN=$[ $i*($NUMRUNS/$NUMNODES)] END=$[ ($i+1)*($NUMRUNS/$NUMNODES)-1] RUNS=$BEGIN..$END echo /home/skidder/omnetpp-4.1/bin/opp_runall -j8 ./beacon_sim -u Cmdenv -c CSMAtest-HT -r $RUNS > configs/config.$i done ... mpirun -np 4 /home/dacs/eenennaa/torque/taskswitcher.sh
What this does: split all runs in the configuration (found in the omnetpp.ini file for scenario CSMAtest-HT) and divide them across the number of nodes (four in this case). So now every node gets 20 runs. The command opp_runall -j8 will run 8 of them in parallel (assuming dual quad-core Xeons per node). In fact, we run a batch scheduler within a batch scheduler here.
Scaling things up
Now, running 32 jobs in parallel is not fast enough if we have a heap of runs to do (say 2500). So configure to use more nodes:
- change “nodes=” to the value you want to use (e.g. “nodes=50”)
- change “NUMNODES” to the same value (if you know a more elegant way to get the same results by setting only one var, let me know)
- change “NUMRUNS” to the number of runs (e.g. 2500)
- change the value of “-np” to the number of nodes you want to use (obviously limited by the number of nodes in your cluster :).
That’s it, qsub and you are on your way.