#!/bin/csh set dir=/scratch/yqg_20 set cmd=/gpfs0/home/staff/youxing/bin/run_RDCmain_nosec_10_SCOP40_1.5_cluster.pl set protein=1J6TA_NHAB mkdir $dir cd $dir `$cmd $protein` rm -rf $dir****************end of sample job file*******************
lpr -o sides=two-sided-long-edge FileToPrint.psExcept for final reports, please try to print as much as possible using double-side printing. Idem, do not print in color except if it's really needed. This not only saves money, but also paper and ink...
First, all compilation and queuing should be run on mgt2, otherwise known as decoder.cc.uga.edu. If you want to compile a ppc application, you can ssh out to pnode001.
Second, both architecture are now viewable in the same queue. So there are now about 90 new computers that you will have access to. The only catch is that they use a different type of CPU, so programs will need to be recompiled. If your entire job is perl code or java, you don't need to recompile.
You can take advantage of both systems by using the $ARC
variable. On x86 machine, $ARC is set to glinux, on PPC
machines, it is set to ppc64linux.
In your queue submission script, simply call a program with a path that looks
like: $HOME/$ARC/bin/blast.
On the x86 machine the program
/home/you/glinux/bin/blast will be run, while on
the PPC machines, /home/you/ppc64linux/bin/blast
will be run. We will try to compile all commonly used applications
in this manner. Ask us if you have any trouble. We should have all
the blast tools up and running, so just use the line
/gpfs1/$ARC/ncbi/bin/blastpgp , and you'll get
blastpgp on any architecture.
If you don't want to bother with the dual architecture thing, you can request
your jobs be run on only one type of machine.
You can request a specific type of machine with the flag -l arch=..., so you
can put in your script, the line
#$ -l arch=glinux
and your program will only run on x86 machines.
Please keep in mind this is very important, if you don't follow the rules I list below, you will possibly crash our cluster. When you are submitting many jobs (say more than 100), add the two lines below in your job script to avoid screen output, which will crash our cluster if there are too many.
#$ -o /dev/null
#$ -e /dev/null
We have also been trying to clean up the parallel environments. There are two essential types of parallel environments: lammpi and bare. The lammpi environment sets up the lammpi deamons so all you have to do is call mpirun. The bare environment simply allocates you the machines, and expects you to set up a communication environment. This is good for programs that use their own parallel communication environments, such as NAMD.
In addition, there are different types of parallel environments that you can access by their 'geographic' location. The blade centers are comprised of 14 machine on the same switch, which means that the intercommunication between those machines has less latency. This is better for parallel jobs that require a lot of low latency communication, such as molecular dynamics. If you have a job that requires only occasional communication between parallel nodes, then you don't need to bother with this.
The parallel enviroments are:
lammpi (all nodes)
lammpi-rsa1 to lammpi-rsa4 (sets of 12 machines from the x335s)
lammpi-mm1 to lammpi-mm12 (sets of 14 machines from the Blades)
bare
bare-rsa1 to bare-rsa4
bare-mm1 to bare-mm12
So, if you want a large, distributed parallel environment, you would put something
like:
#$ -pe lammpi 50-150
You can combine this with the architecture selection:
#$ -l arch=glinux
Or if you want to setup a PPC NAMD run, all machines connected by the same
switch:
#$ -pe bare-* 14-28
#$ -l arch=ppc64linux
Notice how the '*' is a wildcard, and will select what is on the blade
chassis.
For LAM/MPI jobs, you'll need to remember to set the
LAM_MPI_SOCKET_SUFFIX number. This is used to differentiate multiple
MPI sets that you might have running on the same machine. This can
be done by adding the line:
in BASH:
export LAM_MPI_SOCKET_SUFFIX=$JOB_ID
in TCSH:
setenv LAM_MPI_SOCKET_SUFFIX $JOB_ID
Also, remember, Grid Engine ignores the "#!/bin/bash" line
at the beginning of the script, and use as defaults TCSH. In order
to use BASH, add the line:
#$ -S /bin/bash
The LAM boot up process requires that you have your SSH public keys
enabled (for password-less login between nodes). You can do this by:
cd ~/.ssh
ssh-keygen -t dsa (when it asks for the password, type nothing
in, just press return)
cp id_dsa.pub authorized_keys
Then you need to make sure all the host certifications are current, I keep a
copy of this in /gpfs1/tools, so type
cp /gpfs0/tools/known_hosts ~/.ssh/.