Table of Contents
- Prerequisites
- What Is LoadLeveler?
- LoadLeveler Overview
- Basic LoadLeveler Tasks
- Building a Job Command File
- Submitting a Job
- Displaying Job Status
- Changing a Job's Priority
- Holding a Job
- Release a Held Job
- Displaying a Machine's Status
- Canceling a Job
- Displaying the Central Manager
- Other LoadLeveler Tasks
- Submitting Multiple Jobs
- Using the Job Command File as the Executable
- Submitting Parallel Jobs - General Notes
- Submitting MPI and MPL Parallel Jobs
- Submitting PVM Parallel Jobs
- Submitting PVMe Parallel Jobs
- LoadLeveler Internals
- How LoadLeveler Schedules Parallel Jobs
- LoadLeveler at the MHPCC
- LoadLeveler Job Command File Keywords Reference
- LoadLeveler Commands Reference
- References, Acknowledgements, WWW Resources
- Exercises
Prerequisites
- This tutorial assumes that the reader is already familiar with the
concepts covered in the following tutorials:
- LoadLeveler is a batch job scheduling application and program product
of IBM.
- Provides the facility for building, submitting and processing batch
jobs within a network of machines
- LoadLeveler scheduling matches job requirements with the best
available machine resources
- In a multi-user production environment, the use of
LoadLeveler is intended to promote overall improved system performance,
turnaround time and equitable resource distribution for all users
- Flexible - permits each machine in the pool to be configured
differently. For example: a machine may be configured to be used for
batch work only during off-work hours.
- Can schedule serial or parallel (PVMe, PVM, MPL, MPI) jobs
- LoadLeveler environment can include all of the following:
- Machines of different architectures:
- IBM RS/6000
- IBM 9076 (SP) Systems
- Sun SPARCstations
- Silicon Graphics IRIS
- HP Apollo 9000 series workstations
- Submit only machines
- Batch only (dedicated) machines
- Interactive batch machines
- Can be configured to schedule jobs for NQS machines outside the
LoadLeveler pool (see LoadLeveler Administration Guide for
details).
- Provides a graphical user interface called
xloadl for job submission and monitoring
- The entire collection of machines available for LoadLeveler
scheduling is called a "pool"
- Every machine in the pool has one or more LoadLeveler daemons running
on it.
- There is one Central Manager machine for the LoadLeveler pool
- Principal function is to coordinate LoadLeveler related activities
on all machines in the pool
- Maintains status information on all machines and jobs - makes
decision on where jobs should be run
- If Central Manager machine goes down
- Job information is not lost.
- Jobs executing on other machines continue to run
- Jobs waiting to run will start when Central Manager is restarted
- May continue to submit jobs from other machines - will be dispatched
when Central Manager is restarted
- A backup Central Manager can be designated - automatically assumes
Central Manager responsibilities if it goes down.
- Normally, users do not need to even know about the Central Manager
- Other machines in the pool may be used to perform one or more of the
following tasks
- Submit jobs
- Execute jobs
- Schedule submitted jobs (in cooperation with Central Manager)
- Every LoadLeveler job must be defined in a job command file
- Only after defining a job command file, may a user submit the job for
scheduling and execution. (Construction of a job command file is
discussed in more detail next)
Basic LoadLeveler Tasks
Displaying Job Status
- The llq command is used to obtain information
about jobs in the LoadLeveler job queue.
- The default is for llq to display information about all jobs in the
queue
- Other options include the ability to display information by userids,
hostnames, queue host, cluster id and process id.
- A detailed (long) list is displayed if the -l
option is specified
- See the llq man page for details
- With xloadl, job status information can be displayed in the "Jobs"
window, with selection and sort options provided by the "Select"
and "Sort" menus
- A sample short (default) list appears below
Id Owner Submitted ST PRI Size Class Running On
------------ ------- ----------- -- --- ---- --------- -----------
fr3n05.14.0 rjz 7/9 09:13 R 50 0.0 large fr7n09
fr7n05.15.0 bmr3 7/9 10:34 R 50 0.0 large fr7n15
fr5n09.13.4 ksmith 7/9 11:05 R 50 0.0 medium fr3n13
fr5n09.13.3 ksmith 7/9 11:05 R 50 0.0 medium fr7n03
fr5n09.13.2 ksmith 7/9 11:05 P 50 0.0 medium
fr5n09.13.0 ksmith 7/9 11:05 I 50 0.0 medium
fr5n09.13.1 ksmith 7/9 11:05 I 50 0.0 medium
fr2n09.33.1 sokel 7/9 12:19 I 50 0.0 bigmem
fr3n11.02.1 salay 7/9 12:23 I 50 0.0 bigmem
fr3n01.02.1 caldwel 7/8 02:49 H 50 0.0 large
10 jobs in queue 4 waiting, 1 pending, 4 running, 1 held.
Basic LoadLeveler Tasks
Changing a Job's Priority
Basic LoadLeveler Tasks
Holding a Job
- Use the llhold command to place a temporary
hold on a job in the queue
- Does not affect running jobs - only jobs in the queue
- Only the LoadLeveler system administrator can put other user jobs
hold.
- For example, to put a hold on job mymachine.23.0:
llhold mymachine.23.0
- See the llhold man page for details
- With xloadl, jobs may be put on a hold by using the "Actions" menu
in the "Jobs" window
Basic LoadLeveler Tasks
Release a Held Job
- Use the llhold command with the
-r option to release a job that is on hold
- Only the LoadLeveler system administrator can release other user jobs
which are on hold.
- For example, to release the hold on job mymachine.23.0:
llhold -r mymachine.23.0
- See the llhold man page for details
- With xloadl, jobs may be released from a hold by using the "Actions"
menu in the "Jobs" window
Basic LoadLeveler Tasks
Displaying a Machine's Status
- Use the llstatus command to display information
about machines in the LoadLeveler pool
- By default, llstatus will display one line of information for each
machine in the pool. An example appears below:
Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys
cws1.mhpcc.edu Down 0 0 Down 0 0.00 22 R6000 AIX41
fr1n05.mhpcc.edu Avail 0 0 Busy 1 0.00 9999 R6000 AIX41
fr1n07.mhpcc.edu Avail 0 0 Idle 0 0.07 9999 R6000 AIX41
fr2n01.mhpcc.edu Avail 0 0 Busy 1 0.00 9999 R6000 AIX41
fr2n05.mhpcc.edu Avail 0 0 Busy 1 1.00 9999 R6000 AIX41
fr9n05.mhpcc.edu Avail 0 0 Idle 0 0.77 9999 R6000 AIX41
fr9n05.mhpcc.edu Avail 0 0 Busy 1 1.01 9999 R6000 AIX41
. . . . . . . . . .
. . . . . . . . . .
fr28n15.mhpcc.edu Avail 0 0 Busy 1 1.01 9999 R6000 AIX41
fr28n16.mhpcc.edu Avail 0 0 Busy 1 1.01 9999 R6000 AIX41
R6000/AIX41 217 machines 43 jobs 198 running
217 machines 43 jobs 198 running
The Central Manager is defined on cws1.class.mhpcc.edu
All machines on the machine_list are present
Basic LoadLeveler Tasks
Canceling a Job
- Use the llcancel command to cancel either
running or queued jobs
- Options available to cancel jobs by userid,hostname, queue host,
cluster id or process id
- Only the LoadLeveler System Administrator can cancel other users' jobs
- For example, to cancel job mymachine.23.0:
llcancel mymachine.23.0
- See the llcancel man page for details
- With xloadl, jobs can be cancelled by using the "Actions" menu from the
"Jobs" window
Basic LoadLeveler Tasks
Displaying the Central Manager
- The name of the Central Manager machine is shown at the end of the
llstatus list.
- With xloadl, the Central Manager can be found by using the "Actions"
menu on the "Machines" window
Other LoadLeveler Tasks
Submitting Multiple Jobs
- A single job command file can be used to submit multiple jobs
- Accomplished by using multiple #@ queue statements
- LoadLeveler statements in effect for the first job are generally
in effect for all subsequent jobs in the same job command file
- Can be useful if the same executable is to be run with different
input and output files
- The LoadLeveler System Administrator can set a limit the number of
jobs any user can run at one time
- This example job command file queues two jobs having different input,
output and error files
#@ executable = longjob
#
#@ input = longjob.in1
#@ output = longjob.out1
#@ error = longjob.err1
#@ queue
#
#@ input = longjob.in2
#@ output = longjob.out2
#@ error = longjob.err2
#@ queue
- This example demonstrates the use of predefined LoadLeveler macros to
generate different output files. Five jobs will be queued, each of
which reads a unique input file and creates unique output and error
files.
#@ executable = longjob
#
#@ input = longjob.in.$(Process)
#@ output = longjob.out.$(Cluster).$(Process)
#@ error = longjob.err.$(Cluster).$(Process)
#@ queue
#@ queue
#@ queue
#@ queue
#@ queue
Other LoadLeveler Tasks
Using the Job Command File as the Executable
Other LoadLeveler Tasks
Submitting Parallel Jobs - General Notes
- The ability to execute parallel jobs with LoadLeveler depends whether
or not your system hardware/software supports parallel executions
and how the LoadLeveler System Administrator has configured your
system.
- #@ job_type statement specifies type of job:
#@ job_type = serial -default
#@ job_type = parallel -MPL, MPI, PVMe
#@ job_type = pvm3 -public domain PVM version 3
- #@ min_processors and
#@ max_processors statements specify number of
machines to use. Should be set equal to each other if job is not
flexible in number of processors it can use.
#@ min_processors = 8
#@ max_processors = 16
- #@ requirements statement with "Adapter"
is used to specify communications interface:
#@ requirements = (Adapter == "hps_user")
-high performance switch in User Space mode
#@ requirements = (Adapter == "hps_ip")
-high performance switch in IP mode
#@ requirements = (Adapter == "ethernet")
-ethernet (default)
#@ requirements = (Adapter == "fddi")
-fiber distributed data interface*
#@ requirements = (Adapter == "tokenring")
-token ring*
#@ requirements = (Adapter == "fcs")
-fiber channel standards*
* = not implemented at the MHPCC
- #@ environment statement is used for MPL and
MPI to specify User Space communications protocol (IP is default).
#@ environment = MP_EUILIB=us
- #@ parallel_path statement is used for PVM.
Specifies where the PVM process
should look for executables when the parallel job spawns tasks.
Absolute pathname should be specified.
#@ parallel_path = /u/jsmith/pvm3/bin/RS6K
- Parallel jobs can not be checkpointed
- You can determine which nodes were used for your parallel execution
in several ways. In your LoadLeveler job command file:
- Output the value of the
LoadLeveler environment variable LOADL_PROCESSOR_LIST to a
file which you can later view interactively. If you do not
specify a file, the output will simply appear in your LoadLeveler
output file. For example:
echo $LOADL_PROCESSOR_LIST > myhosts
- Specify that mail be sent to you. It will automatically include
the nodes used.
#@ notification = complete
#@ notify_user = jsmith@mhpcc.edu
- Set the MP_INFOLEVEL environment variable to a value above 1.
Then examine the file where std.err is specified to be written.
#@ error = myjob.err
#@ environment = MP_INFOLEVEL=2
- Pre-execution setup and post-execution cleanup is possible if
the executing nodes are known. The example below demonstrates
one way of accomplishing both.
#!/bin/csh
#@ initialdir = /u/jsmith/LoadLeveler
#@ error = run1.err
#@ output = run1.out
#@ job_type = parallel
#@ requirements = (Adapter == "hps_user")
#@ environment = MP_EUILIB=us
#@ min_processors = 4
#@ max_processors = 8
#@ class = large
#@ cpu_limit = 12000
#@ queue
#
set mydir = "/u/jsmith/LoadLeveler/"
set infile = "input.1"
set nodes = `echo $LOADL_PROCESSOR_LIST`
# Pre-execution setup
foreach node (${nodes})
rcp ${mydir}${infile} ${node}:/localscratch
echo "copied ${mydir}${infile} to ${node}:/localscratch"
end
run1
# Post-execution cleanup
foreach node (${nodes})
rsh $node "cd /localscratch; rm -f ${infile}"
echo "cleanup on ${node} done"
end
echo "Job completed"
Other LoadLeveler Tasks
Submitting Parallel MPI and MPL Jobs
Other LoadLeveler Tasks
Submitting Parallel PVM Jobs
- LoadLeveler will automatically select the hosts on which PVM
processes will execute. You can not choose your own nodes via
a hostfile.
- Under LoadLeveler release 2.1, PVM communications will be IP over
the high performance switch by default. For earlier releases, the
default is IP over ethernet - substantially slower in most cases.
- For optimum performance, users should make sure that communications
are over the high performance switch, use Direct Routing and
pvm_psend/pvm_precv. It is possible to obtain PVM communications
bandwidth of approx. 9 MB/sec if this is done.
- An example job command file for a PVM parallel execution appears
below.
#!/bin/csh
# Example PVM job command file. Communications will be IP over the
# high performance switch for LoadLeveler release 2.1, and IP over
# ethernet for earlier releases.
#
#@ job_name = array
#@ output = $(job_name).$(cluster).$(process).out
#@ error = $(cluster).$(process).err
#@ notification = complete
#@ notify_user = jsmith@mhpcc.edu
#@ checkpoint = no
#@ class = large
#@ job_type = pvm3
#@ parallel_path = /u/jsmith/LoadLeveler/pvm.jobs
#@ requirements = (Adapter == "hps_ip")
#@ min_processors = 6
#@ max_processors = 6
#@ cpu_limit = 14000
#@ queue
/u/jsmith/LoadLeveler/pvm.jobs/array.master
# Save the PVM execution logs created on each node
# These will appear at the end of the output file
set myid=`id -u`
foreach node (`echo $LOADL_PROCESSOR_LIST`)
echo $node
rsh $node cat /tmp/pvml.$myid
end
Other LoadLeveler Tasks
Submitting Parallel PVMe Jobs
- PVMe is IBM's enhanced PVM product. It is source compatible with
public domain PVM version 3. PVMe is designed to take advantage
of User Space communications over the high performance switch.
- PVMe within LoadLeveler receives its node allocation from the
Job Manager - a host file is not used.
- If your PVMe executables are not in the default location of
$HOME/pvm3/bin/RS6K, you are required to specify where they reside
by setting the PVMEPATH environment variable.
- Your PVMe executable is invoked by calling pvmd3e with the
-exec option. The location of PVMe product
files is typically /usr/lpp/pvm3 on IBM SP systems.
- The following statements are required:
#@ requirements = (Adapter == "hps_user")
#@ job_type = parallel
#@ environment = MP_EUILIB=us
- By default, PVMe will create a file called
pvmnodelist
which contains the names of the nodes it uses for your job's
execution. This file will be located in your initial LoadLeveler
directory.
- To prevent problems you should use the
#@ initialdir statement
to point to your PVMe executables if you do not submit your
LoadLeveler PVMe job from the same directory.
- Please consult the "IBM PVMe User's Guide and Subroutine Reference"
for a full discussion on PVMe.
- An example job command file for a PVMe parallel execution appears
below:
#!/bin/csh
# Example PVMe job command file
#
#@ job_name = array
#@ output = $(job_name).$(cluster).$(process).out
#@ error = $(job_name).$(cluster).$(process).err
#@ initialdir = /u/jsmith/pvme
#@ environment = PVMEPATH=/u/jsmith/pvme
#@ job_type = parallel
#@ notification = complete
#@ notify_user = jsmith@mhpcc.edu
#@ class = medium
#@ min_processors = 8
#@ max_processors = 8
#@ requirements = (Adapter == "hps_user")
#@ environment = MP_EUILIB=us
#@ cpu_limit = 14000
#@ queue
/usr/lpp/pvm3/pvmd3e -exec /u/jsmith/pvme/array.master
- Much of LoadLeveler's behavior is controlled by
installation and configuration options set by the LoadLeveler System
Administrator.
- LoadLeveler configuration files are in the userid loadlhome
directory.
- The LoadL_admin file contains:
- User stanzas - define characteristics and limits at a per user level
- Class stanzas - define class parameters including resource limits,
permissions,
- Machine stanzas - define machine characteristics and if the machine
is the Central Manager.
- The LoadL_config file contains global configuration information
common to all machines across the pool, such as:
- Job management policies
- Which daemons to run on each machine
- Which job classes are defined per machine
- Job limits
- Machine architectures
- Job accounting parameters
- Paths and environment variables
- Time information
- LoadLeveler macros
- Other parameters too numerous to list here
- Local configuration files can be created to tailor requirements for
individual machines in the pool. Overide global configuration file.
- LoadLeveler processes jobs and monitors the workload by running the
following daemons (note that not all daemons run on all machines):
- LoadL_master
- The master daemon - manages all daemons on its resident machine
- LoadL_schedd
- The schedd daemon - manages batch submissions on its resident machine
- LoadL_startd
- The startd - accepts jobs for dispatch on its resident machine
- LoadL_starter
- The starter process - spawned by startd to manage a running job on the
server machine
- LoadL_shadow
- The shadow process - spawned by the schedd to communicate with the
starter process to run a job on the server machine
- LoadL_kbdd
- The keyboard daemon - monitors keyboard and mouse activity (AIX only)
- LoadL_collector
- The collector daemon - central collector of machine status from all
machines in the pool
- LoadL_negotiator
- The negotiator daemon - the central scheduler and collector of job
status from all machines in the pool
- A typical (and slightly simplified) LoadLeveler job cycle might look
like this:
- User builds job command file
- User submits job command file to LoadLeveler from local machine.
(picture 1)
- A LoadLeveler schedd daemon on the local machine
communicates the job request to the negotiatordaemon on
the Central Manager.
- The Central Manager determines when and where the job may run.
Since the Central Manager knows about every machine in the
pool, it can match job requirements with machine resources.
After reaching this decision, the Central Manager sends a
"permit to run" back to the schedddaemon. The Central
Manager marks the job as "pending".
- The originating schedddaemon spawns a shadowprocess
to handle the job.
(picture 2)
- The shadowprocess contacts the startddaemon on the
target machine where the job is assigned to run. It also
establishes the necessary communication ports with this daemon.
- The startddaemon on the target machine spawns a
starterprocess which then receives the job information and
executable from the shadowprocess. The starter
process is responsible for running the user
executable on the target machine. At this point, the shadow
communicates to the Central Manager that the job has been started
and its status is changed to "running".
(picture 3)
- After job completion, the starterprocess returns the exit
status to the shadow, which then generates mail (optional),
forwards the exit status to the schedddaemon and exits
itself.
(picture 4)
- The schedddaemon finally notifies the Central Manager that
the job has completed.
- When you submit a job, you are required to specify the number of
nodes needed by using the #@ min_processorsand
#@ max_processorskeywords.
- LoadLeveler will then use dynamic node allocation to assign nodes to
your parallel job. You can not choose which nodes your job will
run on by using a hostfile of any variety.
- The job will run immediately if all of the following conditions
are met:
- there are no other jobs queued at a higher system priority;
- you do not already have running the maximum number of jobs
permitted to run simultaneously;
- there are enough nodes available - at least min_processors.
- If there is an insufficient number of nodes (such as when most nodes
have been assigned to other jobs), the job is given a set amount of
time to allocate its nodes.
- As the job allocates its nodes, no other jobs may use the
nodes. They will appear to be "Idle" in the llstatus
display, however, they are actually "allocating".
- If enough nodes are accumulated within the time allowed, the job
will run.
- If the job fails to accumulate enough nodes, it is deferred
(made inactive) for a set time interval. At the end of that time,
it will attempt to accumulate nodes again. The job maintains
its system priority (place in the queue) during the deferral
period.
- The allocation / deferral parameters are site dependent and
controlled by the LoadLeveler system administrator.
- Users may submit as many jobs as they like, however this will have
no effect on when their jobs will be run relative to other users if
they exceed the maximum number of jobs permitted to run
simultaneously. (Note: at the MHPCC this limit is set to 2).
There are a number of important details which you'll need to consider when
using LoadLeveler at the MHPCC. These are discussed below.
- Users have access to IBM's LoadLeveler manuals by using
InfoExplorer.
- Modify your path variable: Include the LoadLeveler executables location
in your path. For example, if your login shell is the C Shell, add the
following line to your .cshrc file and then "source .cshrc" to make it
take effect.
set path=($path /usr/lpp/LoadL/nfs/bin)
- LoadLeveler Classes: Choose the appropriate LoadLeveler class based
upon your job requirements. The MHPCC SP2
Configuration Summary Table includes the valid MHPCC LoadLeveler
classes. Note that:
- Classes are subject to change.
- Class names are case sensitive.
- Classes differ in time limits, number of nodes and types of nodes
- The MHPCC utility "llclasses" may be used to display a summary of
used and available nodes for all classes.
- The "llsubmit" command includes a front end screen which validates
command files for a number of possible errors. Job commands file which
are rejected will usually be accompanied by an error message stating
why they were rejected.
- Job Scheduling: The MHPCC has implemented a "fair share" scheduler which
constantly monitors relative usage between Department of Defense (DoD) and
non-DoD users. The goal is to provide 50% of available cycles to each.
As a result of this, some jobs may "jump ahead" of others in the queues.
- Backfilling: The MHPCC scheduler is able to take advantage of idle cycles
caused by jobs which are "allocating" nodes. Jobs which can run before
an "allocating" job obtains all of its nodes will "jump ahead" in the
queue. The scheduler uses the cpu_limit
keyword to accomplish this. Recommendation: specify the minimum
amount of CPU time that your job requires.
- cpu_limit: CPU time limits are
in effect for each class. The keyword is required for a command script to
be accepted by llsubmit at the MHPCC. This works to the user's advantage
by allowing for a backfill capability.
- max_processors and
min_processors:
Minimum and Maximum processor limits are in effect for each class.
Command scripts that do not specify valid max_processors or min_processors
values will be rejected.
- Selecting nodes by memory size: Nodes vary in their memory
configurations. To select nodes based upon their memory configuration,
follow the examples below (see following note about using large memory
also):
#@ requirements = (Memory >= 64)
#@ requirements = (Memory >= 128)
#@ requirements = (Memory == 256)
#@ requirements = (Memory == 1024)
- Using large memory: AIX compilers have a maximum default of 256 MB for
program data and stack size. To obtain more than this, you must do the
following:
- Compile your program with the -bmaxdata: or
-bmaxstack:option. For example, the following compile
command will allocate 512 MB data segment:
xlf -bmaxdata:512000000 -o myprog mprog.f
See the xlf man page for details about the -bmaxdata: and
-bmaxstack: options.
- Make sure that the memory limits set in your shell are not too small.
To do this, simply put the unlimit command in your
.cshrc shell.
- In your LoadLeveler job command file, be sure that you request nodes
with adequate memory. For example:
#@ Requirements = (Memory == 1024)
- Selecting wide/thin nodes: You can specify that either thin or wide
nodes must be used: For example:
#@ Requirements = (Feature == "Wide")
#@ Requirements = (Feature == "Thin")
- Examples. A few "getting started" examples are available in the
LoadLeveler Exercise.
Be sure to modify the job command files for your own use before attempting
to run them in LoadLeveler. In particular:
- Use your actual userid and directories
- Use your actual email address
- Choose an appropriate class
- Enable the cpu_limit statement and modify it for whichever class you
actually choose
- Number of jobs queued: Be reasonable about the number of jobs you
queue up - no more than
20 please. The MHPCC configuration will permit you to submit as many
jobs as you wish, however, no more than two will be permitted to
run (system wide) at any time.
- Pathnames: When specifying your home directory in a job command file,
do not use pathnames that begin with "/a" such as /a/raid2fr2sw/u8/jsmith.
Instead, use something like /u/jsmith. Use of the "/a" paths will cause
the command file to fail.
- If you plan on using LoadLeveler's GUI, xloadl, add the xloadl
X resource specifications to your .Xdefaults file. This step is
optional, but if you're using xloadl, it makes it look a lot nicer.
- Copy the
xloadl X resource specification file to your own directory:
cp $WORKSHOP/samples/loadl/Xdefaults.xloadl .
- Edit your .Xdefaults file to include the Xdefaults.xloadl file.
- Make sure your DISPLAY variable and xhost permissions are set
correctly.
- Start xloadl with the command "xloadl &"
Other helpful hints
- For MPL and MPI jobs, you will need to specify both of the following
in order to insure running in User Space mode:
#@ requirements = (Adapter == "hps_user")
#@ environment = MP_EUILIB=us
- If you have multiple #@ environment statements, only the last will
have effect. If you need to specify multiple environment variables,
separate them by semi-colons with a single #@ environment statment. For
example:
#@ environment = MP_EUILIB=us;MP_INFOLEVEL=3;MP_LABELIO=yes
- Be careful about setting MP_EUILIB in your .cshrc or .profile file.
This will override whatever you specify in your LoadLeveler command
script. If your communications throughput does not come close to
the figures below, it is probably because of this.
- Communications throughput - rough estimates. You should expect
something close to these for jobs with message sizes greater
than 500,000 bytes in length.
- IP over ethernet = < 1 MB/sec
- IP over switch = 9-12 MB/sec
- US over switch = 30-34 MB/sec
- LoadLeveler will "source" both your .cshrc and .login files. C Shell
users may wish to exclude LoadLeveler from running interactive only
commands in these files by doing something like:
if ($?prompt) then
setenv TERM vt100
set filec
set prompt = "`hostname -s`% "
setenv MP_EUILIB us
:
:
endif
- Do not use the #@executable statement if you are running parallel jobs.
Parallel jobs use the job command file as the executable.
- Use your FULL email address if you specify "notify_user" in your
command script.
- Do not try to use LoadLeveler macros, such as $(job_name), $(cluster)
or $(process) as script variables. They will not be recognized by the
shell.
The following commands permit you to perform LoadLeveler related activities.
Each is linked to its LoadLeveler man page.
- llstatus
- Shows you information about the machines in the LoadLeveler pool
- llsubmit
- Submits a LoadLeveler job command file for scheduling and execution
- llq
- Gets information about jobs
- llcancel
- Cancels a job
- llhold
- Holds/releases a job
- llprio
- Changes the priority of a job
- xloadl
- Invokes LoadLeveler's graphical user interface
Additional Information on the WWW
References and Acknowledgements
- "IBM LoadLeveler User's Guide" IBM Corporation
- "IBM LoadLeveler Administration and Planning Guide" IBM Corporation
© Copyright 1995,1996
Maui High Performance Computing Center. All rights reserved.
Documents located on the Maui High Performance Computing Center's WWW server
are copyrighted by the MHPCC. Educational institutions are encouraged to
reproduce and distribute these materials for educational use as long as
credit and notification are provided. Please retain this copyright notice
and include this statement with any copies that you make. Also, the MHPCC
requests that you send notification of their use to help@mail.mhpcc.edu.
Commercial use of these materials is prohibited without prior written
permission.
Revised: 12 December 1996 blaise@mhpcc.edu