condor_submit

Queue jobs for execution on remote machines

Synopsis

condor_submit [-] [-v] [-n schedd_name] [-r schedd_name] [-d] submit-description file

   

Description

condor_submit is the program for submitting jobs to Condor. condor_submit requires a submit-description file which contains commands to direct the queuing of jobs. One description file may contain specifications for the queuing of many condor jobs at once. All jobs queued by a single invocation of condor_submit must share the same executable, and are referred to as a ``job cluster''. It is advantageous to submit multiple jobs as a single cluster because:

SUBMIT DESCRIPTION FILE COMMANDS

Each condor job description file describes one cluster of jobs to be placed in the condor execution pool. All jobs in a cluster must share the same executable, but they may have different input and output files, and different program arguments, etc. The submit-description file is then used as the only command-line argument to condor_submit.

The submit-description file must contain one executable command and at least one queue command. All of the other commands have default actions.

The commands which can appear in the submit-description file are:

executable = <name>
The name of the executable file for this job cluster. Only one executable command may be present in a description file. If submitting into the Standard Universe, which is the default, then the named executable must have been re-linked with the Condor libraries (such as via the condor_compile command). If submitting into the Vanilla Universe, then the named executable need not be re-linked and can be any process which can run in the background (shell scripts work fine as well).

input = <pathname>
Condor assumes that its jobs are long-running, and that the user will not wait at the terminal for their completion. Because of this, the standard files which normally access the terminal, (stdin, stdout, and stderr), must refer to files. Thus, the filename specified with input should contain any keyboard input the program requires (i.e. this file becomes stdin). If not specified, the default value of /dev/null is used.

output = <pathname>
The output filename will capture any information the program would normally write to the screen (i.e. this file becomes stdout). If not specified, the default value of /dev/null is used. More than one job should not use the same output file, since this will cause one job to overwrite the output of another.

error = <pathname>
The error filename will capture any error messages the program would normally write to the screen (i.e. this file becomes stderr). If not specified, the default value of /dev/null is used. More than one job should not use the same error file, since this will cause one job to overwrite the errors of another.

arguments = <argument_list>
List of arguments to be supplied to the program on the command line.

initialdir = <directory-path>
Used to specify the current working directory for the Condor job. Should be a path to a preexisting directory. If not specified, condor_submit will automatically insert the user's current working directory at the time condor_submit was run as the value for initialdir.

requirements = <ClassAd Boolean Expression>
The requirements command is a boolean ClassAd expression which uses C-like operators. In order for any job in this cluster to run on a given machine, this requirements expression must evaluate to true on the given machine. For example, to require that whatever machine executes your program has a least 64 Meg of RAM and has a MIPS performance rating greater than 45, use:
        requirements = Memory >= 64 && Mips > 45
Only one requirements command may be present in a description file. By default, condor_submit appends the following clauses to the requirements expression:
1.
Arch and OpSys are set equal to the Arch and OpSys of the submit machine. In other words: unless you request otherwise, Condor will give your job machines with the same architecture and operating system version as the machine running condor_submit.
2.
Disk > ExecutableSize. To ensure there is enough disk space on the target machine for Condor to copy over your executable.
3.
VirtualMemory >= ImageSize. To ensure the target machine has enough virtual memory to run your job.
4.
If Universe is set to Vanilla, FileSystemDomain is set equal to the submit machine's FileSystemDomain.
You can view the requirements of a job which has already been submitted (along with everything else about the job ClassAd) with the command condor_q -l; see the command reference for condor_q on page [*]. Also, see the Condor Users Manual for complete information on the syntax and available attributes that can be used in the ClassAd expression.

rank = <ClassAd Float Expression>
A ClassAd Floating-Point expression that states how to rank machines which have already met the requirements expression. Essentially, rank expresses preference. A higher numeric value equals better rank. Condor will give the job the machine with the highest rank. For example,
        requirements = Memory > 60
        rank = Memory
asks Condor to find all available machines with more than 60 megabytes of memory and give the job the one with the most amount of memory. See the Condor Users Manual for complete information on the syntax and available attributes that can be used in the ClassAd expression.

on_exit_remove = <ClassAd Boolean Expression>
This expression is checked when the job exits and if true, then it allows the job to leave the queue normally. If false, then the job is placed back into the Idle state. If the user job is a vanilla job then it restarts from the beginning. If the user job is a standard job, then it restarts from the last checkpoint.

For example: Suppose you have a job that occasionally segfaults but you know if you run it again on the same data, chances are it will finish successfully. This is how you would represent that with on_exit_remove(assuming the signal identifier for segmentation fault is 4):

	on_exit_remove = (ExitBySignal == True) && (ExitSignal != 4)

The above expression will not let the job exit if it exited by a signal and that signal number was 4(representing segmentaion fault). In any other case of the job exiting, it will leave the queue as it normally would have done.

If left unspecified, this will default to True.

periodic_* expressions(defined elsewhere in this man page) take precedent over on_exit_* expressions and a *_hold expression takes precedent over a *_remove expression.

This expression is available only under UNIX and only for the standard and vanilla universes.

on_exit_hold = <ClassAd Boolean Expression>
This expression is checked when the job exits and if true, places the job on hold. If false then nothing happens and the on_exit_remove expression is checked to determine if that needs to be applied.

For example: Suppose you have a job that you know will run for an hour minimum. If the job exits after less than an hour, you would like it to be placed on hold and notified by e-mail instead of being allowed to leave the queue.

	on_exit_hold = (ServerStartTime - JobStartDate) < 3600

The above expression will place the job on hold if it exits for any reason before running for an hour. An e-mail will be sent to the user explaining that the job was placed on hold because this expression became true.

periodic_* expressions(defined elsewhere in this man page) take precedent over on_exit_* expressions and any *_hold expression takes precedent over a *_remove expression.

If left unspecified, this will default to False.

This expression is available only under UNIX and only for the standard and vanilla universes.

periodic_remove = <ClassAd Boolean Expression>
This expression is checked every 20 seconds(non-configurable, but might be in future) and if it becomes true, the job will leave the queue. periodic_remove takes precedent over on_exit_remove if the two describe conflicting states.

For example: Suppose you would like your job removed if the total suspension time of the job is more than half of the run time of the job.

	periodic_remove = CumulativeSuspensionTime > (RemoteWallClockTime / 2.0)

The above expression will remove the job once the conditions have become true.

Notice:
\fbox{\parbox[t]{\textwidth}{Currently, this option will force a \lq\lq terminate'' e...
...ully
and a job where the \texttt{periodic\_remove} expression had become true.}}

periodic_* expressions(defined elsewhere in this man page) take precedent over on_exit_* expressions and any *_hold expression takes precedent over a *_remove expression.

If left unspecified, this will default to False.

This expression is available only under UNIX and only for the standard and vanilla universes.

periodic_hold = <ClassAd Boolean Expression>
This expression is checked every 20 seconds(non-configurable, but might be in future) and if it becomes true, the job will be placed on hold.

For example: Suppose you would like your job held if the total suspension time of the job is more than half of the total run time of the job.

	periodic_hold = CumulativeSuspensionTime > (RemoteWallClockTime / 2.0)

The above expression will place the job on hold if it suspends longer than half the amount of time it has totally run. An e-mail will be sent to the user explaining that the job was placed on hold because this expression became true.

If left unspecified, this will default to False.

periodic_* expressions(defined elsewhere in this man page) take precedent over on_exit_* expressions and any *_hold expression takes precedent over a *_remove expression.

This expression is available only under UNIX and only for the standard and vanilla universes.

priority = <priority>
Condor job priorities range from -20 to +20, with 0 being the default. Jobs with higher numerical priority will run before jobs with lower numerical priority. Note that this priority is on a per user basis; setting the priority will determine the order in which your own jobs are executed, but will have no effect on whether or not your jobs will run ahead of another user's jobs.

notification = <when>
  Owners of condor jobs are notified by email when certain events occur. If when is set to Always, the owner will be notified whenever the job is checkpointed, and when it completes. If when is set to Complete (the default), the owner will be notified when the job terminates. If when is set to Error, the owner will only be notified if the job terminates abnormally. If when is set to Never, the owner will not be mailed, regardless what happens to the job. The statistics included in the email are documented in section 2.6.5 on page [*].

notify_user = <email-address>
  Used to specify the email address to use when Condor sends email about a job. If not specified, Condor will default to using :
        job-owner@UID_DOMAIN
where UID_DOMAIN     is specified by the Condor site administrator. If UID_DOMAIN     has not been specified, Condor will send the email to :
        job-owner@submit-machine-name

copy_to_spool = <True | False>
If copy_to_spool is set to True, then condor_submit will copy the executable to the local spool directory before running it on a remote host. Oftentimes this can be quite time consuming and unnecessary. By setting it to False, condor_submit will skip this step. Defaults to True.

getenv = <True | False>
If getenv is set to True, then condor_submit will copy all of the user's current shell environment variables at the time of job submission into the job ClassAd. The job will therefore execute with the same set of environment variables that the user had at submit time. Defaults to False. You must be careful when using this feature, since the maximum allowed size of the environment in Condor is 10240 characters. If your environment is larger than that, Condor will not allow you to submit your job, and you will have to use the ``Environment'' setting described below, instead.

hold = <True | False>
If hold is set to True, then the job will be submitted in the hold state. Jobs in the hold state will not run until released by condor_release.

environment = <parameter_list>
List of environment variables of the form :
        <parameter> = <value>
Multiple environment variables can be specified by separating them with a semicolon (`` ; ''). These environment variables will be placed into the job's environment before execution. The length of all characters specified in the environment is currently limited to 10240 characters.

log = <pathname>
Use log to specify a filename where Condor will write a log file of what is happening with this job cluster. For example, Condor will log into this file when and where the job begins running, when the job is checkpointed and/or migrated, when the job completes, etc. Most users find specifying a log file to be very handy; its use is recommended. If no log entry is specified, Condor does not create a log for this cluster.

universe = <vanilla | standard | pvm | scheduler | globus | mpi>
Specifies which Condor Universe to use when running this job. The Condor Universe specifies a Condor execution environment. The standard Universe is the default, and tells Condor that this job has been re-linked via condor_compile with the Condor libraries and therefore supports checkpointing and remote system calls. The vanilla Universe is an execution environment for jobs which have not been linked with the Condor libraries. Note: use the vanilla Universe to submit shell scripts to Condor. The pvm Universe is for a parallel job written with PVM 3.4. The scheduler is for a job that should act as a metascheduler. The globus universe uses the Globus GRAM API to contact the Globus resource specifed and requests it run the job. The mpi universe is for running mpi jobs made with the MPICH package. See the Condor User's Manual for more information about using Universe.

image_size = <size>
This command tells Condor the maximum virtual image size to which you believe your program will grow during its execution. Condor will then execute your job only on machines which have enough resources, (such as virtual memory), to support executing your job. If you do not specify the image size of your job in the description file, Condor will automatically make a (reasonably accurate) estimate about its size and adjust this estimate as your program runs. If the image size of your job is underestimated, it may crash due to inability to acquire more address space, e.g. malloc() fails. If the image size is overestimated, Condor may have difficulty finding machines which have the required resources. size must be in kbytes, e.g. for an image size of 8 megabytes, use a size of 8000.

machine_count = <min..max> | <max>
If machine_count is specified, Condor will not start the job until it can simultaneously supply the job with min machines. Condor will continue to try to provide up to max machines, but will not delay starting of the job to do so. If the job is started with fewer than max machines, the job will be notified via a usual PvmHostAdd notification as additional hosts come on line. Important: only use machine_count if an only if submitting into the PVM or MPI Universes. Use min..max for the PVM universe, and just max for the MPI universe.

coresize = <size>
Should the user's program abort and produce a core file, coresize specifies the maximum size in bytes of the core file which the user wishes to keep. If coresize is not specified in the command file, the system's user resource limit ``coredumpsize'' is used (except on HP-UX).

nice_user = <True | False>
 Normally, when a machine becomes available to Condor, Condor decides which job to run based upon user and job priorities. Setting nice_user equal to True tells Condor not to use your regular user priority, but that this job should have last priority amongst all users and all jobs. So jobs submitted in this fashion run only on machines which no other non-nice_user job wants -- a true ``bottom-feeder'' job! This is very handy if a user has some jobs they wish to run, but do not wish to use resources that could instead be used to run other people's Condor jobs. Jobs submitted in this fashion have ``nice-user.'' pre-appended in front of the owner name when viewed from condor_q or condor_userprio. The default value is False.

kill_sig = <signal-number>
When Condor needs to kick a job off of a machine, it will send the job the signal specified by signal-number. signal-number needs to be an integer which represents a valid signal on the execution machine. For jobs submitted to the Standard Universe, the default value is the number for SIGTSTP which tells the Condor libraries to initiate a checkpoint of the process. For jobs submitted to the Vanilla Universe, the default is SIGTERM which is the standard way to terminate a program in UNIX.

compress_files = file1, file2, ...

If your job attempts to access any of the files mentioned in this list, Condor will automatically compress them (if writing) or decompress them (if reading). The compress format is the same as used by GNU gzip.

The files given in this list may be simple filenames or complete paths and may include * as a wildcard. For example, this list causes the file /tmp/data.gz, any file named event.gz, and any file ending in .gzip to be automatically compressed or decompressed as needed:

compress_files = /tmp/data.gz, event.gz, *.gzip

Due to the nature of the compression format, compressed files must only be accessed sequentially. Random access reading is allowed but is very slow, while random access writing is simply not possible. This restriction may be avoided by using both compress_files and fetch_files at the same time. When this is done, a file is kept in the decompressed state at the execution machine, but is compressed for transfer to its original location.

This option only applies to standard-universe jobs.

fetch_files = file1, file2, ...

If your job attempts to access a file mentioned in this list, Condor will automatically copy the whole file to the executing machine, where it can be accessed quickly. When your job closes the file, it will be copied back to its original location. This list uses the same syntax as compress_files, shown above.

This option only applies to standard-universe jobs.

append_files = file1, file2, ...

If your job attempts to access a file mentioned in this list, Condor will force all writes to that file to be appended to the end. Furthermore, condor_submit will not truncate it. This list uses the same syntax as compress_files, shown above.

This option may yield some surprising results. If several jobs attempt to write to the same file, their output may be intermixed. If a job is evicted from one or more machines during the course of its lifetime, such an output file might contain several copies of the results. This option should be only be used when you wish a certain file to be treated as a running log instead of a precise result.

This option only applies to standard-universe jobs.

local_files = file1, file2, ...

If your job attempts to access a file mentioned in this list, Condor will cause it to be read or written at the execution machine. This is most useful for temporary files not used for input or output. This list uses the same syntax as compress_files, shown above.

local_files = /tmp/*

This option only applies to standard-universe jobs.

file_remaps = < `` name = newname ; name2 = newname2 ... ''>

Directs Condor to use a new filename in place of an old one. name describes a filename that your job may attempt to open, and newname describes the filename it should be replaced with. newname may include an optional leading access specifier, local: or remote:. If left unspecified, the default access specifier is remote:. Multiple remaps can be specified by separating each with a semicolon.

This option only applies to standard-universe jobs.

If you wish to remap file names that contain equals signs or semicolons, these special chracaters may be escaped with a backslash.

This option only applies to standard-universe jobs.

Example One:
Suppose that your job reads a file named dataset.1. To instruct Condor to force your job to read other.dataset instead, add this to the submit file:
file_remaps = "dataset.1=other.dataset"
Example Two:
Suppose that your run many jobs which all read in the same large file, called very.big. If this file can be found in the same place on a local disk in every machine in the pool, (say /bigdisk/bigfile,) you can instruct Condor of this fact by remapping very.big to /bigdisk/bigfile and specifying that the file is to be read locally, which will be much faster than reading over the network.
file_remaps = "very.big = local:/bigdisk/bigfile"
Example Three:
Several remaps can be applied at once by separating each with a semicolon.
file_remaps = "very.big = local:/bigdisk/bigfile ; dataset.1 = other.dataset"

buffer_files = < `` name = (size,block-size) ; name2 = (size,block-size) ... '' >
buffer_size = <bytes-in-buffer>
buffer_block_size = <bytes-in-block>
Condor keeps a buffer of recently-used data for each file a job accesses. This buffer is used both to cache commonly-used data and to consolidate small reads and writes into larger operations that get better throughput. The default settings should produce reasonable results for most programs.

These options only apply to standard-universe jobs.

If needed, you may set the buffer controls individually for each file using the buffer_files option. For example, to set the buffer size to 1MB and the block size to 256KB for the file 'input.data', use this command:

buffer_files = "input.data=(1000000,256000)"

Alternatively, you may use these two options to set the default sizes for all files used by your job:

buffer_size = 1000000
buffer_block_size = 256000

If you do not set these, Condor will use the values given by these two config file macros:

DEFAULT_IO_BUFFER_SIZE = 1000000
DEFAULT_IO_BUFFER_BLOCK_SIZE = 256000

Finally, if no other settings are present, Condor will use a buffer of 512KB and a block size of 32KB.

rendezvousdir = <directory-path>
Used to specify the shared-filesystem directory to be used for filesystem authentication when submitting to a remote scheduler. Should be a path to a preexisting directory.

x509directory = <directory-path>
Used to specify the directory which contains the certificate, private key, and trusted certificate directory for GSS authentication. If this attribute is set, the environment variables X509_USER_KEY, X509_USER_CERT, and X509_CERT_DIR are exported with default values. See section 3.9 for more info.

x509userproxy = <full-pathname>
Used to override the default pathname for X509 user certificates. The default location for X509 proxies is the /tmp directory, which is generally a local filesystem. Setting this value would allow Condor to access the proxy in a shared filesystem (e.g., AFS). Condor will use the proxy specified in the submit file first. If nothing is specified in the submit file, it will use the environment variable X509_USER_CERT. If that variable is not present, it will search in the default location. See section 3.9 for more info.

globusscheduler = <scheduler-name>
Used to specify the Globus resource to which the job should be submitted. More than one scheduler can be submitted to, simply place a queue command after each instance of globusscheduler. Each instance should be a valid Globus scheduler, using either the full Globus contact string or the host/scheduler format shown below:
Example:
To submit to the LSF scheduler of the Globus gatekeeper on lego at Boston University:
...
GlobusScheduler = lego.bu.edu/jobmanager-lsf
queue

globus_rsl = <RSL-string>
Used to provide any additional Globus RSL string attributes which are not covered by regular submit file parameters.

transfer_executable = <True | False>
If transfer_executable is set to false, then Condor look for the executable on the remote machine, and not transfer it over. This is useful if you have already pre-staged your executable, and wish to have Condor behave more like rsh. Defaults to True. This option is only used in the Globus universe.

+<attribute> = <value>
A line which begins with a '+' (plus) character instructs condor_submit to simply insert the following attribute into the job ClasssAd with the given value.

queue [number-of-procs
] Places one or more copies of the job into the Condor queue. If desired, new input, output, error, initialdir, arguments, nice_user, priority, kill_sig, coresize, or image_size commands may be issued between queue commands. This is very handy when submitting multiple runs into one cluster with one submit file; for example, by issuing an initialdir between each queue command, each run can work in its own subdirectory. The optional argument number-of-procs specifies how many times to submit the job to the queue, and defaults to 1.

In addition to commands, the submit-description file can contain macros and comments:

Macros
Parameterless macros in the form of $(macro_name)   may be inserted anywhere in condor description files. Macros can be defined by lines in the form of
 
        <macro_name> = <string>
Two pre-defined macros are supplied by the description file parser. The $(Cluster)   macro supplies the number of the job cluster, and the $(Process)   macro supplies the number of the job. These macros are intended to aid in the specification of input/output files, arguments, etc., for clusters with lots of jobs, and/or could be used to supply a Condor process with its own cluster and process numbers on the command line. The $(Process)   macro should not be used for PVM jobs.

If you happen to want a ``$'' as a literal character, then you must use

$(DOLLAR)

In addition to the normal macro, there is also a special kind of macro called a ``Substitution Macro'' that allows you to substitue expressions defined on the resource machine itself(gotten after a match to the machine has been performed) into specific expressions in your submit description file. The special substitution macro is of the form:

 
$$(attribute)

The substitution macro can only be used in three expressions in the submit description file: executable    , environment    , and arguments    . The most common use of this macro is for heterogeneous submission of an executable:

executable = povray.$$(opsys).$$(arch)
The opsys and arch attributes will be substituted at match time for any given resource. This will allow Condor to automatically choose the right executable for the right machine.

Comments
Blank lines and lines beginning with a '#' (pound-sign) character are ignored by the submit-description file parser.

Options

Supported options are as follows:

-
Accept the command file from stdin.
-v
Verbose output - display the created job class-ad

-n schedd_name
Submit to the specified schedd. This option is used when there is more than one schedd running on the submitting machine

-r schedd_name
Submit to a remote schedd. The jobs will be submitted to the schedd on the specified remote host. On Unix systems, the Condor administrator for you site must override the default AUTHENTICATION_METHODS configuration setting to enable remote filesystem (FS_REMOTE) authentication.

-d
Disable file permission checks.

-a command
Augment the commands in the submit file with the given command. This command will be considered to immediately precede the Queue command in the submit file and come after all other previous commands. The submit file is not modified. You can append multiple commands by using the -a option multiple times. If your command has spaces in it, make sure you quote it.

Exit Status

condor_submit will exit with a status value of 0 (zero) upon success, and a non-zero value upon failure.

Examples

Example 1: The below example queues three jobs for execution by Condor. The first will be given command line arguments of '15' and '2000', and will write its standard output to 'foo.out1'. The second will be given command line arguments of '30' and '2000', and will write its standard output to 'foo.out2'. Similarly the third will have arguments of '45' and '6000', and will use 'foo.out3' for its standard output. Standard error output, (if any), from all three programs will appear in 'foo.error'.

      ####################
      #
      # Example 1: queueing multiple jobs with differing
      # command line arguments and output files.
      #                                                                      
      ####################                                                   
                                                                         
      Executable     = foo                                                   
                                                                         
      Arguments      = 15 2000                                               
      Output  = foo.out1                                                     
      Error   = foo.err1
      Queue                                                                  
                                                                         
      Arguments      = 30 2000                                               
      Output  = foo.out2                                                     
      Error   = foo.err2
      Queue                                                                  
                                                                         
      Arguments      = 45 6000                                               
      Output  = foo.out3                                                     
      Error   = foo.err3
      Queue

Example 2: This submit-description file example queues 150 runs of program 'foo' which must have been compiled and linked for Silicon Graphics workstations running IRIX 6.x. Condor will not attempt to run the processes on machines which have less than 32 megabytes of physical memory, and will run them on machines which have at least 64 megabytes if such machines are available. Stdin, stdout, and stderr will refer to ``in.0'', ``out.0'', and ``err.0'' for the first run of this program (process 0). Stdin, stdout, and stderr will refer to ``in.1'', ``out.1'', and ``err.1'' for process 1, and so forth. A log file containing entries about where/when Condor runs, checkpoints, and migrates processes in this cluster will be written into file ``foo.log''.

      ####################                                                    
      #                                                                       
      # Example 2: Show off some fancy features including                            
      # use of pre-defined macros and logging.                                
      #                                                                       
      ####################                                                    
                                                                          
      Executable     = foo                                                    
      Requirements   = Memory >= 32 && OpSys == "IRIX6" && Arch =="SGI"     
      Rank           = Memory >= 64
      Image_Size     = 28 Meg                                                 
                                                                          
      Error   = err.$(Process)                                                
      Input   = in.$(Process)                                                 
      Output  = out.$(Process)                                                
      Log = foo.log                                                                       
                                                                          
      Queue 150

General Remarks

See Also

Condor User Manual

Author

Condor Team, University of Wisconsin-Madison

Copyright

Copyright © 1990-2001 Condor Team, Computer Sciences Department, University of Wisconsin-Madison, Madison, WI. All Rights Reserved. No use of the Condor Software Program is authorized without the express consent of the Condor Team. For more information contact: Condor Team, Attention: Professor Miron Livny, 7367 Computer Sciences, 1210 W. Dayton St., Madison, WI 53706-1685, (608) 262-0856 or miron@cs.wisc.edu. U.S. Government Rights Restrictions: Use, duplication, or disclosure by the U.S. Government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of The Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 or subparagraphs (c)(1) and (2) of Commercial Computer Software-Restricted Rights at 48 CFR 52.227-19, as applicable, Condor Team, Attention: Professor Miron Livny, 7367 Computer Sciences, 1210 W. Dayton St., Madison, WI 53706-1685, (608) 262-0856 or miron@cs.wisc.edu.

See the Condor Version 6.3.1 Manual for additional notices.


condor-admin@cs.wisc.edu