Cluster¶
Usage¶
hs cluster [-h][FILE | --restart | --forever][-N NUM][-t CMD][-b SIZE][-w SEC][-p PORT][-r NUM [--eager]][-f PATH][--capture | [-o PATH] [-e PATH]][--ssh [HOST... | --ssh-group NAME] [--env] | --mpi | --launcher=ARGS...][--no-db | --initdb][--no-confirm][-d SEC][-T SEC][-W SEC][-S SEC][--autoscaling [MODE] [-P SEC] [-F VALUE] [-I NUM] [-X NUM] [-Y NUM]]
Description¶
Start cluster locally or with remote clients over ssh or a custom launcher. This mode should be the most common entry-point for general usage. It fully encompasses all of the different agents in the system in a concise workflow.
The input source for tasks is file-like, either a local path, or from stdin if no argument is
given. The command-line tasks are pulled in and either directly published to a distributed queue
(see --no-db) or committed to a database first before being scheduled later.
For large, long running workflows, it might be a good idea to configure a database and run an
initial submit job to populate the database, and then run the cluster with --restart and no
input FILE. If the cluster is interrupted for whatever reason it can gracefully restart where it
left off.
Use --autoscaling with either fixed or dynamic to run a persistent, elastically scalable
cluster using an external --launcher to bring up clients as needed.
Use hsx in place of hs cluster.
Arguments¶
- FILE
Path to input task file (default: <stdin>).
Modes¶
--sshHOST…Launch directly with SSH host(s). This can be a single host, a comma-separated list of hosts, or an expandable pattern, e.g., “cluster-a[00-04].xyz”.
See also
--ssh-groupand--ssh-args.--mpiSame as
--launcher=mpirun.--launcherARGS…Use specific launch interface. This can be any program that handles process management on a distributed system. For example, on a SLURM cluster one might want to use
srun. In this case you would specify--launcher=srun; however, the ARGS are not merely the executable but the full listing, e.g.,--launcher='srun --mpi=pmi2'.
Options¶
-N,--num-tasksNUMNumber of task executors per client (default: 1).
For example,
-N4would create four workers, but-N4 --ssh 'cluster-a[00-01].xyz'creates two clients and a total of eight workers.-t,--templateCMDCommand-line template pattern (default: “{}”).
This is expanded by the client just before execution. With the default “{}” the input command-line will be run verbatim. Specifying a template pattern allows for simple input arguments (e.g., file paths) to be transformed into some common form; such as
-t './some_command.py {} >outputs/{/-}.out'.See section on templates.
-p,--portNUMPort number (default: 50001).
This is an arbitrary choice and simply must be an available port. The default option chosen here is typically available on most platforms and is not expected by any known major software.
-b,--bundlesizeSIZESize of task bundle (default: 1).
The default value allows for greater concurrency and responsiveness on small scales. This is used by the submit thread to accumulate bundles for either database commits and/or publishing to the queue. If a database is in use, the scheduler thread selects tasks from the database in batches of this size.
Using larger bundles is a good idea for large distributed workflows; specifically, it is best to coordinate bundle size with the number of executors in use by each client.
See also
--num-tasksand--bundlewait.-w,--bundlewaitSECSeconds to wait before flushing tasks (default: 5).
This is used by both the submit thread and forwarded to each client. The client collector thread that accumulates finished task bundles to return to the server will push out a bundle after this period of time regardless of whether it has reached the preferred bundle size.
See also
--bundlesize.-r,--max-retriesNUMAuto-retry failed tasks (default: 0).
If a database is in use, then there is an opportunity to automatically retry failed tasks. A task is considered to have failed if it has a non-zero exit status. Setting this value greater than zero defines the number of attempts for the task. The original is not over-written, a new task is submitted and later scheduled.
See also
--eager.--eagerSchedule failed tasks before new tasks. If
--max-retriesis greater than one, this option defines the appetite for re-submitting failed tasks. By default, failed tasks will only be scheduled when there are no more remaining novel tasks.--no-dbDisable database (submit directly to clients).
By default, a scheduler thread selects tasks from a database that were previously submitted. With
--no-dbenabled, there is no scheduler and instead the submit thread publishes bundles directly to the queue.--initdbAuto-initialize database.
If a database is configured for use with the workflow (e.g., Postgres), auto-initialize tables if they don’t already exist. This is a short-hand for pre-creating tables with the
hs initdbcommand. This happens by default with SQLite databases.Mutually exclusive to
--no-db. Seehs initdbcommand.--no-confirmDisable client confirmation of task bundle received.
To achieve even higher throughput at large scales, optionally disable confirmation payloads from clients. Consider using this option when also using
--no-db.--foreverSchedule forever.
Typically, the cluster will process some finite set of submitted tasks. When there are no more tasks left to schedule, the cluster will begin its shutdown procedure. With
--foreverenabled, the scheduler will continue to wait for new tasks indefinitely.Conflicts with
--no-dband mutually exclusive to--restart.--restartStart scheduling from last completed task.
Instead of pulling a new list of tasks from some input FILE, with
--restartenabled the cluster will restart scheduling tasks where it left off. Any task in the database that was previously scheduled but not completed will be reverted.For very large workflows, an effective strategy is to first use the
submitworkflow to populate the database, and then to use--restartso that if the cluster is interrupted, it can easily continue where it left off, halting if nothing to be done.Conflicts with
--no-dband mutually exclusive to--forever.--ssh-argsARGS…Command-line arguments for SSH. For example,
--ssh-args '-i ~/.ssh/my_key'.--ssh-groupNAMESSH nodelist group in config.
In your configuration under
[ssh.nodelist]can be one or more named lists. These lists should contain host names to associate with the group name.See configuration section.
-E,--envSend environment variables. Only for
--sshmode, allHYPERSHELL_prefixed environment variables can be exported to the remote clients.--remote-exePATHPath to remote executable on the client side.
By default the executable path used to launch clients is the same as that for the cluster. This option is necessary for clients with a different install path on client hosts.
-d,--delay-startSECDelay time in seconds for launching clients (default: 0).
At larger scales it can be advantageous to uniformly delay the client launch sequence. Hundreds or thousands of clients connecting to the server all at once is a challenge. Even if the server could handle the load, your task throughput would be unbalanced, coming in waves.
Use
--delay-startwith a negative number to impose a uniform random delay up to the magnitude specified (e.g.,--delay-start=-600would delay the client up to ten minutes). This also has the effect of staggering the workload. If your tasks take on the order of 30 minutes and you have 1000 nodes, choose--delay-start=-1800.-c,--captureCapture individual task <stdout> and <stderr>.
By default, the stdout and stderr streams of all tasks are fused with that of the client thread, and in turn the cluster. If tasks are producing output that needs to be isolated, the tasks need to manage their own output, you can specify a redirect as part of a
--template, or use--captureto capture these as.outand.errfiles.These are stored local to the client. Task outputs can be automatically retrieved via SFTP, see task usage.
-o,--outputPATHFile path for task outputs (default: <stdout>).
If local only (not
--ssh,--mpior--launcher), then the client can redirect all stdout from tasks to some file PATH together.-e,--errorsPATHFile path for task errors (default: <stderr>).
If local only (not
--ssh,--mpior--launcher), then the client can redirect all stderr from tasks to some file PATH together.-f,--failuresPATHFile path to write failed task args (default: <none>).
The server acts like a sieve, reading task args from stdin and redirecting those original args to stdout if the task had a non-zero exit status. The cluster will run the server for you and if
--failuresis enabled these task args will be sent to a local file PATH.-T,--timeoutSECTimeout in seconds for clients. Automatically shutdown if no tasks received (default: never).
This option is only valid for an
--autoscalingcluster. This feature allows for gracefully scaling down a cluster when task throughput subsides.-W,--task-timeoutSECTask-level walltime limit (default: none).
Executors will send a progression of SIGINT, SIGTERM, and SIGKILL. If the process still persists the executor itself will shutdown.
-S,--signalwaitSECTask-level signal escalation wait period in seconds (default: 10).
When tasks fail to halt following an initial SIGINT, the program waits this interval in seconds before escalating to the next level of interrupt.
See also
--task-timeout.-A,--autoscaling[MODE]Enable autoscaling (default: disabled). Used with
--launcher.Specifying this option on its own triggers the use of the autoscaler, with the default policy or the configured policy. The policy can be specified directly here as either fixed or dynamic (e.g.,
--autoscaling=dynamic). The default is fixed.The specified
--launcheris used to bring up each individual instance of the client as a discrete scaling unit. This is different than using--launcheron its own where it specifies a single invocation that should launch all clients (e.g., like anmpirun). Without this option, clients will simply be run locally.A fixed policy will seek to maintain a definite size and allows for recovery in the event that clients halt for some reason (e.g., due to expected faults or timeouts).
A dynamic policy maintains a
--min-size(default: 0) and grows up to some--max-sizedepending on the observed task pressure given the specified scaling--factor.See also
--factor,--period,--init-size,--min-size, and--max-size.-F,--factorVALUEScaling factor (default: 1).
A configurable, dimensionless quantity used by the
--autoscaling=dynamicpolicy. This value expresses some multiple of the average task duration in seconds.The autoscaler periodically checks
toc / (factor x avg_duration), wheretocis the estimated time of completion for all remaining tasks given current throughput of active clients. This ratio is referred to as task pressure, and if it exceeds 1, the pressure is considered high and we will add another client if we are not already at the given--max-sizeof the cluster.For example, if the average task length is 30 minutes, and we set
--factor=2, then if the estimated time of completion of remaining tasks given currently connected executors exceeds 1 hour, we will scale up by one unit.See also
--period. Only valid with--autoscaling.-P,--periodSECScaling period in seconds (default: 60).
The autoscaler waits for this period of time in between checks and scaling events. A shorter period makes the scaling behavior more responsive but can effect database performance if checks happen too rapidly.
Only valid with
--autoscaling.-I,--init-sizeSIZEInitial size of cluster (default: 1).
When the cluster starts, this number of clients will be launched. For a fixed policy cluster, this should be given with a
--min-size, and likely the same value.Only valid with
--autoscaling.-X,--min-sizeSIZEMinimum size of cluster (default: 0).
Regardless of autoscaling policy, if the number of launched clients drops below this value we will scale up by one. Allowing
--min-size=0is an important feature for efficient use of computing resources in the absence of tasks.Only valid with
--autoscaling.-Y,--max-sizeSIZEMaximum size of cluster (default: 2).
For a dynamic autoscaling policy, this sets an upper limit on the number of launched clients. When this number is reached, scaling stops regardless of task pressure.
Only valid with
--autoscaling.