Release Notes (v2.4.0)¶
June 2, 2023
The v2.4.0 release includes major new features along with numerous fixes and improvements.
User-defined tags
Walltime limits on tasks
Client timeouts for automatic shutdown
Autoscaling cluster (fixed and dynamic)
New logging style
NO_COLOR environment variable support
Features¶
Task timeout¶
Previously, unless the program being executed has a built-in timeout feature, there was no way to preempt a task. Once a task began execution, we would wait indefinitely for it to complete.
The new task-level timeout feature now provides this functionality. If not
specified via configuration file (task.timeout
), or environment variable
(HYPERSHELL_TASK_TIMEOUT
), or command-line argument (-W
, --task-timeout
),
the default behavior is still to wait indefinitely. If given, after the specified
number of seconds has elapsed, a signal is sent to the running program.
Each of SIGINT
, SIGTERM
, and SIGKILL
are sent, in an escalating fashion,
waiting briefly in between, until the program halts. If the program has still not
halted (some programs are pathological), the task executor thread itself will halt.
Client timeout¶
In anticipation of the new autoscaling feature, we’ve implemented a client-level timeout. Numerous deployment scenarios might have a client launching mechanism on some sort of timer or trigger, but would require a hard-termination of the client instances. Previously we implemented a robust mechanism for recovering tasks from evicted clients; however, this is not a graceful or preferred means to intentionally scale down. It interrupts tasks, and requires a waiting period for the server to evict.
Instead, now you can specify a client-level timeout via configuration (client.timeout
),
environment variable (HYPERSHELL_CLIENT_TIMEOUT
), or command-line option (-T
, --timeout
).
If not given, the client will persist indefinitely by default as before. If given,
the client will shutdown gracefully and send a disconnect signal to the server
after the specified period in seconds has elapsed without any new task bundles arriving.
Autoscaling¶
We have added another mode to the hs cluster
enabled with the
--autoscaling
option.
This mode combines some behavioral ideas of all three of the previous modes. The use of
--launcher
previously implied a single subprocess responsible for bringing up all
client instances (like an mpirun
). This is in contrast to the --ssh
mode that
brought up a distinct subprocess for each of the included hosts. The --autoscaling
mode incorporates a new local thread that dynamically brings up new clients with the
--launcher
as a prefix.
There are two scaling policies, fixed
and dynamic
. In both cases, there is an
initial size, minimum size, and maximum size for the cluster. The fixed
policy
is pretty simple. We launch the initial number of clients, and if or when the minimum
size is reached, we add a client.
The dynamic
policy incorporates a scaling factor
, a dimensionless quantity
that expresses some multiple of the average task duration in seconds. When the expected
time to completion of all currently submitted tasks given currently running clients
exceeds this period a new client will be launched.
See the detailed description under --autoscaling
for the command-line interface.
New logging style¶
With the previous release we expanded the set of attributes available for use within logging messages, like elapsed time instead of absolute time, and shortened version of the hostname and module. We’ve now incorporated these in a new predefined logging style.
To enable this new style, just set it in your configuration.
Configure logging style
hs config set logging.style detailed-compact --user
Proper support for NO_COLOR
environment variable¶
This release adds proper support of the NO_COLOR convention.
We previously looked for HYPERSHELL_NO_COLOR
, however it is better that this option
not actually be specific to this software and respect the more general configuration.
Fixes¶
Issue #18¶
Incorrect type inferred for task search filters.
When using the task search
command with -w
, --where
filters,
the type of the value is inferred ‘smartly’ to make it simple to use values,
like exit_status == null
vs exit_status == 1
choosing a Python None
and an integer 1
instead of their string counterparts.
This works as expected. Until you have a field that expects a string and it happens to have values that could be coerced to integers. As a minimal example:
Run cluster with integer-like command arguments
seq -w 100 | hs cluster -N2 -t 'echo {}'
The task args
are 001
, 002
, etc. But now you cannot issue a command
like the following and achieve expected results,
Search for task by integer-like argument
hs task search -w args==001