Release Notes (v2.4.0)

June 2, 2023

The v2.4.0 release includes major new features along with numerous fixes and improvements.

  • User-defined tags

  • Walltime limits on tasks

  • Client timeouts for automatic shutdown

  • Autoscaling cluster (fixed and dynamic)

  • New logging style

  • NO_COLOR environment variable support


Features


Tags

We’ve added custom tag support for tasks. This feature allows users to organize both individual and groups of tasks in new and interesting ways.

Tags are stored with both a key and value, e.g., site:xy. When specifying one or more tags to the group submit command or the individual task submit command, omitting the value results in it being stored as a blank (all tags are accepted as a plain string, non-numeric). When querying for tasks by tag, omitting the value will match on any task with that tag regardless of value. Including the value will target only tasks who both have the tag and the tag has that particular value.

For example, we can submit two sets of tasks with tags x and y.

Submit group of tasks with custom tag

seq 2 | hs submit -t 'echo {}' --tag x:0

Submit individual tasks with alternate tag

hs task submit -t y:0 -- echo foo
hs task submit -t y:1 -- echo bar

Now we can search for tasks based on these tags.

Search by tag

hs task search --count
Output
4
hs task search --count --tag x
Output
2
hs task search --count --tag x:1
Output
0
hs task search -t y:0
Output
---
          id: befbb239-8d91-42dc-b1bc-2170b61a7a50
     command: echo foo (echo foo)
 exit_status: 0
   submitted: 2023-06-02 10:56:29.847831
   scheduled: 2023-06-02 10:56:44.651948
     started: 2023-06-02 10:56:44.673857 (waited: 0:00:14)
   completed: 2023-06-02 10:56:44.678160 (duration: null)
 submit_host: macbook.local (5b676c02-80b0-4d07-9d2f-07367bbd23d1)
 server_host: macbook.local (a6f8b3fa-f635-428f-bd93-96fbbeebb5b3)
 client_host: macbook.local (a6f8b3fa-f635-428f-bd93-96fbbeebb5b3)
     attempt: 1
     retried: false
     outpath: /Users/me/.hypershell/lib/task/befbb239-8d91-42dc-b1bc-2170b61a7a50.out
     errpath: /Users/me/.hypershell/lib/task/befbb239-8d91-42dc-b1bc-2170b61a7a50.err
 previous_id: null
     next_id: null
        tags: y:0

Updating a tag on an existing task works like other fields except that it is additive. Specifying a new tag does not remove previous tags.

Update tags on existing task

hs task update befbb239-8d91-42dc-b1bc-2170b61a7a50 tag mark:false
hs task search -t y:0
Output
---
          id: befbb239-8d91-42dc-b1bc-2170b61a7a50
     command: echo foo (echo foo)
 exit_status: 0
   submitted: 2023-06-02 10:56:29.847831
   scheduled: 2023-06-02 10:56:44.651948
     started: 2023-06-02 10:56:44.673857 (waited: 0:00:14)
   completed: 2023-06-02 10:56:44.678160 (duration: null)
 submit_host: macbook.local (5b676c02-80b0-4d07-9d2f-07367bbd23d1)
 server_host: macbook.local (a6f8b3fa-f635-428f-bd93-96fbbeebb5b3)
 client_host: macbook.local (a6f8b3fa-f635-428f-bd93-96fbbeebb5b3)
     attempt: 1
     retried: false
     outpath: /Users/me/.hypershell/lib/task/befbb239-8d91-42dc-b1bc-2170b61a7a50.out
     errpath: /Users/me/.hypershell/lib/task/befbb239-8d91-42dc-b1bc-2170b61a7a50.err
 previous_id: null
     next_id: null
        tags: y:0 mark:false

Task timeout

Previously, unless the program being executed has a built-in timeout feature, there was no way to preempt a task. Once a task began execution, we would wait indefinitely for it to complete.

The new task-level timeout feature now provides this functionality. If not specified via configuration file (task.timeout), or environment variable (HYPERSHELL_TASK_TIMEOUT), or command-line argument (-W, --task-timeout), the default behavior is still to wait indefinitely. If given, after the specified number of seconds has elapsed, a signal is sent to the running program.

Each of SIGINT, SIGTERM, and SIGKILL are sent, in an escalating fashion, waiting briefly in between, until the program halts. If the program has still not halted (some programs are pathological), the task executor thread itself will halt.


Client timeout

In anticipation of the new autoscaling feature, we’ve implemented a client-level timeout. Numerous deployment scenarios might have a client launching mechanism on some sort of timer or trigger, but would require a hard-termination of the client instances. Previously we implemented a robust mechanism for recovering tasks from evicted clients; however, this is not a graceful or preferred means to intentionally scale down. It interrupts tasks, and requires a waiting period for the server to evict.

Instead, now you can specify a client-level timeout via configuration (client.timeout), environment variable (HYPERSHELL_CLIENT_TIMEOUT), or command-line option (-T, --timeout). If not given, the client will persist indefinitely by default as before. If given, the client will shutdown gracefully and send a disconnect signal to the server after the specified period in seconds has elapsed without any new task bundles arriving.


Autoscaling

We have added another mode to the hs cluster enabled with the --autoscaling option.

This mode combines some behavioral ideas of all three of the previous modes. The use of --launcher previously implied a single subprocess responsible for bringing up all client instances (like an mpirun). This is in contrast to the --ssh mode that brought up a distinct subprocess for each of the included hosts. The --autoscaling mode incorporates a new local thread that dynamically brings up new clients with the --launcher as a prefix.

There are two scaling policies, fixed and dynamic. In both cases, there is an initial size, minimum size, and maximum size for the cluster. The fixed policy is pretty simple. We launch the initial number of clients, and if or when the minimum size is reached, we add a client.

The dynamic policy incorporates a scaling factor, a dimensionless quantity that expresses some multiple of the average task duration in seconds. When the expected time to completion of all currently submitted tasks given currently running clients exceeds this period a new client will be launched.

See the detailed description under --autoscaling for the command-line interface.


New logging style

With the previous release we expanded the set of attributes available for use within logging messages, like elapsed time instead of absolute time, and shortened version of the hostname and module. We’ve now incorporated these in a new predefined logging style.

To enable this new style, just set it in your configuration.

Configure logging style

hs config set logging.style detailed-compact --user

Proper support for NO_COLOR environment variable

This release adds proper support of the NO_COLOR convention. We previously looked for HYPERSHELL_NO_COLOR, however it is better that this option not actually be specific to this software and respect the more general configuration.



Fixes


Issue #18

Incorrect type inferred for task search filters.

When using the task search command with -w, --where filters, the type of the value is inferred ‘smartly’ to make it simple to use values, like exit_status == null vs exit_status == 1 choosing a Python None and an integer 1 instead of their string counterparts.

This works as expected. Until you have a field that expects a string and it happens to have values that could be coerced to integers. As a minimal example:

Run cluster with integer-like command arguments

seq -w 100 | hs cluster -N2 -t 'echo {}'

The task args are 001, 002, etc. But now you cannot issue a command like the following and achieve expected results,

Search for task by integer-like argument

hs task search -w args==001