.. _20240518_2_5_0_release: Release Notes (v2.5.0) ====================== `May 18, 2024` The v2.5.0 release includes major new features along with numerous fixes and improvements. - New command-line program name. - Task update command capabilities. - Shell completions. - UNIX signal interrupts for graceful shutdowns. - Configuration management improvements. - Tagging system improvements. ----- Features -------- | What's in a name? ^^^^^^^^^^^^^^^^^ The original ``hyper-shell`` name is still valid and will remain available for backwards compatibility. But going forward we are dropping the hyphen *everywhere*, preferring simply *hypershell* in writing and ``hs`` at the command-line. But why? At the outset of the project we liked the hyphen for the program name at the command-line; it was aesthetic and distinct. But the hyphen cannot go everywhere we need it to. It cannot go in the environment variables, it cannot go in the package name, and it feels out of place on the file system. It remains a point of confusion for users who are unsure how to refer to the project, so we're deprecating the hyphen *everywhere*. Furthermore, it has been our experience that ``hyper-shell`` is simply a lot of characters to type when developing workflows interactively. It has the benefits of uniqueness on any given system; and it remains a good practice to use in scripts in the same way long-form options are. Conversely, just as it is more ergonomic to use short-form, single letter options at the command-line, even stacked, so too is the ability to invoke programs with fewer characters. | New capabilities for task management ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Previously the ``hs task update`` command would only allow you to specify a single task UUID, a field to change, and the new value. In `Issue #27 `_ we consider the much need capability to *cancel* a task. But what about other sorts of actions (singular and bulk) such as *deletion*, *reversion*, or even updating tags en masse? Instead of creating many additional subcommands, we have extended the original update command to allow for arbitrarily many field and tag modifications in a single call. As well, we've added ``--cancel``, ``--revert``, and ``--delete`` as special cases that apply managed changes. We've ported over the same machinery that underlies the ``hs task search`` command to allow updates to target tasks based on search. In the following example, we modify the `exit_status` of all tasks whose arguments end in this file pattern. .. admonition:: Update fields and tags with positional arguments :class: note .. code-block:: shell hs task update exit_status=-10 -w 'args ~ .20230627.gz$' These sorts of updates are sent to the database in one transaction. We can actually apply updates based on relative position as well using ``--order-by`` and ``--limit``; these changes however require us to pull task IDs and apply changes iteratively. In the following example, we apply a special tag to the most recently submitted task. .. admonition:: Apply tag to most recently submitted task :class: note .. code-block:: shell hs task update mark:false --order-by submit_time --desc --limit 1 Though not shown in these snippets, the update command first submits a *count* query to check how many tasks are affected. It will prompt the user with this information and ask for confirmation before applying the change. To force the update to proceed without user intervention, use ``--no-confirm``. In the following example we cancel all tasks that meet our criteria. .. admonition:: Cancel remaining tasks without confirmation :class: note .. code-block:: shell hs task update --cancel --remaining --no-confirm The ``--remaining``, ``--completed``, ``--failed``, and ``--succeeded`` options expand in the same way as with ``hs task search`` and the tasks' `exit_status`. The ``--cancel`` option implies setting `schedule_time` to now and `exit_status` to -1. The `schedule_time` remains `null` until the scheduler thread on the server selects it; thus setting to now means it cannot be selected. The ``--revert`` option applies the same field changes as when the server starts and identifies orphaned tasks. Basically, it retains its ID, submit details, and tags. It will be as if the task had never been scheduled. The ``--delete`` option physically removes task records from the database. | Shell completions! ^^^^^^^^^^^^^^^^^^ If you use `BASH` or `ZSH` you can now autocomplete subcommands, options, and arguments! (Sorry `CSH`/`TCSH` and `PowerShell` users, nothing for these yet). We'll try to convey some of the cool completions here, but you'll have to see for yourself. At the command-line, press ```` (once, or twice depending on your shell) to trigger completions. **Configuration:** When using ``hs config get`` or ``hs config set``, not only do you get standard, static option completion, the positional arguments are the application parameters and valid options. The shell completion function introspects your current configuration and offers these. Further, when setting values, some options are pre-populated with valid enumerations (e.g., ``logging.level``). Notably, the ``console.theme`` completes with all of the valid theme names. **Search:** When invoking ``hs task search`` or ``hs task update``, the positional arguments represent task fields, which are completed for you. Beyond this, when filtering on tags with ``-t`` or ``--with-tag``, it first completes with all valid, existing, distinct tag `keys`. If you follow that `key` with a ``:`` character, it completes with all existing, distinct `values` for that particular key. This is all run on the database side and unless you have a database with ~10M+ records, should complete in one second or less. .. note:: This feature is so useful, you might be interested to poll the database for this information directly using one of two new options for search: ``hs task search --tag-keys`` or ``hs task search --tag-values ``. **Server and Client:** When invoking the server and client programs there are additional smart completions. For the client, when completing the ``--host`` option, we parse your known hosts (``~/.ssh/known_hosts`` and ``/etc/hosts``) and offer them. This is particularly useful in a Linux cluster environment. For the server, create an ad-hoc `authkey` with ``-k`` by tab completing a 16-digit key generated as a checksum from ``/dev/urandom``. .. note:: The completion definition file must be installed to the correct location on your system or sourced in your login profile in order for completions to be enabled. | UNIX signal interrupts ^^^^^^^^^^^^^^^^^^^^^^ `HyperShell` has the capacity to heal from clients going missing. We've had *heartbeats* implemented for a long time. The client *timeout* feature allows for dynamic clusters to automatically scale down when task pressure is low. Unfortunately however, up until now we did not have the ability to choose to scale down because of external factors. An example of this in the context of typical HPC environments is the finite lifetime of job allocations. Imagine the database and server running externally in a persistent fashion and clients popping into existence on a cluster (using a scheduler like Slurm). In this environment, jobs can run up against their walltime limit in a matter of hours depending on the configuration. This would be a known scenario; and an unfortunate waste of resources to allow tasks to begin execution knowing the client will be unceremoniously killed by the scheduler, causing the eviction process to unfold and the orphaned tasks to get reverted and rescheduled. Wouldn't it be nice if you had some kind of hook into the system that would send your program a signal that it is nearing a cliff and should drain tasks and shutdown as soon as possible. Thankfully, most modern HPC schedulers do indeed offer this feature. And now we have added a signal handling facility to `HyperShell`. The ``SIGUSR1`` and ``SIGUSR2`` signals are intended for application developers to program against as fixed, recognized signals. We now use them for both the client and server to indicate a less catastrophic escalation of shutdown requests. Sending the ``SIGUSR1`` signal will trigger the schedulers to halt and begin shutdown procedures. On the client side, this means that all current tasks (and any in the local queue) will be allowed to complete, but the system will drain and shutdown at the completion of these tasks. Sending the ``SIGUSR2`` signal implies the same, but on the client side will set a flag to send local interrupts to tasks to come down faster. As described in the previous release with regard to the ``task.timeout`` feature, we send ``SIGINT``, ``SIGTERM``, and ``SIGKILL`` in an escalating fashion to halt running tasks. With regard to signals, we have also added a user configurable parameter ``-S``, ``--signalwait`` (or ``task.signalwait``, 10 seconds by default). This is the period in seconds the client will wait between signal escalations when halting a task. | Configuration management improvements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ At the command-line, the ``hs config`` commands allow the use of ``--system`` or ``--user`` as an option to target either of these locations. We've now added ``--local`` to all of the commands and ``--default`` on the ``get`` command. The ``hs config which`` command now provides much richer output showing not only the site from which an option has precedence but improved presentation and now a comparison to the default value. A new ``--site`` option limits output to *only* the site information (e.g., ``system``, ``user``, ``local``, ``env``, ``default``). .. admonition:: Query for site of configuration parameter :class: note .. code-block:: shell hs config which logging.level .. details:: Output .. code-block:: none debug (user: /home/user/.hypershell/config.toml | default: warning) .. code-block:: shell hs config which logging.level --site .. details:: Output .. code-block:: none user Further, we've added a new ``HYPERSHELL_CONFIG_FILE`` environment variable. When set, it disables `system`, `user`, and `local` configuration files in favor of only the named file. Setting this variable as empty results in only environment variables being considered. This can be useful in situations where many instances of the program need to coexist on the same system and incidental modification of the user-level configuration file might break jobs. | Tagging system improvements ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Previously, all tag `values` were considered text. We have modified the encoding to understand and store any valid JSON value type. Note however that some limitations apply and special handling has been implemented where possible; e.g., SQLite considers `true` and `false` to be the synonymous with `0` and `1`, respectively. Previously, all task metadata was injected into tasks' environment variables (e.g., ``TASK_SUBMIT_HOST``). Tag data was specifically stripped however because its more complex JSON was not amenable to simple encoding. However, we now deal with it directly and re-inject them with a ``TASK_TAG_`` prefix. This means tag data is available at runtime. So a task submitted with ``--tag site:b`` would have ``TASK_TAG_SITE=b`` defined at runtime. | ----- Fixes ----- | Tags not duplicated on task retry `#26 `_ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When a task fails and ``-r``, ``--max-retries`` is used to automatically re-submit a task, tags were not properly replicated. This was a simple omission in the code that duplicates the task metadata. This fix will likely be applied as a patch release in `v2.4.1` as well. | IntegrityError: duplicate violates unique constraint `#29 `_ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ At high throughput, when many clients are connecting, when a database is involved, if the server decides to evict a client because you have the eviction policy set too low (seconds), the client will be evicted, and tasks reverted. But the client in question was only delayed (whether due to the network, throughput, or the size of the database slowing down operations), and will collide with the existing database record when the server re-registers the client upon the next heartbeat message. In other words, if the server ever claims a client should be evicted too aggressively when a client is not actually gone, there is no protection mechanism currently to avoid the collision. We fix this by allowed re-registration; though you should avoid this scenario.