Release Notes (v2.6.0)

November 15, 2024

The v2.6.0 release includes performance improvements and new features.

  • Allow --delete with --limit for task update command

  • Add inline tag assignment for task submission

  • Performance improvements and telemetry

  • Update --version behavior

  • Update logging message behavior


Features


Allow --delete with --limit on task update

In the v2.5.0 release we added new task update capabilities, including relative targeting similar to how we can with the hs task search command, such as with --order-by [--desc] and --limit. With these options you can not only query for but apply updates to groups of tasks.

A common use-case for this would be applying changes to the most recently submitted tasks. This was disallowed for the --delete operation however. In this release, we add the delete_all() method to the library interface and now allow access to this from the program interface.

Delete most recent task submitted

hs task update --delete --order-by submit_time --desc --limit 1
Output
INFO [hypershell.task] Searching database: sqlite (/tmp/task.db)
Update affects 1 tasks, continue? yes/[no]: yes
...
INFO [hypershell.task] Deleted 1 tasks

Users should keep in mind as with all uses of relative targeting with --limit on updates pulls in all the task IDs first and then iteratively applies updates, instead of applying the change entirely within the database in a single UPDATE transaction.


Inline tag assignments

User-defined tags have been a feature within HyperShell for a long time. However, tags are always applied to an entire collection of tasks together. For example, with 10k tasks during hs submit, any tag assigment from the command-line program interface will be applied to all 10k.

The only way to manifest individualized tag assignments, such as a secondary task ID (e.g., my_id:123) unique to each task in the input, was to either use the library API manually from within Python, or to submit each task one at a time. Neither of these scenarios is favorable. While the library interface is a first-class citizen it is meant for developers, not end-users. Using the command-line program interface with separate invocations for each task is horribly inefficient and partially abandons the whole benefit of the hs submit command.

In v2.6.0 we add a completely new inline tag assignment interface using shell comments in the input task file. Similar to some other schedulers, adding #HYPERSHELL as a trailing comment to each task line is interpreted using the same tag syntax as the command-line.

Input task file (as in hs submit TASKFILE)

/some/program --conf=cfg/001.yaml --verbose  #HYPERSHELL  other_id:1  group:5
/some/program --conf=cfg/002.yaml --verbose  #HYPERSHELL  other_id:2  group:5
/some/program --conf=cfg/003.yaml --verbose  #HYPERSHELL  other_id:3  group:5
/some/program --conf=cfg/004.yaml --verbose  #HYPERSHELL  other_id:4  group:5
/some/program --conf=cfg/005.yaml --verbose  #HYPERSHELL  other_id:5  group:5
...

Performance improvements and telemetry


Database Performance

Since the very first v2 release of HyperShell we’ve automatically included the following database indices (both SQLite and PostgreSQL):

  • For new tasks: task.schedule_time

  • For retries: task.exit_status, task.retried

  • For client rescheduling: task.client_id, task.completion_time

In our performance testing it seems to be the case that in normal operations (except in pathological cases) the last index on client tasks is rarely used. This index would help to make the client eviction and task rescheduling more efficient, but as a rare event it is more harmful than helpful in normal high-throughput operations. In the v2.6.0 release this index is not created; in the extreme scale case, net throughput of tasks by the server is improved ~10-15% by eliminating this index.

Also related to database operations, a number of fields are now stored as SMALLINT when using PostgreSQL as a backend (specifically, exit_status and attempt).


Testing and Telemetry

As a multithreaded application with many agents, it is helpful for developers to force delays (random delays in particular) to the state machine transitions within each thread to both slow things down but also to test timing bugs between components. We now include commented out sections within hypershell.core.fsm (lines labeled: # FUZZ:). Uncomment these sections of code to add fuzzing to the program.

Similarly, it is helpful to measure the accumulated time spent in each area of the program. Uncomment the lines labeled # PERF: to enable telemetry collection on the time spent in each state for all threads in the program. This information is emitted as TRACE level messages at the end of the program.



Changes


Debugging message for client task launch

Previously, we logged all three INFO, DEBUG, and TRACE messages on client-side task start. They included different levels of information; DEBUG messages included the actually full task command-line with the task ID and TRACE included the process ID and original command input args. The problem is in order to elevate part of this information (e.g., PID) one would enable TRACE which is actually meant for higher resolution messages on polling/waiting behavior (a lot of messages). Instead, now we include the PID in the DEBUG message and have removed TRACE messages for client-side task start.


Extra details in --version output

Previously, the output of --version only included the semantic version triplet. Some applications will also include more detailed information related to their components and runtime. This seems helpful, so we’ve added the Python runtime version in the output to HyperShell in v2.6.0.