.. _getting_started:
Getting Started
===============
Installation
------------
`HyperShell` should be isolated within its own virtual environment
and only expose the top-level entry point *script* on your `PATH`.
The well-known `pipx `_ utility handles all
of this nicely for unprivileged users installing for themselves.
See the :ref:`installation ` guide for more options
and additional notes and recommendations.
.. tab:: pipx
.. code-block:: shell
pipx install https://github.com/glentner/hypershell/archive/refs/tags/2.6.0.tar.gz
.. tab:: uv
.. code-block:: shell
uv tool install https://github.com/glentner/hypershell/archive/refs/tags/2.6.0.tar.gz
.. tab:: homebrew
.. code-block:: shell
brew tap glentner/tap
brew install hypershell
.. warning::
The `HyperShell` project has transitioned away from using the hyphen in any
context (command-line, filesystem, variables, online documentation, etc).
But because of a temporary naming issue with the Python Package Index (pypi.org, pip)
we have not secured the unhyphenated ``hypershell`` name on the index. So
until then, we must install the old package name or from GitHub directly.
-------------------
Features
--------
|
**Simple, Scalable**
Take a listing of shell commands and process them in parallel.
In this example, we use the ``-t`` option to specify a template for the input arguments
which are not fully formed shell commands. Larger workloads will want to use a database
for managing tasks and scheduling. Without having configured the database the program
will manage tasks entirely within memory.
.. admonition:: Hello World
:class: note
.. code-block:: shell
seq 4 | hs cluster -t 'echo {}'
.. details:: Output
.. code-block:: none
WARNING [hypershell.server] No database configured - automatically disabled
0
1
2
4
|
Scale out to remote servers with SSH and even define *groups* in your configuration file.
By default, all command `stdout` and `stderr` are joined and written out directly.
Capture individual task `stdout` and `stderr` with ``--capture``.
Set the :ref:`logging ` level to ``INFO`` to see each task start or ``DEBUG`` to
see additional detail about what is running, where, and when.
.. admonition:: Distributed Cluster over SSH
:class: note
.. code-block:: shell
hs cluster tasks.in -N16 --ssh-group=xyz --capture
.. details:: Logs
.. code-block:: none
2022-03-14 12:29:19.659 a00.cluster.xyz INFO [hypershell.client] Running task (5fb74a31-fc38-4535-8b45-c19bc3dbedee)
2022-03-14 12:29:19.665 a01.cluster.xyz INFO [hypershell.client] Running task (c1d32c32-3e76-48e0-b2c3-9420ea20b41b)
2022-03-14 12:29:19.668 a02.cluster.xyz INFO [hypershell.client] Running task (4a6e19ec-d325-468f-a55b-03a797eb51d5)
2022-03-14 12:29:19.671 a03.cluster.xyz INFO [hypershell.client] Running task (09587f55-4b50-4e2b-a528-55c60667b62a)
2022-03-14 12:29:19.674 a04.cluster.xyz INFO [hypershell.client] Running task (1336f778-c9ab-4111-810e-229d572be62e)
|
Use the provided launcher on HPC clusters to bring up workers within your job allocation.
Specify which program to use with the ``--launcher`` option. Achieve higher throughput by
aggregating tasks in bundles with ``-b``, ``--bundlesize``. Add a database configuration to
allow for retries with ``-r``, ``--max-retries``. Using a negative value for ``--delay-start``
causes the remote clients to sleep some random interval in seconds up to that value. In this
example we stagger the launch process over one minute.
.. admonition:: Distributed Cluster over Slurm
:class: note
.. code-block:: shell
hs cluster tasks.in -N128 -b128 --launcher=srun --max-retries=2 --delay-start=-60 >task.out
.. details:: Logs
.. code-block:: none
2022-03-14 12:29:19.659 a00.cluster.xyz INFO [hypershell.client] Running task (5fb74a31-fc38-4535-8b45-c19bc3dbedee)
2022-03-14 12:29:19.665 a01.cluster.xyz INFO [hypershell.client] Running task (c1d32c32-3e76-48e0-b2c3-9420ea20b41b)
2022-03-14 12:29:19.668 a02.cluster.xyz INFO [hypershell.client] Running task (4a6e19ec-d325-468f-a55b-03a797eb51d5)
2022-03-14 12:29:19.671 a03.cluster.xyz INFO [hypershell.client] Running task (09587f55-4b50-4e2b-a528-55c60667b62a)
2022-03-14 12:29:19.674 a04.cluster.xyz INFO [hypershell.client] Running task (1336f778-c9ab-4111-810e-229d572be62e)
|
**Flexible**
One of several novel features of `HyperShell`, is the ability to independently
stand up the *server* on one machine and then connect to that server using a *client* from
a different environment.
Start the server with a bind address of ``0.0.0.0`` to allow remote connections.
The server schedules tasks on a distributed queue. It is recommended that you protect your instance
with a private *key* (``-k/--auth``).
.. admonition:: Server
:class: note
.. code-block:: shell
hs server --forever --bind '0.0.0.0' --auth ''
Connect to the running server from a different host (even from a different platform, e.g., Windows).
You can connect with any number of clients from any number of hosts. The separate client connections
will each pull tasks off the queue asynchronously, balancing the load.
.. admonition:: Client
:class: note
.. code-block:: shell
hs client --host '' --auth '' --capture
|
**Dynamic**
Individual task metadata is exposed to tasks as environment variables. For example, ``TASK_ID`` provides
the UUID for the task, and ``TASK_SUBMIT_TIME`` records the date and time the task was submitted.
Any environment variable defined with the ``HYPERSHELL_EXPORT_`` prefix will be injected into
the environment of each task, *sans prefix*.
Use ``-t`` (short for ``--template``) to expand a template; ``{}`` can be used to insert the incoming
task arguments (alternatively, use ``TASK_ARGS``). Be sure to use single quotes to delay the variable
expansion. Many meta-patterns are supported (see full overview of :ref:`templates `):
* File operations (e.g., the basename ``'{/}'``)
* Slicing on whitespace (e.g., first ``'{[0]}'``, first three ``'{[:3]}'``, every other ``'{[::2]}'``)
* Sub-commands (e.g., ``'{% dirname @ %}'``)
* Lambda expressions in *x* (e.g., ``'{= x + 1 =}'``)
.. admonition:: Templates
:class: note
.. code-block:: shell
hs cluster tasks.in -N12 -t './some_program.py {} >outputs/{/-}.out'
Capturing `stdout` and `stderr` is supported directly in fact with the ``--capture`` option.
See the full documentation for environment variables under :ref:`configuration `.
Add arbitrary tags to one or whole collections of tasks to track additional context.
.. admonition:: Include user-defined tags
:class: note
.. code-block:: shell
hs submit tasks.in --tag prod instr:B12 site:us-west-1 batch:12
.. details:: Logs
.. code-block:: none
INFO [hypershell.submit] Submitted 20 tasks
|