What is HyperShell?

Release v2.6.0 (Getting Started)

License Github Release Python Versions Code of Conduct

HyperShell is an elegant, cross-platform, high-throughput computing utility for processing shell commands over a distributed, asynchronous queue. It is a highly scalable workflow automation tool for many-task scenarios.

Built on Python and tested on Linux, macOS, and Windows.

Several tools offer similar functionality but not all together in a single tool with the user ergonomics we provide. Novel design elements include but are not limited to:

  • Cross-platform: run on any platform where Python runs. In fact, the server and client can run on different platforms in the same cluster.

  • Client-server: workloads do not need to be monolithic. Run the server as a stand-alone service with SQLite or Postgres as a persistent database and dynamically scale clients as needed.

  • Staggered launch: At the largest scales (1000s of nodes, 100k+ of workers), the launch process can be challenging. Come up gradually to balance the workload.

  • Database in-the-loop: run in-memory for quick, ad-hoc workloads. Otherwise, include a database for persistence, recovery when restarting, and search.


Usage


HyperShell is primarily a command-line program. Most users will operate the hs cluster in a start-to-finish workflow scenario much like people tend to do with alternatives like xargs, GNU Parallel, or HPC-specific tools like ParaFly or TaskFarmer (NERSC-only) or Launcher (TACC).

Basic usage

seq 1000000 | hs cluster -t 'echo {}' -N64 --ssh 'a[00-32].cluster' > task.out

See getting started for features and additional usage examples. Specific documentation is available for configuration management, database setup, logging, and using templates.

The HyperShell server can operate in standalone mode along side the database. Zero or more client instances may come and go as available and process tasks. When deployed in this fashion, the cluster can scale out as necessary as well as scale down to zero. This strategy is appropriate for creating shared, autoscaling, high-throughput pipelines for facilities with multiple users.

HyperShell also provides a library interface for Python applications to embed components. Developers can add HyperShell to their project to provide all of this functionality within their own applications or Python-based workflows.


Support


Join the Discord server to post questions, discuss your project, share with the community, keep in touch with announcements and upcoming events!

HyperShell is an open-source project developed on GitHub. If you find bugs or issues with the software please create an Issue. Contributions are welcome in the form of Pull requests for bug fixes, documentation, and minor feature improvements.


License


HyperShell is released under the Apache Software License (v2).

Copyright 2019-2024 Geoffrey Lentner.

This program is free software: you can redistribute it and/or modify it under the terms of the Apache License (v2.0) as published by the Apache Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Apache License for more details.

You should have received a copy of the Apache License along with this program.


Citation


If this software has helped facilitate your research please consider citing us.

@inproceedings{lentner_2022,
    author = {Lentner, Geoffrey and Gorenstein, Lev},
    title = {HyperShell v2: Distributed Task Execution for HPC},
    year = {2022},
    isbn = {9781450391610},
    publisher = {Association for Computing Machinery},
    url = {https://doi.org/10.1145/3491418.3535138},
    doi = {10.1145/3491418.3535138},
    booktitle = {Practice and Experience in Advanced Research Computing},
    articleno = {80},
    numpages = {3},
    series = {PEARC '22}
}