.. _security: Security ======== | Overview -------- *HyperShell* runs arbitrary shell commands on every machine that connects to its task queue. A *HyperShell* deployment is therefore only ever as secure as the network it operates on and the authentication material that gates it. This document describes the security architecture, the cryptographic protections built into the queue transport, and the recommended deployment patterns for trusted networks, public-internet exposure, and Kubernetes. The two security primitives users need to understand are: * a shared **authentication key** that both server and clients must hold to participate in the queue, and * an **end-to-end TLS layer** that protects every byte of queue traffic on the wire. Both are enabled by default and configured so that the most common deployment scenario — a single-host ``LocalCluster`` — works out of the box with no operator action required. ------------------- Architecture ------------ | Components ^^^^^^^^^^ *HyperShell* is built on Python's :class:`multiprocessing.managers.BaseManager` and :class:`multiprocessing.JoinableQueue` infrastructure. The runtime decomposes into three roles: * **Submit** ingests shell commands from ``stdin`` or files and writes them to the database (SQLite or PostgreSQL) with stable UUIDs. * **Server** pulls pending tasks from the database, bundles them, and exposes four ``JoinableQueue`` instances to clients via RPC over TCP: * ``scheduled`` — task bundles awaiting execution * ``completed`` — results returned from clients * ``heartbeat`` — periodic liveness reports from clients * ``confirmed`` — explicit acknowledgement of received bundles * **Client** connects to the server, pulls bundles from the ``scheduled`` queue, spawns shell subprocesses, and returns results on the ``completed`` queue. The ``cluster`` command (``hsx``) composes the three roles in a single process group for local-only or SSH-distributed workflows. | Threat Model ^^^^^^^^^^^^ Because tasks are arbitrary shell commands, the queue must be treated as a privileged channel: 1. **Arbitrary code execution.** Any party able to enqueue a task can run it on every connected client. Connection authentication is therefore the primary security control, not an optional convenience. 2. **Data exposure.** Task definitions, exit status, timing, and heartbeat data travel through the queue. Task ``stdout`` and ``stderr`` do *not* — they are written to disk on the client (locally, or on a shared filesystem if one is configured) and retrieved on demand over SFTP via :class:`~hypershell.cluster.ssh.SSHConnection` when the operator runs ``hs task info --stdout`` or similar. Without transport encryption, the metadata that *does* flow over the queue (including the authentication handshake) is readable on the wire. 3. **Active network attacks.** Unencrypted traffic is vulnerable to man-in-the-middle modification, command injection, and replay. With encryption but no peer authentication, it is vulnerable to active impersonation. 4. **Rogue clients.** A client that holds the auth key can join the queue and pull tasks; a stolen key effectively grants full task-execution privileges on every node the client can reach. 5. **Denial of service.** Neither connection flooding nor application-level resource exhaustion is mitigated by cryptography. The queue transport imposes no handshake timeout, frame-size cap, or connection limit of its own (see the Built-in TLS *Limitations* below), so DoS resistance must come from firewall and rate-limit controls at the network layer. The defenses below address (1)–(3) directly and partially address (4). DoS mitigation is provided at the network perimeter, not by the queue transport. ------------------- .. _builtin-tls: Built-in TLS ------------ | Overview ^^^^^^^^ *HyperShell* speaks TLS natively for its queue transport. There is no external middleware (``stunnel``, ``nginx``, sidecar) required — the listener, the connect path, and every per-proxy RPC opened by :class:`multiprocessing.managers.BaseProxy` are wrapped with TLS automatically when enabled. The implementation lives in :mod:`hypershell.core.tls` and :mod:`hypershell.core.queue`. TLS is **enabled by default**. The default configuration is designed so that a fresh install on a single host can run ``hsx`` without any operator intervention: certificates are generated on first start, the client trusts the same certificate file via a shared filesystem path, and the queue proceeds over an encrypted, authenticated channel. | How It Works ^^^^^^^^^^^^ A dedicated serializer key, ``'hypershell-tls'``, is registered in :data:`multiprocessing.managers.listener_client` and points to a TLS-aware listener/client pair (:class:`hypershell.core.queue.SecureListener` and :func:`hypershell.core.queue.secure_client`). Whenever a manager is constructed with this serializer, every byte on the wire — the initial handshake, the BaseManager challenge/ response, every per-proxy RPC, and every server-side accept — transparently routes through TLS. The :class:`hypershell.core.queue.SecureManager` class selects this serializer when an active :class:`hypershell.core.tls.TLSConfig` is supplied. If TLS is disabled (or no configuration is found), ``SecureManager`` is byte-for-byte equivalent to a stock ``BaseManager`` with the default ``pickle`` serializer, so cleartext deployments remain supported for constrained environments. | Authentication Key ^^^^^^^^^^^^^^^^^^ Independent of TLS, the queue gates connections with a shared authentication key. The key is never sent over the queue socket. It is used directly (ASCII-encoded) as the key for the standard multiprocessing handshake — an HMAC challenge/response with a fresh per-connection nonce — so the secret itself never crosses the queue transport: .. code-block:: python authkey = config.server.auth.encode('ascii') An eavesdropper on the queue transport cannot recover the auth string from the handshake and cannot forge a valid response without it, and the per-connection nonce precludes replay of a captured exchange. This holds against a passive observer, and against an active attacker only once the TLS peer has been verified (via ``cafile`` + ``servername`` or a pinned ``fingerprint``). The handshake is not cryptographically bound to the TLS channel, so in ``insecure`` mode — where the peer is not authenticated — an active man-in-the-middle can relay the challenge/response. Verify the peer, and do not use ``insecure`` mode on untrusted networks, so the key is exchanged only with the intended server. For ``LocalCluster``, ``RemoteCluster``, ``SSHCluster``, and autoscaling invocations, *HyperShell* generates a fresh random key for every invocation via ``secrets.token_hex(64)``, scoped to that single cluster lifetime; the operator never sees or configures it. For standalone ``hs server`` / ``hs client`` deployments, the operator supplies the key via configuration, environment, or the ``--auth`` CLI option. To keep the shared secret meaningful, ``hs server`` **refuses to start** with the built-in placeholder key and enforces a minimum key policy: the key must be at least 16 characters and drawn from ``[A-Za-z0-9._+/=-]`` (which admits hex, URL-safe tokens, and standard Base64). The key generators used in the deployment guides below satisfy this policy. .. admonition:: The key is visible on the client command line in cluster launches :class: note For ``SSHCluster`` and the MPI/SLURM launchers, the per-invocation key is passed to each client process as a ``-k`` command-line argument. On multi-user client hosts it is therefore visible in the process table (``ps`` / ``/proc``) and scheduler accounting for the lifetime of the client, even though it is redacted from *HyperShell*'s own launch logs. Treat every host that holds the key as inside the trust boundary. A directly configured ``server.auth`` is likewise stored in plaintext in the config file and printed verbatim by ``hs config get``; prefer the ``_eval`` / ``_env`` suffixes (see below) to keep the secret out of the file. | Auto-generated Certificates ^^^^^^^^^^^^^^^^^^^^^^^^^^^ The first time a server starts with TLS enabled and ``cert``/``key`` set to ``''``, *HyperShell* generates a self-signed RSA-3072 certificate covering the local FQDN, the short hostname, ``localhost``, ``127.0.0.1``, and ``::1`` and writes the materials under the site library directory: * **Linux / POSIX:** ``$HOME/.hypershell/lib/tls/server.crt`` and ``server.key`` * **macOS:** ``$HOME/Library/HyperShell/tls/server.crt`` and ``server.key`` * **Windows:** ``%APPDATA%\HyperShell\Library\tls\server.crt`` and ``server.key`` The private key is written with mode ``0600`` (owner-only) and the certificate with mode ``0644``. The SHA-256 fingerprint is logged at ``INFO`` level when the certificate is generated so the operator can paste it into the client configuration of a remote node or pin it via ``server.tls.fingerprint``. Subsequent server starts find the existing materials and reuse them — the operation is idempotent. Delete the two files to force regeneration (for example, after the hostname changes). | Configuration ^^^^^^^^^^^^^ TLS is configured under the ``[server.tls]`` namespace in the *HyperShell* TOML config: .. code-block:: toml [server.tls] enabled = true # default cert = "" # path to server cert, or '' key = "" # path to server key, or '' cafile = "" # trust anchor; '' = server's own cert (single-host / shared-FS) fingerprint = "" # 'SHA256:AB:CD:...' pin (overrides cafile verification) insecure = false # disable peer verification entirely (logs a warning) min_version = "TLSv1.2" # or 'TLSv1.3' ciphers = "" # OpenSSL cipher string servername = "" # SNI / hostname check override Every key is also exposed via environment variables with the standard ``HYPERSHELL_`` prefix and the same precedence as other settings (CLI > environment > local config > user config > system config > defaults): .. code-block:: shell HYPERSHELL_SERVER_TLS_ENABLED=true HYPERSHELL_SERVER_TLS_CERT=/etc/hypershell/tls/server.crt HYPERSHELL_SERVER_TLS_KEY=/etc/hypershell/tls/server.key HYPERSHELL_SERVER_TLS_CAFILE=/etc/hypershell/tls/ca.crt HYPERSHELL_SERVER_TLS_FINGERPRINT=SHA256:AB:CD:EF:... | Peer Verification Modes ^^^^^^^^^^^^^^^^^^^^^^^ The client side of every connection decides how to validate the server certificate. Four modes are supported, in decreasing strength: * **CA verification** — set ``cafile`` to a trust anchor file (PEM bundle) **and** ``servername`` to the expected hostname. The client validates the chain against that anchor (:class:`ssl.CERT_REQUIRED`) and checks that the certificate matches the expected name (SNI / SAN check). This is the recommended mode for multi-host deployments with operator-managed PKI. ``servername`` is required for server-identity verification, not an optional extra: with ``cafile`` set but ``servername`` unset, the client verifies only that the certificate chains to the anchor and does **not** check which host presented it, so any certificate issued by that anchor — from any host — is accepted. Always set ``servername`` on untrusted networks. * **Fingerprint pinning** — set ``fingerprint`` to the server's ``SHA256:AB:CD:...`` digest. The client completes the TLS handshake without CA validation and rejects the connection if the post-handshake fingerprint does not match. Use this when the operator already trusts the server out-of-band and wants minimal configuration — typical for self-signed certificates and small clusters. * **System CA bundle** — leave ``cafile``, ``fingerprint``, and ``insecure`` all unset. The client loads the platform's default CA store and requires hostname verification. Use this with a real public-CA-issued certificate (e.g., Let's Encrypt). * **Insecure mode** — set ``insecure = true``. The handshake completes but the peer identity is not authenticated. The transport is still encrypted, but with no peer verification an active man-in-the-middle can impersonate the server and relay the auth-key handshake (which is not bound to the TLS channel). Suitable only for transient local debugging on a trusted host; logs a warning on every connection. Do not use it on untrusted networks. When both ``cafile`` and ``fingerprint`` are set, ``fingerprint`` takes precedence and ``cafile`` is ignored. | Limitations ^^^^^^^^^^^ Understanding these properties matters most for deployments on untrusted networks. 1. **No mutual TLS.** Clients are not required to present certificates. The shared authentication key authenticates the *client* to the server; TLS authenticates the *server* to the client. There is no certificate-based client identity. 2. **Pickle-serialized RPC.** The queue is built on Python's multiprocessing managers, whose RPC serializes objects with ``pickle`` in both directions. The authentication key and (non-``insecure``) TLS peer verification are the controls that gate this channel: a party that holds the key and reaches the server — or, in ``insecure`` mode, an active man-in-the-middle — can do more than enqueue tasks. Keep the key secret, keep peer verification enabled, and treat every host that holds the key as inside the trust boundary. 3. **Self-signed by default; single-leaf trust.** The auto-generated certificate is self-signed (not chained to a public CA), valid for ten years, and has no revocation path; the private key is written to the site directory unencrypted and owner-only (``0600``), as is normal for an unattended service. With ``cafile = ''`` the client trusts that exact certificate as its own anchor, resolved per host — so this works out of the box only on a single host or a shared filesystem. A standalone remote client must be given the server's certificate or its pinned ``fingerprint`` out of band; otherwise it trusts its own auto-generated certificate and the handshake to the real server fails (do not work around this with ``insecure`` mode). Rotating the auto-generated certificate requires redistributing the new certificate or fingerprint to clients. 4. **No transport-level DoS controls.** The queue sets no per-connection handshake timeout, no inbound frame-size cap, and no limit on concurrent connections or worker threads. A peer that reaches the listener — including a slow peer that stalls before authenticating — can consume server resources, and failed authentication/handshake attempts are handled by the underlying multiprocessing layer without surfacing in *HyperShell*'s logs. Denial-of-service protection must come from the network layer (see the deployment guides). 5. **IPv4 only.** The built-in TLS queue listener binds ``AF_INET`` (IPv4). On IPv6-only or IPv6-preferred networks, clients must reach the server over an IPv4 address. 6. **No pre-shared-key TLS.** PSK would be a natural fit for the shared-secret model but is only available in Python 3.13+; *HyperShell* supports 3.9+. ------------------- Adjacent Trust Surfaces ----------------------- | *HyperShell*'s built-in TLS protects the **queue transport** and nothing else. A production deployment touches at least two additional security boundaries that the *HyperShell* process is not in a position to enforce: the **task database** and (when used) the **SSH transport** for distributing clients or fetching task output. Both must be hardened independently of the queue. | The Task Database ^^^^^^^^^^^^^^^^^ The database (SQLite or PostgreSQL) is the canonical source of truth for what tasks exist and what their command strings are. The server reads pending tasks from the database and hands them to clients for execution. **Anyone who can write rows into the ``task`` table can execute arbitrary shell commands on every connected client** — equivalent to root access on the entire compute fleet if clients run as root, or to user-level access to every client host otherwise. .. admonition:: Securing the database is the operator's responsibility :class: warning *HyperShell* does not authenticate or authorize callers at the database layer. It trusts the database connection. If you use PostgreSQL, the security of that PostgreSQL instance — network exposure, role/permission model, TLS configuration, backup access — is **part of your HyperShell threat surface** and must be treated with the same care as the queue itself. A SQL-injection bug in any unrelated application that shares the same database, or a leaked PostgreSQL password, is functionally equivalent to a stolen *HyperShell* authentication key. **SQLite.** When ``database.provider = 'sqlite'`` (the default), the database is a local file under the site library directory. Security reduces to filesystem permissions: the file should be readable and writable only by the user that runs the *HyperShell* server. Avoid placing the SQLite file on a network filesystem unless every host that mounts it is inside the same trust boundary as the server itself. **PostgreSQL.** When ``database.provider = 'postgres'``, the *HyperShell* server connects via `psycopg (v3) `_ (the ``postgresql+psycopg`` dialect). *HyperShell* constructs a standard SQLAlchemy URL and passes it to :func:`sqlalchemy.create_engine`; everything related to transport security is configured in that URL (or in ``connect_args``) and is delegated to ``psycopg`` and the underlying ``libpq``. *HyperShell* itself adds nothing. For production deployments, prefer the ``postgres-system`` or ``postgres-c`` install extra over ``postgres`` (see :ref:`install`): both link the operating-system ``libpq``/OpenSSL, so the TLS stack receives OS security updates, whereas the default ``postgres`` extra bundles a frozen copy inside the wheel. Concretely, the database does **not** speak TLS unless your URL says so. A bare ``postgresql://user:pass@host/db`` connects in cleartext and accepts the server's certificate (or lack thereof) without verification. To require TLS and verify the server, append the standard ``libpq`` query parameters: .. code-block:: toml [database] provider = "postgres" database = "hypershell" host = "db.example.com" user = "hypershell" password_eval = "cat /etc/hypershell/pg-pass.key" sslmode = "verify-full" sslrootcert = "/etc/hypershell/pg-ca.crt" ``sslmode`` accepts (in order of strength) ``disable``, ``allow``, ``prefer``, ``require``, ``verify-ca``, and ``verify-full``. Anything below ``verify-full`` is vulnerable to man-in-the-middle attack against the database connection. The standard ``libpq`` documentation covers the full parameter set: https://www.postgresql.org/docs/current/libpq-ssl.html. In addition to transport security, harden the PostgreSQL instance itself: * Bind PostgreSQL to a non-public interface and restrict access in ``pg_hba.conf`` to the hosts that actually run the *HyperShell* server. * Create a dedicated PostgreSQL role for *HyperShell* with the minimum privileges needed to operate on the *HyperShell* schema. Do not reuse the ``postgres`` superuser. * Rotate the database password on the same cadence as the *HyperShell* authentication key. * Treat database backups with the same sensitivity as the live database — they contain the same command strings. | SSH for Cluster Distribution and Output Retrieval ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Two *HyperShell* features rely on SSH: * :class:`~hypershell.cluster.ssh.SSHCluster` distributes client processes to remote hosts via ``paramiko``. The host list, key material, and remote shell are all configured through the user's standard ``~/.ssh/config``. * ``hs info --stdout`` / ``--stderr`` fetches the captured task output from the client host via SFTP (also ``paramiko``), since output is *not* sent through the queue. Security for these paths is provided entirely by SSH itself — host key verification, key rotation, ``authorized_keys`` policy, ``StrictHostKeyChecking``, and so on. *HyperShell* does not bypass or weaken any of these controls. Treat the SSH configuration that *HyperShell* uses with the same care you would treat any other SSH-based automation: prefer key-based authentication, disable password auth, keep ``known_hosts`` populated, and audit ``authorized_keys`` for unexpected entries. ------------------- Deployment Guides ----------------- | Local Single-Host (Default) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ For ``LocalCluster`` and SSH-distributed workflows on a shared filesystem (the typical HPC scenario), no operator action is required. The default configuration: * Generates a self-signed certificate on first server start * Writes it to the site library directory readable by every process on the host * Configures the client to trust the same file (``cafile = ''`` mirrors ``cert``) * Generates a fresh random authentication key per cluster invocation The most common command therefore needs no security flags: .. code-block:: shell seq 1000 | hsx -t 'echo {}' -N16 Cross-host workflows over SSH (``hsx --ssh 'a[00-32].cluster'``) inherit the same setup provided that every host shares the site library directory via NFS, Lustre, GPFS, or similar. The server writes the certificate once; every SSH-launched client reads it from the same path. .. admonition:: When the shared filesystem assumption breaks :class: note Each client resolves its TLS trust from its *own* configuration and environment — the launchers do not ship certificate material to clients. On a shared filesystem this resolves to the same server-generated certificate automatically; without one, give each client the server's trust anchor out of band. The simplest option is to pin the server fingerprint (``HYPERSHELL_SERVER_TLS_FINGERPRINT=SHA256:...``, logged by the server at ``INFO`` on first start); alternatively distribute the certificate to each node and set ``HYPERSHELL_SERVER_TLS_CAFILE`` (with ``HYPERSHELL_SERVER_TLS_SERVERNAME``). Never distribute the private key. | Linux Cluster Exposed to the Internet ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For deployments that accept client connections from outside a trusted network — for example, a head-node *HyperShell* server reachable over the public internet by remote workers — the recommended configuration uses operator-managed certificates and a strong, operator-managed authentication key. **Step 1 — Generate a strong authentication key.** .. code-block:: shell openssl rand -base64 48 > /etc/hypershell/auth.key chmod 600 /etc/hypershell/auth.key chown hypershell:hypershell /etc/hypershell/auth.key Distribute this file securely to every client (e.g., via configuration management, a secrets store, or out-of-band channel). Never check it into source control. **Step 2 — Provision a real TLS certificate.** The cleanest path for a server with a stable hostname is `Let's Encrypt `_ via ``certbot``: .. code-block:: shell certbot certonly --standalone -d hypershell.example.com This produces ``/etc/letsencrypt/live/hypershell.example.com/fullchain.pem`` (cert) and ``privkey.pem`` (key). Configure *HyperShell* to use them: .. code-block:: toml # /etc/hypershell.toml [server] bind = "0.0.0.0" auth_eval = "cat /etc/hypershell/auth.key" [server.tls] enabled = true cert = "/etc/letsencrypt/live/hypershell.example.com/fullchain.pem" key = "/etc/letsencrypt/live/hypershell.example.com/privkey.pem" min_version = "TLSv1.3" .. admonition:: ``_eval`` and ``_env`` suffixes :class: note Any setting with an ``_eval`` suffix is resolved by shelling out to the named command and capturing ``stdout``; ``_env`` reads the named environment variable. These are *cmdkit* features available on every configuration key. They let secrets stay outside the TOML file. For example, ``auth_eval = "vault read -field=value secret/hypershell"`` or ``auth_env = "HYPERSHELL_AUTH_VALUE"``. For environments without internet-reachable HTTP for ACME validation, use an internal CA (e.g., HashiCorp Vault PKI, smallstep, FreeIPA). Configure the same ``cert``/``key`` paths to point at the issued material. **Step 3 — Configure the client to verify the server.** If clients can resolve and connect to a public hostname covered by the server's cert, leave ``cafile`` unset and let the platform's default CA store handle verification: .. code-block:: toml # client-side config [server] host = "hypershell.example.com" auth_eval = "cat /etc/hypershell/auth.key" If you used an internal CA, distribute the CA's root certificate to each client and pin it explicitly: .. code-block:: toml [server.tls] enabled = true cafile = "/etc/hypershell/internal-ca.crt" servername = "hypershell.example.com" # require SAN match min_version = "TLSv1.3" If you accept a self-signed certificate (acceptable for small operator-managed deployments), pin the fingerprint instead — the server logs it at ``INFO`` on first start: .. code-block:: toml [server.tls] enabled = true fingerprint = "SHA256:AB:CD:EF:01:23:..." **Step 4 — Restrict network exposure.** Cryptography is not a substitute for network controls. Combine TLS with: * A host firewall (``ufw``, ``firewalld``, ``iptables``) restricting the server port to known client IP ranges * A perimeter firewall or VPN tunnel for cross-site connectivity * `fail2ban `_ or equivalent to throttle abusive connection attempts * OS-level rate limits (``iptables -m connlimit``) to bound the connection rate **Step 5 — Run with least privilege.** .. code-block:: shell useradd --system --shell /usr/sbin/nologin hypershell chown -R hypershell:hypershell /var/lib/hypershell /etc/hypershell # systemd unit running as User=hypershell, Group=hypershell Never run the *HyperShell* server as ``root``. Tasks executed by clients run as whichever user the client process runs as; constrain that user to the minimum privileges required for the workload. | Kubernetes Deployment ^^^^^^^^^^^^^^^^^^^^^ Kubernetes is a natural fit for *HyperShell*: the server runs as a long-lived workload (``StatefulSet`` if backed by SQLite on a ``PersistentVolume``, ``Deployment`` if backed by PostgreSQL), and clients run as elastically-scaled ``Deployment`` or ``Job`` workloads. **Certificate management with cert-manager.** Use `cert-manager `_ to provision and rotate the server certificate. A ``Certificate`` resource against an internal ``Issuer`` (or ACME-backed ``ClusterIssuer``) produces a ``Secret`` containing ``tls.crt``, ``tls.key``, and ``ca.crt``: .. code-block:: yaml apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: hypershell-server namespace: hypershell spec: secretName: hypershell-server-tls issuerRef: name: internal-ca kind: ClusterIssuer commonName: hypershell-server.hypershell.svc.cluster.local dnsNames: - hypershell-server - hypershell-server.hypershell - hypershell-server.hypershell.svc.cluster.local duration: 2160h # 90 days renewBefore: 360h # 15 days **Secret for the authentication key.** Store the shared auth key as a ``Secret`` rather than embedding it in a manifest: .. code-block:: shell kubectl create secret generic hypershell-auth \ --namespace hypershell \ --from-literal=auth="$(openssl rand -base64 48)" **Server StatefulSet.** Mount both Secrets into the server pod and expose the TLS configuration via environment variables: .. code-block:: yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: hypershell-server namespace: hypershell spec: serviceName: hypershell-server replicas: 1 selector: matchLabels: { app: hypershell-server } template: metadata: labels: { app: hypershell-server } spec: containers: - name: server image: hypershell/hypershell:latest args: ["hs", "server", "--bind", "0.0.0.0"] env: - name: HYPERSHELL_SERVER_TLS_ENABLED value: "true" - name: HYPERSHELL_SERVER_TLS_CERT value: /tls/tls.crt - name: HYPERSHELL_SERVER_TLS_KEY value: /tls/tls.key - name: HYPERSHELL_SERVER_TLS_MIN_VERSION value: TLSv1.3 - name: HYPERSHELL_SERVER_AUTH valueFrom: secretKeyRef: { name: hypershell-auth, key: auth } ports: - containerPort: 50001 name: queue volumeMounts: - { name: tls, mountPath: /tls, readOnly: true } - { name: data, mountPath: /var/lib/hypershell } volumes: - name: tls secret: { secretName: hypershell-server-tls } volumeClaimTemplates: - metadata: { name: data } spec: accessModes: [ReadWriteOnce] resources: { requests: { storage: 10Gi } } **Service and NetworkPolicy.** Expose the server as a ``ClusterIP`` Service and lock down access with a ``NetworkPolicy``: .. code-block:: yaml apiVersion: v1 kind: Service metadata: name: hypershell-server namespace: hypershell spec: selector: { app: hypershell-server } ports: - port: 50001 targetPort: 50001 name: queue --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: hypershell-server-allow-clients namespace: hypershell spec: podSelector: matchLabels: { app: hypershell-server } policyTypes: [Ingress] ingress: - from: - podSelector: matchLabels: { app: hypershell-client } ports: - { protocol: TCP, port: 50001 } **Client Deployment.** Mount the same auth ``Secret`` and the CA from the cert-manager-issued ``Secret``: .. code-block:: yaml apiVersion: apps/v1 kind: Deployment metadata: name: hypershell-client namespace: hypershell spec: replicas: 8 selector: matchLabels: { app: hypershell-client } template: metadata: labels: { app: hypershell-client } spec: containers: - name: client image: hypershell/hypershell:latest args: - "hs" - "client" - "--host" - "hypershell-server.hypershell.svc.cluster.local" env: - name: HYPERSHELL_SERVER_TLS_ENABLED value: "true" - name: HYPERSHELL_SERVER_TLS_CAFILE value: /tls/ca.crt - name: HYPERSHELL_SERVER_TLS_SERVERNAME value: hypershell-server.hypershell.svc.cluster.local - name: HYPERSHELL_SERVER_AUTH valueFrom: secretKeyRef: { name: hypershell-auth, key: auth } volumeMounts: - { name: tls, mountPath: /tls, readOnly: true } volumes: - name: tls secret: { secretName: hypershell-server-tls } The client's ``HYPERSHELL_SERVER_TLS_SERVERNAME`` matches the cert's SAN, which makes the TLS layer reject any pod that happens to attach to the same Service but presents a different certificate. .. admonition:: Service mesh as an alternative :class: note If your cluster already runs a service mesh (Istio, Linkerd) with mTLS for all in-cluster traffic, you can rely on the mesh for transport security and disable *HyperShell*'s own TLS layer (set ``HYPERSHELL_SERVER_TLS_ENABLED=false``). The shared authentication key still gates queue access. This is appropriate when the mesh-provided identity model is authoritative and you want to avoid double encryption. | External Tunneling (Legacy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ For sites that pre-date built-in TLS or that already operate a sidecar tunneling layer for other reasons, *HyperShell* can run over an external tunnel with its own TLS disabled. This is documented for completeness; new deployments should prefer the built-in transport. **VPN.** `WireGuard `_, `Tailscale `_, or `OpenVPN `_ provide a transparent encrypted tunnel between client and server hosts. *HyperShell* runs as if on a private network; no TLS configuration is required on the *HyperShell* side, though enabling it adds defense-in-depth at negligible cost. **stunnel.** A ``stunnel`` sidecar can wrap a cleartext *HyperShell* connection in TLS: .. code-block:: ini # server-side stunnel.conf [hypershell] accept = 0.0.0.0:50001 connect = 127.0.0.1:50000 cert = /path/to/server-cert.pem key = /path/to/server-key.pem # client-side stunnel.conf [hypershell] client = yes accept = 127.0.0.1:50000 connect = server.example.com:50001 CAfile = /path/to/ca-cert.pem verify = 2 With this layout the *HyperShell* server binds ``localhost:50000`` and ``stunnel`` handles TLS on port ``50001``. Disable *HyperShell*'s built-in TLS (``HYPERSHELL_SERVER_TLS_ENABLED=false``) to avoid double-wrapping. ------------------- Operational Recommendations --------------------------- | Independent of the deployment style above: * **Rotate keys and certificates regularly.** A 90-day rotation cadence is a sensible default for both the auth key and the TLS certificate. cert-manager and Let's Encrypt handle the certificate side automatically; the auth key requires an operator-driven rolling restart of server and clients. * **Run as an unprivileged user.** The *HyperShell* server itself does not require elevated privileges. Tasks run as the user that owns the client process; constrain that user accordingly. * **Enable audit logging.** Set ``logging.style = "detailed"`` (or ``"system"``) and ship logs to a central collector. The server logs every client connection, every task dispatched, and every completion. See :ref:`logging ` for details. The per-invocation auth key is redacted from launch logs, but avoid shipping full configuration dumps (which contain a directly-set ``server.auth``) to shared log infrastructure. * **Sanitize task input.** When generating tasks programmatically, treat the command string as a shell-injection vector — escape or template carefully, and never interpolate untrusted input into the command. ``hsx`` accepts arbitrary input from ``stdin`` by design; the trust boundary is whoever can write to that pipe. * **Monitor connection patterns.** *HyperShell* does not itself log failed authentication or failed TLS handshakes — they are handled by the underlying multiprocessing / TLS layer and not surfaced in the application log — so watch for probes and connection surges at the network layer (firewall counters, ``fail2ban``, flow logs) rather than expecting them in *HyperShell*'s output. ------------------- References ---------- | * `Python multiprocessing.managers `_ * `Python ssl module `_ * `cryptography library `_ * `cert-manager `_ * `Let's Encrypt `_ * `WireGuard `_ * `Tailscale `_ * `stunnel `_ ------------------- Summary ------- | *HyperShell* enables powerful parallel computing but requires careful attention to security because it executes arbitrary shell commands by design. The two essential controls *HyperShell* itself provides are the shared **authentication key** that gates the queue and the **TLS layer** that protects queue traffic on the wire. Both are enabled by default and self-provisioning on a single host; both are configurable end-to-end for multi-host deployments through TOML, environment variables, and CLI flags. Beyond the queue, the **task database** (especially PostgreSQL) and any **SSH transports** used for client distribution or output retrieval are adjacent trust surfaces that must be hardened independently. Recommended practice: 1. **Local single-host workflows** — accept the defaults. Auto-generated self-signed certificates and per-invocation auth keys are sufficient. 2. **Multi-host on trusted networks** — accept the defaults if the site library directory is shared; otherwise pin the server fingerprint on each client. 3. **Multi-host across untrusted networks** — provision a real certificate (Let's Encrypt or an internal CA), distribute a strong shared auth key out of band, restrict the server port at the firewall, run as an unprivileged user, and require ``sslmode=verify-full`` for any PostgreSQL connection. 4. **Kubernetes** — use cert-manager for certificates, ``Secret`` for the auth key, ``NetworkPolicy`` for connectivity, and consider an existing service mesh as an alternative transport-security layer. 5. **Database and SSH** — treat the PostgreSQL instance and your SSH key/host-key policy as part of the *HyperShell* threat surface. A leaked database password or a permissive ``authorized_keys`` defeats every protection in the queue. For questions or to report security vulnerabilities, please contact the maintainers via the `GitHub repository `_ or join the `Discord `_ community.