Security


Overview

HyperShell runs arbitrary shell commands on every machine that connects to its task queue. A HyperShell deployment is therefore only ever as secure as the network it operates on and the authentication material that gates it. This document describes the security architecture, the cryptographic protections built into the queue transport, and the recommended deployment patterns for trusted networks, public-internet exposure, and Kubernetes.

The two security primitives users need to understand are:

  • a shared authentication key that both server and clients must hold to participate in the queue, and

  • an end-to-end TLS layer that protects every byte of queue traffic on the wire.

Both are enabled by default and configured so that the most common deployment scenario — a single-host LocalCluster — works out of the box with no operator action required.


Architecture


Components

HyperShell is built on Python’s multiprocessing.managers.BaseManager and multiprocessing.JoinableQueue infrastructure. The runtime decomposes into three roles:

  • Submit ingests shell commands from stdin or files and writes them to the database (SQLite or PostgreSQL) with stable UUIDs.

  • Server pulls pending tasks from the database, bundles them, and exposes four JoinableQueue instances to clients via RPC over TCP:

    • scheduled — task bundles awaiting execution

    • completed — results returned from clients

    • heartbeat — periodic liveness reports from clients

    • confirmed — explicit acknowledgement of received bundles

  • Client connects to the server, pulls bundles from the scheduled queue, spawns shell subprocesses, and returns results on the completed queue.

The cluster command (hsx) composes the three roles in a single process group for local-only or SSH-distributed workflows.


Threat Model

Because tasks are arbitrary shell commands, the queue must be treated as a privileged channel:

  1. Arbitrary code execution. Any party able to enqueue a task can run it on every connected client. Connection authentication is therefore the primary security control, not an optional convenience.

  2. Data exposure. Task definitions, exit status, timing, and heartbeat data travel through the queue. Task stdout and stderr do not — they are written to disk on the client (locally, or on a shared filesystem if one is configured) and retrieved on demand over SFTP via SSHConnection when the operator runs hs task info --stdout or similar. Without transport encryption, the metadata that does flow over the queue (including the authentication handshake) is readable on the wire.

  3. Active network attacks. Unencrypted traffic is vulnerable to man-in-the-middle modification, command injection, and replay. With encryption but no peer authentication, it is vulnerable to active impersonation.

  4. Rogue clients. A client that holds the auth key can join the queue and pull tasks; a stolen key effectively grants full task-execution privileges on every node the client can reach.

  5. Denial of service. Neither connection flooding nor application-level resource exhaustion is mitigated by cryptography. The queue transport imposes no handshake timeout, frame-size cap, or connection limit of its own (see the Built-in TLS Limitations below), so DoS resistance must come from firewall and rate-limit controls at the network layer.

The defenses below address (1)–(3) directly and partially address (4). DoS mitigation is provided at the network perimeter, not by the queue transport.


Built-in TLS


Overview

HyperShell speaks TLS natively for its queue transport. There is no external middleware (stunnel, nginx, sidecar) required — the listener, the connect path, and every per-proxy RPC opened by multiprocessing.managers.BaseProxy are wrapped with TLS automatically when enabled. The implementation lives in hypershell.core.tls and hypershell.core.queue.

TLS is enabled by default. The default configuration is designed so that a fresh install on a single host can run hsx without any operator intervention: certificates are generated on first start, the client trusts the same certificate file via a shared filesystem path, and the queue proceeds over an encrypted, authenticated channel.


How It Works

A dedicated serializer key, 'hypershell-tls', is registered in multiprocessing.managers.listener_client and points to a TLS-aware listener/client pair (hypershell.core.queue.SecureListener and hypershell.core.queue.secure_client()). Whenever a manager is constructed with this serializer, every byte on the wire — the initial handshake, the BaseManager challenge/ response, every per-proxy RPC, and every server-side accept — transparently routes through TLS.

The hypershell.core.queue.SecureManager class selects this serializer when an active hypershell.core.tls.TLSConfig is supplied. If TLS is disabled (or no configuration is found), SecureManager is byte-for-byte equivalent to a stock BaseManager with the default pickle serializer, so cleartext deployments remain supported for constrained environments.


Authentication Key

Independent of TLS, the queue gates connections with a shared authentication key. The key is never sent over the queue socket. It is used directly (ASCII-encoded) as the key for the standard multiprocessing handshake — an HMAC challenge/response with a fresh per-connection nonce — so the secret itself never crosses the queue transport:

authkey = config.server.auth.encode('ascii')

An eavesdropper on the queue transport cannot recover the auth string from the handshake and cannot forge a valid response without it, and the per-connection nonce precludes replay of a captured exchange. This holds against a passive observer, and against an active attacker only once the TLS peer has been verified (via cafile + servername or a pinned fingerprint). The handshake is not cryptographically bound to the TLS channel, so in insecure mode — where the peer is not authenticated — an active man-in-the-middle can relay the challenge/response. Verify the peer, and do not use insecure mode on untrusted networks, so the key is exchanged only with the intended server.

For LocalCluster, RemoteCluster, SSHCluster, and autoscaling invocations, HyperShell generates a fresh random key for every invocation via secrets.token_hex(64), scoped to that single cluster lifetime; the operator never sees or configures it. For standalone hs server / hs client deployments, the operator supplies the key via configuration, environment, or the --auth CLI option.

To keep the shared secret meaningful, hs server refuses to start with the built-in placeholder key and enforces a minimum key policy: the key must be at least 16 characters and drawn from [A-Za-z0-9._+/=-] (which admits hex, URL-safe tokens, and standard Base64). The key generators used in the deployment guides below satisfy this policy.

The key is visible on the client command line in cluster launches

For SSHCluster and the MPI/SLURM launchers, the per-invocation key is passed to each client process as a -k command-line argument. On multi-user client hosts it is therefore visible in the process table (ps / /proc) and scheduler accounting for the lifetime of the client, even though it is redacted from HyperShell’s own launch logs. Treat every host that holds the key as inside the trust boundary. A directly configured server.auth is likewise stored in plaintext in the config file and printed verbatim by hs config get; prefer the _eval / _env suffixes (see below) to keep the secret out of the file.


Auto-generated Certificates

The first time a server starts with TLS enabled and cert/key set to '<auto>', HyperShell generates a self-signed RSA-3072 certificate covering the local FQDN, the short hostname, localhost, 127.0.0.1, and ::1 and writes the materials under the site library directory:

  • Linux / POSIX: $HOME/.hypershell/lib/tls/server.crt and server.key

  • macOS: $HOME/Library/HyperShell/tls/server.crt and server.key

  • Windows: %APPDATA%\HyperShell\Library\tls\server.crt and server.key

The private key is written with mode 0600 (owner-only) and the certificate with mode 0644. The SHA-256 fingerprint is logged at INFO level when the certificate is generated so the operator can paste it into the client configuration of a remote node or pin it via server.tls.fingerprint.

Subsequent server starts find the existing materials and reuse them — the operation is idempotent. Delete the two files to force regeneration (for example, after the hostname changes).


Configuration

TLS is configured under the [server.tls] namespace in the HyperShell TOML config:

[server.tls]
enabled     = true              # default
cert        = "<auto>"          # path to server cert, or '<auto>'
key         = "<auto>"          # path to server key, or '<auto>'
cafile      = "<auto>"          # trust anchor; '<auto>' = server's own cert (single-host / shared-FS)
fingerprint = "<none>"          # 'SHA256:AB:CD:...' pin (overrides cafile verification)
insecure    = false             # disable peer verification entirely (logs a warning)
min_version = "TLSv1.2"         # or 'TLSv1.3'
ciphers     = "<none>"          # OpenSSL cipher string
servername  = "<none>"          # SNI / hostname check override

Every key is also exposed via environment variables with the standard HYPERSHELL_ prefix and the same precedence as other settings (CLI > environment > local config > user config > system config > defaults):

HYPERSHELL_SERVER_TLS_ENABLED=true
HYPERSHELL_SERVER_TLS_CERT=/etc/hypershell/tls/server.crt
HYPERSHELL_SERVER_TLS_KEY=/etc/hypershell/tls/server.key
HYPERSHELL_SERVER_TLS_CAFILE=/etc/hypershell/tls/ca.crt
HYPERSHELL_SERVER_TLS_FINGERPRINT=SHA256:AB:CD:EF:...

Peer Verification Modes

The client side of every connection decides how to validate the server certificate. Four modes are supported, in decreasing strength:

  • CA verification — set cafile to a trust anchor file (PEM bundle) and servername to the expected hostname. The client validates the chain against that anchor (ssl.CERT_REQUIRED) and checks that the certificate matches the expected name (SNI / SAN check). This is the recommended mode for multi-host deployments with operator-managed PKI. servername is required for server-identity verification, not an optional extra: with cafile set but servername unset, the client verifies only that the certificate chains to the anchor and does not check which host presented it, so any certificate issued by that anchor — from any host — is accepted. Always set servername on untrusted networks.

  • Fingerprint pinning — set fingerprint to the server’s SHA256:AB:CD:... digest. The client completes the TLS handshake without CA validation and rejects the connection if the post-handshake fingerprint does not match. Use this when the operator already trusts the server out-of-band and wants minimal configuration — typical for self-signed certificates and small clusters.

  • System CA bundle — leave cafile, fingerprint, and insecure all unset. The client loads the platform’s default CA store and requires hostname verification. Use this with a real public-CA-issued certificate (e.g., Let’s Encrypt).

  • Insecure mode — set insecure = true. The handshake completes but the peer identity is not authenticated. The transport is still encrypted, but with no peer verification an active man-in-the-middle can impersonate the server and relay the auth-key handshake (which is not bound to the TLS channel). Suitable only for transient local debugging on a trusted host; logs a warning on every connection. Do not use it on untrusted networks.

When both cafile and fingerprint are set, fingerprint takes precedence and cafile is ignored.


Limitations

Understanding these properties matters most for deployments on untrusted networks.

  1. No mutual TLS. Clients are not required to present certificates. The shared authentication key authenticates the client to the server; TLS authenticates the server to the client. There is no certificate-based client identity.

  2. Pickle-serialized RPC. The queue is built on Python’s multiprocessing managers, whose RPC serializes objects with pickle in both directions. The authentication key and (non-insecure) TLS peer verification are the controls that gate this channel: a party that holds the key and reaches the server — or, in insecure mode, an active man-in-the-middle — can do more than enqueue tasks. Keep the key secret, keep peer verification enabled, and treat every host that holds the key as inside the trust boundary.

  3. Self-signed by default; single-leaf trust. The auto-generated certificate is self-signed (not chained to a public CA), valid for ten years, and has no revocation path; the private key is written to the site directory unencrypted and owner-only (0600), as is normal for an unattended service. With cafile = '<auto>' the client trusts that exact certificate as its own anchor, resolved per host — so this works out of the box only on a single host or a shared filesystem. A standalone remote client must be given the server’s certificate or its pinned fingerprint out of band; otherwise it trusts its own auto-generated certificate and the handshake to the real server fails (do not work around this with insecure mode). Rotating the auto-generated certificate requires redistributing the new certificate or fingerprint to clients.

  4. No transport-level DoS controls. The queue sets no per-connection handshake timeout, no inbound frame-size cap, and no limit on concurrent connections or worker threads. A peer that reaches the listener — including a slow peer that stalls before authenticating — can consume server resources, and failed authentication/handshake attempts are handled by the underlying multiprocessing layer without surfacing in HyperShell’s logs. Denial-of-service protection must come from the network layer (see the deployment guides).

  5. IPv4 only. The built-in TLS queue listener binds AF_INET (IPv4). On IPv6-only or IPv6-preferred networks, clients must reach the server over an IPv4 address.

  6. No pre-shared-key TLS. PSK would be a natural fit for the shared-secret model but is only available in Python 3.13+; HyperShell supports 3.9+.


Adjacent Trust Surfaces


HyperShell’s built-in TLS protects the queue transport and nothing else. A production deployment touches at least two additional security boundaries that the HyperShell process is not in a position to enforce: the task database and (when used) the SSH transport for distributing clients or fetching task output. Both must be hardened independently of the queue.


The Task Database

The database (SQLite or PostgreSQL) is the canonical source of truth for what tasks exist and what their command strings are. The server reads pending tasks from the database and hands them to clients for execution. Anyone who can write rows into the ``task`` table can execute arbitrary shell commands on every connected client — equivalent to root access on the entire compute fleet if clients run as root, or to user-level access to every client host otherwise.

Securing the database is the operator’s responsibility

HyperShell does not authenticate or authorize callers at the database layer. It trusts the database connection. If you use PostgreSQL, the security of that PostgreSQL instance — network exposure, role/permission model, TLS configuration, backup access — is part of your HyperShell threat surface and must be treated with the same care as the queue itself. A SQL-injection bug in any unrelated application that shares the same database, or a leaked PostgreSQL password, is functionally equivalent to a stolen HyperShell authentication key.

SQLite. When database.provider = 'sqlite' (the default), the database is a local file under the site library directory. Security reduces to filesystem permissions: the file should be readable and writable only by the user that runs the HyperShell server. Avoid placing the SQLite file on a network filesystem unless every host that mounts it is inside the same trust boundary as the server itself.

PostgreSQL. When database.provider = 'postgres', the HyperShell server connects via psycopg (v3) (the postgresql+psycopg dialect). HyperShell constructs a standard SQLAlchemy URL and passes it to sqlalchemy.create_engine(); everything related to transport security is configured in that URL (or in connect_args) and is delegated to psycopg and the underlying libpq. HyperShell itself adds nothing.

For production deployments, prefer the postgres-system or postgres-c install extra over postgres (see Installation): both link the operating-system libpq/OpenSSL, so the TLS stack receives OS security updates, whereas the default postgres extra bundles a frozen copy inside the wheel.

Concretely, the database does not speak TLS unless your URL says so. A bare postgresql://user:pass@host/db connects in cleartext and accepts the server’s certificate (or lack thereof) without verification. To require TLS and verify the server, append the standard libpq query parameters:

[database]
provider      = "postgres"
database      = "hypershell"
host          = "db.example.com"
user          = "hypershell"
password_eval = "cat /etc/hypershell/pg-pass.key"
sslmode       = "verify-full"
sslrootcert   = "/etc/hypershell/pg-ca.crt"

sslmode accepts (in order of strength) disable, allow, prefer, require, verify-ca, and verify-full. Anything below verify-full is vulnerable to man-in-the-middle attack against the database connection. The standard libpq documentation covers the full parameter set: https://www.postgresql.org/docs/current/libpq-ssl.html.

In addition to transport security, harden the PostgreSQL instance itself:

  • Bind PostgreSQL to a non-public interface and restrict access in pg_hba.conf to the hosts that actually run the HyperShell server.

  • Create a dedicated PostgreSQL role for HyperShell with the minimum privileges needed to operate on the HyperShell schema. Do not reuse the postgres superuser.

  • Rotate the database password on the same cadence as the HyperShell authentication key.

  • Treat database backups with the same sensitivity as the live database — they contain the same command strings.


SSH for Cluster Distribution and Output Retrieval

Two HyperShell features rely on SSH:

  • SSHCluster distributes client processes to remote hosts via paramiko. The host list, key material, and remote shell are all configured through the user’s standard ~/.ssh/config.

  • hs info <task> --stdout / --stderr fetches the captured task output from the client host via SFTP (also paramiko), since output is not sent through the queue.

Security for these paths is provided entirely by SSH itself — host key verification, key rotation, authorized_keys policy, StrictHostKeyChecking, and so on. HyperShell does not bypass or weaken any of these controls. Treat the SSH configuration that HyperShell uses with the same care you would treat any other SSH-based automation: prefer key-based authentication, disable password auth, keep known_hosts populated, and audit authorized_keys for unexpected entries.


Deployment Guides


Local Single-Host (Default)

For LocalCluster and SSH-distributed workflows on a shared filesystem (the typical HPC scenario), no operator action is required. The default configuration:

  • Generates a self-signed certificate on first server start

  • Writes it to the site library directory readable by every process on the host

  • Configures the client to trust the same file (cafile = '<auto>' mirrors cert)

  • Generates a fresh random authentication key per cluster invocation

The most common command therefore needs no security flags:

seq 1000 | hsx -t 'echo {}' -N16

Cross-host workflows over SSH (hsx --ssh 'a[00-32].cluster') inherit the same setup provided that every host shares the site library directory via NFS, Lustre, GPFS, or similar. The server writes the certificate once; every SSH-launched client reads it from the same path.

When the shared filesystem assumption breaks

Each client resolves its TLS trust from its own configuration and environment — the launchers do not ship certificate material to clients. On a shared filesystem this resolves to the same server-generated certificate automatically; without one, give each client the server’s trust anchor out of band. The simplest option is to pin the server fingerprint (HYPERSHELL_SERVER_TLS_FINGERPRINT=SHA256:..., logged by the server at INFO on first start); alternatively distribute the certificate to each node and set HYPERSHELL_SERVER_TLS_CAFILE (with HYPERSHELL_SERVER_TLS_SERVERNAME). Never distribute the private key.


Linux Cluster Exposed to the Internet

For deployments that accept client connections from outside a trusted network — for example, a head-node HyperShell server reachable over the public internet by remote workers — the recommended configuration uses operator-managed certificates and a strong, operator-managed authentication key.

Step 1 — Generate a strong authentication key.

openssl rand -base64 48 > /etc/hypershell/auth.key
chmod 600 /etc/hypershell/auth.key
chown hypershell:hypershell /etc/hypershell/auth.key

Distribute this file securely to every client (e.g., via configuration management, a secrets store, or out-of-band channel). Never check it into source control.

Step 2 — Provision a real TLS certificate.

The cleanest path for a server with a stable hostname is Let’s Encrypt via certbot:

certbot certonly --standalone -d hypershell.example.com

This produces /etc/letsencrypt/live/hypershell.example.com/fullchain.pem (cert) and privkey.pem (key). Configure HyperShell to use them:

# /etc/hypershell.toml
[server]
bind     = "0.0.0.0"
auth_eval = "cat /etc/hypershell/auth.key"

[server.tls]
enabled     = true
cert        = "/etc/letsencrypt/live/hypershell.example.com/fullchain.pem"
key         = "/etc/letsencrypt/live/hypershell.example.com/privkey.pem"
min_version = "TLSv1.3"

_eval and _env suffixes

Any setting with an _eval suffix is resolved by shelling out to the named command and capturing stdout; _env reads the named environment variable. These are cmdkit features available on every configuration key. They let secrets stay outside the TOML file. For example, auth_eval = "vault read -field=value secret/hypershell" or auth_env = "HYPERSHELL_AUTH_VALUE".

For environments without internet-reachable HTTP for ACME validation, use an internal CA (e.g., HashiCorp Vault PKI, smallstep, FreeIPA). Configure the same cert/key paths to point at the issued material.

Step 3 — Configure the client to verify the server.

If clients can resolve and connect to a public hostname covered by the server’s cert, leave cafile unset and let the platform’s default CA store handle verification:

# client-side config
[server]
host      = "hypershell.example.com"
auth_eval = "cat /etc/hypershell/auth.key"

If you used an internal CA, distribute the CA’s root certificate to each client and pin it explicitly:

[server.tls]
enabled    = true
cafile     = "/etc/hypershell/internal-ca.crt"
servername = "hypershell.example.com"   # require SAN match
min_version = "TLSv1.3"

If you accept a self-signed certificate (acceptable for small operator-managed deployments), pin the fingerprint instead — the server logs it at INFO on first start:

[server.tls]
enabled     = true
fingerprint = "SHA256:AB:CD:EF:01:23:..."

Step 4 — Restrict network exposure.

Cryptography is not a substitute for network controls. Combine TLS with:

  • A host firewall (ufw, firewalld, iptables) restricting the server port to known client IP ranges

  • A perimeter firewall or VPN tunnel for cross-site connectivity

  • fail2ban or equivalent to throttle abusive connection attempts

  • OS-level rate limits (iptables -m connlimit) to bound the connection rate

Step 5 — Run with least privilege.

useradd --system --shell /usr/sbin/nologin hypershell
chown -R hypershell:hypershell /var/lib/hypershell /etc/hypershell
# systemd unit running as User=hypershell, Group=hypershell

Never run the HyperShell server as root. Tasks executed by clients run as whichever user the client process runs as; constrain that user to the minimum privileges required for the workload.


Kubernetes Deployment

Kubernetes is a natural fit for HyperShell: the server runs as a long-lived workload (StatefulSet if backed by SQLite on a PersistentVolume, Deployment if backed by PostgreSQL), and clients run as elastically-scaled Deployment or Job workloads.

Certificate management with cert-manager.

Use cert-manager to provision and rotate the server certificate. A Certificate resource against an internal Issuer (or ACME-backed ClusterIssuer) produces a Secret containing tls.crt, tls.key, and ca.crt:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: hypershell-server
  namespace: hypershell
spec:
  secretName: hypershell-server-tls
  issuerRef:
    name: internal-ca
    kind: ClusterIssuer
  commonName: hypershell-server.hypershell.svc.cluster.local
  dnsNames:
    - hypershell-server
    - hypershell-server.hypershell
    - hypershell-server.hypershell.svc.cluster.local
  duration: 2160h    # 90 days
  renewBefore: 360h  # 15 days

Secret for the authentication key.

Store the shared auth key as a Secret rather than embedding it in a manifest:

kubectl create secret generic hypershell-auth \
    --namespace hypershell \
    --from-literal=auth="$(openssl rand -base64 48)"

Server StatefulSet.

Mount both Secrets into the server pod and expose the TLS configuration via environment variables:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: hypershell-server
  namespace: hypershell
spec:
  serviceName: hypershell-server
  replicas: 1
  selector:
    matchLabels: { app: hypershell-server }
  template:
    metadata:
      labels: { app: hypershell-server }
    spec:
      containers:
        - name: server
          image: hypershell/hypershell:latest
          args: ["hs", "server", "--bind", "0.0.0.0"]
          env:
            - name: HYPERSHELL_SERVER_TLS_ENABLED
              value: "true"
            - name: HYPERSHELL_SERVER_TLS_CERT
              value: /tls/tls.crt
            - name: HYPERSHELL_SERVER_TLS_KEY
              value: /tls/tls.key
            - name: HYPERSHELL_SERVER_TLS_MIN_VERSION
              value: TLSv1.3
            - name: HYPERSHELL_SERVER_AUTH
              valueFrom:
                secretKeyRef: { name: hypershell-auth, key: auth }
          ports:
            - containerPort: 50001
              name: queue
          volumeMounts:
            - { name: tls, mountPath: /tls, readOnly: true }
            - { name: data, mountPath: /var/lib/hypershell }
      volumes:
        - name: tls
          secret: { secretName: hypershell-server-tls }
  volumeClaimTemplates:
    - metadata: { name: data }
      spec:
        accessModes: [ReadWriteOnce]
        resources: { requests: { storage: 10Gi } }

Service and NetworkPolicy.

Expose the server as a ClusterIP Service and lock down access with a NetworkPolicy:

apiVersion: v1
kind: Service
metadata:
  name: hypershell-server
  namespace: hypershell
spec:
  selector: { app: hypershell-server }
  ports:
    - port: 50001
      targetPort: 50001
      name: queue
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: hypershell-server-allow-clients
  namespace: hypershell
spec:
  podSelector:
    matchLabels: { app: hypershell-server }
  policyTypes: [Ingress]
  ingress:
    - from:
        - podSelector:
            matchLabels: { app: hypershell-client }
      ports:
        - { protocol: TCP, port: 50001 }

Client Deployment.

Mount the same auth Secret and the CA from the cert-manager-issued Secret:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hypershell-client
  namespace: hypershell
spec:
  replicas: 8
  selector:
    matchLabels: { app: hypershell-client }
  template:
    metadata:
      labels: { app: hypershell-client }
    spec:
      containers:
        - name: client
          image: hypershell/hypershell:latest
          args:
            - "hs"
            - "client"
            - "--host"
            - "hypershell-server.hypershell.svc.cluster.local"
          env:
            - name: HYPERSHELL_SERVER_TLS_ENABLED
              value: "true"
            - name: HYPERSHELL_SERVER_TLS_CAFILE
              value: /tls/ca.crt
            - name: HYPERSHELL_SERVER_TLS_SERVERNAME
              value: hypershell-server.hypershell.svc.cluster.local
            - name: HYPERSHELL_SERVER_AUTH
              valueFrom:
                secretKeyRef: { name: hypershell-auth, key: auth }
          volumeMounts:
            - { name: tls, mountPath: /tls, readOnly: true }
      volumes:
        - name: tls
          secret: { secretName: hypershell-server-tls }

The client’s HYPERSHELL_SERVER_TLS_SERVERNAME matches the cert’s SAN, which makes the TLS layer reject any pod that happens to attach to the same Service but presents a different certificate.

Service mesh as an alternative

If your cluster already runs a service mesh (Istio, Linkerd) with mTLS for all in-cluster traffic, you can rely on the mesh for transport security and disable HyperShell’s own TLS layer (set HYPERSHELL_SERVER_TLS_ENABLED=false). The shared authentication key still gates queue access. This is appropriate when the mesh-provided identity model is authoritative and you want to avoid double encryption.


External Tunneling (Legacy)

For sites that pre-date built-in TLS or that already operate a sidecar tunneling layer for other reasons, HyperShell can run over an external tunnel with its own TLS disabled. This is documented for completeness; new deployments should prefer the built-in transport.

VPN. WireGuard, Tailscale, or OpenVPN provide a transparent encrypted tunnel between client and server hosts. HyperShell runs as if on a private network; no TLS configuration is required on the HyperShell side, though enabling it adds defense-in-depth at negligible cost.

stunnel. A stunnel sidecar can wrap a cleartext HyperShell connection in TLS:

# server-side stunnel.conf
[hypershell]
accept  = 0.0.0.0:50001
connect = 127.0.0.1:50000
cert    = /path/to/server-cert.pem
key     = /path/to/server-key.pem

# client-side stunnel.conf
[hypershell]
client  = yes
accept  = 127.0.0.1:50000
connect = server.example.com:50001
CAfile  = /path/to/ca-cert.pem
verify  = 2

With this layout the HyperShell server binds localhost:50000 and stunnel handles TLS on port 50001. Disable HyperShell’s built-in TLS (HYPERSHELL_SERVER_TLS_ENABLED=false) to avoid double-wrapping.


Operational Recommendations


Independent of the deployment style above:

  • Rotate keys and certificates regularly. A 90-day rotation cadence is a sensible default for both the auth key and the TLS certificate. cert-manager and Let’s Encrypt handle the certificate side automatically; the auth key requires an operator-driven rolling restart of server and clients.

  • Run as an unprivileged user. The HyperShell server itself does not require elevated privileges. Tasks run as the user that owns the client process; constrain that user accordingly.

  • Enable audit logging. Set logging.style = "detailed" (or "system") and ship logs to a central collector. The server logs every client connection, every task dispatched, and every completion. See logging for details. The per-invocation auth key is redacted from launch logs, but avoid shipping full configuration dumps (which contain a directly-set server.auth) to shared log infrastructure.

  • Sanitize task input. When generating tasks programmatically, treat the command string as a shell-injection vector — escape or template carefully, and never interpolate untrusted input into the command. hsx accepts arbitrary input from stdin by design; the trust boundary is whoever can write to that pipe.

  • Monitor connection patterns. HyperShell does not itself log failed authentication or failed TLS handshakes — they are handled by the underlying multiprocessing / TLS layer and not surfaced in the application log — so watch for probes and connection surges at the network layer (firewall counters, fail2ban, flow logs) rather than expecting them in HyperShell’s output.


References



Summary


HyperShell enables powerful parallel computing but requires careful attention to security because it executes arbitrary shell commands by design. The two essential controls HyperShell itself provides are the shared authentication key that gates the queue and the TLS layer that protects queue traffic on the wire. Both are enabled by default and self-provisioning on a single host; both are configurable end-to-end for multi-host deployments through TOML, environment variables, and CLI flags. Beyond the queue, the task database (especially PostgreSQL) and any SSH transports used for client distribution or output retrieval are adjacent trust surfaces that must be hardened independently.

Recommended practice:

  1. Local single-host workflows — accept the defaults. Auto-generated self-signed certificates and per-invocation auth keys are sufficient.

  2. Multi-host on trusted networks — accept the defaults if the site library directory is shared; otherwise pin the server fingerprint on each client.

  3. Multi-host across untrusted networks — provision a real certificate (Let’s Encrypt or an internal CA), distribute a strong shared auth key out of band, restrict the server port at the firewall, run as an unprivileged user, and require sslmode=verify-full for any PostgreSQL connection.

  4. Kubernetes — use cert-manager for certificates, Secret for the auth key, NetworkPolicy for connectivity, and consider an existing service mesh as an alternative transport-security layer.

  5. Database and SSH — treat the PostgreSQL instance and your SSH key/host-key policy as part of the HyperShell threat surface. A leaked database password or a permissive authorized_keys defeats every protection in the queue.

For questions or to report security vulnerabilities, please contact the maintainers via the GitHub repository or join the Discord community.